BPI-R2: Mini-PCIe to LAN data transfer rate

Hello,

I have been using the bpi-r2 as the base platform for prototyping our company’s 60GHz wifi module for a while now and since the very beginning, I’ve been struggling with a throughput drop issue. I would really appreciate If you could share your thoughts on this with me.

Our 60GHz Wifi module throughput reaches 1.3Gbps at its best. ~1Gbps is normally what I get in my day-to-day work. The module also has a 1-lane mini-pcie interface with the transfer rate of up to 2Gbps (i.e., PCIe 1.1 standard). You can think of it as one of off-the-shelves mini-pcie Wifi modules working on 60GHz frequency range rather than 2.4/5GHz. It is connected to bpi-r2 board through the mini-pcie slot, it supports 4-addr frames and will become part of the br-lan bridge after the boot up.

The typical use case is connecting two bpi-r2 boards using the 60GHz radio, one acting as an AP and the other one as a station. The bpi-r2 boards are also connected to a PC through one of the lan ports. My setup is like this:

PC #1, running iperf server <------ Ethernet cable -------> bpi-r2 board #1 <----------------- 60GHz Wifi ------------------> bpi-r2 board #2 <---------------- Ethernet cable ----------> PC #2, running iperf client

In this setup, the Wifi throughput between bpi-r2 boards is ~ 1Gbps (I use iperf for throughput test). But the max throughput that I can achieve between PC #1 and PC #2 is around 700Mbps and that’s what puzzles me. Replacing the wireless link with an Ethernet cable will result in 1Gbps throughput as well. So, I am left to believe that the throughput drop is due to data transfer rate between the mini-pcie slot and either the on-chip Ethernet frame engine or the on-board mt7530 switch, or maybe both. Is there anything I can try to confirm/pinpoint or even resolve this issue? Have any of you experienced similar behavior or I am simply doing something wrong…

Thanks a lot

What is the system load during the throughput test? As you got 4 CPU cores, you should make sure IRQ handling load is nicely shared between the 4 cores (ie. setup irq affinity, if needed)

Right, very good point. I thought about that at some point, and I know that all the interrupts are handled by cpu0 (I remember I kept checking the number on /proc/interrupts). I think cause I was getting the 1Gbps throughput when using the Ethernet cable, I kinda assumed that one cpu should still be fine. But the reality is, the data path could be very much different when I replace the Wifi with Ethernet cable. I was not running any applications really (I don’t have any!). I didn’t check the system load, but I am sure the cpu usages is always close to 0. One question though, for IRQ balancing, the driver should support that, right?

The mini-pcie and mt7623n Ethernet engine both use DMA. I wonder if there is anything I could do to take advantage of this situation…

The are probably still some interrupts firing to trigger DMA transfers and notify the host about their completion. Best is to check /proc/interrupts and then redistribute interrupts of the wifi NIC and the embedded ethernet to different CPU cores using echo $mask > /proc/irq/${irq}/smp_affinity where $mask is a a bitfield of CPUs to be used for that interrupt.

proc_interrupts_snapshots This is a snapshot of interrupts before and after running the iperf (i.e., iperf -c 192.168.137.1 -i 1 -t 30) for 30 seconds on PC#1. At least I could distribute the interrupts of wifi and ethernet modules…

Yes, at least move each of the interrupts involved there to different cores I’d suggest.

echo 2 > /proc/irq/214/smp_affinity
echo 4 > /proc/irq/215/smp_affinity
echo 8 > /proc/irq/223/smp_affinity 
1 Like

Alright, this is what I was getting before IRQ distribution:

Client connecting to 192.168.137.1, TCP port 5001
TCP window size:  408 KByte (default)
------------------------------------------------------------
[  3] local 192.168.137.124 port 36790 connected with 192.168.137.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  61.8 MBytes   518 Mbits/sec
[  3]  1.0- 2.0 sec  67.8 MBytes   568 Mbits/sec
[  3]  2.0- 3.0 sec  67.2 MBytes   564 Mbits/sec
[  3]  3.0- 4.0 sec  66.0 MBytes   554 Mbits/sec
[  3]  4.0- 5.0 sec  63.8 MBytes   535 Mbits/sec
[  3]  5.0- 6.0 sec  65.4 MBytes   548 Mbits/sec
[  3]  6.0- 7.0 sec  64.6 MBytes   542 Mbits/sec
[  3]  7.0- 8.0 sec  67.6 MBytes   567 Mbits/sec
[  3]  8.0- 9.0 sec  66.4 MBytes   557 Mbits/sec
[  3]  9.0-10.0 sec  68.0 MBytes   570 Mbits/sec
[  3] 10.0-11.0 sec  69.6 MBytes   584 Mbits/sec
[  3] 11.0-12.0 sec  67.9 MBytes   569 Mbits/sec
[  3] 12.0-13.0 sec  62.8 MBytes   526 Mbits/sec
[  3] 13.0-14.0 sec  66.1 MBytes   555 Mbits/sec
[  3] 14.0-15.0 sec  62.2 MBytes   522 Mbits/sec
[  3] 15.0-16.0 sec  65.6 MBytes   551 Mbits/sec
[  3] 16.0-17.0 sec  69.1 MBytes   580 Mbits/sec
[  3] 17.0-18.0 sec  67.4 MBytes   565 Mbits/sec
[  3] 18.0-19.0 sec  67.9 MBytes   569 Mbits/sec
[  3] 19.0-20.0 sec  67.2 MBytes   564 Mbits/sec
[  3] 20.0-21.0 sec  68.6 MBytes   576 Mbits/sec
[  3] 21.0-22.0 sec  66.0 MBytes   554 Mbits/sec
[  3] 22.0-23.0 sec  66.2 MBytes   556 Mbits/sec
[  3] 23.0-24.0 sec  70.2 MBytes   589 Mbits/sec
[  3] 24.0-25.0 sec  67.0 MBytes   562 Mbits/sec
[  3] 25.0-26.0 sec  68.1 MBytes   571 Mbits/sec
[  3] 26.0-27.0 sec  67.4 MBytes   565 Mbits/sec
[  3] 27.0-28.0 sec  65.8 MBytes   552 Mbits/sec
[  3] 28.0-29.0 sec  64.4 MBytes   540 Mbits/sec
[  3] 29.0-30.0 sec  64.9 MBytes   544 Mbits/sec
[  3]  0.0-30.0 sec  1.95 GBytes   557 Mbits/sec

And this is what I am getting after IRQ balancing:

------------------------------------------------------------
Client connecting to 192.168.137.1, TCP port 5001
TCP window size:  230 KByte (default)
------------------------------------------------------------
[  3] local 192.168.137.124 port 36796 connected with 192.168.137.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 1.0 sec  88.0 MBytes   738 Mbits/sec
[  3]  1.0- 2.0 sec  86.0 MBytes   721 Mbits/sec
[  3]  2.0- 3.0 sec  94.8 MBytes   795 Mbits/sec
[  3]  3.0- 4.0 sec  85.0 MBytes   713 Mbits/sec
[  3]  4.0- 5.0 sec  78.0 MBytes   654 Mbits/sec
[  3]  5.0- 6.0 sec  87.1 MBytes   731 Mbits/sec
[  3]  6.0- 7.0 sec  78.6 MBytes   660 Mbits/sec
[  3]  7.0- 8.0 sec  93.9 MBytes   787 Mbits/sec
[  3]  8.0- 9.0 sec  94.5 MBytes   793 Mbits/sec
[  3]  9.0-10.0 sec  88.0 MBytes   738 Mbits/sec
[  3] 10.0-11.0 sec  89.4 MBytes   750 Mbits/sec
[  3] 11.0-12.0 sec  86.5 MBytes   726 Mbits/sec
[  3] 12.0-13.0 sec  87.5 MBytes   734 Mbits/sec
[  3] 13.0-14.0 sec  86.5 MBytes   726 Mbits/sec
[  3] 14.0-15.0 sec  86.8 MBytes   728 Mbits/sec
[  3] 15.0-16.0 sec  86.8 MBytes   728 Mbits/sec
[  3] 16.0-17.0 sec  88.5 MBytes   742 Mbits/sec
[  3] 17.0-18.0 sec  90.8 MBytes   761 Mbits/sec
[  3] 18.0-19.0 sec  82.2 MBytes   690 Mbits/sec
[  3] 19.0-20.0 sec  87.4 MBytes   733 Mbits/sec
[  3] 20.0-21.0 sec  87.6 MBytes   735 Mbits/sec
[  3] 21.0-22.0 sec  83.1 MBytes   697 Mbits/sec
[  3] 22.0-23.0 sec  85.9 MBytes   720 Mbits/sec
[  3] 23.0-24.0 sec  81.0 MBytes   679 Mbits/sec
[  3] 24.0-25.0 sec  83.0 MBytes   696 Mbits/sec
[  3] 25.0-26.0 sec  84.6 MBytes   710 Mbits/sec
[  3] 26.0-27.0 sec  83.5 MBytes   700 Mbits/sec
[  3] 27.0-28.0 sec  76.8 MBytes   644 Mbits/sec
[  3] 28.0-29.0 sec  92.5 MBytes   776 Mbits/sec
[  3] 29.0-30.0 sec  88.0 MBytes   738 Mbits/sec
[  3]  0.0-30.0 sec  2.53 GBytes   724 Mbits/sec

The improvement is undeniable. Thanks a lot for the suggestion. I think it worked to the perfection.

Having said that, I am still convinced that I should be able to do better. I mean pushing it up to 1Gbps. I’ve manually assigned the IRQs to the CPUs. Do you think using IRQ balancer make any difference?

/usr/sbin/irqbalance achieves very similar numbers. I guess that’s the best I can get out of the IRQ balancing.