I have been using the bpi-r2 as the base platform for prototyping our company’s 60GHz wifi module for a while now and since the very beginning, I’ve been struggling with a throughput drop issue. I would really appreciate If you could share your thoughts on this with me.
Our 60GHz Wifi module throughput reaches 1.3Gbps at its best. ~1Gbps is normally what I get in my day-to-day work. The module also has a 1-lane mini-pcie interface with the transfer rate of up to 2Gbps (i.e., PCIe 1.1 standard). You can think of it as one of off-the-shelves mini-pcie Wifi modules working on 60GHz frequency range rather than 2.4/5GHz. It is connected to bpi-r2 board through the mini-pcie slot, it supports 4-addr frames and will become part of the br-lan bridge after the boot up.
The typical use case is connecting two bpi-r2 boards using the 60GHz radio, one acting as an AP and the other one as a station. The bpi-r2 boards are also connected to a PC through one of the lan ports. My setup is like this:
PC #1, running iperf server <------ Ethernet cable -------> bpi-r2 board #1 <----------------- 60GHz Wifi ------------------> bpi-r2 board #2 <---------------- Ethernet cable ----------> PC #2, running iperf client
In this setup, the Wifi throughput between bpi-r2 boards is ~ 1Gbps (I use iperf for throughput test). But the max throughput that I can achieve between PC #1 and PC #2 is around 700Mbps and that’s what puzzles me. Replacing the wireless link with an Ethernet cable will result in 1Gbps throughput as well. So, I am left to believe that the throughput drop is due to data transfer rate between the mini-pcie slot and either the on-chip Ethernet frame engine or the on-board mt7530 switch, or maybe both. Is there anything I can try to confirm/pinpoint or even resolve this issue? Have any of you experienced similar behavior or I am simply doing something wrong…
What is the system load during the throughput test? As you got 4 CPU cores, you should make sure IRQ handling load is nicely shared between the 4 cores (ie. setup irq affinity, if needed)
Right, very good point. I thought about that at some point, and I know that all the interrupts are handled by cpu0 (I remember I kept checking the number on /proc/interrupts).
I think cause I was getting the 1Gbps throughput when using the Ethernet cable, I kinda assumed that one cpu should still be fine. But the reality is, the data path could be very much different when I replace the Wifi with Ethernet cable. I was not running any applications really (I don’t have any!). I didn’t check the system load, but I am sure the cpu usages is always close to 0.
One question though, for IRQ balancing, the driver should support that, right?
The are probably still some interrupts firing to trigger DMA transfers and notify the host about their completion. Best is to check /proc/interrupts and then redistribute interrupts of the wifi NIC and the embedded ethernet to different CPU cores using echo $mask > /proc/irq/${irq}/smp_affinity where $mask is a a bitfield of CPUs to be used for that interrupt.
This is a snapshot of interrupts before and after running the iperf (i.e., iperf -c 192.168.137.1 -i 1 -t 30) for 30 seconds on PC#1. At least I could distribute the interrupts of wifi and ethernet modules…
The improvement is undeniable. Thanks a lot for the suggestion. I think it worked to the perfection.
Having said that, I am still convinced that I should be able to do better. I mean pushing it up to 1Gbps. I’ve manually assigned the IRQs to the CPUs. Do you think using IRQ balancer make any difference?