Banana Pi BPI-R4 iperf3 speed test
Which kernel version and which interface (if sfp which one) is used? I guess it is a 5gbit/s sfp but you have nearly 6gbit/s.
Looks like a 10gbit/s SFP and the board is maxing out processing at 5.8gbit/s unidirectional and 4.8git/s bidirectional.
@sinovoip, are you sure you did not apply the wrong driver on MT7988? In my experience, when MT advertises for 10Gbps, it typically should reach 9.4Gbps.
I was being dumb when used the cdc-ncm driver for RTL8156B and that capped the speed at 1.8Gbps. After applying the correct r8152 driver, it goes to full 2.35 Gbps.
When I was testing IPQ8072A, I only get 5.72 Gbps in the upstream direction, but 9.4Gbps in the downstream direction, simultaneously. So Filogic 880’s 4.5 Gbps ish bidirectional speed is quite disappointing.
It depends also on the other side and on cpu…maybe the cpu is limiting here (packet creation or forwarding in the network-stack). I’m not sure if anything here is already optimized. Which kernel/network driver is used here? Which network topology? Which device on the other side?
Maybe it can be tested from 1sfp+ to the other and simultanously looking on cpu-load. But i guess i need to do some tricks to not only sending traffic through sw stack as both interfaces are locally connected. Currently i have no 10g endpoint. But it will be some work to get linux itself working…
We did not see the irq occupancy and atop output, so this is hard to tell. @sinovoip maybe run iperf3 as a daemon or fork to background.
Here is some partial 10G (AQC113CS, atlantic driver) on BPI-R2 Pro
Hi.
I did a iperf3 test here. SFP+ 10G fibre between two BPI-R4 units. TCP gets 5 Gbps in each direction. UDP gets about 850 Mbps in each direction.
I have not done proper tests. I.e. tried 2 TCP streams at the same time. TCP has latency / bandwidth related speed limits for a single TCP stream.
This is because mainline misses RSS+LRO implementation
Where might I find those? I can test them.
They need to be upported…i did most parts but something is missing
RSS: Diff - 18f46a84d87308a4f56f9176ca166dc75c38bb20^! - openwrt/feeds/mtk-openwrt-feeds - Gitiles
LRO: Diff - ddc366751fad05dade79b09932a999c5d5ae890c^! - openwrt/feeds/mtk-openwrt-feeds - Gitiles
A bit mixed with lvts and some debug for others,but it should be last state: Commits · frank-w/BPI-Router-Linux · GitHub
I have noticed that the sfp1 and sfp2 have an MTU of 1500, with a max available of 2030. I think that if we could increase the MTU up to 9000 we might get better performance. Do we know what MTU the 10G network chips can actually go up to? Is it really only 2030?
maybe mt7988 does support more as it is the first supporting 10G, but have no info for it yet…afaik for using it mtk_soc_eth needs set_mtu callback defined (currently i see none).
in datasheet i found no MTU setting, but
151130D0 xfi_rx_cfg_2 XFI RX configuration 2 00000800
13:0 rx_max_len_cfg The maximum RX packet size (unit is byte)
sounds related to mtu.
@frank-w I will take a look at implementing RSS on top of Linux Kernel 6.9-rc1.
If I can get it working, it is likely to improve performance noticeably more than LRO would, so I will pick the bit that is likely to have the greatest benefit first.
So I’ve noticed that when I use iperf3 to test speeds over any 2.5 gig interface via directly on ethernet (rj45sfp) or USB (2.5G eth to usb-A), the speeds TO(download) the router are nearly full speed 2500Mb, but the speeds FROM(upload) the router are roughly half that (~1200).
Doesn’t matter if the router is the server or the client. Upload is roughly half.
But I only see this on 2500M I do not see this on the 1000M ports. On the 1000M its pretty much full speed up and down.
For upload side, it takes cpu computing power to generate packets. I suppose you’ll notice one of the cpu showing 0% idle If you use the command below.
mpstate 2 -P ALL
You need to use taskset to assign iperf app to another cpu or change smp_affinity to ensure that all cpu are utilized.
https://openwrt.org/docs/guide-user/advanced/load_balancing_-_tuning_smp_irq
I’ll check that, but interesting you bring that up. I noticed xhci (usb3) is always only using one cpu core. Nothing I do with affinities seems to get it to use all or multiple cores. Nor can I get it to simply change from cpu0 to a different cpu core.
If I use an usb to ethernet adapter or cellular modem on the m.2 slot , cpu0 shoots up to 50-70% utilization when transferring data at any substantial speed.
Might need to make a different thread for that though.