BPI-R2 slow ethernet speed

marcin · October 13, 2020, 9:11am

Hi

I notice the same problem on 5.4.70 kernel. My configuration is:

provider speed - 1GbpsDL/300MbpsUL . Performance tested from laptop connected to provider’s box
lan ports bridged to br0, connected to internal gb network
R2 works on SSD disk. only MASQUERADE is used, All vlans removed from configuration
a simple. old linksys connected to provider’s box and internal network to verify connection

My tests shows:

download from internet to r2 using wan IP - 30MBps (wget over https)
download from internet to r2 using lan IP - 30MBps
download from internal net to r2 using lan IP - 30MBps (wget over http)
traffic between bridged ports 112MBps
speedtest from internal network using R2 - depends on restart, up to 100MbpsDL 200Mbps UL, sometimes 14MbpsDL /14 MbpsUL.
tcp download from the same https as test 1 and 2 to the internal serwer - 200kbps !!!
traffic using paraller linksys - always 200MbpsDl/230MbpsUL. no mater which server internal server or protocol

I’ve got stuck. I plan to switch to openwrt but fighting with compilation errors. Any hint ?

Thanks,

frank-w · October 13, 2020, 10:35am

Do you use mainline kernel or from my repo (5.4-main)? Mainline is still missing pause option on gmac

marcin · October 13, 2020, 11:36am

Yes, it’s from your repo.

[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
[remote "origin"]
        url = https://github.com/frank-w/BPI-R2-4.14.git
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "5.4-main"]
        remote = origin
        merge = refs/heads/5.4-main

BR,

frank-w · October 13, 2020, 12:26pm

have you tried without bridging? have you took a look on cpu/interrupts while generating traffic?

watch "cat /proc/interrupts | grep eth"

marcin · October 14, 2020, 7:40am

Hi

the results are :

> cat /proc/interrupts ...  TorontoR2: Tue Oct 13 22:45:42 2020
> 
> 240:     997594          0          0          0  MT_SYSIRQ 199 Level     1b1000
> 00.ethernet
> 241:     519358          0          0          0  MT_SYSIRQ 198 Level     1b1000
> 00.ethernet

even I’ve already added:

> post-up echo 2 > /sys/class/net/wan/queues/rx-0/rps_cpus
> post-up echo 3 > /sys/class/net/br0/queues/rx-0/rps_cpus

in /etc/network/interfaces

I also updated kernel yesterday from you repo, so test was done on updated kernel

BR, Marcin

frank-w · October 14, 2020, 8:26am

For interrupts it is depending on how fast they counting…but i see you only use 1 cpu for both interrupts. You can try if it works better if you use 2

In my case it does not change anything…i got lags on streaming,but don’t know if they caused by r2 (maybe tv, isp or remote side)

DeadMeat · October 14, 2020, 9:46am

Hello everyone!

If your ISP has persistent ip - try to use SNAT --to-source in general it’s faster as it has less overhead.

in my case i have follofing speeds:

General ssl/tsl (ssh or https) - max ~100Mb/s (Megabit/s) - the bottleneck is ssl/tsl encryption - 100% 1 of cpu cores used (when r2 is client or server).

traffic between bridged ports - 860-960 Mbit’s not sure what it depens on.

it was tested on numerous of kernels 5.4, 5.5, 5.8, 5.9 amd 4.14 (long time ago :))

You can also try to use perf top to determine the bottleneck.

also you can try to spread interrupts over 2 CPUs

i dont remember exact command, but you can google smp_affinity also the commant shold seems like:

echo <cpu_mask> > /proc/irq/<irq_num>/smp_affinity i can find some details if needed

But i think you should start with perf top during the speed test. (installing perf may depend on your distro)

frank-w · October 14, 2020, 10:33am

Thank you for your tests

See my linked thread here

Generally if client opens a https webpage through r2,r2 do only forward packets…it does not need to encrypt/decrypt.so only the routing/nat-part should be done on cpu.

@marcin please do not use MBps as it is not clear if you mean Megabit or Megabyte per second

marcin · October 15, 2020, 6:29am

I have strange Iperf results from local server to r2 (without NAT)

> 
> Connecting to host server, port 5201
> [  4] local r2 port 49210 connected to server port 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-1.00   sec  11.8 MBytes  99.1 Mbits/sec  273   4.24 KBytes
> [  4]   1.00-2.00   sec  12.9 MBytes   108 Mbits/sec  263   2.83 KBytes
> [  4]   2.00-3.00   sec  8.45 MBytes  70.9 Mbits/sec  130   1.41 KBytes
> [  4]   3.00-4.00   sec  8.51 MBytes  71.4 Mbits/sec  183   5.66 KBytes
> [  4]   4.00-5.00   sec   954 KBytes  7.82 Mbits/sec   35   4.24 KBytes
> [  4]   5.00-6.00   sec  6.46 MBytes  54.2 Mbits/sec  154   12.7 KBytes
> [  4]   6.00-7.00   sec  2.42 MBytes  20.3 Mbits/sec   57   1.41 KBytes
> [  4]   7.00-8.00   sec  9.51 MBytes  79.7 Mbits/sec  148   7.07 KBytes
> [  4]   8.00-9.00   sec  16.5 MBytes   139 Mbits/sec  347   9.90 KBytes
> [  4]   9.00-10.00  sec  12.8 MBytes   107 Mbits/sec  300   2.83 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-10.00  sec  90.3 MBytes  75.8 Mbits/sec  1890             sender
> [  4]   0.00-10.00  sec  90.0 MBytes  75.5 Mbits/sec                  receiver
> 
> iperf Done.
> 
> 
> 
> 
> Connecting to host r2, port 5201
> [  5] local server port 53752 connected to r2 port 5201
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec  86.9 MBytes   729 Mbits/sec   68    851 KBytes
> [  5]   1.00-2.00   sec  86.2 MBytes   724 Mbits/sec    0   1.02 MBytes
> [  5]   2.00-3.00   sec  85.0 MBytes   713 Mbits/sec    0   1.17 MBytes
> [  5]   3.00-4.00   sec  86.2 MBytes   724 Mbits/sec    0   1.29 MBytes
> [  5]   4.00-5.00   sec  87.5 MBytes   734 Mbits/sec    0   1.39 MBytes
> [  5]   5.00-6.00   sec  85.0 MBytes   713 Mbits/sec    5   1.03 MBytes
> [  5]   6.00-7.00   sec  66.2 MBytes   556 Mbits/sec  740    153 KBytes
> [  5]   7.00-8.00   sec  86.2 MBytes   724 Mbits/sec  109   1.20 MBytes
> [  5]   8.00-9.00   sec  85.0 MBytes   713 Mbits/sec    0   1.32 MBytes
> [  5]   9.00-10.00  sec  85.0 MBytes   713 Mbits/sec    0   1.41 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec   839 MBytes   704 Mbits/sec  922             sender
> [  5]   0.00-10.00  sec   837 MBytes   702 Mbits/sec                  receiver
> 
> iperf Done.

No problem between two local net hosts.

BTW: I use MB for megabytes, Mb for megabits

BR Marcin

frank-w · October 15, 2020, 7:35am

Retransmitts may came from missing flowcontrol on switch/other side. Show ethtool -S eth0/br0/lanX after getting these retransmitts

the pause-patch for gmac is not yet in 5.4 as it will be merged to 5.10…only in my 5.4…that can also cause the Problem (but afair you use my repo)…just for others not using my repo

gwalton · October 15, 2020, 2:05pm

Hi Frank,

I am still seeing some issues I think are related to what is going on in this thread. I updated to your 5.4-main branch (at 5.4.70) from about two days ago. I get the “eth0: transmit timed out” error in dmesg and the LANx interfaces stop working until the unit is restarted. There is a stack trace as well in dmesg but it does not look to contain anything helpful.

Its not easy to reproduce. Running things ipref do not seem to trigger it, you have to create some CPU load on the R2 as well. Running samba on the R2, sharing a volume on a large SATA disk I can trigger the failure by mounting the volume on a client and doinging something like “cat * > /dev/null” (on the client) in a directory with a few 100GB of files. Some times it will run for six or seven hours, sometimes it will die in 15min.

I realize this is perhaps not a helpful bug report, but I only just got a second unit to play with and cloned my images to it. Now that I have separate test-pig setup I can try things out and iterate faster.

frank-w · October 15, 2020, 2:09pm

I know these rx timeout problems,but thought they are fixed by phylink conversion (5.2/3).

But anywhere i saw a fix for timeouts,but do not remember if it was for 5.4…maybe in openwrt-master or any fork

gwalton · October 15, 2020, 2:10pm

is your 5.4-main phylink?

frank-w · October 15, 2020, 2:14pm

5.4 mainline is phylink…it got merged with 5.3 or 5.2

Can you give me the exact error-message/stack trace

gwalton · October 15, 2020, 2:18pm

Ok, that is what I thought. I appologize i have been away from this project for a bit. I tried to get caught up but the way this forum is organized its a bit challenging, especially with the 64 bit board topics all in the same place here.

So if that is the case, I am curious if anyone else is still seeing the timeout issue. I don’t know if others here are not hitting because they are not trying to use the r2 for any NAS functions. I defenitly have not hit the problem just routing/NATing thru it. I had 5.4.2 up for like 60 days with samba and nfs disabled.

Where there any other changes related to timeout that went in with phylink? I read some stuff about a gmac pause patch. If there is a timeout or wait maybe it needs to be more agressive/pessimistic?

frank-w · October 15, 2020, 2:21pm

The missing pause was added in my 5.4-main long time ago,but with 5.6/5.7 i stumbled again over it so i posted it to mainline,but it is not in 5.4 yet (about to be merged in 5.10)

frank-w · October 15, 2020, 2:52pm

Maybe one of these Patches fix your Problem

https://git.openwrt.org/?p=openwrt/staging/nbd.git;a=commit;h=59d236f11df7539cfd9524a8d7857faefe40f74b

Maybe this one patchwork

Found in this thread: https://forum.openwrt.org/t/mtk-soc-eth-watchdog-timeout-after-r11573/50000/301

gwalton · October 15, 2020, 3:32pm

thanks I’ll take a look

gwalton · October 15, 2020, 5:08pm

patches look interesting but mtk_eth_soc.c looks to have changed a lot before 4.14 and 5.4.70. Going to take some effort here to see if this interrupt vs poll order isssue is even still present.

frank-w · October 15, 2020, 5:35pm

Seems the fe_poll_rx/tx is openwrt specific…i have not found these functions not in 4.14 too…mhm.