BPI-R2 slow ethernet speed

frank-w · June 8, 2019, 6:38am

Right…port5 (eth1=wan) is rgmi only, i don’t know why it is only working if set to trgmii. Port6=eth0=lan supports trgmii. Imho trgmii was not setup the right way so it causes issues if adding the second gmac with another mode.

LeXa2 · June 8, 2019, 6:42am

The code I cited means that the mode you set for port5 in DTS is essentially ignored andit hardcoded to always be rgmii. Probably there’s some other place where the value of the port mode from DTS is being used for something and that’s why we see differences (i.e. malformed packets) when set DTS port mode to rgmii.

frank-w · June 8, 2019, 6:52am

You have 2 ends of gmac,switch and soc…imho this setting affects only switch-side. But the code you’ve found also sets switch-side…maybe it is overridden with dts-value? I meam that this code is only an initialization value

LeXa2 · June 8, 2019, 8:07am

That’s it. We both do not know enough about the code in a whole to be sure. Then again when I seen the the patch above I thought that it might be a good explanation for the fact that we’ve got GMAC2 set to the mode it doesn’t support but still works. Might be a co-incidence though.

frank-w · June 10, 2019, 7:58am

i looked again over the code-snippet above.

have you seen the “!” in the condition? so only if port 5 is no cpu-port (imho it can be used as external port) the lines below are used (trgmii-mode makes no sense for external port).

so it should be right and does not affect our trgmii-setting on r2 where p5 is a cpu-port if activated by 2nd-gmac-patches, else it not used and then it does not matter in which mode it is

frank-w · June 21, 2019, 4:41pm

Maybe you want to try new phylink. Olser code was reported to have not this issue

frank-w · June 29, 2019, 7:21am

Anyone here for trying actual phylink-code?

ray · August 13, 2019, 2:04pm

Can’t open link. GH said 404

frank-w · August 13, 2019, 2:06pm

it’s because 5.3-rc is released and we have updated phylink

marcin · October 13, 2020, 9:11am

Hi

I notice the same problem on 5.4.70 kernel. My configuration is:

provider speed - 1GbpsDL/300MbpsUL . Performance tested from laptop connected to provider’s box
lan ports bridged to br0, connected to internal gb network
R2 works on SSD disk. only MASQUERADE is used, All vlans removed from configuration
a simple. old linksys connected to provider’s box and internal network to verify connection

My tests shows:

download from internet to r2 using wan IP - 30MBps (wget over https)
download from internet to r2 using lan IP - 30MBps
download from internal net to r2 using lan IP - 30MBps (wget over http)
traffic between bridged ports 112MBps
speedtest from internal network using R2 - depends on restart, up to 100MbpsDL 200Mbps UL, sometimes 14MbpsDL /14 MbpsUL.
tcp download from the same https as test 1 and 2 to the internal serwer - 200kbps !!!
traffic using paraller linksys - always 200MbpsDl/230MbpsUL. no mater which server internal server or protocol

I’ve got stuck. I plan to switch to openwrt but fighting with compilation errors. Any hint ?

Thanks,

frank-w · October 13, 2020, 10:35am

Do you use mainline kernel or from my repo (5.4-main)? Mainline is still missing pause option on gmac

marcin · October 13, 2020, 11:36am

Yes, it’s from your repo.

[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
[remote "origin"]
        url = https://github.com/frank-w/BPI-R2-4.14.git
        fetch = +refs/heads/*:refs/remotes/origin/*
[branch "5.4-main"]
        remote = origin
        merge = refs/heads/5.4-main

BR,

frank-w · October 13, 2020, 12:26pm

have you tried without bridging? have you took a look on cpu/interrupts while generating traffic?

watch "cat /proc/interrupts | grep eth"

marcin · October 14, 2020, 7:40am

Hi

the results are :

> cat /proc/interrupts ...  TorontoR2: Tue Oct 13 22:45:42 2020
> 
> 240:     997594          0          0          0  MT_SYSIRQ 199 Level     1b1000
> 00.ethernet
> 241:     519358          0          0          0  MT_SYSIRQ 198 Level     1b1000
> 00.ethernet

even I’ve already added:

> post-up echo 2 > /sys/class/net/wan/queues/rx-0/rps_cpus
> post-up echo 3 > /sys/class/net/br0/queues/rx-0/rps_cpus

in /etc/network/interfaces

I also updated kernel yesterday from you repo, so test was done on updated kernel

BR, Marcin

frank-w · October 14, 2020, 8:26am

For interrupts it is depending on how fast they counting…but i see you only use 1 cpu for both interrupts. You can try if it works better if you use 2

In my case it does not change anything…i got lags on streaming,but don’t know if they caused by r2 (maybe tv, isp or remote side)

DeadMeat · October 14, 2020, 9:46am

Hello everyone!

If your ISP has persistent ip - try to use SNAT --to-source in general it’s faster as it has less overhead.

in my case i have follofing speeds:

General ssl/tsl (ssh or https) - max ~100Mb/s (Megabit/s) - the bottleneck is ssl/tsl encryption - 100% 1 of cpu cores used (when r2 is client or server).

traffic between bridged ports - 860-960 Mbit’s not sure what it depens on.

it was tested on numerous of kernels 5.4, 5.5, 5.8, 5.9 amd 4.14 (long time ago :))

You can also try to use perf top to determine the bottleneck.

also you can try to spread interrupts over 2 CPUs

i dont remember exact command, but you can google smp_affinity also the commant shold seems like:

echo <cpu_mask> > /proc/irq/<irq_num>/smp_affinity i can find some details if needed

But i think you should start with perf top during the speed test. (installing perf may depend on your distro)

frank-w · October 14, 2020, 10:33am

Thank you for your tests

See my linked thread here

Generally if client opens a https webpage through r2,r2 do only forward packets…it does not need to encrypt/decrypt.so only the routing/nat-part should be done on cpu.

@marcin please do not use MBps as it is not clear if you mean Megabit or Megabyte per second

marcin · October 15, 2020, 6:29am

I have strange Iperf results from local server to r2 (without NAT)

> 
> Connecting to host server, port 5201
> [  4] local r2 port 49210 connected to server port 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-1.00   sec  11.8 MBytes  99.1 Mbits/sec  273   4.24 KBytes
> [  4]   1.00-2.00   sec  12.9 MBytes   108 Mbits/sec  263   2.83 KBytes
> [  4]   2.00-3.00   sec  8.45 MBytes  70.9 Mbits/sec  130   1.41 KBytes
> [  4]   3.00-4.00   sec  8.51 MBytes  71.4 Mbits/sec  183   5.66 KBytes
> [  4]   4.00-5.00   sec   954 KBytes  7.82 Mbits/sec   35   4.24 KBytes
> [  4]   5.00-6.00   sec  6.46 MBytes  54.2 Mbits/sec  154   12.7 KBytes
> [  4]   6.00-7.00   sec  2.42 MBytes  20.3 Mbits/sec   57   1.41 KBytes
> [  4]   7.00-8.00   sec  9.51 MBytes  79.7 Mbits/sec  148   7.07 KBytes
> [  4]   8.00-9.00   sec  16.5 MBytes   139 Mbits/sec  347   9.90 KBytes
> [  4]   9.00-10.00  sec  12.8 MBytes   107 Mbits/sec  300   2.83 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-10.00  sec  90.3 MBytes  75.8 Mbits/sec  1890             sender
> [  4]   0.00-10.00  sec  90.0 MBytes  75.5 Mbits/sec                  receiver
> 
> iperf Done.
> 
> 
> 
> 
> Connecting to host r2, port 5201
> [  5] local server port 53752 connected to r2 port 5201
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec  86.9 MBytes   729 Mbits/sec   68    851 KBytes
> [  5]   1.00-2.00   sec  86.2 MBytes   724 Mbits/sec    0   1.02 MBytes
> [  5]   2.00-3.00   sec  85.0 MBytes   713 Mbits/sec    0   1.17 MBytes
> [  5]   3.00-4.00   sec  86.2 MBytes   724 Mbits/sec    0   1.29 MBytes
> [  5]   4.00-5.00   sec  87.5 MBytes   734 Mbits/sec    0   1.39 MBytes
> [  5]   5.00-6.00   sec  85.0 MBytes   713 Mbits/sec    5   1.03 MBytes
> [  5]   6.00-7.00   sec  66.2 MBytes   556 Mbits/sec  740    153 KBytes
> [  5]   7.00-8.00   sec  86.2 MBytes   724 Mbits/sec  109   1.20 MBytes
> [  5]   8.00-9.00   sec  85.0 MBytes   713 Mbits/sec    0   1.32 MBytes
> [  5]   9.00-10.00  sec  85.0 MBytes   713 Mbits/sec    0   1.41 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec   839 MBytes   704 Mbits/sec  922             sender
> [  5]   0.00-10.00  sec   837 MBytes   702 Mbits/sec                  receiver
> 
> iperf Done.

No problem between two local net hosts.

BTW: I use MB for megabytes, Mb for megabits

BR Marcin

frank-w · October 15, 2020, 7:35am

Retransmitts may came from missing flowcontrol on switch/other side. Show ethtool -S eth0/br0/lanX after getting these retransmitts

the pause-patch for gmac is not yet in 5.4 as it will be merged to 5.10…only in my 5.4…that can also cause the Problem (but afair you use my repo)…just for others not using my repo

gwalton · October 15, 2020, 2:05pm

Hi Frank,

I am still seeing some issues I think are related to what is going on in this thread. I updated to your 5.4-main branch (at 5.4.70) from about two days ago. I get the “eth0: transmit timed out” error in dmesg and the LANx interfaces stop working until the unit is restarted. There is a stack trace as well in dmesg but it does not look to contain anything helpful.

Its not easy to reproduce. Running things ipref do not seem to trigger it, you have to create some CPU load on the R2 as well. Running samba on the R2, sharing a volume on a large SATA disk I can trigger the failure by mounting the volume on a client and doinging something like “cat * > /dev/null” (on the client) in a directory with a few 100GB of files. Some times it will run for six or seven hours, sometimes it will die in 15min.

I realize this is perhaps not a helpful bug report, but I only just got a second unit to play with and cloned my images to it. Now that I have separate test-pig setup I can try things out and iterate faster.