Right…port5 (eth1=wan) is rgmi only, i don’t know why it is only working if set to trgmii. Port6=eth0=lan supports trgmii. Imho trgmii was not setup the right way so it causes issues if adding the second gmac with another mode.
The code I cited means that the mode you set for port5 in DTS is essentially ignored andit hardcoded to always be rgmii. Probably there’s some other place where the value of the port mode from DTS is being used for something and that’s why we see differences (i.e. malformed packets) when set DTS port mode to rgmii.
You have 2 ends of gmac,switch and soc…imho this setting affects only switch-side. But the code you’ve found also sets switch-side…maybe it is overridden with dts-value? I meam that this code is only an initialization value
That’s it. We both do not know enough about the code in a whole to be sure. Then again when I seen the the patch above I thought that it might be a good explanation for the fact that we’ve got GMAC2 set to the mode it doesn’t support but still works. Might be a co-incidence though.
i looked again over the code-snippet above.
have you seen the “!” in the condition? so only if port 5 is no cpu-port (imho it can be used as external port) the lines below are used (trgmii-mode makes no sense for external port).
so it should be right and does not affect our trgmii-setting on r2 where p5 is a cpu-port if activated by 2nd-gmac-patches, else it not used and then it does not matter in which mode it is
Maybe you want to try new phylink. Olser code was reported to have not this issue
Anyone here for trying actual phylink-code?
Can’t open link. GH said 404
it’s because 5.3-rc is released and we have updated phylink
Hi
I notice the same problem on 5.4.70 kernel. My configuration is:
- provider speed - 1GbpsDL/300MbpsUL . Performance tested from laptop connected to provider’s box
- lan ports bridged to br0, connected to internal gb network
- R2 works on SSD disk. only MASQUERADE is used, All vlans removed from configuration
- a simple. old linksys connected to provider’s box and internal network to verify connection
My tests shows:
- download from internet to r2 using wan IP - 30MBps (wget over https)
- download from internet to r2 using lan IP - 30MBps
- download from internal net to r2 using lan IP - 30MBps (wget over http)
- traffic between bridged ports 112MBps
- speedtest from internal network using R2 - depends on restart, up to 100MbpsDL 200Mbps UL, sometimes 14MbpsDL /14 MbpsUL.
- tcp download from the same https as test 1 and 2 to the internal serwer - 200kbps !!!
- traffic using paraller linksys - always 200MbpsDl/230MbpsUL. no mater which server internal server or protocol
I’ve got stuck. I plan to switch to openwrt but fighting with compilation errors. Any hint ?
Thanks,
Do you use mainline kernel or from my repo (5.4-main)? Mainline is still missing pause option on gmac
Yes, it’s from your repo.
[core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [remote "origin"] url = https://github.com/frank-w/BPI-R2-4.14.git fetch = +refs/heads/*:refs/remotes/origin/* [branch "5.4-main"] remote = origin merge = refs/heads/5.4-main
BR,
have you tried without bridging? have you took a look on cpu/interrupts while generating traffic?
watch "cat /proc/interrupts | grep eth"
Hi
the results are :
> cat /proc/interrupts ... TorontoR2: Tue Oct 13 22:45:42 2020
>
> 240: 997594 0 0 0 MT_SYSIRQ 199 Level 1b1000
> 00.ethernet
> 241: 519358 0 0 0 MT_SYSIRQ 198 Level 1b1000
> 00.ethernet
even I’ve already added:
> post-up echo 2 > /sys/class/net/wan/queues/rx-0/rps_cpus
> post-up echo 3 > /sys/class/net/br0/queues/rx-0/rps_cpus
in /etc/network/interfaces
I also updated kernel yesterday from you repo, so test was done on updated kernel
BR, Marcin
For interrupts it is depending on how fast they counting…but i see you only use 1 cpu for both interrupts. You can try if it works better if you use 2
In my case it does not change anything…i got lags on streaming,but don’t know if they caused by r2 (maybe tv, isp or remote side)
Hello everyone!
If your ISP has persistent ip - try to use SNAT --to-source in general it’s faster as it has less overhead.
in my case i have follofing speeds:
General ssl/tsl (ssh or https) - max ~100Mb/s (Megabit/s) - the bottleneck is ssl/tsl encryption - 100% 1 of cpu cores used (when r2 is client or server).
traffic between bridged ports - 860-960 Mbit’s not sure what it depens on.
it was tested on numerous of kernels 5.4, 5.5, 5.8, 5.9 amd 4.14 (long time ago :))
You can also try to use perf top to determine the bottleneck.
also you can try to spread interrupts over 2 CPUs
i dont remember exact command, but you can google smp_affinity also the commant shold seems like:
echo <cpu_mask> > /proc/irq/<irq_num>/smp_affinity
i can find some details if needed
But i think you should start with perf top
during the speed test. (installing perf may depend on your distro)
Thank you for your tests
See my linked thread here
Generally if client opens a https webpage through r2,r2 do only forward packets…it does not need to encrypt/decrypt.so only the routing/nat-part should be done on cpu.
@marcin please do not use MBps as it is not clear if you mean Megabit or Megabyte per second
I have strange Iperf results from local server to r2 (without NAT)
> > Connecting to host server, port 5201 > [ 4] local r2 port 49210 connected to server port 5201 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 11.8 MBytes 99.1 Mbits/sec 273 4.24 KBytes > [ 4] 1.00-2.00 sec 12.9 MBytes 108 Mbits/sec 263 2.83 KBytes > [ 4] 2.00-3.00 sec 8.45 MBytes 70.9 Mbits/sec 130 1.41 KBytes > [ 4] 3.00-4.00 sec 8.51 MBytes 71.4 Mbits/sec 183 5.66 KBytes > [ 4] 4.00-5.00 sec 954 KBytes 7.82 Mbits/sec 35 4.24 KBytes > [ 4] 5.00-6.00 sec 6.46 MBytes 54.2 Mbits/sec 154 12.7 KBytes > [ 4] 6.00-7.00 sec 2.42 MBytes 20.3 Mbits/sec 57 1.41 KBytes > [ 4] 7.00-8.00 sec 9.51 MBytes 79.7 Mbits/sec 148 7.07 KBytes > [ 4] 8.00-9.00 sec 16.5 MBytes 139 Mbits/sec 347 9.90 KBytes > [ 4] 9.00-10.00 sec 12.8 MBytes 107 Mbits/sec 300 2.83 KBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 90.3 MBytes 75.8 Mbits/sec 1890 sender > [ 4] 0.00-10.00 sec 90.0 MBytes 75.5 Mbits/sec receiver > > iperf Done. > > > > > Connecting to host r2, port 5201 > [ 5] local server port 53752 connected to r2 port 5201 > [ ID] Interval Transfer Bitrate Retr Cwnd > [ 5] 0.00-1.00 sec 86.9 MBytes 729 Mbits/sec 68 851 KBytes > [ 5] 1.00-2.00 sec 86.2 MBytes 724 Mbits/sec 0 1.02 MBytes > [ 5] 2.00-3.00 sec 85.0 MBytes 713 Mbits/sec 0 1.17 MBytes > [ 5] 3.00-4.00 sec 86.2 MBytes 724 Mbits/sec 0 1.29 MBytes > [ 5] 4.00-5.00 sec 87.5 MBytes 734 Mbits/sec 0 1.39 MBytes > [ 5] 5.00-6.00 sec 85.0 MBytes 713 Mbits/sec 5 1.03 MBytes > [ 5] 6.00-7.00 sec 66.2 MBytes 556 Mbits/sec 740 153 KBytes > [ 5] 7.00-8.00 sec 86.2 MBytes 724 Mbits/sec 109 1.20 MBytes > [ 5] 8.00-9.00 sec 85.0 MBytes 713 Mbits/sec 0 1.32 MBytes > [ 5] 9.00-10.00 sec 85.0 MBytes 713 Mbits/sec 0 1.41 MBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bitrate Retr > [ 5] 0.00-10.00 sec 839 MBytes 704 Mbits/sec 922 sender > [ 5] 0.00-10.00 sec 837 MBytes 702 Mbits/sec receiver > > iperf Done.
No problem between two local net hosts.
BTW: I use MB for megabytes, Mb for megabits
BR Marcin
Retransmitts may came from missing flowcontrol on switch/other side. Show ethtool -S eth0/br0/lanX after getting these retransmitts
the pause-patch for gmac is not yet in 5.4 as it will be merged to 5.10…only in my 5.4…that can also cause the Problem (but afair you use my repo)…just for others not using my repo
Hi Frank,
I am still seeing some issues I think are related to what is going on in this thread. I updated to your 5.4-main branch (at 5.4.70) from about two days ago. I get the “eth0: transmit timed out” error in dmesg and the LANx interfaces stop working until the unit is restarted. There is a stack trace as well in dmesg but it does not look to contain anything helpful.
Its not easy to reproduce. Running things ipref do not seem to trigger it, you have to create some CPU load on the R2 as well. Running samba on the R2, sharing a volume on a large SATA disk I can trigger the failure by mounting the volume on a client and doinging something like “cat * > /dev/null” (on the client) in a directory with a few 100GB of files. Some times it will run for six or seven hours, sometimes it will die in 15min.
I realize this is perhaps not a helpful bug report, but I only just got a second unit to play with and cloned my images to it. Now that I have separate test-pig setup I can try things out and iterate faster.