If your ISP has persistent ip - try to use SNAT --to-source in general it’s faster as it has less overhead.
in my case i have follofing speeds:
General ssl/tsl (ssh or https) - max ~100Mb/s (Megabit/s) - the bottleneck is ssl/tsl encryption - 100% 1 of cpu cores used (when r2 is client or server).
traffic between bridged ports - 860-960 Mbit’s not sure what it depens on.
it was tested on numerous of kernels 5.4, 5.5, 5.8, 5.9 amd 4.14 (long time ago :))
You can also try to use perf top to determine the bottleneck.
also you can try to spread interrupts over 2 CPUs
i dont remember exact command, but you can google smp_affinity
also the commant shold seems like:
echo <cpu_mask> > /proc/irq/<irq_num>/smp_affinity i can find some details if needed
But i think you should start with perf top during the speed test. (installing perf may depend on your distro)
Generally if client opens a https webpage through r2,r2 do only forward packets…it does not need to encrypt/decrypt.so only the routing/nat-part should be done on cpu.
@marcin please do not use MBps as it is not clear if you mean Megabit or Megabyte per second
Retransmitts may came from missing flowcontrol on switch/other side. Show ethtool -S eth0/br0/lanX after getting these retransmitts
the pause-patch for gmac is not yet in 5.4 as it will be merged to 5.10…only in my 5.4…that can also cause the Problem (but afair you use my repo)…just for others not using my repo
I am still seeing some issues I think are related to what is going on in this thread. I updated to your 5.4-main branch (at 5.4.70) from about two days ago. I get the “eth0: transmit timed out” error in dmesg and the LANx interfaces stop working until the unit is restarted. There is a stack trace as well in dmesg but it does not look to contain anything helpful.
Its not easy to reproduce. Running things ipref do not seem to trigger it, you have to create some CPU load on the R2 as well. Running samba on the R2, sharing a volume on a large SATA disk I can trigger the failure by mounting the volume on a client and doinging something like “cat * > /dev/null” (on the client) in a directory with a few 100GB of files. Some times it will run for six or seven hours, sometimes it will die in 15min.
I realize this is perhaps not a helpful bug report, but I only just got a second unit to play with and cloned my images to it. Now that I have separate test-pig setup I can try things out and iterate faster.
Ok, that is what I thought. I appologize i have been away from this project for a bit. I tried to get caught up but the way this forum is organized its a bit challenging, especially with the 64 bit board topics all in the same place here.
So if that is the case, I am curious if anyone else is still seeing the timeout issue. I don’t know if others here are not hitting because they are not trying to use the r2 for any NAS functions. I defenitly have not hit the problem just routing/NATing thru it. I had 5.4.2 up for like 60 days with samba and nfs disabled.
Where there any other changes related to timeout that went in with phylink? I read some stuff about a gmac pause patch. If there is a timeout or wait maybe it needs to be more agressive/pessimistic?
The missing pause was added in my 5.4-main long time ago,but with 5.6/5.7 i stumbled again over it so i posted it to mainline,but it is not in 5.4 yet (about to be merged in 5.10)
patches look interesting but mtk_eth_soc.c looks to have changed a lot before 4.14 and 5.4.70. Going to take some effort here to see if this interrupt vs poll order isssue is even still present.