Is there anyone who feels responsible for fixing the problem? As far as we now know, it’s a firmware bug.
Since I can easily recreate the issue within seconds, I could always test new drivers/firmwarez if you so desire.
Is there anyone who feels responsible for fixing the problem? As far as we now know, it’s a firmware bug.
Since I can easily recreate the issue within seconds, I could always test new drivers/firmwarez if you so desire.
Maybe installing the latest OpenWrt SNAPSHOT firmware will solve your problem:
It actually doesn’t. This problem is also present on OpenWrt.
Any news on this? None of the newest driver and firmware releases did fix it
I have been testing networking extensively these last couple of weeks in the R3.
However, this was on the BPI-R3 running archlinuxarm, connected to a R3mini, a rk3588 and a R64 (all on archlinuxarm), in all sorts of setups.
Except for a some retries on iperf3, I do not experience any networking problem…
Did you try the solution as here:
thank you for the tip. I changed ssh configuration on client and both servers as suggested.
Unfortunately the error still strikes within 20 - 30seconds after start of the reproduction test. I do not copy large files. The error mostly appears when both machines copy files of a size of about 100kB. I run both rsyncs with verbose flag so I can see the output flying by rapidly
setting
ethtool -K fth7 tx off
“fixes” the issue as outlined above
And how does iperf3 perform? Also with the -R option (reverse direction)?
Any difference with the different setting tx on/off
I run the iperf3 test according to my setup from above between desktop A (client) and desktop B (server)
configure router to: /sbin/ethtool -K eth0 tx off
desktopA# iperf3 -fM -c desktopB
Connecting to host desktopB, port 5201
[ 5] local 192.168.140.196 port 60240 connected to 192.168.150.150 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 91.4 MBytes 91.3 MBytes/sec 0 2.90 MBytes
[ 5] 1.00-2.00 sec 91.2 MBytes 91.2 MBytes/sec 0 2.90 MBytes
[ 5] 2.00-3.00 sec 90.0 MBytes 90.0 MBytes/sec 0 2.90 MBytes
[ 5] 3.00-4.00 sec 91.2 MBytes 91.2 MBytes/sec 0 2.90 MBytes
[ 5] 4.00-5.00 sec 90.0 MBytes 90.0 MBytes/sec 0 2.90 MBytes
[ 5] 5.00-6.00 sec 90.0 MBytes 90.0 MBytes/sec 0 2.90 MBytes
[ 5] 6.00-7.00 sec 91.2 MBytes 91.2 MBytes/sec 0 2.90 MBytes
[ 5] 7.00-8.00 sec 87.5 MBytes 87.5 MBytes/sec 0 2.90 MBytes
[ 5] 8.00-9.00 sec 88.8 MBytes 88.8 MBytes/sec 0 2.90 MBytes
[ 5] 9.00-10.00 sec 87.5 MBytes 87.5 MBytes/sec 0 2.90 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 899 MBytes 89.9 MBytes/sec 0 sender
[ 5] 0.00-10.02 sec 898 MBytes 89.7 MBytes/sec receiver
iperf Done.
desktopA# iperf3 -R -fM -c desktopB
Connecting to host desktopB, port 5201
Reverse mode, remote host desktopB is sending
[ 5] local 192.168.140.196 port 36590 connected to 192.168.150.150 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 63.7 MBytes 63.7 MBytes/sec
[ 5] 1.00-2.00 sec 64.4 MBytes 64.4 MBytes/sec
[ 5] 2.00-3.00 sec 63.6 MBytes 63.6 MBytes/sec
[ 5] 3.00-4.00 sec 64.2 MBytes 64.2 MBytes/sec
[ 5] 4.00-5.00 sec 63.8 MBytes 63.8 MBytes/sec
[ 5] 5.00-6.00 sec 64.3 MBytes 64.3 MBytes/sec
[ 5] 6.00-7.00 sec 64.6 MBytes 64.6 MBytes/sec
[ 5] 7.00-8.00 sec 64.3 MBytes 64.3 MBytes/sec
[ 5] 8.00-9.00 sec 63.4 MBytes 63.4 MBytes/sec
[ 5] 9.00-10.00 sec 64.3 MBytes 64.3 MBytes/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.02 sec 644 MBytes 64.3 MBytes/sec 0 sender
[ 5] 0.00-10.00 sec 641 MBytes 64.1 MBytes/sec receiver
iperf Done.
configure router to: /sbin/ethtool -K eth0 tx on
desktopA# iperf3 -fM -c desktopB
Connecting to host desktopB, port 5201
[ 5] local 192.168.140.196 port 45992 connected to 192.168.150.150 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 115 MBytes 115 MBytes/sec 0 1.07 MBytes
[ 5] 1.00-2.00 sec 111 MBytes 111 MBytes/sec 0 1.61 MBytes
[ 5] 2.00-3.00 sec 111 MBytes 111 MBytes/sec 0 2.47 MBytes
[ 5] 3.00-4.00 sec 111 MBytes 111 MBytes/sec 0 2.70 MBytes
[ 5] 4.00-5.00 sec 112 MBytes 113 MBytes/sec 0 2.76 MBytes
[ 5] 5.00-6.00 sec 110 MBytes 110 MBytes/sec 0 2.84 MBytes
[ 5] 6.00-7.00 sec 108 MBytes 108 MBytes/sec 0 2.84 MBytes
[ 5] 7.00-8.00 sec 109 MBytes 109 MBytes/sec 0 2.84 MBytes
[ 5] 8.00-9.00 sec 110 MBytes 110 MBytes/sec 0 2.84 MBytes
[ 5] 9.00-10.00 sec 111 MBytes 111 MBytes/sec 0 2.84 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.08 GBytes 111 MBytes/sec 0 sender
[ 5] 0.00-10.01 sec 1.08 GBytes 111 MBytes/sec receiver
iperf Done.
desktopA# iperf3 -R -fM -c desktopB
Connecting to host desktopB, port 5201
Reverse mode, remote host desktopB is sending
[ 5] local 192.168.140.196 port 58096 connected to 192.168.150.150 port 5201
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 73.9 MBytes 73.9 MBytes/sec
[ 5] 1.00-2.00 sec 74.5 MBytes 74.5 MBytes/sec
[ 5] 2.00-3.00 sec 74.0 MBytes 74.0 MBytes/sec
[ 5] 3.00-4.00 sec 73.6 MBytes 73.6 MBytes/sec
[ 5] 4.00-5.00 sec 75.5 MBytes 75.5 MBytes/sec
[ 5] 5.00-6.00 sec 74.3 MBytes 74.3 MBytes/sec
[ 5] 6.00-7.00 sec 73.2 MBytes 73.2 MBytes/sec
[ 5] 7.00-8.00 sec 74.1 MBytes 74.1 MBytes/sec
[ 5] 8.00-9.00 sec 73.9 MBytes 73.9 MBytes/sec
[ 5] 9.00-10.00 sec 73.7 MBytes 73.7 MBytes/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.01 sec 744 MBytes 74.2 MBytes/sec 0 sender
[ 5] 0.00-10.00 sec 741 MBytes 74.1 MBytes/sec receiver
iperf Done.
Your connection looks good, so maybe slowing the connection a bit with /sbin/ethtool -K eth0 tx off
, a fault in rsync will not occur…
Are all versions of rsync exactly the same? Try googling for related rsync issues…
I tried with internal bandwidth option of ‘rsync’ to slow down the connection, but with no success.
Another hardware (router with same Debian version and firewall framework but running on a AMD based PC Engines APU2 B4) had no issues with rsync ever though.
Does it help running rsync from a screen session?
ssh [email protected]
screen -R
rsync ....
Can leave the session with ctrl-a + d , and resume
screen -R
running the rsyncs from new sessions unfortunately has no effect on the issue
Are you sure flow offloading is switched off?
Can check with
conntrack -L | grep OFFLOAD
On the bpi board during the file transfer (no second later)
a line with OFFLOAD is never shown by conntrack -L But setting ‘tx off’ slows connection speed by about 20% instantly. And gets recovered to 100% if resetting to ‘tx on’. So it must have at least ‘some’ effect
The bpi-r3 may behave differently with checksum offloading, when the interfaces are part of a bridge.
You could try bridging lan0 and lan2, or if you need ipforward, then put them in separate bridges.
Also, did you check mtu on all interfaces involved in the rsync transfer?
You can try disabling these offloads, maybe one of them will solve your problem without reducing the connection speed like ‘tx off’ does:
Try disabling Generic Receive Offload (GRO) and then test:
ethtool -K eth0 gro off
Try disabling TCP Segmentation Offload (TSO) and then test:
ethtool -K eth0 tso off
@dangowrt recently solved an issue related to TX queues, maybe installing the latest OpenWrt SNAPSHOT firmware will solve your problem:
Source:
thank you for all your advice.
I tried all permutations of setting ‘tx’, ‘tso’, ‘gro’ to ‘on’ or ‘off’.
Conclusion: only those with ‘ethtool -K eth0 tx off’ finally ‘fix’ the issue at the cost of 20% slower connection speed
But there is one interesting combination:
ethtool -K eth0 tso off
ethtool -K eth0 gro on
ethtool -K eth0 tx on
In that case running a single rsync between desktop A (Gbit) and Raspberry Pi (100Mbit) is sufficient to quickly reproduce the problem.
Under same conditions another test with only a single rsync between desktop A (Gbit) and desktop B (Gbit) runs with no error though.
This greatly simplifies the test environment. Possibly making it much easier for others to reproduce the issue
my minimal test setup to reproduce the issue:
-----------------------------------------
desktop A (debian 11.11) (Gbit interface)
-----------------------------------------
|
-------------------------
lan0 (192.168.140.0/24)
BPI-R3 (debian 12.7)
lan2 (192.168.150.0/24)
-------------------------
|
-------------
Gbit switch B
-------------
|
----------------------------------------------
Raspberry Pi (debian 11.9) (100Mbit interface)
----------------------------------------------
and
ethtool -K eth0 tso off
ethtool -K eth0 gro on
ethtool -K eth0 tx on
running
desktopA# ssh raspberrypi rm -frv /tmp/YYYYY; rsync --delete -vaX --numeric-ids source_dir raspberrypi:/tmp/YYYYY
reproduces the issue very quickly.
It appears to be important to put an Gbit-switch between the 100Mbit machine and the BPI-R3
The information about 100mbit devices causing errors (and the tx-timeout trace before daniels pr) is not new,but behind a gbit switch…i guess it is something like pause handling
Perhaps more info can be found with:
echo "file drivers/net/ethernet/* +p" > /sys/kernel/debug/dynamic_debug/control
So are these switches to blame?