[BPI-R3] weird networking issue (and weirder "solution")

Is there anyone who feels responsible for fixing the problem? As far as we now know, it’s a firmware bug.

Since I can easily recreate the issue within seconds, I could always test new drivers/firmwarez if you so desire.

Maybe installing the latest OpenWrt SNAPSHOT firmware will solve your problem:

It actually doesn’t. This problem is also present on OpenWrt.

Any news on this? None of the newest driver and firmware releases did fix it

I have been testing networking extensively these last couple of weeks in the R3.

However, this was on the BPI-R3 running archlinuxarm, connected to a R3mini, a rk3588 and a R64 (all on archlinuxarm), in all sorts of setups.

Except for a some retries on iperf3, I do not experience any networking problem…

Did you try the solution as here:

https://superuser.com/questions/1355421/rsync-stopped-working-and-returns-rsync-error-unexplained-error-code-255-at

thank you for the tip. I changed ssh configuration on client and both servers as suggested.

Unfortunately the error still strikes within 20 - 30seconds after start of the reproduction test. I do not copy large files. The error mostly appears when both machines copy files of a size of about 100kB. I run both rsyncs with verbose flag so I can see the output flying by rapidly

setting

    ethtool -K fth7 tx off

“fixes” the issue as outlined above

1 Like

And how does iperf3 perform? Also with the -R option (reverse direction)?

Any difference with the different setting tx on/off

I run the iperf3 test according to my setup from above between desktop A (client) and desktop B (server)

  1. configure router to: /sbin/ethtool -K eth0 tx off

     desktopA# iperf3 -fM -c desktopB
     Connecting to host desktopB, port 5201
     [  5] local 192.168.140.196 port 60240 connected to 192.168.150.150 port 5201
     [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
     [  5]   0.00-1.00   sec  91.4 MBytes  91.3 MBytes/sec    0   2.90 MBytes
     [  5]   1.00-2.00   sec  91.2 MBytes  91.2 MBytes/sec    0   2.90 MBytes
     [  5]   2.00-3.00   sec  90.0 MBytes  90.0 MBytes/sec    0   2.90 MBytes
     [  5]   3.00-4.00   sec  91.2 MBytes  91.2 MBytes/sec    0   2.90 MBytes
     [  5]   4.00-5.00   sec  90.0 MBytes  90.0 MBytes/sec    0   2.90 MBytes
     [  5]   5.00-6.00   sec  90.0 MBytes  90.0 MBytes/sec    0   2.90 MBytes
     [  5]   6.00-7.00   sec  91.2 MBytes  91.2 MBytes/sec    0   2.90 MBytes
     [  5]   7.00-8.00   sec  87.5 MBytes  87.5 MBytes/sec    0   2.90 MBytes
     [  5]   8.00-9.00   sec  88.8 MBytes  88.8 MBytes/sec    0   2.90 MBytes
     [  5]   9.00-10.00  sec  87.5 MBytes  87.5 MBytes/sec    0   2.90 MBytes
     - - - - - - - - - - - - - - - - - - - - - - - - -
     [ ID] Interval           Transfer     Bitrate         Retr
     [  5]   0.00-10.00  sec   899 MBytes  89.9 MBytes/sec    0             sender
     [  5]   0.00-10.02  sec   898 MBytes  89.7 MBytes/sec                  receiver
    
     iperf Done.
    
     desktopA# iperf3 -R -fM -c desktopB
     Connecting to host desktopB, port 5201
     Reverse mode, remote host desktopB is sending
     [  5] local 192.168.140.196 port 36590 connected to 192.168.150.150 port 5201
     [ ID] Interval           Transfer     Bitrate
     [  5]   0.00-1.00   sec  63.7 MBytes  63.7 MBytes/sec
     [  5]   1.00-2.00   sec  64.4 MBytes  64.4 MBytes/sec
     [  5]   2.00-3.00   sec  63.6 MBytes  63.6 MBytes/sec
     [  5]   3.00-4.00   sec  64.2 MBytes  64.2 MBytes/sec
     [  5]   4.00-5.00   sec  63.8 MBytes  63.8 MBytes/sec
     [  5]   5.00-6.00   sec  64.3 MBytes  64.3 MBytes/sec
     [  5]   6.00-7.00   sec  64.6 MBytes  64.6 MBytes/sec
     [  5]   7.00-8.00   sec  64.3 MBytes  64.3 MBytes/sec
     [  5]   8.00-9.00   sec  63.4 MBytes  63.4 MBytes/sec
     [  5]   9.00-10.00  sec  64.3 MBytes  64.3 MBytes/sec
     - - - - - - - - - - - - - - - - - - - - - - - - -
     [ ID] Interval           Transfer     Bitrate         Retr
     [  5]   0.00-10.02  sec   644 MBytes  64.3 MBytes/sec    0             sender
     [  5]   0.00-10.00  sec   641 MBytes  64.1 MBytes/sec                  receiver
    
     iperf Done.
    
  2. configure router to: /sbin/ethtool -K eth0 tx on

     desktopA# iperf3 -fM -c desktopB
     Connecting to host desktopB, port 5201
     [  5] local 192.168.140.196 port 45992 connected to 192.168.150.150 port 5201
     [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
     [  5]   0.00-1.00   sec   115 MBytes   115 MBytes/sec    0   1.07 MBytes
     [  5]   1.00-2.00   sec   111 MBytes   111 MBytes/sec    0   1.61 MBytes
     [  5]   2.00-3.00   sec   111 MBytes   111 MBytes/sec    0   2.47 MBytes
     [  5]   3.00-4.00   sec   111 MBytes   111 MBytes/sec    0   2.70 MBytes
     [  5]   4.00-5.00   sec   112 MBytes   113 MBytes/sec    0   2.76 MBytes
     [  5]   5.00-6.00   sec   110 MBytes   110 MBytes/sec    0   2.84 MBytes
     [  5]   6.00-7.00   sec   108 MBytes   108 MBytes/sec    0   2.84 MBytes
     [  5]   7.00-8.00   sec   109 MBytes   109 MBytes/sec    0   2.84 MBytes
     [  5]   8.00-9.00   sec   110 MBytes   110 MBytes/sec    0   2.84 MBytes
     [  5]   9.00-10.00  sec   111 MBytes   111 MBytes/sec    0   2.84 MBytes
     - - - - - - - - - - - - - - - - - - - - - - - - -
     [ ID] Interval           Transfer     Bitrate         Retr
     [  5]   0.00-10.00  sec  1.08 GBytes   111 MBytes/sec    0             sender
     [  5]   0.00-10.01  sec  1.08 GBytes   111 MBytes/sec                  receiver
    
     iperf Done.
    
     desktopA# iperf3 -R -fM -c desktopB
     Connecting to host desktopB, port 5201
     Reverse mode, remote host desktopB is sending
     [  5] local 192.168.140.196 port 58096 connected to 192.168.150.150 port 5201
     [ ID] Interval           Transfer     Bitrate
     [  5]   0.00-1.00   sec  73.9 MBytes  73.9 MBytes/sec
     [  5]   1.00-2.00   sec  74.5 MBytes  74.5 MBytes/sec
     [  5]   2.00-3.00   sec  74.0 MBytes  74.0 MBytes/sec
     [  5]   3.00-4.00   sec  73.6 MBytes  73.6 MBytes/sec
     [  5]   4.00-5.00   sec  75.5 MBytes  75.5 MBytes/sec
     [  5]   5.00-6.00   sec  74.3 MBytes  74.3 MBytes/sec
     [  5]   6.00-7.00   sec  73.2 MBytes  73.2 MBytes/sec
     [  5]   7.00-8.00   sec  74.1 MBytes  74.1 MBytes/sec
     [  5]   8.00-9.00   sec  73.9 MBytes  73.9 MBytes/sec
     [  5]   9.00-10.00  sec  73.7 MBytes  73.7 MBytes/sec
     - - - - - - - - - - - - - - - - - - - - - - - - -
     [ ID] Interval           Transfer     Bitrate         Retr
     [  5]   0.00-10.01  sec   744 MBytes  74.2 MBytes/sec    0             sender
     [  5]   0.00-10.00  sec   741 MBytes  74.1 MBytes/sec                  receiver
    
     iperf Done.
    
1 Like

Your connection looks good, so maybe slowing the connection a bit with /sbin/ethtool -K eth0 tx off, a fault in rsync will not occur…

Are all versions of rsync exactly the same? Try googling for related rsync issues…

I tried with internal bandwidth option of ‘rsync’ to slow down the connection, but with no success.

Another hardware (router with same Debian version and firewall framework but running on a AMD based PC Engines APU2 B4) had no issues with rsync ever though.

Does it help running rsync from a screen session?

ssh [email protected]
screen -R
rsync ....

Can leave the session with ctrl-a + d , and resume

screen -R
1 Like

running the rsyncs from new sessions unfortunately has no effect on the issue

1 Like

Are you sure flow offloading is switched off?

Can check with

conntrack -L | grep OFFLOAD

On the bpi board during the file transfer (no second later)

1 Like

a line with OFFLOAD is never shown by conntrack -L But setting ‘tx off’ slows connection speed by about 20% instantly. And gets recovered to 100% if resetting to ‘tx on’. So it must have at least ‘some’ effect

1 Like

The bpi-r3 may behave differently with checksum offloading, when the interfaces are part of a bridge.

You could try bridging lan0 and lan2, or if you need ipforward, then put them in separate bridges.

Also, did you check mtu on all interfaces involved in the rsync transfer?

1 Like

You can try disabling these offloads, maybe one of them will solve your problem without reducing the connection speed like ‘tx off’ does:

  1. Try disabling Generic Receive Offload (GRO) and then test:

    ethtool -K eth0 gro off
    
  2. Try disabling TCP Segmentation Offload (TSO) and then test:

    ethtool -K eth0 tso off
    

@dangowrt recently solved an issue related to TX queues, maybe installing the latest OpenWrt SNAPSHOT firmware will solve your problem:

Source:

Click

thank you for all your advice.

I tried all permutations of setting ‘tx’, ‘tso’, ‘gro’ to ‘on’ or ‘off’.

Conclusion: only those with ‘ethtool -K eth0 tx off’ finally ‘fix’ the issue at the cost of 20% slower connection speed

But there is one interesting combination:

ethtool -K eth0 tso off
ethtool -K eth0 gro on
ethtool -K eth0 tx on

In that case running a single rsync between desktop A (Gbit) and Raspberry Pi (100Mbit) is sufficient to quickly reproduce the problem.

Under same conditions another test with only a single rsync between desktop A (Gbit) and desktop B (Gbit) runs with no error though.

This greatly simplifies the test environment. Possibly making it much easier for others to reproduce the issue

my minimal test setup to reproduce the issue:

    -----------------------------------------
    desktop A (debian 11.11) (Gbit interface)
    -----------------------------------------
                |
     -------------------------
      lan0 (192.168.140.0/24)
        BPI-R3 (debian 12.7)
      lan2 (192.168.150.0/24)
     -------------------------
                |
           -------------
           Gbit switch B
           -------------
                |
----------------------------------------------
Raspberry Pi (debian 11.9) (100Mbit interface)
----------------------------------------------

and

ethtool -K eth0 tso off
ethtool -K eth0 gro on
ethtool -K eth0 tx on

running

desktopA# ssh raspberrypi rm -frv /tmp/YYYYY; rsync --delete -vaX --numeric-ids source_dir raspberrypi:/tmp/YYYYY

reproduces the issue very quickly.

It appears to be important to put an Gbit-switch between the 100Mbit machine and the BPI-R3

1 Like

The information about 100mbit devices causing errors (and the tx-timeout trace before daniels pr) is not new,but behind a gbit switch…i guess it is something like pause handling

Perhaps more info can be found with:

echo "file drivers/net/ethernet/* +p" > /sys/kernel/debug/dynamic_debug/control

So are these switches to blame?