[BPI-R3] weird networking issue (and weirder "solution")

I guess it is different issue…there were some reports with 100mbit devices for r3/r4

E.g. for r4: [BPI-R4] bad switch performance in upload

I tested with manual setting speed which worked,but it could be other issue.

When you say traffic between lan0 and lan2 is forwarded i expect they are in different lan segments (different subnet),right?

Have you tried disabling autoneg and set speed manually?

Maybe it is an autoneg issue.

Similar to this: BPI-R4: 100Mbit broken

both ‘lan0’ and ‘lan2’ first are connected to simple 1Gbit switches. So basically the BPI-R3 sees only 1Gbit ports.

‘traffic between lan0 and lan2’ means: the BPI-R3 router forwards packets between ‘lan0’ and ‘lan2’ that are located on different subnets (aka 192.168.140.0/24 and 192.168.150.0/24)

there are no ‘dmesg’ messages at all on any machine involved at the time of error. But I will try to disable autoneg.

To ‘turn off tx-checksumming’ would be much more interesting IMHO. It appears not to work on the current kernel/drivers (plse see above). Do you know what to do to turn it off?

This is the same error I was having. You need to run it on the switch interface, i.e. part after @, like @eth0. Really curious to see if it fixes the issue for you too.

Hmm. I can describe the setup that was failing the most ‘reliably’ for me. Connect two BPI-R3 with an ethernet cable. Use a PC/device to connect to one of the BPI-R3s over wlan. Do some intensive TCP data transfers from the PC with the BPI-R3 you are not directly connected to. I am using the unison tool to backup my files, which uses rsync over ssh. This would fail to finish 99% of the time, with the error @sparkie printed above.

it’s phantastic!

I can reproduce my issue within seconds in my LAN environment. So I tried to toggle between ‘tx on’ and ‘tx off’ on the interface originally named ‘eth0’ (thanks @meehien for hinting me). The actual test runs between interfaces originally named ‘lan0’ and ‘lan2’ though.

My issue does no longer appear with ‘tx off’. But instantly reappears after setting to ‘tx on’

after lots of experimenting this finally is the easiest setup I found to reproduce the issue within seconds:

hardware setup:

  • desktop A (Gbit) connected to BPI-R3 (lan0) via Gbit switch A (in 192.168.140.0/24)
  • desktop B (Gbit) connected to BPI-R3 (lan2) via Gbit switch B (in 192.168.150.0/24)
  • RaspberryPi (100Mbit) connected to BPI-R3 (lan2) via Gbit switch B (in 192.168.150.0/24)

some illustrating ASCII art:

    ------------------------
    desktop A (debian 11.11)
    ------------------------
                |
           -------------
           Gbit switch A
           -------------
                |
     -------------------------
      lan0 (192.168.140.0/24)
        BPI-R3 (debian 12.7)
      lan2 (192.168.150.0/24)
     -------------------------
                |
           -------------
           Gbit switch B
           -------------
                |        \
                |         \
-------------------------- \
Raspberry Pi (debian 11.9)  \
--------------------------   \
                              \
                             -----------------------
                             desktop B (debian 12.5)
                             -----------------------

software setup:

basically only 2 concurrent ‘rsyncs’ are needed copying some files around. All commands are started from desktop A

desktopA# ssh raspberrypi rm -frv /tmp/YYYYY; rsync --delete -vaX --numeric-ids source_dir raspberrypi:/tmp/YYYYY
desktopA# ssh desktopB rm -frv /tmp/YYYYY; rsync --delete -vaX --numeric-ids source_dir desktopB:/tmp/YYYYY

(started in different shells concurrently)

error symptoms:

in case of error the ‘rsync’ running between ‘desktop A’ and ‘desktop B’ breaks with:

client_loop: send disconnect: Broken pipe
rsync: [sender] write error: Broken pipe (32)
rsync error: unexplained error (code 255) at io.c(823) [sender=3.2.3]

the other ‘rsync’ running between ‘desktop A’ and ‘raspberrypi’ is mostly not affected

successful workaround:

ethtool -K eth0 tx off

thanks to @meehien for providing this :slight_smile:

caveats:

setting ‘tx’ to ‘off’ impacts network performance.

with ‘tx on’ (the default) ‘iftop’ utility shows stunning ‘117MB’ when running a simple ‘netcat’ between ‘desktop A’ and ‘desktop B’. Excellent for a truly routing/firewalling device.

alas with ‘tx off’ (workaround) ‘iftop’ utility shows no more than about ‘94MB’ for the same

desktopA# netcat desktopB 9000 < /dev/zero
desktopB# netcat -l 9000 > /dev/zero

without workaround:

desktopA# iftop -B

238MB          477MB         715MB          954MB    1.16GB
└─────────────┴──────────────┴─────────────┴──────────────┴──────────────
desktopA              <=> desktopB                 117MB   117MB   116MB

with workaround:

desktopA# iftop -B

238MB          477MB         715MB          954MB    1.16GB
└─────────────┴──────────────┴─────────────┴──────────────┴──────────────
desktopA              <=> desktopB                94.6MB  94.2MB  93.0MB
1 Like

Is there anyone who feels responsible for fixing the problem? As far as we now know, it’s a firmware bug.

Since I can easily recreate the issue within seconds, I could always test new drivers/firmwarez if you so desire.

Maybe installing the latest OpenWrt SNAPSHOT firmware will solve your problem:

It actually doesn’t. This problem is also present on OpenWrt.

Any news on this? None of the newest driver and firmware releases did fix it

I have been testing networking extensively these last couple of weeks in the R3.

However, this was on the BPI-R3 running archlinuxarm, connected to a R3mini, a rk3588 and a R64 (all on archlinuxarm), in all sorts of setups.

Except for a some retries on iperf3, I do not experience any networking problem…

Did you try the solution as here:

https://superuser.com/questions/1355421/rsync-stopped-working-and-returns-rsync-error-unexplained-error-code-255-at

thank you for the tip. I changed ssh configuration on client and both servers as suggested.

Unfortunately the error still strikes within 20 - 30seconds after start of the reproduction test. I do not copy large files. The error mostly appears when both machines copy files of a size of about 100kB. I run both rsyncs with verbose flag so I can see the output flying by rapidly

setting

    ethtool -K fth7 tx off

“fixes” the issue as outlined above

1 Like

And how does iperf3 perform? Also with the -R option (reverse direction)?

Any difference with the different setting tx on/off

I run the iperf3 test according to my setup from above between desktop A (client) and desktop B (server)

  1. configure router to: /sbin/ethtool -K eth0 tx off

     desktopA# iperf3 -fM -c desktopB
     Connecting to host desktopB, port 5201
     [  5] local 192.168.140.196 port 60240 connected to 192.168.150.150 port 5201
     [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
     [  5]   0.00-1.00   sec  91.4 MBytes  91.3 MBytes/sec    0   2.90 MBytes
     [  5]   1.00-2.00   sec  91.2 MBytes  91.2 MBytes/sec    0   2.90 MBytes
     [  5]   2.00-3.00   sec  90.0 MBytes  90.0 MBytes/sec    0   2.90 MBytes
     [  5]   3.00-4.00   sec  91.2 MBytes  91.2 MBytes/sec    0   2.90 MBytes
     [  5]   4.00-5.00   sec  90.0 MBytes  90.0 MBytes/sec    0   2.90 MBytes
     [  5]   5.00-6.00   sec  90.0 MBytes  90.0 MBytes/sec    0   2.90 MBytes
     [  5]   6.00-7.00   sec  91.2 MBytes  91.2 MBytes/sec    0   2.90 MBytes
     [  5]   7.00-8.00   sec  87.5 MBytes  87.5 MBytes/sec    0   2.90 MBytes
     [  5]   8.00-9.00   sec  88.8 MBytes  88.8 MBytes/sec    0   2.90 MBytes
     [  5]   9.00-10.00  sec  87.5 MBytes  87.5 MBytes/sec    0   2.90 MBytes
     - - - - - - - - - - - - - - - - - - - - - - - - -
     [ ID] Interval           Transfer     Bitrate         Retr
     [  5]   0.00-10.00  sec   899 MBytes  89.9 MBytes/sec    0             sender
     [  5]   0.00-10.02  sec   898 MBytes  89.7 MBytes/sec                  receiver
    
     iperf Done.
    
     desktopA# iperf3 -R -fM -c desktopB
     Connecting to host desktopB, port 5201
     Reverse mode, remote host desktopB is sending
     [  5] local 192.168.140.196 port 36590 connected to 192.168.150.150 port 5201
     [ ID] Interval           Transfer     Bitrate
     [  5]   0.00-1.00   sec  63.7 MBytes  63.7 MBytes/sec
     [  5]   1.00-2.00   sec  64.4 MBytes  64.4 MBytes/sec
     [  5]   2.00-3.00   sec  63.6 MBytes  63.6 MBytes/sec
     [  5]   3.00-4.00   sec  64.2 MBytes  64.2 MBytes/sec
     [  5]   4.00-5.00   sec  63.8 MBytes  63.8 MBytes/sec
     [  5]   5.00-6.00   sec  64.3 MBytes  64.3 MBytes/sec
     [  5]   6.00-7.00   sec  64.6 MBytes  64.6 MBytes/sec
     [  5]   7.00-8.00   sec  64.3 MBytes  64.3 MBytes/sec
     [  5]   8.00-9.00   sec  63.4 MBytes  63.4 MBytes/sec
     [  5]   9.00-10.00  sec  64.3 MBytes  64.3 MBytes/sec
     - - - - - - - - - - - - - - - - - - - - - - - - -
     [ ID] Interval           Transfer     Bitrate         Retr
     [  5]   0.00-10.02  sec   644 MBytes  64.3 MBytes/sec    0             sender
     [  5]   0.00-10.00  sec   641 MBytes  64.1 MBytes/sec                  receiver
    
     iperf Done.
    
  2. configure router to: /sbin/ethtool -K eth0 tx on

     desktopA# iperf3 -fM -c desktopB
     Connecting to host desktopB, port 5201
     [  5] local 192.168.140.196 port 45992 connected to 192.168.150.150 port 5201
     [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
     [  5]   0.00-1.00   sec   115 MBytes   115 MBytes/sec    0   1.07 MBytes
     [  5]   1.00-2.00   sec   111 MBytes   111 MBytes/sec    0   1.61 MBytes
     [  5]   2.00-3.00   sec   111 MBytes   111 MBytes/sec    0   2.47 MBytes
     [  5]   3.00-4.00   sec   111 MBytes   111 MBytes/sec    0   2.70 MBytes
     [  5]   4.00-5.00   sec   112 MBytes   113 MBytes/sec    0   2.76 MBytes
     [  5]   5.00-6.00   sec   110 MBytes   110 MBytes/sec    0   2.84 MBytes
     [  5]   6.00-7.00   sec   108 MBytes   108 MBytes/sec    0   2.84 MBytes
     [  5]   7.00-8.00   sec   109 MBytes   109 MBytes/sec    0   2.84 MBytes
     [  5]   8.00-9.00   sec   110 MBytes   110 MBytes/sec    0   2.84 MBytes
     [  5]   9.00-10.00  sec   111 MBytes   111 MBytes/sec    0   2.84 MBytes
     - - - - - - - - - - - - - - - - - - - - - - - - -
     [ ID] Interval           Transfer     Bitrate         Retr
     [  5]   0.00-10.00  sec  1.08 GBytes   111 MBytes/sec    0             sender
     [  5]   0.00-10.01  sec  1.08 GBytes   111 MBytes/sec                  receiver
    
     iperf Done.
    
     desktopA# iperf3 -R -fM -c desktopB
     Connecting to host desktopB, port 5201
     Reverse mode, remote host desktopB is sending
     [  5] local 192.168.140.196 port 58096 connected to 192.168.150.150 port 5201
     [ ID] Interval           Transfer     Bitrate
     [  5]   0.00-1.00   sec  73.9 MBytes  73.9 MBytes/sec
     [  5]   1.00-2.00   sec  74.5 MBytes  74.5 MBytes/sec
     [  5]   2.00-3.00   sec  74.0 MBytes  74.0 MBytes/sec
     [  5]   3.00-4.00   sec  73.6 MBytes  73.6 MBytes/sec
     [  5]   4.00-5.00   sec  75.5 MBytes  75.5 MBytes/sec
     [  5]   5.00-6.00   sec  74.3 MBytes  74.3 MBytes/sec
     [  5]   6.00-7.00   sec  73.2 MBytes  73.2 MBytes/sec
     [  5]   7.00-8.00   sec  74.1 MBytes  74.1 MBytes/sec
     [  5]   8.00-9.00   sec  73.9 MBytes  73.9 MBytes/sec
     [  5]   9.00-10.00  sec  73.7 MBytes  73.7 MBytes/sec
     - - - - - - - - - - - - - - - - - - - - - - - - -
     [ ID] Interval           Transfer     Bitrate         Retr
     [  5]   0.00-10.01  sec   744 MBytes  74.2 MBytes/sec    0             sender
     [  5]   0.00-10.00  sec   741 MBytes  74.1 MBytes/sec                  receiver
    
     iperf Done.
    
1 Like

Your connection looks good, so maybe slowing the connection a bit with /sbin/ethtool -K eth0 tx off, a fault in rsync will not occur…

Are all versions of rsync exactly the same? Try googling for related rsync issues…

I tried with internal bandwidth option of ‘rsync’ to slow down the connection, but with no success.

Another hardware (router with same Debian version and firewall framework but running on a AMD based PC Engines APU2 B4) had no issues with rsync ever though.

Does it help running rsync from a screen session?

ssh [email protected]
screen -R
rsync ....

Can leave the session with ctrl-a + d , and resume

screen -R
1 Like

running the rsyncs from new sessions unfortunately has no effect on the issue

1 Like

Are you sure flow offloading is switched off?

Can check with

conntrack -L | grep OFFLOAD

On the bpi board during the file transfer (no second later)

1 Like