[BPI-R4] NIC Goes Down At High Load

I’m running into a problem where on ‘high’ network loads (e.g. over 500Mb/s) sustained for more than ~10s the network interfaces on my R4 go down.

The logs/dmesg don’t show very much information, just saying that the link(s) go down and come back up straight away. This has been happening since I got the device.

Tue Sep 17 04:29:33 2024 kern.info kernel: [ 3123.556862] mt7530-mmio 15020000.switch lan3: Link is Down
Tue Sep 17 04:29:34 2024 kern.info kernel: [ 3124.066579] br-lan: port 5(lan3) entered disabled state
Tue Sep 17 04:29:34 2024 daemon.notice netifd: Network device 'lan3' link is down
Tue Sep 17 04:29:35 2024 kern.info kernel: [ 3125.972606] mt7530-mmio 15020000.switch lan3: Link is Up - 1Gbps/Full - flow control off
Tue Sep 17 04:29:35 2024 kern.info kernel: [ 3125.973199] br-lan: port 5(lan3) entered blocking state
Tue Sep 17 04:29:35 2024 daemon.notice netifd: Network device 'lan3' link is up
Tue Sep 17 04:29:35 2024 kern.info kernel: [ 3125.985924] br-lan: port 5(lan3) entered forwarding state

Only thing I noticed was a massive spike in CPU context switches, going from under 1k/s to ~100k/s

I am running OpenWrt SNAPSHOT (r27426-232cc239b8), kernel 6.6.51.

Not sure if this is a configuration, software, or hardware problem.

Has anybody else had similar issues? Would appreciate any help or troubleshooting tips.

I’m using the SFP connectors and did not see this behavior - but have not checked the 1Gbit ports yet. Just to make sure:

  • It is no overheating problem? So you have a fan/heatsink installed?
  • What kind of power supply do you use?

Thanks for the ideas. The heatsink and fan are installed, monitored the CPU temperature and it is ~50C so there doesn’t seem to be an issue with that, not sure about the other chips though (and nothing else shows up with sensors). I’ll stick some extra/better heatsinks on and see if that helps.

I’m using a decent USB-C power supply rated for 80W, the BPI-R4 is the only device plugged into it. In case it was an issue with the power supply I put a multimeter on the contacts on the board and it’s a consistent ~20V under load, at least as far as my crappy multimeter can tell.

Also updated to the latest snapshot (r28146-52b6c92479 / LuCI Master 24.330.73959~fefb9ac) and played around with enabling/disabling irqbalance, sqm, software/hardware offloading, and packet steering, but nothing helped…

TBH the only thing that frequently saturates my network is my server, I was already considering using the 10G SFP for it, maybe moving to that will help solve the issue or narrow down the cause.