BPI-R4 as a bridge: LAN host seems to struggle

so, after experimenting with my BPI-R4, I took it into production. as an ap with switch. The uplink is connected to sfp2, but all ports are bridged via openwrt br-lan. After some GB of load, it seems like it’s struggling with one of the LAN ports (LAN1). Incoming connections via the SFP port and the PCIe AP adapter sometimes seem to struggle. Package loss gets very high. Rebooting the BPI works, but only for some time.

How do I diagnose this issue? I don’ t have anything useful here, aside from “rebooting helps, so it’s probably a software issue on the bpi-r4”

EDIT: dmesg says nothing at that point, also the system load is < 1.0, so it is completely unclear what happens and why

Tell people what software (which version of OpenWrt and created by whom) that your R4 is running. So that ppl will have a better idea and chime in if they know something about the issue (or no issue for them).

I doubt it’s relevant, but if you really want to know: openwrt on master branch, compiled by myself

you can turn on debug messages. see [BPI-R4] bad switch performance in upload - #12 by frank-w

maybe that helps to figure out whats going on behind the scenes

so far, I couldn’t find anything yet, but the idea is interesting. What also seems interesting (for now) is that the listed command

ethtool -A lan1 autoneg off rx off tx off

seems to fix things … it’s too early to say, but it seems more reliable now. Does that point to anything?

its a known problem, yeah… auto negotiation is somehow broken, but it should impact immediately, not after some GB. at least thats what i noticed.

autoneg is known to be broken for some SFP as 2g5 is not part of autoneg spec. but i wonder why you have this on a DSA switch port…how does ethtool look like when the issue shows up? normally autoneg is only used when a new link came up…once the link is established, autoneg is not done. so you need to have at least a link down/up.

afaik its not (only) related to sfp. i have an autonegotiation problem on lan1-lan4 and dont have 2g5. sfp works perfectly fine on my side.

Maybe there’s some power-saving on the client pc negotiating for a lower link speed, or EEE. I’ll have a look

Mhm,but autoneg handling for sfp is afaik in mac/pcs driver and for rj45-ports in dsa driver (mt7530). Strange that then 2 drivers may have the same issue.

Not sure both have EEE settings…saw a patchset from arinc adding it to dsa driver but imho not yet merged.

To be honest, I never truely believed it could be involved until I’m looking at it right now … It sounds like some tx ring buffer not functioning properly