BPI-R2 Pro stmmac CoE/DSA Checksum Offloading Issue [Expanded implications to other boards]

Hello @frank-w,

I upgraded my kernel to 6.6-r2pro to add in the patches of baylibre for crypto extension. Then I got 0.6Mbps download speed on both eth0, eth1, I plugged in a PCIe NIC and the result is the same.

I initially thought this has to do with the hw offload driver, so I removed the kernel modules and re-test, but still

So this points to something in the net stack of kernel 6.6-r2pro?

Thanks.

I guess problem is the other end of the network cable…i have not seen such issues with 6.6. btw. You can use 6.6-main tree and change board in build.conf to “bpi-r2pro”

And look in ethtool whats the advertised/applied speed/duplex setting. You can also connect to another device and run iperf3 over this before connecting to your internet modem/router.

If you compile the kernel without modules and you have the modules still on your card in modules dir with same kernel name i may load the module

I think your mt7530 dsa config is outdated, has changed to NET_DSA_MT7530_MDIO NET_DSA_MT7530_MMIO net: dsa: mt7530: introduce separate MDIO driver · torvalds/linux@cb675af · GitHub

When I run Frank’s ./build.sh config, then save, the NET_DSA_MT7530_MDIO and MMIO are automatically selected. I unchecked MMIO, but it does not help with my problems though.

Frank, I am still using your Ubuntu image from 5.15 → 6.1 (no problem) → 6.4-6.6 has problems. The issue is now isolated to the DSA user ports, and the onboard eth0 or external NICs are fine.

I could ping, that means the ARP is fine. But not SSH or iperf3. The wireshark reports bunch of TCP retransmissions. I apt removed iptables and nftables, although there is no kernel modules for these regardless. But still is problematic.

tcpdump on pi could see the packet, but not the application themselves I believe.

Issue further isolated to eth1 attached to DSA switch and TCP connections only. DHCP, MDNS, and ncat on UDP are all fine. Decided to check whether this is a TCP checksum offloading issue and IT IS!!!

ethtool -K eth1 rx 0 tx 0

This fixes the issue temporarily albeit at increased CPU usage.

Now, who should I contact to get this fixed? I know MT7531 driver changed between kernel 6.1 to 6.6, and whether the driver needs amending or some code on the RK chip side.

Thanks.

1 Like

I vaguely remember that there was something with offload,but i guess this cannot be fixed in linux

I’m not sure offloading was enabled on switch or soc side…but looks like it is not working well.

But with the 6.1 kernel, I don’t need to turn off the tx rx checksum offload, this points to some changes made in the Mediatek driver, or the stmmac driver?

https://lore.kernel.org/netdev/[email protected]/#r

It is what a timing! Similar issues are discovered on Dec 15, 2023. The DSA’s framers and developers have been already summoned…

"A bisection on the transmitting side’s kernel showed that the commit that caused the bug was:

6b2c6e4a938f (net: stmmac: propagate feature flags to vlan, 2023-04-17)

"

Nice,just try reverting the commit or patching out the 3first lines pointed