Strange packet loss


#1

Just wanted to toss this out there in case anyone else experiences it. This is an intermittent issue that so far has been really hard to nail down because there are not really any logs or errors being generated that are helpful.

Once in a while after cold boot from power on network performance is just awful, on the LAN ports. I am running Frank’s 2.4.16 kernel on Slackware arm current for that is worth.

If you run tcpdump on a system communicating with the BananaPi-R2 you see tons and tons TCP re-transmission. I have verified all the other network hardware - by taking it out of the picture and cabling directly to the port on the Pi. Same behavior as with a switch between it and the rest of the network. Its like packets are just disappearing…

There are no errors reported by ifconfig on any interfaces, nor under runs. There is nothing in syslog to indicate anything is wrong. Nothing in dmesg, that differs from a working boot. All daemons start normally etc. The only thing logged is NFS errors if an NFS share is mounted you can see log events like “Sent X only acknowledged Y giving up.”

The problem can be detected immediately just run something like dmesg via SSH and if it takes time to scroll the text the issue is occurring, normally the buffer can be written out instantly. No performance issues can be observed on the physical console and CPU utilization is normal.

This has never occurred after a “reboot” only on cold start. A simple reboot has corrected the problem the times I have tried that without doing anything else. This has happened four times now in I think 25 power cycles. I only started making notes after the second time. I have left the device running for 9 days following a successful boot and there are no issues. So if it starts up okay its stable. Given the later, this isnt a huge issue for me but I wanted to put this out that in case anyone else sees it.


(Frank W.) #2

Are the retransmitted packets only from nfs? Or different type? which kernel do you use exactly and which image (i don’t know a slackware-image)? are there damaged packets incoming/outgoing?


(rainfall83) #3

Maybe apply ethtool -S to eth0, eth1, lan0 to lan3, and wancan show more interesting information about the strange behavior.
For example:
ethtool -S eth0
ethtool -S lan0
Below frank-w’s thread shows the example output of the command.


#4

Its not just NFS its everything as far as I can tell. I was see the loss on other deamons like ssh as well. As I stated it intermittent. I was running tcpdump on the remote side and not seeing any broken packets but I don’t recall if I did that without the switch in the picture or not. Its likely the switch simply did not forward broken frames, if they were present.

I am testing a DVB daemon here we have ported to ARM. Its long running and I am currently profiling it to make sure one its stable to two its not leaking memory. So I have not been restart the R2 lately. Unfortunately I have to test on the real hardware because I can’t pass the USB DVB sticks to qemu emulating ARM. I’ll look at this more when I am able to restart the R2 again.