Strange packet loss

This lines came from stock-kernel not from any patch applied by me

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/tree/drivers/net/ethernet/mediatek/mtk_eth_soc.c?h=linux-4.14.y#n1954

Lines are added by a patch nearly 1.5 years ago: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/commit/drivers/net/ethernet/mediatek/mtk_eth_soc.c?h=linux-4.14.y&id=656e705243fd0c2864b89634ea16ed444ef64dc6

the link quality between mt7623 gmac2 and mt7530 port5 is not good and cause packet drop by gmac2 or mt7530 port5 because of crc error. After changing driving to 0x600, the issue was gone. I will suggest you run iperf server on DUT and test UDP dut->wan and lan->dut test to check peak throughput can reach 9xx Mbps first. If both of them are ok, I think uplink low throughput issue was caused by kernel netfilter configuration, thanks.

Can you provide a patch based on my repo (4.14-main)?bt

Btw. @gwalton first tried with 4.16 (right?)…here is no second gmac (only cpu-port0 and port 6 of mt7530). So it seems its another problem

That is correct - I first encountered this with 4.16

So 2nd gmac,hwnat and hwqos are not the problem. Have you tried with 4.18,maybe problem is fixed there and need to be backported.which image do you use? Is problem persistent on other image? If using ubuntu 18.4 try my debian-image

Can you explain your testcase?

  • which port have which device connected
  • which network-settings on r2
  • commands on clients to generate traffic

I can confirm a little similar issue on frank’s 4.14 kernel.
So far what i observed that I can’t use “git clone” - always is get error:

fatal: unable to access ‘…’: Empty reply from server

Also cloudflared-dns daemon is not workikg - it can’t get headers or somthing from 1.1.1.1 and 1.0.0.1

Regards.

Can you try newer kernel like 4.19 or 5.3 (maybe phylink)?

Unfortunately with kernel 4.19 I can’t get IP address from my ISP.
Network configuration stays same, MAC addresses stay same but I have no response from dhclient on WAN side. I can’t test 4.19 kernel.
I will try 5.3-phylink-2.5 and let you know.

Can you try manually on 4.19? Put gmac2 up,put wan up and run dhclient

Also 4.19-without2gmac can be tested to know if problem is mainline or caused by my patches

any info in dmesg/syslog?

how do you put gmacs up? i guess it’s a timing-problem (dhclient running before gmac is up)

So I tested with kernel 4.19. I must set VLAN because my ISP is offering also IPTV and internet only is on VLAN id 300.

4.19.72-bpi-r2-main

ip link set dev eth1 up:
		ip a:
		eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
		link/ether 02:03:03:03:03:03 brd ff:ff:ff:ff:ff:ff
		inet6 fe80::3:3ff:fe03:303/64 scope link 
		valid_lft forever preferred_lft forever

ip link add link eth1 name wan.300 type vlan id 300:
		journalctl:
		systemd-udevd[743]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
		systemd-udevd[743]: Using default interface naming scheme 'v243'.
		ip a:
		wan.300@eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
		link/ether 02:03:03:03:03:03 brd ff:ff:ff:ff:ff:ff

ip link set dev wan.300 up:
		ip a:
		wan.300@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
		link/ether 02:03:03:03:03:03 brd ff:ff:ff:ff:ff:ff
		inet6 fe80::3:3ff:fe03:303/64 scope link 
		valid_lft forever preferred_lft forever

systemctl start [email protected]:
		journalctl:
		dhclient[811]: Internet Systems Consortium DHCP Client 4.4.1
		dhclient[811]: Internet Systems Consortium DHCP Client 4.4.1
		dhclient[811]: Copyright 2004-2018 Internet Systems Consortium.
		dhclient[811]: All rights reserved.
		dhclient[811]: For info, please visit https://www.isc.org/software/dhcp/
		dhclient[811]: Copyright 2004-2018 Internet Systems Consortium.
		dhclient[811]: All rights reserved.
		dhclient[811]: For info, please visit https://www.isc.org/software/dhcp/
		dhclient[811]: 
		dhclient[811]: Listening on LPF/wan.300/02:03:03:03:03:03
		dhclient[811]: Sending on   LPF/wan.300/02:03:03:03:03:03
		dhclient[811]: Listening on LPF/wan.300/02:03:03:03:03:03
		dhclient[811]: Sending on   LPF/wan.300/02:03:03:03:03:03
		dhclient[811]: Sending on   Socket/fallback
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 3
		dhclient[811]: Sending on   Socket/fallback
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 3
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 3
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 3
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 3
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 3
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 8
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 8
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 15
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 15
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 15
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 15
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 14
		dhclient[811]: DHCPDISCOVER on wan.300 to 255.255.255.255 port 67 interval 14
		dhclient[811]: No DHCPOFFERS received.
		dhclient[811]: No DHCPOFFERS received.
		dhclient[811]: No working leases in persistent database - sleeping.
		dhclient[811]: No working leases in persistent database - sleeping.

This configuration is working on 4.4 and 4.14. Next is to test without second gmac.
I will use 4.19-without2ndgmac but I must compile it first. I will give you know.

Regards.

maybe you need to change mac? my isp needs different mac-adresses for all vlan.

the journalctl-message are strange:

systemd-udevd[743]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.

i guess duplex/speed is only available for physical port (wan) and not for vlan, because there is only a tagging by software, but strange, that this message appears on virtual port. have you put wan up before setting vlan up?

I tried it already and not helps. On kernel 4.14 MAC addresses are same also I have same errors form systemd-udevd but I get IP address so i really don’t understand the problem. I tried with this commands order:
eth1 up -> wan up -> vlan 300 -> dhclient
no success :frowning:

only diffrerence what I see is that on 4.14 kernel I have wan.300@wan but on 4.19 I have wan.300@eth1.

Oh…it should be wan.300@wan…you cannot define vlan on gmacs…something is wrong there

if i add a vlan in kernel 4.19 it looks right (as in 4.14)

ip link add link wan name wan.140 type vlan id 140
root@bpi-r2:~# ip link show wan.140                                             
10: wan.140@wan: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAU
LT group default qlen 1000                                                      
    link/ether 02:03:03:03:03:03 brd ff:ff:ff:ff:ff:ff

you have created it the wrong way:

ip link add link eth1 name wan.300 type vlan id 300 #see eth1

i wonder why this is working in 4.14…it’s basicly wrong…you create a vlan on gmac instead of the dsa-port

I just checked 5.3.0-rc1-bpi-r2-phylink-2.5 and it works fine but i had some problems (I can’t bridge lan interfaces - kernel panic when cable plugged in)

I can use git now and cloudflared. As I could not use cloudlfared before, on 4.14, I switched to dnscrypt-proxy. On 4.14 had lot of timeouts when daemon was looking for servers. Now on 5.3 I have no timeouts at all - all server are avaliable.
I’m going back to 4.19 and do vlan as you mentioned above.

rc1 has the bridging-error that was fixed in rc4 afaik…

Unfortunately is not working.
I have wan.300@wan but still no ip address from my isp.

I’ll try patched 5.3.
fatal: bad object 17494d3884cd0c5cf8367ae6e8219e00fa53983c

:frowning:

“net: dsa: Check existence of .port_mdb_add callback before calling it” is now 58799865be84e2a895dab72de0e1b996ed943f22

have now updated 5.3 to release (5.3.0) and renamed branch to 5.3-main

1 Like

I used 5.3-main and have now same problem like on 4.19: my isp is not giving me ip address.
Changing mac address is not helping.

you can try 5.4-rc now, it contains phylink-patches and will be next lts

i expect problems caused by wrong mt7530- setup in older mainline-driver

1 Like

Still the same :frowning:
I get ip address from my isp without problems only on 4.4 and 4.14.