Issues with NAT for LAN on 4.14.80 kernel

Iptables look good so far. I don’t use bridge here in my setup.

Can you try remove a lan-port out of the bridge and test with this?

Tried that now. That does not change anything unfortunately. I liked that the 4 lan ports were a single interface in my previous 4.4 install. Is there a way to use eth0 or eth1 (whichever is the 4 ports)? Maybe that helps the kernel speed things up.

imho this is not possible, if you drop dsa-driver you have no defined connection between eth0/1 and wan/lan. i don’t know how exactly it’s done in 4.4…i guess there is a hardcoded connection

you use dhcp on your bridge? did all client get valid config? maybe here is the problem

you can also run tcpdump on lanx+wan to see whats happening here

i expect you have routing activated for all interfaces…

echo 1 > /proc/sys/net/ipv4/ip_forward

there are also interface-based entries possible…

you can ping lanx from your client? can you also ping wan-address from it?

hmm…btw. you need “auto wan” before “allow-hotplug wan” to bring it up on bootup…

and please comment out lines for lan0-lan3 if using a bridge

and make a clean reboot after it

i hope it fixes that strange LOWERLAYERDOWN on lanx while br0 is up (or is there nothing physically connected?)

Yeah lan1-3 are just not connected that’s why it says LOWERLAYERDOWN.

DHCP is on for br0 but I used a manual setup for lan0 on both ends. Ping works to the gateway and also to 1.1.1.1. The ping rtt is about 10ms higher than via ap0 which is also strange. The pi is not doing anything else right now besides being a router so there shouldn’t be any cpu bottlenecks.

lan0 is also LOWERLAYERDOWN in your output…

have you tried removing lan0-lan3-config and adding “auto wan”?

so your interfaces should look like this:

auto eth0
iface eth0 inet manual
  pre-up ip link set $IFACE up
  post-down ip link set $IFACE down

#2nd gmac (cpu-port) only in 4.14.53+
auto eth1
iface eth1 inet manual
  pre-up ip link set $IFACE up
  post-down ip link set $IFACE down

#wan-port as dhcp-client
auto wan
allow-hotplug wan
iface wan inet dhcp

auto br0
iface br0 inet static
    address 192.168.0.1
    netmask 255.255.0.0
    bridge_ports lan0 lan1 lan2 lan3
    bridge_fd 5
    bridge_stp no

source-directory /etc/network/interfaces.d

what is IP 1.1.1.1? i mean ping from your client (connected on lan0)

br0: 192.168.0.1
wan: 10.10.61.54

1.1.1.1 is just the Cloudflare DNS server. Similar to Google’s 8.8.8.8. I always use it for connectivity checks. Also wan is not brought up automatically when I add allow-hotplug wan just auto wan is enough. Removing the lan0-lan3 manual block didn’t change anything either.

In my log lan0 also says LOWERLEVELDOWN because it was not plugged in at the point I retrieved the output as I was connected via ap0, because the lans don’t work. Even SSH is affected as it seems. When I connect to the pi via SSH, it stops replying whenever I send a command that produces more than a couple lines of output (’ip a’ for example).

It feels like the lan ports are just not up to speed at all. It could be the same problem that @troumad has here regarding the packetloss.

I did some more testing just now and now I set up my interfaces differently:

  • I removed ap0 from the bridge
  • ap0 is assigned it its own subnet on 192.168.2.1/24
  • ap0 has its own dhcp range as well

I did a tcpdump on lan0 again and noticed when I ctrl-c’ed that it said this:

^C
1372 packets captured
1701 packets received by filter
329 packets dropped by kernel
1 packet dropped by interface

looks quite suspicious to me tbh. Why would the kernel drop any packets though?

Can you look if hwqos/hwnat is enabled on system and try to disable?

Tried with and without the hwnat module. No difference in behavior

if you can recompile kernel and install it you can drop HWQOS, i have got a message that this causes some problems, but i can’t reproduce them

run “./build.sh defconfig” and comment out this line (at the end) with #

CONFIG_NET_MEDIATEK_HW_QOS=m

then importconfig and build

else i guess we need help from an expert. imho hwqos is part of hwnat and should not be active if hwnat is dropped

edit: found modifications in base-ethernet-driver, so it can break traffic also if hwnat is unloaded

./drivers/net/ethernet/mediatek/mtk_eth_soc.c:27:#if defined(CONFIG_NET_MEDIATEK_HW_QOS)
./drivers/net/ethernet/mediatek/mtk_eth_soc.c:739:#ifdef CONFIG_NET_MEDIATEK_HW_QOS
./drivers/net/ethernet/mediatek/mtk_eth_soc.c:2136:#if defined(CONFIG_NET_MEDIATEK_HW_QOS)
./drivers/net/ethernet/mediatek/mtk_eth_soc.c:2143:#if defined(CONFIG_NET_MEDIATEK_HW_QOS)

boy, the kernel was always one of those things I always avoided to touch because I don’t have a clue what I’m doing there. But I guess I will have to give it a shot. While I’m at it: should I use your 4.19 kernel or just use the 4.14 kernel again?

try 4.19 first, hwnat/hwqos is not supported there and maybe bug is fixed

Maybe a dumb question but is it enough to just install the debian-package from your github releases page?

you can try it…have not tested this much

I have loaded the new 4.19.11 kernel now but the issue remains.

For any one interested: I installed @frank-w’s DEB-Package for Kernel 4.19.11-testing from his Github and then changed the /boot/bananapi/bpi-r2/linux/uEnv.txt so it loads the new uImage_4.19.11-main file by changing the boot1 entry which is loaded by default.

Is there a build of the 4.4 kernel with working wifi AP with WPA2 somewhere? Then I would roll back to that one for now.

you find compiled 4.4 on my gdrive or compile it from my repo

https://drive.google.com/open?id=1kkFyxeHoskszI7CpSsL6Wi6ROjDrBLOB

note that you need to assign ip-adress to eth0 instead of lanx

Btw boot1 should use var kernel which is easier to change than boot1.imho deb should do it automaticly…

Have you same strange packetdrops with 4.19 in tcpdump?

mhm…lan-ports need to be set to up after bridge is created…before there is no traffic and the port+bridge are down. make now some ping-tests to br0-ip…or do you have only problem with forwarded traffic?

true that makes more sense will change it back later.

no its not just the forwarded traffic that has these issues. direct pings over the wire have the same issues regarding packet loss. about 10% of pings to the BPI get lost. I don’t think it’s an issues with my network setup. The interfaces are in their right states when they have to be, no issues there.

I downgraded to 4.4 now and have a look there. hostapd does its thing nicely but iptables complains:

 modprobe: ERROR: ../libkmod/libkmod-module.c:192 kmod_module_parse_depline() ctx=0x441008 path=/lib/modules/4.4.160-BPI-R2-Kernel/kernel/net/ipv4/netfilter/ip_tables.ko error=No such file or directory

is it not compiled in your build?

--- 192.168.0.18 ping statistics ---
1001 packets transmitted, 1001 received, 0% packet loss, time 1023906ms
rtt min/avg/max/mdev = 0.285/0.449/0.671/0.060 ms

root@bpi-r2:~# tcpdump -i br0 >/dev/null
[  347.217490] device br0 entered promiscuous mode
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C2649 packets cap[ 1317.278048] device br0 left promiscuous mode
tured
2650 packets received by filter
0 packets dropped by kernel
root@bpi-r2:~# uname -r
4.19.11-bpi-r2-main
root@bpi-r2:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 32:c1:83:3a:48:5a brd ff:ff:ff:ff:ff:ff
3: wan@eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
    link/ether 32:c1:83:3a:48:5a brd ff:ff:ff:ff:ff:ff
4: lan0@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 32:c1:83:3a:48:5a brd ff:ff:ff:ff:ff:ff
5: lan1@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP group default qlen 1000
    link/ether 32:c1:83:3a:48:5a brd ff:ff:ff:ff:ff:ff
6: lan2@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop master br0 state DOWN group default qlen 1000
    link/ether 32:c1:83:3a:48:5a brd ff:ff:ff:ff:ff:ff
7: lan3@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 32:c1:83:3a:48:5a brd ff:ff:ff:ff:ff:ff
8: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 32:c1:83:3a:48:5a brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.18/24 brd 192.168.0.255 scope global br0
       valid_lft forever preferred_lft forever
9: wan.60@wan: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN group default qlen 1000
    link/ether 32:c1:83:3a:48:5a brd ff:ff:ff:ff:ff:ff
    inet 192.168.60.1/24 brd 192.168.60.255 scope global wan.60
       valid_lft forever preferred_lft forever
10: lxcbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 32:b6:a2:b4:7e:cc brd ff:ff:ff:ff:ff:ff
    inet 10.0.3.1/24 brd 10.0.3.255 scope global lxcbr0
       valid_lft forever preferred_lft forever

direct pings do not have any issues here

regarding iptables…imho i used the config from official repo…only added things. after getting 4.14 running i do not use 4.4…i only use it to compare on problems

have you unpacked the lib/modules-folder in BPI-ROOT? config contains IPtables as module so it should be included…

currently i’m not getting lan-port up on system-boot…i always need to enable it manually after boot

https://wiki.debian.org/BridgeNetworkConnections says that this line for each port is enough before bridge:

iface eth0 inet manual

so i configured it like this:

iface lan1 inet manual
iface lan2 inet manual

auto br0
iface br0 inet static
    address 192.168.0.18
    netmask 255.255.255.0
    bridge_ports lan1 lan2
    bridge_fd 5
    bridge_stp no

okay I went back to kernel 4.14 now because I couldn’t get 4.4 to work and I want 4.14 or newer anyways. And yes I unpacked both folders to their respective directories.

I checked the ping stats again and I am not getting any kernel dropped packets on the direct route either anymore which is strange.

I have to go now and probably won’t be back for a week or so. Maybe something will come up in the meantime…

bring the bridge up seems to need this:

auto lan1
allow-hotplug lan1
iface lan1 inet manual
   pre-up   ifconfig $IFACE up
   pre-down ifconfig $IFACE down

auto lan2
allow-hotplug lan2
iface lan2 inet manual
   pre-up   ifconfig $IFACE up
   pre-down ifconfig $IFACE down

auto br0
iface br0 inet static
    address 192.168.0.18
    netmask 255.255.255.0
    bridge_ports lan1 lan2
    bridge_fd 5
    bridge_stp no

https://www.cyberciti.biz/faq/debian-network-interfaces-bridge-eth0-eth1-eth2/ but also need the “auto …” line

your topic is named issues with nat so i guess the problem exists only if NAT (Masquerading) is used…

ping over nat (my main-router) to cloudflare

--- 1.1.1.1 ping statistics ---
2108 packets transmitted, 2094 received, 0% packet loss, time 2110253ms
rtt min/avg/max/mdev = 35.064/44.800/1070.474/65.431 ms, pipe 2

Not much,but 14 packets lost of 2100

tcpdump on main-router:

[18:03] frank@bpi-r2-e:/var/lib/tftp$ sudo tcpdump -v icmp -i lan0 >/dev/null
tcpdump: listening on lan0, link-type EN10MB (Ethernet), capture size 262144 bytes
^C4022 packets captured
4022 packets received by filter
0 packets dropped by kernel

[18:37] frank@bpi-r2-e:/var/lib/tftp$ uname -r
4.14.78-bpi-r2-main

did tcpdump on ppp0 (may wan where masquerade is setup)

[19:04] frank@bpi-r2-e:/var/lib/tftp$ sudo tcpdump -v icmp -i ppp0 >/dev/null
[sudo] Passwort für frank: 
tcpdump: listening on ppp0, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
^C2645 packets captured
2645 packets received by filter
0 packets dropped by kernel

client says

--- 1.1.1.1 ping statistics ---
1046 packets transmitted, 1034 received, 1% packet loss, time 1046582ms
rtt min/avg/max/mdev = 36.604/43.582/1066.369/52.799 ms, pipe 2

you can run (maybe with watch before to see counting)

netstat -i

to see where Packets are dropped, i have indeed some for lan0 which is my main-lan-interface (but there are 25m cable between r2 and switch, which may cause errors)

pinged google-dns over night and this is the result:

--- 8.8.8.8 ping statistics ---
46758 packets transmitted, 46742 received, +4 errors, 0% packet loss, time 46826538ms
rtt min/avg/max/mdev = 10.029/22.975/1042.775/87.433 ms, pipe 2

16 packets lost of 46700 and these are caused for sure by my 24h connection-reset (done by cronjob)