Problem with NAT/ip_forward

Hi. In my company we use a lot of BPI-R2 as a routers. On some of them we have a problem with packet forwarding - packets lost or the speed is very low. I have check this on two different OS: Raspbian based on Debian Stretch and SlackARM. So, for example - the desktop connected behind the BPI-R2 router:

Package: iptables-1.8.3-arm-1.txz
                        Downloading ftp://ftp.arm.slackware.com/slackwarearm/slackwarearm-current/./slackware/n/iptables-1.8.3-arm-1.txz...
--2019-07-16 14:35:27--  ftp://ftp.arm.slackware.com/slackwarearm/slackwarearm-current/slackware/n/iptables-1.8.3-arm-1.txz
           => ļæ½ā€˜//var/cache/packages/./slackware/n/iptables-1.8.3-arm-1.txzļæ½ā€™
Resolving ftp.arm.slackware.com (ftp.arm.slackware.com)... 91.109.244.240
Connecting to ftp.arm.slackware.com (ftp.arm.slackware.com)|91.109.244.240|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /slackwarearm/slackwarearm-current/slackware/n ... done.
==> SIZE iptables-1.8.3-arm-1.txz ... 345752
==> PASV ... done.    ==> RETR iptables-1.8.3-arm-1.txz ... done.
Length: 345752 (338K) (unauthoritative)

iptables-1.8.3-arm-   0%[                    ]   1.41K  --.-KB/s    eta 37h 51m

2019-07-16 14:50:43 (1.60 B/s) - Data connection: Connection timed out;

How I can find the reason of this problem? Or it is a hardware problem and I could not do anything with that? Only replace the BPI-R2?

You are sure configuration is right on r2 (forwarding enabled and not blocked by firewall, nat setup right,ā€¦)? Which kernel are you using?

You can try sniff the packets with tcpdump on incoming and outgoing. Is traffic TO r2 affected?

Without knowing your config and network-topology it is impossible to help youā€¦

Hi @frank-w Yes, Iā€™m sure that everything what is required is configured correctly and completely :slight_smile:

I tested this on the same BPI-R2 board with kernel 4.16 and 4.19 and donā€™t want to work. When I run the same board with official kernel 4.4.70 NAT start working without any issues. Interestingly, on some of BPI-R2 boards with kernel 4.16 (the same OS and configuration) NAT works OK. Strange :pensive:

The topology on my network is very simple: 1st BPI-R2 connected to local network via wan port and on the lan ports is running dhcp server and had setup NAT, 2nd BPI-R2 connected directly via wan port to the 1st BPI-R2 to lan port. Thatā€™s all :wink:

4.4 and 4.9+ have completely different network configuration because of dsa driver. In 4.9+ you bring ethx up and configure ip on dsa-ports (wan,lanx). 4.4 only have ethx

I know about this differences and setup NAT as should on each kernel. The problem is that on the same BPI-R2 board NAT works OK with kernel 4.4 but donā€™t works with kernel 4.16/4.19. When we try ping from the second BPI-R2 using bigger package than default packages are lost. ping -s 12500 10.10.0.1 /10.10.0.1 is a GW in my local network/

you can try phylink, maybe this fixes Problems

this is last version (for r2):

default-size packets working?

Yes, default packet size work well. It is the good point to test problems with ethernet port(s). After made a check on official kernel 4.14 https://github.com/BPI-SINOVOIP/BPI-R2-bsp-4.14 I have had the same situation - default packet size with ping is OK but when we try with bigger (>1024) we donā€™t receive answer from ping (iā€™m pinging my local GW which not drop bigger icmp packages than default - Iā€™m sure them ;)).

@frank-w As I have a few motherboards with this issue (from a lot used) Iā€™m sure this is a problem with hardware (with BPI-R2 board), not with kernel.

if it works with 4.4 it canā€™t be hardware issueā€¦

Imho official 4.14 was based on my repo in early phase

Maybe you need to enable support for jumbo-frames in kernel/sysfs

Hmm, @frank-w maybe. But, if it would be true ip_forward/NAT shouldnā€™t work on all used by me BPI-R2, true? Or Iā€™m thinking wrong? But the problem occurs, as I wrote earlier, on few from all which we use.

afaik you had the problem only on 4.14+? and not on 4.4ā€¦

Config beween 4.4 and my 4.9+ may differ

@frank-w I can make the deep tests again but to have simplest situation I want to have the same name for network interfaces on all kernels. Do you known and could you tell me, if I can setup interface name in u-boot? or maybe in other place (the good way will be do it without kernel recompilation).

except kernel 4.4 the interfaces have the same nameā€¦ethx for gmacs, wan+lan0-3 for the physical ports on the board. eth1 (second gmac) only exists in my modified kernels (4.14,4.19,x.xx-gmac) in newer phylink branches second gmac is simply renamed to wan (= same network-config as in mainline-kernel)

in uboot there are imho all interfaces bridged in switch and only one interface is configured, but you cannot set any interface-name in uboot

or do i understand your question wrong?

You understand me correctly. Itā€™s clear, i try to do NAT tests again and will back with the result.

configure nat on the outgoing (public) interface not a lan-interfaceā€¦

i donā€™t know your topology, but here in germany public interface is a pppXā€¦

I have this BPI-R2 connected inside my local network so I use wan port (the separated port) connected to my local network as an outgoing interface an rest of ports (lan0-3) as a port to which I connect other device from with I try download/ping/etc.

Iā€™m thinking that HDMI will work correctly on all kernels with my settings in uEnv file video=HDMI-A-1:1280x1024-32@60D

which kernel do you use? I have not merged it in all non-lts

Wan needs to have another subnet (not used by any of the lan-portsā€¦

I have kernels

  • 4.16.18
  • 4.19.16

recompiled a long time ago. I recompile a few days ago the kernel version 4.14.34 from official SINOVOIP repository.

I know that I must use different subnets :wink:

Official 4.14 afair has no hdmiā€¦

4.14,4.16 and 4.19 have hdmi-supportā€¦maybe not all resolutions but i also use 1280x1024.

Two BPI-R2 mobo. The 1st - probably have hardware problems, whyā€¦

1st BPI-R2 started with Raspbian with kernel:

root@bpi-iot-ros-ai:/home/pi# uname -a
Linux bpi-iot-ros-ai 4.16.18-bpi-r2-main #7 SMP Fri Aug 10 16:36:25 CEST 2018 armv7l GNU/Linux

NAT up (with MASQUERADE) and dhcp server on lan3.

wget result from test desktop connected to BPI-R2 to port lan3:

root@slackware:~# ping 10.10.0.1
PING 10.10.0.1 (10.10.0.1): 56 data bytes
64 bytes from 10.10.0.1: seq=0 ttl=254 time=1.726 ms
64 bytes from 10.10.0.1: seq=1 ttl=254 time=1.385 ms
64 bytes from 10.10.0.1: seq=2 ttl=254 time=1.330 ms
^C
--- 10.10.0.1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 1.330/1.480/1.726 ms
root@slackware:~# wget http://noc.pirx.pl/100mb.bin -O /dev/null 
Connecting to noc.pirx.pl (217.73.181.197:80)
null                   0% |                                                                                                                                 |     2k  - stalled -

And the result from wget when as router is used 2nd BPI-R2 with the same Raspbian and kernel:

root@slackware:~# wget http://noc.pirx.pl/100mb.bin -O /dev/null 
Connecting to noc.pirx.pl (217.73.181.197:80)
null                  17% |***********************                                                                                                          | 18269k  0:00:27 ETA^C

I know, itā€™s look strange but it is true. I got info about problems with other BPI-R2 which ā€œcanā€™t workā€ as router. :tired_face:

//10.10.0.1 is the GW in my local network//

With the 1st BPI-R2 (which have problems on kernel 4.16.18) with kernel 4.4.70

root@slackware:~# wget http://noc.pirx.pl/100mb.bin -O /dev/null 
Connecting to noc.pirx.pl (217.73.181.197:80)
null                  42% |******************************************************                                                                           | 43616k  0:00:06 ETA^C

Iā€™m very tired this not clear situation for me. Why some BPI-R2 with kernel 4.16.18 works well as router (with NAT/ip_forward) but some not? Where is the reason this strange (for me) situation?

you have enabled ip forwarding? and configured default route? try to make a traceroute to your internet-targetā€¦does the next r2 repond to it?

can you draw your topology, else it is hard to understand what is 1rst r2 and second :wink:

can you check soldering of ethernet-ports? i heard of some manufactoring problems