New netfilter flow table based HNAT

frank-w · April 4, 2021, 1:19pm

which ND proxy do you use here (and how does this work, i only know the “normal” way to have an own /64 on each subnet and radvd for advertising it)? have not done much with ipv6. i have a /48 over tunnelbroker.net, so i have enough subnets for using radvd…by default somebody get a /64 first and after a while (collecting points) a larger subnet can be “purchased” with these points, like i did some time ago

dangowrt · April 4, 2021, 3:48pm

Oh yeah, I remember playing the HE IPv6 qualification game a long time ago. By now, most commercial private VDSL and DOCSIS connections I’ve seen already provide native IPv6 and usually delegate /64 or even /56 to the CPE.

ND proxy is a Linux kernel feature (like ARP proxy on v4), you can enable that using sysctl. In OpenWrt we got our own IPv6 management stack (as ISC stack would be overkill and dnsmasq doesn’t allow doing IPv6 delegation properly). See

DeadMeat · April 5, 2021, 8:33am

Do you mean sysctl featue e.g net.ipv6.conf.wan.proxy_ndp = 1, and ip -6 neigh add proxy <client_ipv6_addr> dev <wan>?

It’s quite inconvenient to use unless you use static ip for every client, or your ipv6 management tracks any client and it’s ip.

For general purpose linux distro i used ndppd.

frank-w · April 8, 2021, 1:23pm

As far as Alex debugged it,problem with slowdown is here:

https://elixir.bootlin.com/linux/v5.12-rc6/source/lib/rhashtable.c#L429

Maybe static size hashtable is better?

DeadMeat · April 10, 2021, 9:03am

I’ve found (i hope ) a typo in rhashtable_params definition that is likely causes halts and slow-downs.

For now i’m testing this patch - looks great after 40mins. Previously problems has started after 15-30 min, depends on traffic.

frank-w · April 10, 2021, 9:33am

Nice, head_offset was defined twice

DeadMeat · April 10, 2021, 9:47am

Also lookup should use key, not head, and it uses cookie.

Can you test it on 5.10? If not - i can, but a bit later. @graphine can you also test it?

P.S. my R2 uptime is 1:21 - far away beyond previous maximum.

frank-w · April 10, 2021, 10:34am

Is the patch above enough? I’ve added it to my repo in 5.12 and 5.10 (hnat-branches)

Btw. I rebased my 5.12 trees (add mt6625l patches to rc, updated it to rc6 and rebased hnat onto rc)

i see you did conversion from mutex to spinlock…but this does not fixed anything right? was this preinitialized? as i see no DEFINE_SPINLOCK() and no spin_lock_init()

had run 50 iterations of iperf3 without slowdown (100mbit only till now)

On 1gbit/s i see throughput of ~850 mbit with 10-30 retransmitts each round, no cpu usage on test-r2 (main-r2 where iperf3 server is running is ~ 150% cpu, ksoftirq~90%+iperf3~55%). sometimes i see connection refused (on 1G only) on starting new round (round 4,50)… iperf_hnat.log (59,9 KB)

seems like the connection reset is fixed by a sleep 1

( for i in {1..50};do echo "======== ROUND $i ========"; iperf3 -c 192.168.0.10;sleep 1;done ) 2>&1 | tee -a ~/iperf_hnat_sleep.log

iperf_hnat_sleep.log (62,3 KB)

the retransmitts can be there because of a ~25m cable between my switch and main-r2 (inside wall)

with second laptop i still see retransmitts (maybe switch-config) but a throughput of ~930Mbit/s, checked switchconfig, enabled flowcontrol-auto-neg for port of second laptop, but still same retransmitts, but no slowdown…after ~150 iterations (~930 Mbit,20-35 retransmitts each round), test-r2 is up now 3h30m

last 2 iperf-sessions iperf_hnat_sleep2.log (62,3 KB) iperf_hnat_sleep.log (187,1 KB)

if i do iperf3 between client-laptop and test-r2, i see no retransmitts, test-r2 to iperf-server-laptop, have <10 retransmitts in each full round. so retransmitts seem to caused by hnat, but not that much

DeadMeat · April 10, 2021, 5:41pm

It doesn’t fix anything, so we likely don’t need it, or mutex. I use spinlock defined in rhashtable, spin_lock_init is here: https://elixir.bootlin.com/linux/v5.12-rc6/source/lib/rhashtable.c#L1027

P.S R2 uptime is 9:12 - still alive

frank-w · April 10, 2021, 5:51pm

did 250 iperf3-interations (~42Min) after last log, still stable, no hang…always 930Mbit/s, ~20-35 Retransmitts on each run (i guess laptops network cards do more than r2, but this is ok)

root@bpi-r2:~# uname -a
Linux bpi-r2 5.10.25-bpi-r2-hnat #1 SMP Sat Apr 10 13:36:00 CEST 2021 armv7l GNU/Linux
root@bpi-r2:~# uptime
 19:37:19 up  5:04,  1 user,  load average: 0.00, 0.00, 0.00
root@bpi-r2:~# nft --version
nftables v0.9.8 (E.D.S.)
root@bpi-r2:~#

iperf_hnat_sleep3.log.gz (22,0 KB)

so it looks like 5.10-hnat (including the 2 fixes: mutex+rhashtable_params) is stable enough for productive

root@bpi-r2:~# nft list ruleset
table ip filter {
        flowtable f {
                hook ingress priority filter + 10
                devices = { lan3, lan0, wan }
                flags offload;
        }

        chain input {
                type filter hook input priority filter; policy accept;
        }

        chain output {
                type filter hook output priority filter; policy accept;
        }

        chain forward {
                type filter hook forward priority filter; policy accept;
                ip protocol { tcp, udp } flow add @f
        }
}
table ip nat {
        chain post {
                type nat hook postrouting priority filter; policy accept;
                oifname "wan" masquerade
        }

        chain pre {
                type nat hook prerouting priority filter; policy accept;
        }
}
root@bpi-r2:~#

btw. i’ve noticed that nft reverts order of devices have defined it as

devices = { wan, lan0, lan3 }

in file, and loaded with nft -f file

but it seems that https-connections are not nat’ed

in firewall i see sometimes my client-laptops-ip (192.168.90.x instead of the one from test-r2’s wan i see in iperf3), http/80 (e.g. this forum) goes over correctly…very strange…already rebooted test-r2…maybe firewall/routing-related

graphine · April 10, 2021, 10:08pm

Applied to openwrt, seems previous issues are gone.

DeadMeat · April 11, 2021, 5:46am

How did you measured re transmits?

frank-w · April 11, 2021, 5:50am

See my iperf logs…

DeadMeat · April 11, 2021, 6:11am

I see…

I’ve also tested for it:

9 rounds with offload: 2 attempts with retransmit (56 and 1); 7 attempts without it.

perf_hnat.log (11.6 КБ)

(slow speed because of wifi on a server, but R2-NAT is connected to wan and lan0)

Both cables are ~2m both links 1G

UPD:

I’ve also deleted spinlock/mutex patches - after fixing rhashtable_param seems to be functional w/o locks. (I’ll test it for a while)

frank-w · April 11, 2021, 6:26am

You use 5.10 or 5.12? Afair 5.12 have patch for trgmii clock

Mhm,seems like i have it accidentally removed

https://patchwork.kernel.org/project/linux-mediatek/patch/[email protected]/

DeadMeat · April 11, 2021, 6:31am

5.12 - Your branch with mutex rolled-back.

P.S. sometimes there are retransmits, but i also may have it with offload off.

frank-w · April 17, 2021, 7:17am

Deng qingfang posted your fix

https://patchwork.kernel.org/project/linux-mediatek/patch/[email protected]/

And there is another fix for ppe

https://patchwork.kernel.org/project/linux-mediatek/patch/[email protected]/

frank-w · April 26, 2021, 2:51pm

tried to add flowtable to my router running 5.10-hnat

define iflan="lanbr0"
define ifwan="ppp8"
define ifoffload={$ifwan,$iflan}

table ip filter {
    flowtable ft {
        hook ingress priority filter + 10;
        devices = $ifoffload;
        flags offload;
    }
...

# nft.sh
/usr/local/sbin/firewall/ruleset_new_v4+v6.nft:89:12-13: Error: Could not process rule: Operation not supported
	flowtable ft {
	          ^^

if i remove the devices line, i got no error…i guess i need something to allow flowtable on ppp or bridge

but found nothing related ;(

if i hardcode the interfaces i got this:

/usr/local/sbin/firewall/ruleset_new_v4+v6.nft:93:14-19: Error: syntax error, unexpected quoted string, expecting string or '$'
		devices = {"ppp8","lanbr0"};

if i leave the quotes, i got the “operation not supported” again

DeadMeat · April 26, 2021, 5:03pm

There are mentions of PPPoE in mtk_offload code (do you use it?), may be you should try to use wan interface, which is used for ppp instead of ppp?

frank-w · April 26, 2021, 5:12pm

Traffic is forwarded to ppp8,not to vlan or wan-port. I see patches for pppoe push and bridge support

so i guess problem is more in nft command,but found no device type check

http://git.netfilter.org/nftables/commit/src?id=92911b362e9067a9a335ac1a63e15119fb69a47d