I’ve found (i hope ) a typo in rhashtable_params definition that is likely causes halts and slow-downs.
For now i’m testing this patch - looks great after 40mins. Previously problems has started after 15-30 min, depends on traffic.
I’ve found (i hope ) a typo in rhashtable_params definition that is likely causes halts and slow-downs.
For now i’m testing this patch - looks great after 40mins. Previously problems has started after 15-30 min, depends on traffic.
Nice, head_offset was defined twice
Also lookup should use key, not head, and it uses cookie.
Can you test it on 5.10? If not - i can, but a bit later. @graphine can you also test it?
P.S. my R2 uptime is 1:21 - far away beyond previous maximum.
Is the patch above enough? I’ve added it to my repo in 5.12 and 5.10 (hnat-branches)
Btw. I rebased my 5.12 trees (add mt6625l patches to rc, updated it to rc6 and rebased hnat onto rc)
i see you did conversion from mutex to spinlock…but this does not fixed anything right? was this preinitialized? as i see no DEFINE_SPINLOCK() and no spin_lock_init()
had run 50 iterations of iperf3 without slowdown (100mbit only till now)
On 1gbit/s i see throughput of ~850 mbit with 10-30 retransmitts each round, no cpu usage on test-r2 (main-r2 where iperf3 server is running is ~ 150% cpu, ksoftirq~90%+iperf3~55%). sometimes i see connection refused (on 1G only) on starting new round (round 4,50)… iperf_hnat.log (59,9 KB)
seems like the connection reset is fixed by a sleep 1
( for i in {1..50};do echo "======== ROUND $i ========"; iperf3 -c 192.168.0.10;sleep 1;done ) 2>&1 | tee -a ~/iperf_hnat_sleep.log
iperf_hnat_sleep.log (62,3 KB)
the retransmitts can be there because of a ~25m cable between my switch and main-r2 (inside wall)
with second laptop i still see retransmitts (maybe switch-config) but a throughput of ~930Mbit/s, checked switchconfig, enabled flowcontrol-auto-neg for port of second laptop, but still same retransmitts, but no slowdown…after ~150 iterations (~930 Mbit,20-35 retransmitts each round), test-r2 is up now 3h30m
last 2 iperf-sessions iperf_hnat_sleep2.log (62,3 KB) iperf_hnat_sleep.log (187,1 KB)
if i do iperf3 between client-laptop and test-r2, i see no retransmitts, test-r2 to iperf-server-laptop, have <10 retransmitts in each full round. so retransmitts seem to caused by hnat, but not that much
It doesn’t fix anything, so we likely don’t need it, or mutex. I use spinlock defined in rhashtable, spin_lock_init is
here:
https://elixir.bootlin.com/linux/v5.12-rc6/source/lib/rhashtable.c#L1027
P.S R2 uptime is 9:12 - still alive
did 250 iperf3-interations (~42Min) after last log, still stable, no hang…always 930Mbit/s, ~20-35 Retransmitts on each run (i guess laptops network cards do more than r2, but this is ok)
root@bpi-r2:~# uname -a
Linux bpi-r2 5.10.25-bpi-r2-hnat #1 SMP Sat Apr 10 13:36:00 CEST 2021 armv7l GNU/Linux
root@bpi-r2:~# uptime
19:37:19 up 5:04, 1 user, load average: 0.00, 0.00, 0.00
root@bpi-r2:~# nft --version
nftables v0.9.8 (E.D.S.)
root@bpi-r2:~#
iperf_hnat_sleep3.log.gz (22,0 KB)
so it looks like 5.10-hnat (including the 2 fixes: mutex+rhashtable_params) is stable enough for productive
root@bpi-r2:~# nft list ruleset
table ip filter {
flowtable f {
hook ingress priority filter + 10
devices = { lan3, lan0, wan }
flags offload;
}
chain input {
type filter hook input priority filter; policy accept;
}
chain output {
type filter hook output priority filter; policy accept;
}
chain forward {
type filter hook forward priority filter; policy accept;
ip protocol { tcp, udp } flow add @f
}
}
table ip nat {
chain post {
type nat hook postrouting priority filter; policy accept;
oifname "wan" masquerade
}
chain pre {
type nat hook prerouting priority filter; policy accept;
}
}
root@bpi-r2:~#
btw. i’ve noticed that nft reverts order of devices have defined it as
devices = { wan, lan0, lan3 }
in file, and loaded with nft -f file
but it seems that https-connections are not nat’ed
in firewall i see sometimes my client-laptops-ip (192.168.90.x instead of the one from test-r2’s wan i see in iperf3), http/80 (e.g. this forum) goes over correctly…very strange…already rebooted test-r2…maybe firewall/routing-related
Applied to openwrt, seems previous issues are gone.
How did you measured re transmits?
See my iperf logs…
I see…
I’ve also tested for it:
9 rounds with offload: 2 attempts with retransmit (56 and 1); 7 attempts without it.
perf_hnat.log (11.6 КБ)
(slow speed because of wifi on a server, but R2-NAT is connected to wan and lan0)
Both cables are ~2m both links 1G
UPD:
I’ve also deleted spinlock/mutex patches - after fixing rhashtable_param seems to be functional w/o locks. (I’ll test it for a while)
You use 5.10 or 5.12? Afair 5.12 have patch for trgmii clock
Mhm,seems like i have it accidentally removed
https://patchwork.kernel.org/project/linux-mediatek/patch/[email protected]/
5.12 - Your branch with mutex rolled-back.
P.S. sometimes there are retransmits, but i also may have it with offload off.
Deng qingfang posted your fix
https://patchwork.kernel.org/project/linux-mediatek/patch/[email protected]/
And there is another fix for ppe
https://patchwork.kernel.org/project/linux-mediatek/patch/[email protected]/
tried to add flowtable to my router running 5.10-hnat
define iflan="lanbr0"
define ifwan="ppp8"
define ifoffload={$ifwan,$iflan}
table ip filter {
flowtable ft {
hook ingress priority filter + 10;
devices = $ifoffload;
flags offload;
}
...
# nft.sh
/usr/local/sbin/firewall/ruleset_new_v4+v6.nft:89:12-13: Error: Could not process rule: Operation not supported
flowtable ft {
^^
if i remove the devices line, i got no error…i guess i need something to allow flowtable on ppp or bridge
but found nothing related ;(
if i hardcode the interfaces i got this:
/usr/local/sbin/firewall/ruleset_new_v4+v6.nft:93:14-19: Error: syntax error, unexpected quoted string, expecting string or '$'
devices = {"ppp8","lanbr0"};
if i leave the quotes, i got the “operation not supported” again
There are mentions of PPPoE in mtk_offload code (do you use it?), may be you should try to use wan interface, which is used for ppp instead of ppp?
Traffic is forwarded to ppp8,not to vlan or wan-port. I see patches for pppoe push and bridge support
so i guess problem is more in nft command,but found no device type check
http://git.netfilter.org/nftables/commit/src?id=92911b362e9067a9a335ac1a63e15119fb69a47d
In my case i use bridge(lan0-3+wifi) for lan and wan, but also i use lan0-3 and wan in firewall:
Bridge:
r2-gentoo /dev # brctl show
bridge name bridge id STP enabled interfaces
br0 8000.6ac4b308d134 no lan0
lan1
lan2
lan3
wlp1s0
ip addr:
r2-gentoo /dev # ip a s br0
16: br0: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 6a:c4:b3:08:d1:34 brd ff:ff:ff:ff:ff:ff
inet 10.0.1.1/24 brd 10.0.1.255 scope global br0
valid_lft forever preferred_lft forever
inet6 fe80::68c4:b3ff:fe08:d134/64 scope link
valid_lft forever preferred_lft forever
firewall:
nft list ruleset
....
flowtable f {
hook ingress priority filter + 1
devices = { lan2, lan1, lan0, wan }
flags offload;
}
....
All my ethernet devices are connected via lan0-3 which are members of br0
.
I suppose ppp and bridge has no offload flags as they are virtual interfaces, and you still need to use physical interfaces in your firewall
P.S. Only in offload flowtable of cource you still need ppp8 for nat
Thanks i try it…after thinking about it,basicly it makes sense as each virtual device is only software and only adds headers to packets (ppp,vlan) and hardware only knows hardware ports…
For ppp i have packets containing
Dot1q-header+ppp-header+payload
And sent out of wan
For bridge there are no additional headers as traffic are only grouped on software side. Hardware does not see it.
btw. where have the flow-table be linked before or after other rules in forward-chain?
chain FORWARD {
type filter hook forward priority 0; policy drop;
#ip protocol { tcp, udp } flow add @f
oifname $ifexternal ip saddr $iprangesblocked jump REJECTED comment "block internal ip ranges to have only internal access"
#ipv6 in ipv4 tunnel
udp dport {41,43,44,58,59,60} jump REJECTED comment "block ipv6 in ipv4 tunnel"
oifname $ifwan ip saddr 192.168.0.9 jump REJECTED comment "Block internet-access for cisco switch"
oifname $ifwan tcp dport domain jump REJECTED comment "block external dns in forward"
seems at position i have prepared, it breaks forwarding, but imho accept/drop exits the chain and so never reach position of “flow add” defined.
so it looks like a bug in ppp-handling (maybe bug while porting to 5.10)
table ip filter {
flowtable f {
hook ingress priority 10
#devices = $ifoffload;
#use HW interfaces here!
devices = { wan, lan0, lan1, lan2, lan3 }
flags offload
}
if i look into entries, i see only UNB, many with “new=0.0.0.0:0->0.0.0.0:0 eth=00:00:00:00:00:00->00:00:00:00:00:00 etype=0000 vlan=0,0”, but some (also with public ips) with mac-addresses and the vlan
01ac6 UNB IPv4 5T orig=192.168.0.21:52136->195.20.250.26:443 new=217.61.147.xx:57010->217.72.196.71:443 eth=02:11:02:03:04:05->00:00:5e:00:01:02 etype=0101 vlan=140,0 ib1=1000019d ib2=007ff020
i wonder why target-IP has changed…imho only source needs change to public-ip (IP and vlan 140 are correct)
Imho Ethtype should be 8100/88a8 for vlan or maybe 8863/4 for pppoe and not 101
i’ve found out, that i can install a pppoe-server in ubuntu for testing (apt install pppoe), but this needs some configuration of course
ok, seems only a etype-problem…activated it and made request to https://wiki.ubuntuusers.de/tcpdump/ (see packet with tcpdump on ppp-interface, not wan.140 or wan, but i see pppoe-packets with ip-adress-information, seems like port-filter does not work in this case)
# tcpdump -n -i ppp8 port 443
16:32:01.256873 IP 80.245.76.249.33630 > 213.95.41.4.443: Flags [.], ack 764, win 501, options [nop,nop,TS val 128775260 ecr 153299079], length 0
16:32:01.261032 IP 213.95.41.4.443 > 80.245.76.249.33634: Flags [P.], seq 4984:5298, ack 1985, win 262, options [nop,nop,TS val 153299081 ecr 128775254], length 314
# nslookup 213.95.41.4
4.41.95.213.in-addr.arpa name = ha.ubuntu-eu.org.
cat /sys/kernel/debug/mtk_ppe/entries | grep BND | grep 140 | grep '213.95.41.4'
00c22 BND IPv4 5T orig=192.168.0.21:33676->213.95.41.4:443 new=80.245.76.249:33676->213.95.41.4:443 eth=02:11:02:03:04:05->00:00:5e:00:01:02 etype=0101 vlan=140,0 ib1=214949a7 ib2=007ff020
mac is the one for wan.140 (wan has another, as i need to set it for both vlans)
if etype in entries is really the ethernet-type it needs to be 8100/88XX (vlan/pppoe). i guess first ethernettype needs to be 8100 for vlan
mhm, on my previous test with afair wan+lan3 i got etype with bits 12/8 set, so again it looks like the ethtype is set to the dsa-port here and not the vlan…but i do not see if all headers are added
did some more tests with a local pppoe-server and it seems that is working with 5.12 and not with 5.10…so i guess i miss anything while porting
it seems flowtable itself is breaking, not the hw-offload (it is broken with disabled flags too, but working if “add flow” line is disabled), but offload with 5.10-hnat works without ppp…very strange
it looks like with 5.10 i have only a mtu problem with flowtable active, seems like flowtable breaks the normal Path-discovery/fragmentation behaviour. without flowtable i can access websites through ppp-tunnel (mtu 1492), with flowtable (also without offload) i got only connection refused. if i reduce mtu, it works with flowtable+offload. idk why i don’t get problems with 5.12 or without flowtable. i will try now mss-fix-settings
https://wiki.nftables.org/wiki-nftables/index.php/Mangling_packet_headers
i got answer about location of rule here: https://marc.info/?l=netfilter&m=162012856832116&w=2
flow add should be last in forward to process all other rules before. forward chain is only done for SYN and SYN-ACK (first 2 tcp packets per connection)
Have You succeeded with tests? I’ve tested 5.10 + pppoe, with MSSFIX on the server’s side - works without any problems.
not yet due to missing time timeslots are currently only max 30min, too short for setting up the complete test environment