Bridged interfaces flowtable HW_OFFLOAD

@frank-w @dangowrt @ghostbuster91 and any others interested

I have finished the rough version of my newest patch-set. Of course, the patch-set needs some more cleaning up and removing the printk()'s. Also I do need to do some more testing. Perhaps any of you care to test or review it, before I try it as a [RFC PATCH] to netfilter, maybe somewhere in the next following weeks or months.

The patch-set adds the possibility for adding hardware offloaded flow through nftables for bridged interfaces. Until now this was only possible using the userspace tool bridger, but it runs on openwrt only. Since I’m running ArchlinuxArm on my BananaPi routers, I can’t use bridger. Of course this patch-set will also work on Debian/Ubuntu, Openwrt and other distributions.

You can find it in this branch, starting with the commit COVER LETTER:

https://github.com/ericwoud/linux/commits/bpir-nftflow-stable https://github.com/ericwoud/linux/commits/bpir-nftflow-net-next

This branch will get rebased/fixedup/squashed very often.

The contents of the cover-letter commit is only used for debugging. The description holds the [0/X] description of the patch-set.

The patch-set also addresses a few other issues that have come to light during testing the bridged flow.

After applying the patches, one can add the bridged flow. It needs CONFIG_NF_CONNTRACK_BRIDGE enabled. The flowtable is added as an extra flowtable, called fb in this example:

table inet filter {
        flowtable f {
                hook ingress priority filter
                devices = { eth1, lan0, lan1, lan2, lan3, lan4, wlan0, wlan1 }
                flags offload
        }
        chain forward {
                type filter hook forward priority filter
                ct state established flow add @f
        }
}

table bridge filter {
        flowtable fb {
                hook ingress priority filter
                devices = { lan0, lan1, lan2, lan3, lan4, wlan0, wlan1 }
                flags offload
        }
        chain forward {
                type filter hook forward priority filter
                ct state established flow add @fb
        }
}

In this example eth1 is used as wan port.

         forward fastpath bypass
 .----------------------------------------.
/                                          \
|                        IP - forwarding    |
|                       /                \  v
|                      /                  wan ...
|                     /
|                     |
|                     |
|                   brlan.1
|                     |
|    +-------------------------------+
|    |           vlan 1              |
|    |                               |
|    |     brlan (vlan-filtering)    |
|    +---------------+               |
|    |  DSA-SWITCH   |               |
|    |               |    vlan 1     |
|    |               |      to       |
|    |   vlan 1      |   untagged    |
|    +---------------+---------------+
.         /                   \
 ------>lan0                 wlan1
        .  ^                 ^
        .  |                 |
        .  \_________________/
        .  bridge fastpath bypass
        .
        ^
     vlan 1 tagged packets

This combination of vlan-aware bridge and use of vlan-device turns out to be an excellent test setup. It also represents a pretty common vlan setup.

For pppoe, for now, it is assumed the pppoe-device is only in the forward path, not in the bridge path. Only passing along the pppoe tag, if it is present, is supported in the bridge fastpath.

TODO:

* Test wenxu's rebased selftest script.
* Test all related to pppoe (in both fastpath types).
* Find out if pkt->skb->dev->ifindex == out_dev->ifindex is always true.
* Replace "dev_get_by_index(dev_net(nft_in(pkt)), nft_in(pkt)->ifindex)" by "nft_in(pkt)"?
* General cleanup.
2 Likes

Update:

Tested and fixed pppoe in forward fastpath as the fastpath does not go through the pppoe device anymore.

Fixed bug in handling vlan in bridge fastpath, also looks ok now.

So basically all functionality is now complete.

TODO:

* Test passing pppoe tagged packets in bridge fastpath.
* Test IEEE 802.1ad (double vlan tagging)
* Test wenxu's rebased selftest script.
* General cleanup.
1 Like

Latest changes now in bpir-nftflow-net-next branch, which is based on net-next

Now functional, pppoe and double vlan (tested in forward path). These tags are now pushed in the software fastpath, same manner as the hardware offloaded path. No more need to use the vlan- and/or pppoe-device as part of the software fastpath (which is not possible in the bridge fastpath).

TODO:

* Test vlan + pppoe combined.

* Test passing pppoe tagged packets in bridge fastpath.
* Test IEEE 802.1ad / QinQ  in bridge fastpath
* General cleanup.
2 Likes

For anyone interested, I’ve posted my test-setup scripts for testing all different kinds of network connections:

Test scripts

One script on a host that is most near to the internet connection, upstream. One script for a host that is connected to the first host, down-stream.

The scripts assume that the selected interface $intf is not managed by NetworkManager, systemd-networkd or any other

Start both scripts with the same argument, see script for argument options.

no argument = b
b = bridge (on upstream host)
i = ipforwarded (on upstream host)
p = pppoe setup
q = 802.1q single vlan
qq = 802.1q QinQ double vlan
ad = 802.1ad double vlan
qp = vlan and pppoe combined

Not all very difficult to setup, but easy pppoe setup was not documented very well, so maybe this info can help more testers.

One may need to change some ip numbers, I published this more as an example then ready to go scripts.

  • EDITED *

All is looking good, but checking with:

cat /sys/kernel/debug/ppe0/bind

I only get 1 entry, only 1 direction.

The other direction will be on another ppe, since some patches added in 6.11, so I need to cat the other ppe also…

So from 6.11 we need to check with:

cat /sys/kernel/debug/ppe*/bind

So back to testing the last items…

2 Likes

So it turns out that the conntrack system for the bridge family does not track these at all (yet). Only tracks untagged and 802.1q tagged.

Edit:

So I’ve added a patch to add double vlan and pppoe to bridge conntrack. These packets can now be conntracked when going through a bridge.

So back to testing these packets in the (hardware offloaded) bridge fastpath…

Edit2:

After also adding double vlan and pppoe to the bridge filter chain, it looks like all netfilter patches are finished.

However, passing 802.1ad and QinQ through a bridge (eth0 and eth1 on R3mini) the packets are not HW_OFFLOAD-ing, even when they are OFFLOAD-ed. Strangely enough, PPPoE-in-Q packets are HW_OFFLOAD-ing, which are also packets with 2 encapsulations.

Edit3:

Looks like it is not supported by mediatek ppe:

https://elixir.bootlin.com/linux/v6.11.1/source/drivers/net/ethernet/mediatek/mtk_ppe_offload.c#L351

		case FLOW_ACTION_VLAN_PUSH:
			if (data.vlan.num == 1 ||
			    act->vlan.proto != htons(ETH_P_8021Q))
				return -EOPNOTSUPP;

So no more then 1 vlan tag supported and also 802.1ad tag not supported.

1 Like

So the mtk-ppe does not seem to support the 802.1ad tag, however, the code in mainline does already partially support Q-in-Q double vlan.

So I’ve written a patch that allows Q-in-Q offload and it works fine on the BPI-R3(mini), on non-dsa-ports and dsa-ports.

That will be a separate patch I could send RFC.

https://github.com/ericwoud/linux/commit/806d26b7d7fc579d26afa7e36eeaf6da91c596f2

1 Like

For PPE documentation you mentioned in the commit, currently the only available one is MT7620’s register manual. Maybe you can check that?

Follow the RFC v1 here:

Patchwork: Show patches with: Series = bridge-fastpath and related improvements

And the mediatek patch:

Patchwork: [RFC,v1,net-next] net: ethernet: mtk_ppe_offload: Allow QinQ

2 Likes

Thanks

2.19.2 PPE Features
...
 4 VLAN tagging (Q-in-Q)
---

So in this case, it should be supported by hardware, but is the code in mtk_ppe.c and the new code in mtk_ppe_offload.c sufficient for use in all affected socs?

I’m hoping the get an answer in a reply to the RFC.

You can also try to check hnat code https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/

Not all hardware functions are implemented by mainline driver, but hnat did that.

All patches work for dsa, but for other switchdevs, it doesn’t.

I’ve ordered an AM335x board, so I can test the patches on a switchdev.

When I get it set up, can continue with patches to net-next. So a delay of a couple of weeks to get this board setup for development.

Also found an issue, where a flag BR_VLFLAG_ADDED_BY_SWITCHDEV should never be set for a foreign dsa port. As it turns out, the entire implementation about BR_VLFLAG_ADDED_BY_SWITCHDEV is probably broken and needs a fix, even disregarding my patch-set.

2 Likes

RFC v2 here:

Patchwork: Show patches with: Series = V2 bridge-fastpath and related improvements

Cover letter:

https://www.spinics.net/lists/netdev/msg1047466.html

2 Likes

Actually cover letter is also here: https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/

Also I’m curious if you will keep trying to push in PPE QinQ patch?

There was no response. Maybe we need to contact someone at Mediatek for this…

1 Like

How did you go to the cover letter?

There’s an “expand” button

Maybe Felix can help?