Vlan enabled bridge bug?

hi

in the most cases it make no sense to delete the mac addresses from the member interfaces of an bridge.

the vlanfilter from the linux bridge is almost not necessary .

checkout the documentaton about vlanfilter at linux bridge.

holger

Hi Holger,

I would like to use vlan for setting up guest wifi.

I would like to be able to delete fdb self entries, to be able to setup a home network totally on layer 2 bridge level, between router and accesspoints The fdb needs to be tidied up to support roaming over this network. After that I would like to setup 802.11r fast roaming. I already have this set up successfully on Marvell chips of the Linksys routers, but code development is at full stop for wifi drivers. Therefore I am migrating to mediatek.

As long as the fdb self entry stays stuck on lanx to a remote router, it is impossible for a wifi client to connect to local wifi in the same bridge.

See all links below about this roaming issue on. It is dsa related, and I’ve experienced that this is also happening on the mediatek dsa switch.

For now I use my own userspace program to tidy up the fdb, I call it bridgefdbd , but it cannot delete fdb self entries, same reason iproute2 cannot. I use the program until the kernel drivers are fixed. The last link seems like a fix for Marvell for automatically deleting the stuck self entries within the kernel

https://www.spinics.net/lists/netdev/msg642583.html

https://gitlab.nic.cz/turris/os/build/-/issues/165

https://github.com/Chadster766/McDebian/issues/70

https://www.spinics.net/lists/netdev/msg645130.html

GOOD NEWS:

It seems for Marvell they finally after years fixed this in following commit: yay :blush:

It says: The switch may generate an “ATU violation” warning when a client moves from the CPU port to a switch port because the static ATU entry added by DSA core still points to the CPU port. DSA core will then clear the static entry so it is not fatal. Disable the warning so it will not confuse users.

as the commit is titled “dsa roaming fix” there was one for mt7530 (i guess mt7531 as it’s the same driver) too:

https://patchwork.kernel.org/project/linux-mediatek/patch/[email protected]/

Btw. It was same author for marvell :slight_smile:

Thanks. Have to check this out then. Maybe first try without vlan.

Ok, now I have tested wifi roaming on a home network connected router-lan to bpir64-lan.

It works when vlan is disabled, but it also works if vlan is enabled, but with vlan 1 only!

This is with enabled vlan, vlan 1. First command, phone connected to remote router, second command, phone roamed to BPI-R64.

root@bpi-r64:~# bridge fdb | grep aa:bb:cc:dd:ee:ff
aa:bb:cc:dd:ee:ff dev lan0 vlan 1 master brlan 
aa:bb:cc:dd:ee:ff dev lan0 vlan 1 self 
root@bpi-r64:~# bridge fdb | grep aa:bb:cc:dd:ee:ff
aa:bb:cc:dd:ee:ff dev wlan0 vlan 1 master brlan 

Phone has good internet connectivity :slight_smile:

Now the same setup but with vlan 2

root@bpi-r64:~# bridge fdb | grep aa:bb:cc:dd:ee:ff
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 master brlan 
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 self 
root@bpi-r64:~# bridge fdb | grep aa:bb:cc:dd:ee:ff
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 self 
aa:bb:cc:dd:ee:ff dev wlan0 vlan 2 master brlan 

This last result makes the phone not able to reach the dhcp server (on the remote router). No connectivity.

The fix ‘net: dsa: mt7530: fix roaming from DSA user ports’ does not work anymore when vlan id does not equal 1.

I guess it suffers from the same bug that fdb ‘seff’ entries with vlan’s other than 1 cannot be deleted.

And where to post this bug report? Would be nice someone could fix it.

You could look for author/maintainers/reviewer of driver (get_maintainer.pl) and ask them (better include mailinglist)

$ scripts/get_maintainer.pl drivers/net/dsa/mt7530.c

Send, let’s see if someone can help…

1 Like

Found a fix so that deleting manually works now. However automatic deletion when roaming still does not work.

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=11d8d98cbeef

Now at least I can use my bridgefdbd program. It can now keep the fdb cleaned up.

So I am happy for now…

1 Like

And already a fix applied to it, only use the ivl bit for vid larger than 1.

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=7e777021780e9c373fc0c04d40b8407ce8c3b5d5

all is working now? i thought you have still problems with vlan 1

No, only for vid larger than 1. This is now fixed.

I was quite happy, being able to delete the ‘self’ entries manually.

Upgrading to kernel 5.14 just broke that again.

The DSA driver has changed and now manipulation of ‘self’ entries using the bridge command (and also bridgefdbd) is no longer implemented. This should be great, and wifi roaming on vlan 1 still works great.

However, wifi roaming on vlan greater than 1 is still not working correctly. The automaitc deletion of the ‘self’ entry is still not working (as in kernel 5.12) but manual deletion is now also not possible anymore…

In 5.14, I guess it should be so, this ‘self’ entry is now deleted together with the ‘master static’ entry, but it does not happen…

i guess thats what vladimir talkes about…https://patchwork.kernel.org/project/linux-mediatek/patch/[email protected]/#24317677

Vladimir found an issue, I need to try it one of these days:

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=2b0a5688493a

Can confirm it works now.

Vladimir is working on a major makeover of the coupling of the software bridge’s fdb and the dsa-switch fdb. It is now in netdev net-next git. I should be included starting from kernel 5.15-rc1.

Only need to add:

	ds->assisted_learning_on_cpu_port = true;

to mt7531_setup() in mt7530.c

Bridge fdb now looks like this, when roaming from router to (local) accesspoint to router:

aa:bb:cc:dd:ee:ff dev lan0 vlan 2 master brlan 
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 self 
root@bpi-r64:~# bridge fdb | grep aa:bb:cc:dd:ee:ff
aa:bb:cc:dd:ee:ff dev eth0 self permanent
aa:bb:cc:dd:ee:ff dev wlan0 vlan 2 offload master brlan 
root@bpi-r64:~# bridge fdb | grep aa:bb:cc:dd:ee:ff
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 master brlan 
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 self 

All done inside the kernel.

Not even need to do anything with fdb in userspace anymore, so no need for my bridgefdbd program anymore. Anyway, that is in major changeover to support 802.11k and 802.11v, since it already communicates with all instances of hostapd on the entire local network.

1 Like

Issue is now solved so IVL bit is always set.

Assisted learning enabled

In the following commit:

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d851798584ffd497b6cf0ae68f9ba75afced0ec3

Thanks to DENG Qingfang

1 Like

Hey, I came across a 4-year-old thread that seems to describe similar issues we’re facing on OpenWrt:

The fixes mentioned there should already be included, but it looks like they haven’t been fully addressed in the current OpenWrt versions with the 6.6 kernel.

Try disabling flow offloading.

I added to the firewall options in openwrt:


config defaults
    option flow_offloading '0'
    option flow_offloading_hw '0'

However, did not help. :confused: Sorry I completely forgot about this thread. Still searching for a solution.