mark05
(Holger Glaess)
May 26, 2021, 7:15am
25
hi
in the most cases it make no sense to delete the mac addresses from the
member interfaces of an bridge.
the vlanfilter from the linux bridge is almost not necessary .
checkout the documentaton about vlanfilter at linux bridge.
holger
ericwoud
(Eric W.)
May 26, 2021, 7:26am
26
Hi Holger,
I would like to use vlan for setting up guest wifi.
I would like to be able to delete fdb self entries, to be able to setup a home network totally on layer 2 bridge level, between router and accesspoints The fdb needs to be tidied up to support roaming over this network. After that I would like to setup 802.11r fast roaming. I already have this set up successfully on Marvell chips of the Linksys routers, but code development is at full stop for wifi drivers. Therefore I am migrating to mediatek.
As long as the fdb self entry stays stuck on lanx to a remote router, it is impossible for a wifi client to connect to local wifi in the same bridge.
See all links below about this roaming issue on. It is dsa related, and I’ve experienced that this is also happening on the mediatek dsa switch.
For now I use my own userspace program to tidy up the fdb, I call it bridgefdbd , but it cannot delete fdb self entries, same reason iproute2 cannot. I use the program until the kernel drivers are fixed. The last link seems like a fix for Marvell for automatically deleting the stuck self entries within the kernel
https://www.spinics.net/lists/netdev/msg642583.html
https://gitlab.nic.cz/turris/os/build/-/issues/165
https://github.com/Chadster766/McDebian/issues/70
https://www.spinics.net/lists/netdev/msg645130.html
GOOD NEWS:
It seems for Marvell they finally after years fixed this in following commit: yay
It says: The switch may generate an “ATU violation” warning when a client moves
from the CPU port to a switch port because the static ATU entry added by
DSA core still points to the CPU port. DSA core will then clear the static
entry so it is not fatal. Disable the warning so it will not confuse users.
Marvell mv88e6xxx switch series cannot perform MAC learning from
CPU-injected (FROM_CPU) DSA frames, which results in 2 issues.
- excessive flooding, due to...
changed 15 files
with 1180 additions
and 0 deletions .
frank-w
(Frank W.)
May 26, 2021, 11:04am
27
as the commit is titled “dsa roaming fix” there was one for mt7530 (i guess mt7531 as it’s the same driver) too:
https://patchwork.kernel.org/project/linux-mediatek/patch/[email protected] /
Btw. It was same author for marvell
ericwoud
(Eric W.)
May 26, 2021, 11:25am
28
Thanks. Have to check this out then. Maybe first try without vlan.
ericwoud
(Eric W.)
May 26, 2021, 7:23pm
29
Ok, now I have tested wifi roaming on a home network connected router-lan to bpir64-lan.
It works when vlan is disabled, but it also works if vlan is enabled, but with vlan 1 only!
This is with enabled vlan, vlan 1. First command, phone connected to remote router, second command, phone roamed to BPI-R64.
root@bpi-r64:~# bridge fdb | grep aa:bb:cc:dd:ee:ff
aa:bb:cc:dd:ee:ff dev lan0 vlan 1 master brlan
aa:bb:cc:dd:ee:ff dev lan0 vlan 1 self
root@bpi-r64:~# bridge fdb | grep aa:bb:cc:dd:ee:ff
aa:bb:cc:dd:ee:ff dev wlan0 vlan 1 master brlan
Phone has good internet connectivity
Now the same setup but with vlan 2
root@bpi-r64:~# bridge fdb | grep aa:bb:cc:dd:ee:ff
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 master brlan
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 self
root@bpi-r64:~# bridge fdb | grep aa:bb:cc:dd:ee:ff
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 self
aa:bb:cc:dd:ee:ff dev wlan0 vlan 2 master brlan
This last result makes the phone not able to reach the dhcp server (on the remote router). No connectivity.
The fix ‘net: dsa: mt7530: fix roaming from DSA user ports’ does not work anymore when vlan id does not equal 1.
I guess it suffers from the same bug that fdb ‘seff’ entries with vlan’s other than 1 cannot be deleted.
ericwoud
(Eric W.)
June 10, 2021, 9:26am
30
And where to post this bug report? Would be nice someone could fix it.
frank-w
(Frank W.)
June 10, 2021, 12:51pm
31
You could look for author/maintainers/reviewer of driver (get_maintainer.pl) and ask them (better include mailinglist)
$ scripts/get_maintainer.pl drivers/net/dsa/mt7530.c
ericwoud
(Eric W.)
June 19, 2021, 12:52pm
32
Send, let’s see if someone can help…
1 Like
ericwoud
(Eric W.)
July 16, 2021, 9:04pm
33
Found a fix so that deleting manually works now. However automatic deletion when roaming still does not work.
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=11d8d98cbeef
Now at least I can use my bridgefdbd program. It can now keep the fdb cleaned up.
So I am happy for now…
1 Like
ericwoud
(Eric W.)
July 20, 2021, 3:33pm
34
frank-w
(Frank W.)
July 20, 2021, 3:59pm
35
all is working now? i thought you have still problems with vlan 1
ericwoud
(Eric W.)
July 20, 2021, 8:24pm
36
No, only for vid larger than 1. This is now fixed.
ericwoud
(Eric W.)
July 21, 2021, 2:51pm
37
I was quite happy, being able to delete the ‘self’ entries manually.
Upgrading to kernel 5.14 just broke that again.
The DSA driver has changed and now manipulation of ‘self’ entries using the bridge command (and also bridgefdbd) is no longer implemented. This should be great, and wifi roaming on vlan 1 still works great.
However, wifi roaming on vlan greater than 1 is still not working correctly. The automaitc deletion of the ‘self’ entry is still not working (as in kernel 5.12) but manual deletion is now also not possible anymore…
In 5.14, I guess it should be so, this ‘self’ entry is now deleted together with the ‘master static’ entry, but it does not happen…
frank-w
(Frank W.)
July 21, 2021, 4:06pm
38
ericwoud
(Eric W.)
July 22, 2021, 8:16am
39
ericwoud
(Eric W.)
July 26, 2021, 7:42pm
40
Can confirm it works now.
Vladimir is working on a major makeover of the coupling of the software bridge’s fdb and the dsa-switch fdb. It is now in netdev net-next git. I should be included starting from kernel 5.15-rc1.
Only need to add:
ds->assisted_learning_on_cpu_port = true;
to mt7531_setup() in mt7530.c
Bridge fdb now looks like this, when roaming from router to (local) accesspoint to router:
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 master brlan
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 self
root@bpi-r64:~# bridge fdb | grep aa:bb:cc:dd:ee:ff
aa:bb:cc:dd:ee:ff dev eth0 self permanent
aa:bb:cc:dd:ee:ff dev wlan0 vlan 2 offload master brlan
root@bpi-r64:~# bridge fdb | grep aa:bb:cc:dd:ee:ff
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 master brlan
aa:bb:cc:dd:ee:ff dev lan0 vlan 2 self
All done inside the kernel.
Not even need to do anything with fdb in userspace anymore, so no need for my bridgefdbd program anymore. Anyway, that is in major changeover to support 802.11k and 802.11v, since it already communicates with all instances of hostapd on the entire local network.
1 Like
ericwoud
(Eric W.)
August 4, 2021, 10:40am
41
Issue is now solved so IVL bit is always set.
Assisted learning enabled
In the following commit:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d851798584ffd497b6cf0ae68f9ba75afced0ec3
Thanks to DENG Qingfang
1 Like
Hey, I came across a 4-year-old thread that seems to describe similar issues we’re facing on OpenWrt:
opened 06:07PM - 30 Dec 22 UTC
bug
### Describe the bug
Subject: Regression: DSA breaks roaming to WLAN bridged to… VLAN
# Summary
When bridging a WLAN with a VLAN on DSA-enabled OpenWRT versions, an erroneous FDB entry creates 100% packet loss for 300 seconds.
# Summary analysis
When a VLAN-tagged packet is received by the device from the an Ethernet (LAN) port side, 2 FDB entries for the source MAC address of the packet are created in the forwarding database (FDB) of the bridge:
1. one VLAN-tagged FDB entry and
2. one VLAN-untagged FDB entry
Both FDB entries tell the switch to forward packets whose destination is that MAC address to that Ethernet port.
When then a packet from the same source is received by the device from the WiFi (WLAN) side (e.g. after roaming a client device to that WiFi) and forwarded to the Ethernet (LAN) side (with a VLAN tag), only 1 of these 2 FDB entries are updated (namely the VLAN-tagged FDB entry). The other VLAN-untagged FDB entry remains unchanged, erroneously.
When then another VLAN-tagged packet (e.g. an ARP reply packet or a DHCP reply packet) is received by the device from the Ethernet (LAN) side and that packet's destination is the client device's MAC address (=the VLAN-untagged FDB entry's MAC address), then the switch believes that this packet ought to be forwarded to the Ethernet port (even although it is received from exactly this port), while it actually should be forwarded to the CPU of the router (and then further to the WLAN). This packet then gets dropped.
This packet loss continues until, after about 300 seconds (5 minutes), the erroneous FDB entry expires.
# Impact of this bug
This bug affects potentially all DSA-enabled platforms, but at least a sizable subset of all DSA-enabled platforms.
It has been verified to exist on:
1. OpenWRT 22.03 on BT HomeHub 5a (hh5a).
2. Turris OS 4.0 on [Turris Omnia](https://forum.turris.cz/t/omnia-vlan-on-dsa-port-breaks-arp-responses-tos-4-0-5/12584/37) already since the year 2020.
# How to reproduce
## Steps
1. Ensure that your OpenWRT device A and software supports Distributed Switch Architecture (DSA).
2. Setup your OpenWRT device A to have at least one WiFi interface and at least one ethernet port with VLAN support:
1. Create bridge device "br-switch", with the list of bridge ports only consisting of port "lan2".
2. Enable "VLAN filtering" for that bridge-device and create a VLAN with VLAN ID 31, local enabled and Egress tagged for port "lan2".
3. Create OpenWRT interface "users" with device "br-switch.31".
4. Create a OpenWRT WiFi interface with SSID "test". Ensure (under Interface Configuration/General Setup/Network) that it is connected to network "users".
5. Verify by running "brctl show" on the command line (e.g. using SSH) that there is a bridge called "br-switch" with at least 2 members (one member being "lan2" and the other member being a WiFi interface, for example "wlan1-1").
6. Reboot the OpenWRT device A.
3. Connect your OpenWRT device A to another WiFi router B
1. Ensure that WiFi router B also support VLANs
2. Connect an Ethernet cable to port "lan2" of OpenWRT device A and a suitable Ethernet port of WiFi router B.
3. Setup WiFi router B such that it has a VLAN with VLAN ID 31 which is available (tagged) at the Ethernet port of WiFi router B.
4. Setup a WiFi interface on WiFi router B with the same SSID "test". Similarly to OpenWRT device A, ensure that that this WiFi interface is bridged to VLAN ID 31.
4. Connect a client device C to OpenWRT device A.
1. Run "ping" from the client device C to the IP address of WiFi router B.
2. Observe that client device C receives ping replies.
5. Roam to WiFi router B.
1. Change the physical position of client device C to be close to WiFi router B and away from OpenWRT device A. (You may need to reduce the output power of OpenWRT device A if the devices are close to each other.)
2. Wait 15 seconds.
3. Verify that client device C is now associated with WiFi router B. (You may verify this looking at the UI of WiFi router B or looking at the output of running "iw dev wlan0 link" on client device C.)
4. Observe that client device C still receives ping replies although its WLAN association has changed.
6. Roam to OpenWRT device A again.
1. Change the physical position of client device C to be close to OpenWRT device A and away from WiFi router B . (You may need to reduce the output power of WiFi router B if the devices are close to each other.)
2. Wait 15 seconds.
3. Verify that client device C is now associated with OpenWRT device A. (You may verify this looking at the UI of WiFi router B or looking at the output of running "iw dev wlan0 link" on client device C.)
## Expected result
It is expected that client device C still receives ping replies although its WLAN association has changed again.
## Observed result
It can be observed that client device C does not receive ping replies once its WLAN association has changed again.
# How to analyze
1. Install OpenWRT packages "ip-bridge" on your OpenWRT device A.
2. Re-run the reproduction steps 4, 5, and 6. Assume that the MAC address of client device C is "02:ff:04:05:06:07".
3. Run "bridge fdb show | grep 02:03:04:05:06:07" after step 3 but before step 4. Observe that the output is empty.
4. Run "bridge fdb show | grep 02:03:04:05:06:07" after step 4 but before step 5. Observe that the output is akin
```
02:ff:04:05:06:07 dev wlan1-1 vlan 31 master br-switch
```
This shows that there is one FDB entry which says that if there is a packet which is tagged with VLAN 31 and which has the destination MAC address 02:03:04:05:06:07, then it should be forwarded through device "wlan1-1".
5. Run "bridge -statistics fdb show | grep 02:03:04:05:06:07" after step 5 but before step 6. Observe that the output is akin
```
02:ff:04:05:06:07 dev lan2 vlan 31 master br-switch
02:ff:04:05:06:07 dev lan2 self
```
This shows that there is one FDB entry which says that if there is a packet which is tagged with VLAN 31 and which has the destination MAC address 02:03:04:05:06:07, then it should be forwarded through device "lan2", and another FDB entry which says that if there is a packet which has the destination MAC address 02:03:04:05:06:07, then it should be forwarded through device "lan2".
6. Run "bridge -statistics fdb show | grep 02:03:04:05:06:07" after step 6. Observe that the output is akin
```
02:ff:04:05:06:07 dev lan2 self
02:ff:04:05:06:07 dev wlan1-1 vlan 31 master br-switch
```
This shows that there is one FDB entry which says that if there is a packet which is tagged with VLAN 31 and which has the destination MAC address 02:03:04:05:06:07, then it should be forwarded through device "wlan1-1", and another FDB entry which says that if there is a packet which has the destination MAC address 02:03:04:05:06:07, then it should be forwarded through device "lan2".
# Working workarounds
## Wait 5 minutes
After 5 minutes, the erroneous FDB entry expires automatically.
## Delete the erroneous FDB entry explicitly once
Run "bridge fdb del 02:ff:04:05:06:07 dev lan2". Once you do this, you will observe immediately that the packets are not dropped anymore.
## Delete the erroneous FDB entry explicitly automatically
This bug is so pervasive that [somehone has created a workaround](https://forum.turris.cz/t/omnia-vlan-on-dsa-port-breaks-arp-responses-tos-4-0-5/12584/44) which deletes the erroneous FDB entries automatically.
## Downgrade to swconfig-enabled-OpenWRT versions (instead of DSA-enabled OpenWRT versions)
This is possible and works perfectly. However, not upgrading is not viable in the long run.
# Non-working workarounds
## Enabling learning_sync
Running
```
bridge link set dev lan2 learning_sync on
```
has no effect on the bug, it still occurs.
## Disabling learning
Running
```
bridge link set dev lan2 learning off
```
fails with
```
RTNETLINK answers: Not supported
```
# Analysis
The root cause seems to be the confusion between whether VLAN-untagged FDB entries should also apply to VLAN-tagged packets.
This confusion results in unequal treatment for VLAN-tagged packets incoming through the Ethernet port and forwarded to the CPU on the one hand and VLAN-tagged packets outgoing from the CPU through the Ethernet port on the other hand:
1. When a VLAN-tagged packet is incoming, 2 FDB entries are created (or updated)
2. When a VLAN-tagged packet is outgoing, only 1 FDB entry is created (or updated)
## Solution 1: VLAN-untagged FDB entries should not apply to VLAN-tagged packets
In this case,
1. when a VLAN-tagged packet is incoming, only 1 FDB entries should be created (or updated),
2. when a VLAN-tagged packet is outgoing, only 1 FDB entry should be created (or updated).
## Solution 2: VLAN-untagged FDB entries should also apply to VLAN-tagged packets
In this case,
1. when a VLAN-tagged packet is incoming, 2 FDB entries should be created (or updated),
2. when a VLAN-tagged packet is outgoing, 2 FDB entry should be created (or updated).
This confusion is evidently software-based, as non-DSA versions of OpenWRT do not exhibit this bug.
The exact location of this confusion is (currently) unknown.
### OpenWrt version
r19803-9a599fee93
### OpenWrt target/subtarget
lantiq/xrx200
### Device
hh5a (BT HomeHub 5a)
### Image kind
Official downloaded image
### Steps to reproduce
1. Ensure that your OpenWRT device A and software supports Distributed Switch Architecture (DSA).
2. Setup your OpenWRT device A to have at least one WiFi interface and at least one ethernet port with VLAN support:
1. Create bridge device "br-switch", with the list of bridge ports only consisting of port "lan2".
2. Enable "VLAN filtering" for that bridge-device and create a VLAN with VLAN ID 31, local enabled and Egress tagged for port "lan2".
3. Create OpenWRT interface "users" with device "br-switch.31".
4. Create a OpenWRT WiFi interface with SSID "test". Ensure (under Interface Configuration/General Setup/Network) that it is connected to network "users".
5. Verify by running "brctl show" on the command line (e.g. using SSH) that there is a bridge called "br-switch" with at least 2 members (one member being "lan2" and the other member being a WiFi interface, for example "wlan1-1").
6. Reboot the OpenWRT device A.
3. Connect your OpenWRT device A to another WiFi router B
1. Ensure that WiFi router B also support VLANs
2. Connect an Ethernet cable to port "lan2" of OpenWRT device A and a suitable Ethernet port of WiFi router B.
3. Setup WiFi router B such that it has a VLAN with VLAN ID 31 which is available (tagged) at the Ethernet port of WiFi router B.
4. Setup a WiFi interface on WiFi router B with the same SSID "test". Similarly to OpenWRT device A, ensure that that this WiFi interface is bridged to VLAN ID 31.
4. Connect a client device C to OpenWRT device A.
1. Run "ping" from the client device C to the IP address of WiFi router B.
2. Observe that client device C receives ping replies.
5. Roam to WiFi router B.
1. Change the physical position of client device C to be close to WiFi router B and away from OpenWRT device A. (You may need to reduce the output power of OpenWRT device A if the devices are close to each other.)
2. Wait 15 seconds.
3. Verify that client device C is now associated with WiFi router B. (You may verify this looking at the UI of WiFi router B or looking at the output of running "iw dev wlan0 link" on client device C.)
4. Observe that client device C still receives ping replies although its WLAN association has changed.
6. Roam to OpenWRT device A again.
1. Change the physical position of client device C to be close to OpenWRT device A and away from WiFi router B . (You may need to reduce the output power of WiFi router B if the devices are close to each other.)
2. Wait 15 seconds.
3. Verify that client device C is now associated with OpenWRT device A. (You may verify this looking at the UI of WiFi router B or looking at the output of running "iw dev wlan0 link" on client device C.)
### Actual behaviour
It can be observed that client device C does not receive ping replies once its WLAN association has changed again.
### Expected behaviour
It is expected that client device C still receives ping replies although its WLAN association has changed again.
### Additional info
_No response_
### Diffconfig
_No response_
### Terms
- [X] I am reporting an issue for OpenWrt, not an unsupported fork.
The fixes mentioned there should already be included, but it looks like they haven’t been fully addressed in the current OpenWrt versions with the 6.6 kernel.
ericwoud
(Eric W.)
January 28, 2025, 9:33pm
43
Try disabling flow offloading.
I added to the firewall options in openwrt:
config defaults
option flow_offloading '0'
option flow_offloading_hw '0'
However, did not help. Sorry I completely forgot about this thread. Still searching for a solution.