Null pointer dereference after enabling WLAN

That happened on r21897-d40f59825a - when I enabled WLAN I started seeing Banana PI dropping from network completly. After investigation I noticed that it seems to crash in kernel:

    [  603.909990] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[  603.918768] Mem abort info:
[  603.921556]   ESR = 0x0000000096000005
[  603.925289]   EC = 0x25: DABT (current EL), IL = 32 bits
[  603.930587]   SET = 0, FnV = 0
[  603.933626]   EA = 0, S1PTW = 0
[  603.936752]   FSC = 0x05: level 1 translation fault
[  603.941614] Data abort info:
[  603.944478]   ISV = 0, ISS = 0x00000005
[  603.948296]   CM = 0, WnR = 0
[  603.951251] user pgtable: 4k pages, 39-bit VAs, pgdp=0000000045e98000
[  603.957671] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  603.966354] Internal error: Oops: 96000005 [#1] SMP
[  603.971214] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_qn
[  604.040226] CPU: 3 PID: 1592 Comm: hostapd Not tainted 5.15.89 #0
[  604.046299] Hardware name: Bananapi BPI-R3 (DT)
[  604.050812] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  604.057752] pc : sta_set_sinfo+0xa10/0xbb0 [mac80211]
[  604.062822] lr : sta_set_sinfo+0x5b4/0xbb0 [mac80211]
[  604.067871] sp : ffffffc00a1fb7a0
[  604.071169] x29: ffffffc00a1fb7a0 x28: 0000000000000001 x27: ffffffc00a1fbdd0
[  604.078284] x26: ffffff800007e880 x25: ffffff8005404900 x24: ffffff80054e8735
[  604.085400] x23: 000000000009f15c x22: ffffff80057ccb00 x21: 0000058d1783bf82
[  604.092517] x20: ffffff80054e8000 x19: ffffff8003da8900 x18: 0000000000000000
[  604.099631] x17: 00000000000007c0 x16: ffffffc008ee6000 x15: 00000000000003e0
[  604.106747] x14: ffffff80054e8df8 x13: ffffff80054e8df8 x12: 0000000000000000
[  604.113863] x11: 0000000000000000 x10: ffffff80054e8e00 x9 : 0000000000000000
[  604.120977] x8 : ffffff8003da8a00 x7 : 0000000000002010 x6 : ffffff8003b303a0
[  604.128093] x5 : ffffff8003b30880 x4 : 0000000000000000 x3 : ffffff8003da8944
[  604.135207] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 0000000000000000
[  604.142322] Call trace:
[  604.144755]  sta_set_sinfo+0xa10/0xbb0 [mac80211]
[  604.149461]  sta_set_sinfo+0xb0c/0xbb0 [mac80211]
[  604.154163]  sta_info_destroy_addr_bss+0x4c/0x70 [mac80211]
[  604.159733]  ieee80211_color_change_finish+0x1bf8/0x1e80 [mac80211]
[  604.165994]  cfg80211_check_station_change+0x1384/0x4720 [cfg80211]
[  604.172254]  genl_family_rcv_msg_doit+0xb4/0x110
[  604.176857]  genl_rcv_msg+0xd0/0x1c0
[  604.180416]  netlink_rcv_skb+0x58/0x120
[  604.184235]  genl_rcv+0x34/0x50
[  604.187361]  netlink_unicast+0x1f0/0x2ec
[  604.191270]  netlink_sendmsg+0x19c/0x3d0
[  604.195178]  ____sys_sendmsg+0x21c/0x260
[  604.199089]  ___sys_sendmsg+0x80/0xf0
[  604.202735]  __sys_sendmsg+0x44/0xa0
[  604.206295]  __arm64_sys_sendmsg+0x20/0x30
[  604.210374]  invoke_syscall.constprop.0+0x4c/0xe0
[  604.215061]  do_el0_svc+0x40/0xd0
[  604.218359]  el0_svc+0x14/0x50
[  604.221401]  el0t_64_sync_handler+0xe0/0x110
[  604.225655]  el0t_64_sync+0x158/0x15c
[  604.229305] Code: d3441c42 12000c00 8b020cc2 f9409c42 (f9400446)
[  604.235378] ---[ end trace 460a361aac9e5e4a ]---
spi-nand: spi_nand spi_nand@1: Winbond SPI NAND was found.
spi-nand: spi_nand spi_nand@1: 128 MiB, block size: 128 KiB, page size: 2048, OOB size: 64
jedec_spi_nor spi_nor@0: unrecognized JEDEC id bytes: 00, ef, aa

After that router rebooted into NAND despite boot config being eMMC and subsequent reboots went to NAND. Is it a known problem? How to WAR / where to report it.

I tried to disable WLAN roaming and update to newest snapshot (r21950-90dbdb4941) so I’ll see how it will go.

If you have self compiled openwrt you can try to find the cause by adding printk’s in the function where it crashes (after openwrt build-chain have applied its patches)

https://elixir.bootlin.com/linux/latest/source/net/mac80211/sta_info.c#L2474

Better place for reporting is the openwrt github issue section

From stacktrace it looks like the function calls itself (recursion) and on this 2nd call it tries to access a pointer which is null…