Do you all (who have this issue) have bridged lanports? Have you tried if issue is there without the bridge?
Hi all, it looks like this thread happened to intermix several different issues related to network performance.
Letās try decouple them because it is not easily possible to fix an issue that is a combination of different issues with different causes.
First of all, letās keep encryption performance out of a table as (a) it is expected to be not that good on ARM embedded CPU and (b) thereās no point in speeding up encryption if the underlying network performance is unstable/broken.
Next, I think that it is a good idea to distinguish the following use cases and their performance:
- BPiR2 acting as a simple switch. All lanX ports are bridged together, communicating boxes have 1Gbps network cards and reasonably fast CPUs, all connected to LAN ports of R2. This way it will be possible to test dsa/hw switching performance of the R2. In ideal case scenario there should be no CPU load on R2 and throughput between boxes connected to LAN1<->LAN3 and LAN2<->LAN4 should be ~1Gbps in each direction all running at the same time. I hadnāt checked hw specs for R2 switch chip so donāt know if it is capable/fast enough to provide sustained 4Gbps internal switching throughput. Also a good idea would be to check if using jumbo-frames would have any effect on achieved bandwidth.
- BPiR2 acting as a simple router. All lanX ports should be configured as a standalone interfaces with separate subnets, ip_forwarding should be enabled, all netfilter rules cleaned out and default polices set to ACCEPT. For the same LAN1<->LAN3 + LAN2<->LAN4 bidirectional communication scenario throughput is expected to be 1/4 of 1Gbps at best as it will be physically limited by the single 1Gbps CPU port all traffic will be routed through and Iād expect speed to be even more limited by the R2 CPU performance.
- BPiR2 acting as a NAT router. Here a good testing setup would be one box connected to WAN port and another box connected to any LAN port. Best performance should be achieved when using MTK hwnat but I wouldnāt raise hopes too high - 500-600Mbps here in each direction would be a really good result if achieved. Tests should be repeated for both LAN ports bridged together and for LAN ports each set to be a separate interface.
- Same as pt.3 above but coupled with two more boxes connected to LAN ports and bidirectionally communicating with each other but not with WAN. All LAN ports should be bridged together.
- BPiR2 own network performance. Here the communication speed between R2 itself and boxes connected to LAN or to WAN ports should be tested in all possible combinations. General expectation is R2 to be able to receive traffic at more or less good pace but failing to sustain the same throughput when generating traffic due to R2 CPU limits.
And - last but not least - there might be more serious issues with network stack - and I believe they are looking at the stacktraces in dmesg I get on my testing R2 which look the same as stacktrace that been posted in this thread by @cioby23. I will post my stacktraces and conditions I get them at in a separate post.
As mentioned earlier here are my findings on BPiR2 ethernet speed and behavior with kernel 4.14 (tested OpenWRT version compiled from source - BananaPI BPI-R2 Openwrt18.06 Demo Image and Source Code Release 2019-03-06 and version from frankās 4.14-main branch). For now Iām leaving performance figures aside as it seems that generating 1Gbps of traffic using iperf or iperf3 from R2 towards the host connected to any of lanX ports leads to a network stall accompanied with the NETDEV WATCHDOG message appearing in the kernel log. Here is one from frankās 4.14-main kernel:
[ 75.991517] ------------[ cut here ]------------
[ 75.996146] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x244/0x248
[ 76.004376] NETDEV WATCHDOG: eth1 (mtk_soc_eth): transmit queue 0 timed out
[ 76.011282] Modules linked in: ipv6 qcserial pppoe ppp_async usb_wwan rndis_host qmi_wwan pppox ppp_generic pl2303 nf_nat_pptp nf_conntrack_pptp lz4 iptable_nat ipt_REJECT ipt_MASQUERADE cdc_ether ax88179_178a asix xt_time xt_tcpudp xt_state xt_recent xt_policy xt_nat xt_multiport xt_mark xt_mac xt_limit xt_helper xt_esp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_CT usbserial usbnet slhc nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_conntrack_ipv4 nf_nat_ipv4 nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_log_ipv4 nf_log_common nf_defrag_ipv4 nf_conntrack_tftp nf_conntrack_snmp nf_conntrack_sip nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_irc nf_conntrack_h323
[ 76.081771] nf_conntrack_ftp nf_conntrack_broadcast ts_kmp nf_conntrack_amanda nf_conntrack lz4_decompress lz4_compress iptable_raw iptable_mangle iptable_filter ipt_ah ip_tables crc_ccitt cdc_wdm fuse xt_set x_tables ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink nfsv4 nfsv3 nfsd nfs ipcomp xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet esp4 ah4 tunnel4 rpcsec_gss_krb5 auth_rpcgss oid_registry tun loop af_key xfrm_user xfrm_ipcomp xfrm_algo lockd sunrpc grace dns_resolver raid10 raid1 raid0 md_mod nls_utf8 nls_cp866 nls_cp1251
[ 76.152022] zram zsmalloc md5 echainiv ecb des_generic cts cbc authenc nls_iso8859_1 nls_cp437 uas usb_storage leds_gpio ohci_platform ohci_hcd ehci_pci ehci_platform ehci_hcd mii
[ 76.168053] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.14.117-4.14-main #2
[ 76.174959] Hardware name: Mediatek Cortex-A7 (Device Tree)
[ 76.180508] [<c010ec84>] (unwind_backtrace) from [<c010abf0>] (show_stack+0x10/0x14)
[ 76.188199] [<c010abf0>] (show_stack) from [<c05af784>] (dump_stack+0x78/0x8c)
[ 76.195370] [<c05af784>] (dump_stack) from [<c011738c>] (__warn+0xe8/0x100)
[ 76.202284] [<c011738c>] (__warn) from [<c01173dc>] (warn_slowpath_fmt+0x38/0x48)
[ 76.209714] [<c01173dc>] (warn_slowpath_fmt) from [<c051d240>] (dev_watchdog+0x244/0x248)
[ 76.217837] [<c051d240>] (dev_watchdog) from [<c016af54>] (call_timer_fn.constprop.3+0x28/0x98)
[ 76.226472] [<c016af54>] (call_timer_fn.constprop.3) from [<c016b04c>] (expire_timers+0x88/0x94)
[ 76.235193] [<c016b04c>] (expire_timers) from [<c016b124>] (run_timer_softirq+0xcc/0x194)
[ 76.243314] [<c016b124>] (run_timer_softirq) from [<c0101578>] (__do_softirq+0xe8/0x25c)
[ 76.251345] [<c0101578>] (__do_softirq) from [<c011c00c>] (irq_exit+0xbc/0x104)
[ 76.258604] [<c011c00c>] (irq_exit) from [<c0157410>] (__handle_domain_irq+0x80/0xec)
[ 76.266379] [<c0157410>] (__handle_domain_irq) from [<c010144c>] (gic_handle_irq+0x4c/0x90)
[ 76.274670] [<c010144c>] (gic_handle_irq) from [<c010b7cc>] (__irq_svc+0x6c/0xa8)
[ 76.282093] Exception stack(0xdf071f80 to 0xdf071fc8)
[ 76.287104] 1f80: 00000003 c069b9a4 1ef9f000 c0114000 ffffe000 c0903c74 c0903c28 c092809c
[ 76.295217] 1fa0: c069e04c 410fc073 00000000 00000000 00000000 df071fd0 c01082ac c01082b0
[ 76.303327] 1fc0: 60000013 ffffffff
[ 76.306796] [<c010b7cc>] (__irq_svc) from [<c01082b0>] (arch_cpu_idle+0x38/0x3c)
[ 76.314140] [<c01082b0>] (arch_cpu_idle) from [<c014c53c>] (do_idle+0xd0/0x138)
[ 76.321396] [<c014c53c>] (do_idle) from [<c014c84c>] (cpu_startup_entry+0x18/0x1c)
[ 76.328909] [<c014c84c>] (cpu_startup_entry) from [<8010178c>] (0x8010178c)
[ 76.335861] ---[ end trace 6601a547925c5504 ]---
[ 76.340452] mtk_soc_eth 1b100000.ethernet eth1: transmit timed out
And here is the same from OpenWRTās 4.14 18.02.1 kernel:
[35491.038734] ------------[ cut here ]------------
[35491.043342] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:320 dev_watchdog+0x158/0x224
[35491.051569] NETDEV WATCHDOG: eth1 (mtk_soc_eth): transmit queue 0 timed out
[35491.058472] Modules linked in: qcserial pppoe ppp_async usb_wwan rndis_host qmi_wwan pppox ppp_generic pl2303 nf_nat_pptp nf_conntrack_pptp mt76x2e mt76x2_common mt76x02_lib mt7603e mt76 mac80211 lz4 iptable_nat ipt_REJECT ipt_MASQUERADE cfg80211 cdc_ether ax88179_178a asix xt_time xt_tcpudp xt_state xt_recent xt_policy xt_nat xt_multiport xt_mark xt_mac xt_limit xt_helper xt_esp xt_conntrack xt_connmark xt_connlimit xt_connbytes xt_comment xt_TCPMSS xt_REDIRECT xt_LOG xt_FLOWOFFLOAD xt_CT usbserial usbnet ts_fsm ts_bm slhc nf_reject_ipv4 nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_redirect nf_nat_proto_gre nf_nat_masquerade_ipv4 nf_nat_irc nf_conntrack_ipv4 nf_nat_ipv4 nf_nat_h323 nf_nat_ftp nf_nat_amanda nf_nat nf_log_ipv4 nf_log_common nf_flow_table_hw nf_flow_table nf_defrag_ipv4 nf_conntrack_tftp
[35491.129389] nf_conntrack_snmp nf_conntrack_sip nf_conntrack_rtcache nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp nf_conntrack_broadcast ts_kmp nf_conntrack_amanda nf_conntrack lz4_decompress lz4_compress iptable_raw iptable_mangle iptable_filter ipt_ah ip_tables crc_ccitt compat cdc_wdm fuse cryptodev xt_set x_tables ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink nfsv4 nfsv3 nfsd nfs ipcomp xfrm4_tunnel xfrm4_mode_tunnel xfrm4_mode_transport xfrm4_mode_beet esp4 ah4 tunnel4 rpcsec_gss_krb5 auth_rpcgss
[35491.199843] oid_registry tun loop af_key xfrm_user xfrm_ipcomp xfrm_algo lockd sunrpc grace dns_resolver raid10 raid1 raid0 md_mod nls_utf8 nls_cp866 nls_cp1251 zram zsmalloc md5 echainiv ecb des_generic cts cbc authenc nls_iso8859_1 nls_cp437 uas usb_storage leds_gpio ohci_platform ohci_hcd ehci_pci ehci_platform ehci_hcd gpio_button_hotplug mii
[35491.230449] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.105 #0
[35491.236489] Hardware name: Mediatek Cortex-A7 (Device Tree)
[35491.242032] [<c010eb14>] (unwind_backtrace) from [<c010ac1c>] (show_stack+0x10/0x14)
[35491.249717] [<c010ac1c>] (show_stack) from [<c05a351c>] (dump_stack+0x78/0x8c)
[35491.256885] [<c05a351c>] (dump_stack) from [<c0117068>] (__warn+0xe4/0x100)
[35491.263793] [<c0117068>] (__warn) from [<c01170bc>] (warn_slowpath_fmt+0x38/0x48)
[35491.271218] [<c01170bc>] (warn_slowpath_fmt) from [<c050c0a8>] (dev_watchdog+0x158/0x224)
[35491.279334] [<c050c0a8>] (dev_watchdog) from [<c01689b8>] (call_timer_fn.constprop.3+0x28/0x94)
[35491.287964] [<c01689b8>] (call_timer_fn.constprop.3) from [<c0168aa0>] (expire_timers+0x7c/0x98)
[35491.296680] [<c0168aa0>] (expire_timers) from [<c0168b38>] (run_timer_softirq+0x7c/0x160)
[35491.304791] [<c0168b38>] (run_timer_softirq) from [<c010155c>] (__do_softirq+0xe4/0x250)
[35491.312818] [<c010155c>] (__do_softirq) from [<c011bc0c>] (irq_exit+0xac/0xf4)
[35491.319984] [<c011bc0c>] (irq_exit) from [<c0155b9c>] (__handle_domain_irq+0xbc/0xe4)
[35491.327752] [<c0155b9c>] (__handle_domain_irq) from [<c0101440>] (gic_handle_irq+0x5c/0x90)
[35491.336035] [<c0101440>] (gic_handle_irq) from [<c010b7cc>] (__irq_svc+0x6c/0xa8)
[35491.343454] Exception stack(0xdf06ff88 to 0xdf06ffd0)
[35491.348463] ff80: 00000002 c06cd8b0 1ef95000 c0113d60 ffffe000 c0903c74
[35491.356573] ffa0: c0903c28 c092d610 8000406a 410fc073 00000000 00000000 00000000 df06ffd8
[35491.364682] ffc0: c010834c c0108350 60000013 ffffffff
[35491.369694] [<c010b7cc>] (__irq_svc) from [<c0108350>] (arch_cpu_idle+0x34/0x38)
[35491.377034] [<c0108350>] (arch_cpu_idle) from [<c014af48>] (do_idle+0xa8/0x11c)
[35491.384286] [<c014af48>] (do_idle) from [<c014b240>] (cpu_startup_entry+0x18/0x1c)
[35491.391795] [<c014b240>] (cpu_startup_entry) from [<8010176c>] (0x8010176c)
[35491.398715] ---[ end trace d583d83ddc5b60c6 ]---
[35491.403299] mtk_soc_eth 1b100000.ethernet eth1: transmit timed out
What is interesting is a fact that while actual traffic exchange happen on the eth0 in logs above transmit timed out refers to eth1. As soon as these messages appear in the kernel log any network exchange on both wan and lanX stalls and the only way to access the R2 is to use serial console. I had tried to āreanimateā the network by /etc/init.d/network restart
to no avail, then tried to put interfaces down and back up using iproute2 tool - also didnāt help. Hadnāt tried to rmmod/insmod mtk_soc_eth
and mt7530
as these were compiled into the kernel.
The issue is pretty easy to reproduce: boot the board, execute iperf -c 192.168.1.2 -i 1 -t600 -p 5001
or iperf3 -c 192.168.1.2 -i1 -fm -t600 -p 5002
and wait for 5-10 seconds. Most of the time measured bandwidth will start at ~950Mbps but then will cease to 0 with the NETDEV WATCHDOG message appearing in the kernel log. Sometimes if I leave the board running with this stalled network state I start to periodically get new stacktraces in the kernel log.
First one:
[ 108.541232] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 108.546864] 0-...: (1 GPs behind) idle=87e/140000000000002/0 softirq=1347/1347 fqs=1051
[ 108.554971] (detected by 1, t=2103 jiffies, g=405, c=404, q=38)
[ 108.560935] Sending NMI from CPU 1 to CPUs 0:
[ 108.565453] NMI backtrace for cpu 0
[ 108.565457] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.14.117-4.14-main #2
[ 108.565459] Hardware name: Mediatek Cortex-A7 (Device Tree)
[ 108.565461] task: c0906c40 task.stack: c0900000
[ 108.565463] PC is at __do_softirq+0x9c/0x25c
[ 108.565465] LR is at __do_softirq+0x8c/0x25c
[ 108.565467] pc : [<c010152c>] lr : [<c010151c>] psr: 20000113
[ 108.565469] sp : c0901eb0 ip : 0d01d83a fp : 00000000
[ 108.565471] r10: df008000 r9 : c069b9a4 r8 : 00000001
[ 108.565474] r7 : ffffe000 r6 : 000000e3 r5 : 00000000 r4 : 00000008
[ 108.565476] r3 : 00000000 r2 : c092dd00 r1 : c06a8078 r0 : 00000000
[ 108.565478] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 108.565481] Control: 10c5387d Table: 9d51006a DAC: 00000051
[ 108.565483] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.14.117-4.14-main #2
[ 108.565485] Hardware name: Mediatek Cortex-A7 (Device Tree)
[ 108.565488] [<c010ec84>] (unwind_backtrace) from [<c010abf0>] (show_stack+0x10/0x14)
[ 108.565490] [<c010abf0>] (show_stack) from [<c05af784>] (dump_stack+0x78/0x8c)
[ 108.565493] [<c05af784>] (dump_stack) from [<c05b4eb4>] (nmi_cpu_backtrace+0x6c/0xb4)
[ 108.565496] [<c05b4eb4>] (nmi_cpu_backtrace) from [<c010dbb0>] (handle_IPI+0xec/0x1b0)
[ 108.565498] [<c010dbb0>] (handle_IPI) from [<c010148c>] (gic_handle_irq+0x8c/0x90)
[ 108.565501] [<c010148c>] (gic_handle_irq) from [<c010b7cc>] (__irq_svc+0x6c/0xa8)
[ 108.565502] Exception stack(0xc0901e60 to 0xc0901ea8)
[ 108.565505] 1e60: 00000000 c06a8078 c092dd00 00000000 00000008 00000000 000000e3 ffffe000
[ 108.565508] 1e80: 00000001 c069b9a4 df008000 00000000 0d01d83a c0901eb0 c010151c c010152c
[ 108.565509] 1ea0: 20000113 ffffffff
[ 108.565512] [<c010b7cc>] (__irq_svc) from [<c010152c>] (__do_softirq+0x9c/0x25c)
[ 108.565514] [<c010152c>] (__do_softirq) from [<c011c00c>] (irq_exit+0xbc/0x104)
[ 108.565517] [<c011c00c>] (irq_exit) from [<c0157410>] (__handle_domain_irq+0x80/0xec)
[ 108.565520] [<c0157410>] (__handle_domain_irq) from [<c010144c>] (gic_handle_irq+0x4c/0x90)
[ 108.565522] [<c010144c>] (gic_handle_irq) from [<c010b7cc>] (__irq_svc+0x6c/0xa8)
[ 108.565524] Exception stack(0xc0901f48 to 0xc0901f90)
[ 108.565527] 1f40: 00000000 00000001 df7a4ac0 ffffa8a3 c08337e0 c092d480
[ 108.565530] 1f60: c0903c00 ffffffff c092d480 c0823a28 e07fcb80 00000000 00000000 c0901f98
[ 108.565531] 1f80: c017bae8 c017c188 20000013 ffffffff
[ 108.565534] [<c010b7cc>] (__irq_svc) from [<c017c188>] (tick_nohz_idle_enter+0x44/0x78)
[ 108.565537] [<c017c188>] (tick_nohz_idle_enter) from [<c014c47c>] (do_idle+0x10/0x138)
[ 108.565539] [<c014c47c>] (do_idle) from [<c014c84c>] (cpu_startup_entry+0x18/0x1c)
[ 108.565542] [<c014c84c>] (cpu_startup_entry) from [<c0800c80>] (start_kernel+0x3c0/0x3cc)
Second one:
[ 171.590892] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 171.596526] 0-...: (1 GPs behind) idle=87e/140000000000001/0 softirq=1347/1347 fqs=4204
[ 171.604633] (detected by 1, t=8408 jiffies, g=405, c=404, q=123)
[ 171.610681] Sending NMI from CPU 1 to CPUs 0:
[ 171.615197] NMI backtrace for cpu 0
[ 171.615201] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.14.117-4.14-main #2
[ 171.615203] Hardware name: Mediatek Cortex-A7 (Device Tree)
[ 171.615205] task: c0906c40 task.stack: c0900000
[ 171.615207] PC is at __do_softirq+0x9c/0x25c
[ 171.615209] LR is at __do_softirq+0x8c/0x25c
[ 171.615211] pc : [<c010152c>] lr : [<c010151c>] psr: 20000113
[ 171.615213] sp : c0901eb0 ip : 0d01d83a fp : 00000000
[ 171.615215] r10: df008000 r9 : c069b9a4 r8 : 00000001
[ 171.615217] r7 : ffffe000 r6 : 000000e3 r5 : 00000000 r4 : 00000008
[ 171.615220] r3 : 00000000 r2 : c092dd00 r1 : c06a8078 r0 : 00000000
[ 171.615222] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 171.615224] Control: 10c5387d Table: 9d51006a DAC: 00000051
[ 171.615227] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.14.117-4.14-main #2
[ 171.615229] Hardware name: Mediatek Cortex-A7 (Device Tree)
[ 171.615232] [<c010ec84>] (unwind_backtrace) from [<c010abf0>] (show_stack+0x10/0x14)
[ 171.615234] [<c010abf0>] (show_stack) from [<c05af784>] (dump_stack+0x78/0x8c)
[ 171.615237] [<c05af784>] (dump_stack) from [<c05b4eb4>] (nmi_cpu_backtrace+0x6c/0xb4)
[ 171.615239] [<c05b4eb4>] (nmi_cpu_backtrace) from [<c010dbb0>] (handle_IPI+0xec/0x1b0)
[ 171.615242] [<c010dbb0>] (handle_IPI) from [<c010148c>] (gic_handle_irq+0x8c/0x90)
[ 171.615244] [<c010148c>] (gic_handle_irq) from [<c010b7cc>] (__irq_svc+0x6c/0xa8)
[ 171.615246] Exception stack(0xc0901e60 to 0xc0901ea8)
[ 171.615249] 1e60: 00000000 c06a8078 c092dd00 00000000 00000008 00000000 000000e3 ffffe000
[ 171.615251] 1e80: 00000001 c069b9a4 df008000 00000000 0d01d83a c0901eb0 c010151c c010152c
[ 171.615253] 1ea0: 20000113 ffffffff
[ 171.615256] [<c010b7cc>] (__irq_svc) from [<c010152c>] (__do_softirq+0x9c/0x25c)
[ 171.615258] [<c010152c>] (__do_softirq) from [<c011c00c>] (irq_exit+0xbc/0x104)
[ 171.615261] [<c011c00c>] (irq_exit) from [<c0157410>] (__handle_domain_irq+0x80/0xec)
[ 171.615263] [<c0157410>] (__handle_domain_irq) from [<c010144c>] (gic_handle_irq+0x4c/0x90)
[ 171.615266] [<c010144c>] (gic_handle_irq) from [<c010b7cc>] (__irq_svc+0x6c/0xa8)
[ 171.615268] Exception stack(0xc0901f48 to 0xc0901f90)
[ 171.615270] 1f40: 00000000 00000001 df7a4ac0 ffffa8a3 c08337e0 c092d480
[ 171.615273] 1f60: c0903c00 ffffffff c092d480 c0823a28 e07fcb80 00000000 00000000 c0901f98
[ 171.615275] 1f80: c017bae8 c017c188 20000013 ffffffff
[ 171.615278] [<c010b7cc>] (__irq_svc) from [<c017c188>] (tick_nohz_idle_enter+0x44/0x78)
[ 171.615280] [<c017c188>] (tick_nohz_idle_enter) from [<c014c47c>] (do_idle+0x10/0x138)
[ 171.615283] [<c014c47c>] (do_idle) from [<c014c84c>] (cpu_startup_entry+0x18/0x1c)
[ 171.615285] [<c014c84c>] (cpu_startup_entry) from [<c0800c80>] (start_kernel+0x3c0/0x3cc)
And so on. Another possible but rare case is when netdev watchdog message appears in the kernel log but the network exchange donāt stall. In these case if I leave iperf running then after some time new stack trace appears in the kernel log:
[ 897.112518] mtk_soc_eth 1b100000.ethernet eth1: transmit timed out
[ 921.512426] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 921.518062] 0-...: (1 GPs behind) idle=a7e/2/0 softirq=6121/6123 fqs=526
[ 921.524878] (detected by 1, t=2363 jiffies, g=1819, c=1818, q=20)
[ 921.531014] Sending NMI from CPU 1 to CPUs 0:
[ 921.535539] NMI backtrace for cpu 0
[ 921.535543] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.14.117-4.14-main #2
[ 921.535545] Hardware name: Mediatek Cortex-A7 (Device Tree)
[ 921.535547] task: c0906c40 task.stack: c0900000
[ 921.535549] PC is at __do_softirq+0x9c/0x25c
[ 921.535551] LR is at __do_softirq+0x8c/0x25c
[ 921.535553] pc : [<c010152c>] lr : [<c010151c>] psr: 20000113
[ 921.535555] sp : c0901eb0 ip : 86f75b2f fp : 00000000
[ 921.535557] r10: df008000 r9 : c069b9a4 r8 : 00000001
[ 921.535560] r7 : ffffe000 r6 : 000000e3 r5 : 00000000 r4 : 00000008
[ 921.535562] r3 : 00000000 r2 : c092dd00 r1 : c06a8078 r0 : 00000000
[ 921.535564] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 921.535566] Control: 10c5387d Table: 9e1a406a DAC: 00000051
[ 921.535569] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.14.117-4.14-main #2
[ 921.535571] Hardware name: Mediatek Cortex-A7 (Device Tree)
[ 921.535574] [<c010ec84>] (unwind_backtrace) from [<c010abf0>] (show_stack+0x10/0x14)
[ 921.535576] [<c010abf0>] (show_stack) from [<c05af784>] (dump_stack+0x78/0x8c)
[ 921.535579] [<c05af784>] (dump_stack) from [<c05b4eb4>] (nmi_cpu_backtrace+0x6c/0xb4)
[ 921.535581] [<c05b4eb4>] (nmi_cpu_backtrace) from [<c010dbb0>] (handle_IPI+0xec/0x1b0)
[ 921.535584] [<c010dbb0>] (handle_IPI) from [<c010148c>] (gic_handle_irq+0x8c/0x90)
[ 921.535586] [<c010148c>] (gic_handle_irq) from [<c010b7cc>] (__irq_svc+0x6c/0xa8)
[ 921.535588] Exception stack(0xc0901e60 to 0xc0901ea8)
[ 921.535591] 1e60: 00000000 c06a8078 c092dd00 00000000 00000008 00000000 000000e3 ffffe000
[ 921.535594] 1e80: 00000001 c069b9a4 df008000 00000000 86f75b2f c0901eb0 c010151c c010152c
[ 921.535595] 1ea0: 20000113 ffffffff
[ 921.535598] [<c010b7cc>] (__irq_svc) from [<c010152c>] (__do_softirq+0x9c/0x25c)
[ 921.535600] [<c010152c>] (__do_softirq) from [<c011c00c>] (irq_exit+0xbc/0x104)
[ 921.535603] [<c011c00c>] (irq_exit) from [<c0157410>] (__handle_domain_irq+0x80/0xec)
[ 921.535606] [<c0157410>] (__handle_domain_irq) from [<c010144c>] (gic_handle_irq+0x4c/0x90)
[ 921.535608] [<c010144c>] (gic_handle_irq) from [<c010b7cc>] (__irq_svc+0x6c/0xa8)
[ 921.535610] Exception stack(0xc0901f48 to 0xc0901f90)
[ 921.535613] 1f40: 00000000 c069b9a4 1ef6f000 c0114000 ffffe000 c0903c74
[ 921.535615] 1f60: c0903c28 c092809c c069e04c c0823a28 e07fcb80 00000000 00000000 c0901f98
[ 921.535617] 1f80: c01082ac c01082b0 60000013 ffffffff
[ 921.535620] [<c010b7cc>] (__irq_svc) from [<c01082b0>] (arch_cpu_idle+0x38/0x3c)
[ 921.535622] [<c01082b0>] (arch_cpu_idle) from [<c014c53c>] (do_idle+0xd0/0x138)
[ 921.535624] [<c014c53c>] (do_idle) from [<c014c84c>] (cpu_startup_entry+0x18/0x1c)
[ 921.535627] [<c014c84c>] (cpu_startup_entry) from [<c0800c80>] (start_kernel+0x3c0/0x3cc)
After this stacktrace onward serial console becomes non-responsive but not dead: it echos back all characters I type, it reacts on Ctrl+S/Ctrl+Q, but donāt react on Ctrl-C or Ctrl-Z - i.e. kernel hadnāt hang completely but system went utterly broken.
I had repeated my tests with frankās kernel 4.9-main and wasnāt able to reproduce the problem. It also seems to be fixed in frankās kernel 5.1-p5detect2. Hadnāt tested with 5.0 or other kernels between 4.14 and 5.1 yet. Any ideas what might be the cause of the problem here?
Thanks for your test,can you please try with branch 5.1-main to know which problems are present in mainline (no net changes except wifi). Maybe revert wifi-changes there because this touches watchdog if you get same problems
Could anyone using 4.14 try reverting this patch:
And see if problem persists?
Thanks to @LeXa2 for finding this
Yep, as @frank-w already reported here I had managed to bisect my problem down to the patch 0027-net-next-mediatek-fix-DQL-support.patch
. As soon as I remove/revert it the network stall under high load is gone. As a stress-test I had used iperf and iperf3 executed in client mode on R2 with test duration set to 6000 seconds. This testing was finished moments ago and everything went fine. So, having this problem sorted out Iām about to try to measure some performance figures of R2 in line with the performance testing scheme Iāve been describing earlier.
So, performance benchmarks.
Initial setup
- Box A: x86-64 linux-based NAS, CPU AMD A4-5300 @3.4Ghz, MB Gigabyte GA-F2A88XM-HD3, built-in ethernet based on RTL8168e connected to local gigabit ethernet infrastructure.
- Box B: x86-64 ultrabook running debian, CPU Intel Core i5-4210U, USB3.0 gigabit ethernet card based on ax88179 chip.
- Box R2: Banana Pi R2
Test 0: āGetting the baseline without R2ā
B connected to the local network infrastructure (i.e. R2 is not involved in this testing).
iperf3 B->A: ~920Mbps
iperf3 A->B: ~910Mbps
iperf A<->B (full duplex): ~795Mbps A->B, ~875Mbps B->A
Test 1: āPerformance of R2 switch chipā
B connected to the lan3 port of R2. Port lan1 of R2 connected to the local network infrastructure.
R2 DSA interfaces lan1ā¦4 bridged together into interface br-lan.
iperf3 B->A: ~920Mbps
iperf3 A->B: ~910Mbps
iperf A<->B (full duplex): ~798Mbps A->B, ~889Mbps B->A
Analysis: R2 switching speed for this usecase seem to be good enough and do not cause any slowdown compared to baseline numbers.
Test 2: āIs wan port any different to lanX?ā
B connected to the wan port of R2. Port lan1 of R2 connected to the local network infrastructure.
R2 DSA interfaces lan1ā¦4 and wan all bridged together into interface br-lan.
iperf3 B->A: ~920Mbps
iperf3 A->B: ~909Mbps
iperf A<->B (full duplex): ~798Mbps A->B, ~895Mbps B->A
Analysis: when bridged with other DSA ports wan port seems to behave exactly the same way as lanX ports do. Switching speed for this usecase seem to be good enough and do not cause any slowdown compared to baseline numbers.
Test 3: āR2 CPU routing performanceā
B connected to the wan port of R2. Port lan1 of R2 connected to the local network infrastructure.
R2 DSA interfaces lan1ā¦4 bridged together into interface br-lan. R2 interface wan assigned its own subnet, box B assigned IP address from this subnet. Routing rule added on A specifying R2 ip assigned to br-lan as gateway for wanās subnet. Routing rule added on B specifying R2 ip assigned to wan as gateway for br-lanās subnet. Iptables flushed on R2 and all polices are set to ACCEPT.
iperf3 B->A: ~840Mbps
iperf3 B->A (data from /dev/urandom): ~832Mbps
iperf3 A->B: ~780Mbps
iperf3 A->B (data from /dev/urandom): ~785Mbps
iperf A<->B (full duplex): ~694Mbps A->B, ~793Mbps B->A
Analysis: R2 raw CPU forwarding performance seems to be OK. No NAT or conntracking though for this testcase though. Entropy of the forwarded data seems to have no influence on throughtput.
Test 4: āR2 CPU NAT performanceā
B connected to the wan port of R2. Port lan1 of R2 connected to the local network infrastructure.
R2 DSA interfaces lan1ā¦4 bridged together into interface br-lan. R2 interface wan assigned its own subnet, box B assigned IP address from this subnet. Routing rule added on A specifying R2 ip assigned to br-lan as gateway for wanās subnet. Routing rule added on B specifying R2 ip assigned to wan as gateway for br-lanās subnet. Iptables on R2 configured with default OpenWRT fw3 to perform NAT for connections from br-lan to wan. No in-kernel mtk hwnat.
iperf3 B->A: ~751Mbps
iperf3 B->A (data from /dev/urandom): ~750Mbps
iperf3 A->B: ~615Mbps
iperf3 A->B (data from /dev/urandom): ~612Mbps
Analysis: R2 CPU-based NAT forwarding performance seems to be OK. Entropy of the forwarded data seems to have slight influence on throughtput but the scale of this influence is really low.
Conclusions
R2 seems to perform well enough to be used as a SOHO router. Itād be useful to perform additional tests involving two pairs of cross-communicating hosts to determine if switch chip in R2 is good enough to allow non-blocking commutation of two simultaneous full-duplex gigabit streams. Hardware NAT performance was not tested due to the 4.14 kernel used was not patched to include it.
Which kernel/system have you used?
Could you try create >1gbit traffic (multiple iperf-processes)?
existing gmac should do trgmii (2.5gbit/s) but in my tests i get only 1gbit/s (lan+lan to local and wan+lan to local with second gmac active). I read (pointed by rene) in technical document that framengine is wired only with 1gbit/s which afaik makes trgmii impossible.
Kernel is basically the same as one will get compiling BananaPI BPI-R2 Openwrt18.06 Demo Image and Source Code Release 2019-03-06 image except for:
-
Applied DTS-fix patches:
0065-dts-fix-bpi-r2-memory-to-2GB.patch
,0066-dts-mt7623-add-pcie.patch
and0067-dts-bpi-r2-fix-second-gmac.patch
. -
Removed patch:
0027-net-next-mediatek-fix-DQL-support.patch
. -
Kernel config adjusted a bit:
diff --git a/target/linux/mediatek/mt7623/config-4.14 b/target/linux/mediatek/mt7623/config-4.14
index a38f02f2..b13675c6 100644
--- a/target/linux/mediatek/mt7623/config-4.14
+++ b/target/linux/mediatek/mt7623/config-4.14
@@ -1,4 +1,12 @@
# CONFIG_AIO is not set
+CONFIG_AHCI_MTK=y
+CONFIG_ATA=y
+CONFIG_SATA_AHCI=y
+CONFIG_SATA_AHCI_PLATFORM=y
+CONFIG_SCSI=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_RTC_DRV_MT7622=y
+CONFIG_WATCHDOG_SYSFS=y
CONFIG_ALIGNMENT_TRAP=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_ARCH_HAS_DEBUG_VIRTUAL=y
@@ -33,7 +41,9 @@ CONFIG_ARM=y
CONFIG_ARM_APPENDED_DTB=y
CONFIG_ARM_ARCH_TIMER=y
CONFIG_ARM_ARCH_TIMER_EVTSTREAM=y
-# CONFIG_ARM_ATAG_DTB_COMPAT is not set
+CONFIG_ARM_ATAG_DTB_COMPAT=y
+CONFIG_ARM_ATAG_DTB_COMPAT_CMDLINE_FROM_BOOTLOADER=y
+# CONFIG_ARM_ATAG_DTB_COMPAT_CMDLINE_EXTEND is not set
CONFIG_ARM_CPU_SUSPEND=y
# CONFIG_ARM_CPU_TOPOLOGY is not set
CONFIG_ARM_GIC=y
@@ -490,3 +500,5 @@ CONFIG_ZBOOT_ROM_BSS=0
CONFIG_ZBOOT_ROM_TEXT=0
CONFIG_ZLIB_DEFLATE=y
CONFIG_ZLIB_INFLATE=y
+CONFIG_EXT4_FS=y
+CONFIG_VFAT_FS=y`
Not this time as (1) I had finally put this particular R2 under workload it was initially bought for and (2) I havenāt got enough traffic generating boxes on hand right now to do it. Will try to do it in future as I had ordered two more R2s and they are already on a half the way through from China to my place. Will have to borrow my girlfriendās laptop to use as a ābox Cā in tests .
Iām not sure if R2 is able to perform fast enough to see the real benefit from trgmii even if all the HW wiring is in place for 2.5Gbps. Here are some interesting figures illustrating the max possible kernel network stack bandwidth with the kernel I use now on R2:
Opened two terminals connected to the R2. Then, in first terminal executed:
root@lx2bpir2:/# /etc/init.d/firewall stop ... lots of output stripped... root@lx2bpir2:/# iperf -s -i1 -p 5001 ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------
Then in second terminal:
root@lx2bpir2:/etc/config# iperf -c 127.0.0.1 -i1 -t5 -p5001 ------------------------------------------------------------ Client connecting to 127.0.0.1, TCP port 5001 TCP window size: 2.50 MByte (default) ------------------------------------------------------------ [ 3] local 127.0.0.1 port 35816 connected with 127.0.0.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 1.0 sec 316 MBytes 2.65 Gbits/sec [ 3] 1.0- 2.0 sec 315 MBytes 2.64 Gbits/sec [ 3] 2.0- 3.0 sec 317 MBytes 2.66 Gbits/sec [ 3] 3.0- 4.0 sec 318 MBytes 2.66 Gbits/sec [ 3] 4.0- 5.0 sec 318 MBytes 2.67 Gbits/sec [ 3] 0.0- 5.0 sec 1.55 GBytes 2.66 Gbits/sec
What is seen here is a fact that pure localhost TCP communication through in-kernel network layer maxed out at around 2.5Gbps - and this is with iptables cleared out and without diving deeper into kernel skb and netdev related stuff. Enabling a default OpenWRT firewall slows things down to ~2.2Gbps - this is the cost of having ā-A INPUT -i lo -j ACCEPT
ā and ā-A OUTPUT -o lo -j ACCEPT
ā at the top of resp. iptables chains. So I wonāt be having a lot of hopes for seeing the benefit of trgmii connection between MACs and PHYs in R2. And for 2.5Gbps to be possible it is not enough to have trgmii link in GMAC as 2.5Gbps should be also active on the switch ASIC side for these MACs. Are you sure that this is the case?
Iām not sure because rene says me that on available patches trgmii-mode is not initialized the right way and patch for second gmac brings another wrong setting: phy-mode have to be the same for mac0 and mac1ā¦else there are damaged packetsā¦both trgmii is not possible because only 1 gmac is really trgmii (the other only rgmii) and both rgmii leaves unused bandwidth of 1.5gbit/s (if trgmii is possible)
Well, to know for sure we need to somehow fetch and read R2 HW interconnection schematics and datasheets on all participating chips. My experience from digging through this kind of docs tells me that it wonāt be an easy ride even if weād get the mentioned docs into our hands .
From official specs on Mediatek site (quote) āMT7623N also implements 10/100/1000 Ethernet RGMII and TRGMII interfaces
ā. It leaves us with 3.5Gbps max theoretical bandwidth in case everything is fine with wiring and speed modes are supported on a switch ASIC side (which I expect they are). Iāve got doubts R2 is able to handle so much traffic when routing using CPU. It should be pretty easy to test though with four boxes. Test setup should be like this: lan0 and lan1 bridged together into br-lan, lan2 and lan3 bridged together into br-lan-1, br-lan and br-lan-1 set to use separate subnets, ip_forwarding enabled on R2, firewall cleared. Communication should be like lan0->lan2 and lan1->lan4 simultaneously. If aggregate bandwidth would end up any higher than 1Gbps then (a) trgmii is working and (b) I was underestimating R2. Itās only a matter of finding enough boxes and a moment to perform this test now.
Where do you get the dts-fix-patches (they are not in upstream)?
Created them myself using your kernel 4.14 as a base. Thereās nothing new in there compared to what you already know and use.
have reverted the DQL-Patch in my 4.14-main, anyone who have Problems should test if this fixes the issue
release built by travis:
strange that 4.19/5.x is also reported to have the slowdown, where this Patch was not applied
btw. the ethernet-fixes-patch on 4.9 also reverts the following previous patch:
0029-net-next-ethernet-mediatek-add-CDM-able-to-recognize.patch
0054-net-ethernet-mediatek-fixed-deadlock-captured-by-loc.patch (spinlock-change)
and modifies handling of this (partially revert the patch) (restore old mtk_rx_alloc+mtk_w32-calls and add new mtk_rx_alloc_qdma,mtk_rx_clean_qdma):
0037-net-next-mediatek-bring-up-QDMA-RX-ring-0.patch (mtk_rx_alloc)
and this:
0051-net-mediatek-increase-tx_timeout.patch (watchdog_timeo 5=>30=>15)
and this (remove complete CDM-block):
0029-net-next-ethernet-mediatek-add-CDM-able-to-recognize.patch
0043-net-next-mediatek-enable-special-tag-indication-for-.patch
also removes new block introduced by
0042-net-next-mediatek-honour-special-tag-bit-inside-RX-D.patch (find out which mac the packet comes from)
and changes to new variant (not the old code)
Maybe we/i can use base 4.9 (https://github.com/frank-w/BPI-R2-4.14/tree/4.9-main_new) to split+squash patches together and see which code is not reverted and need to be ported
interesting are also the settings for cpumask_set_cpu and irq_set_affinity_hint which may be increasing speed in 4.9 but not in other kernel-versions
Well, it is important here to keep efforts structured. I mean, thereās a mediatek linux kernel team out there somewhere working on getting new features into the mainline. So Iād say that this work on tidying up patches and determining what to port and what to leave behind should be coordinated with those chaps.
It is also important to line up the goals for porting. Iād say that as end-users weāre interested in having properly working (1) mainline kernel and (2) all LTS kernels. From what I know 4.9 from your repo seems to be more or less working, at least when talking about DSA/networking stuff. At the same time 4.14 and 4.19 seem to need more love to be stable enough for R2 to be used as everyday router. Thus Iād outline priorities as pushing as hard as possible to get updated to use phylink layer driver into the mainline (as having it in stable mainline will result in getting it into the next LTS down the road) and as a second level prio we should be pushing to make network stable in 4.19 and 4.14.
You are workin on phylink too?
Iām not working on it in a typical meaning of the term āto workā. Iām following the work that is happening around porting mtk_eth_soc to phylink (both on mediatek mailing list and here on this forum, and in your kernel repo too) and from time to time try to build a kernel with new version of phylink patches and test it.
P.S. Received two new R2 boards today, arrived surprisingly fast - took only 8 days in transit.
Have not seen phylink patches in mtk mailinglistā¦maybe netdev (iām not watching this)?
It is probably me being wrong here - I was under impression that Iāve seen approx the same phylink patches as youāve got in your git repo somewhere else and as the only other place Iāve been visiting regarding mtk_etc_soc are mediatek kernel development mailing lists I thought that origin of the phylink patches is there. If thatās not the case than itās my error/imagination, sorry for that.
Imho only reneās repo (https://github.com/vDorst/linux-1) has same Patchesā¦
Have you done some further tests? Any problems with 4.14/4.19?