[BPI-R4] any plans for jumbo frame support?

Yes,patches are only in this 6.18-jumbo branch.

See this patch in mtk sdk for fixing the dsa warnings:

https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/91956213e315348639dc350ce567b893e1d85ed3^!/#F0

updated 6.18-jumbo branch, but only compilation tested yet…did first bootup, and git an oops…still looking why this happens…seems like happen in probe (i guess i found it but need to think how to solve correctly).

edit: have fixed crash in 6.19-jumbo (and dropped that update from 6.18 tree as it breaks), but jumbo-frames handling is now completely different…i have to set mtu for eth0 first before i can set dsa-user-port mtu

#BPI-R4
# ip link set eth0 mtu 9000 up
# ip link set dev lan3 mtu 9000 up
# ip a a 192.168.90.1/24 dev lan3

#other side (laptop):
$ sudo ip link set dev enx00e04c68001b mtu 9000 up
$ sudo ip a a 192.168.90.2/24 dev enx00e04c68001b
$ ping -M do -s 8972  192.168.90.1
PING 192.168.90.1 (192.168.90.1) 8972(9000) bytes of data.
8980 bytes from 192.168.90.1: icmp_seq=1 ttl=64 time=0.738 ms
...

mhm, it seems that the mac down/up is still needed, but mt533x error seems gone

root@bpi-r4-v11:~
# ip a a 192.168.90.1/24 dev lan3
root@bpi-r4-v11:~
# ip link set eth0 mtu 9000 up
Error: mtu greater than device maximum.
root@bpi-r4-v11:~
# ip link set eth0 up
[   51.854102] mtk_soc_eth 15100000.ethernet eth0: configuring for fixed/internal link mode
[   51.862259] mtk_soc_eth 15100000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx/tx
root@bpi-r4-v11:~
# ip link set eth0 down
[   54.964902] mtk_soc_eth 15100000.ethernet eth0: Link is Down
root@bpi-r4-v11:~
# ip link set eth0 mtu 9000 up
[   95.284815] mtk_soc_eth 15100000.ethernet eth0: configuring for fixed/internal link mode
[   95.293009] mtk_soc_eth 15100000.ethernet eth0: Link is Up - 10Gbps/Full - flow control rx/tx
root@bpi-r4-v11:~
# ip link set dev lan3 mtu 9000 up
[  131.224631] mt7530-mmio 15020000.switch lan3: configuring for phy/internal link mode
[  131.233282] mt7530-mmio 15020000.switch lan3: Link is Up - 1Gbps/Full - flow control rx/tx
root@bpi-r4-v11:~
# 
root@bpi-r4-v11:~
# dmesg | grep mt753
[    3.359175] mt7530-mmio 15020000.switch: configuring for fixed/internal link mode
[    3.369729] mt7530-mmio 15020000.switch: Link is Up - 10Gbps/Full - flow control rx/tx
[    3.370382] mt7530-mmio 15020000.switch wan (uninitialized): PHY [mt7530-0:00] driver [Generic PHY] (irq=POLL)
[    3.371955] mt7530-mmio 15020000.switch lan1 (uninitialized): PHY [mt7530-0:01] driver [Generic PHY] (irq=POLL)
[    3.373384] mt7530-mmio 15020000.switch lan2 (uninitialized): PHY [mt7530-0:02] driver [Generic PHY] (irq=POLL)
[    3.374836] mt7530-mmio 15020000.switch lan3 (uninitialized): PHY [mt7530-0:03] driver [Generic PHY] (irq=POLL)
[  131.224631] mt7530-mmio 15020000.switch lan3: configuring for phy/internal link mode
[  131.233282] mt7530-mmio 15020000.switch lan3: Link is Up - 1Gbps/Full - flow control rx/tx

Thanks! I saw your newest commits and I’ve been trying to backport them to 6.12 (since I’m running OpenWrt). The mtk_eth_soc driver has changed a lot in the past year or so – I’ll need a while to figure out how.

CMIIW: we’re looking at ci: fix R3 filename (used in images-repo)…net: mediatek: fix crash after jumbo-patch correct?

I reverted my fix and simply moved the mtk_set_max_mtu more down in mtk_add_mac in later commit (6.19-jumbo)

so basicly you need these commits:

a3207a383794 2025-12-28 net: mtk_eth_soc: try to fix crash the right way Frank Wunderlich  (HEAD -> 6.19-jumbo, origin/6.19-jumbo)
ca4c976dce22 2025-11-12 net: ethernet: mtk_eth_soc: add dynamic rx buffer adjustment support Mason Chang 
b4c6cb1d5532 2025-10-17 net: ethernet: mtk_eth_soc: add 9k jumbo frame support Mason Chang 
92bfdbdd4de1 2025-10-08 net: ethernet: mtk_eth_soc: change default rx buffer length Mason Chang

most changes are because RSS/LRO support, but it should be no much problem to adapt the changes for 6.12

i will merge the top 2 into one tomorrow

Thanks for the update. I’ve gotten those commits in my build – and just managed to compile the kernel fixing a few merge conflicts (as you mentioned, was mostly for RSS and LRO). I’ll figure out how to roll it out to test in the next few days.

Unfortunately when I compiled and run the updated kernel on my bpi I could no longer talk to it over Ethernet while plugged into the 1gbps eth switch. It booted – does the change only work for the 10gbps SFP ports, or?

I don’t have a UART either so that may be the next thing I get to figure out what’s wrong.

Hello everyone, I’ve committed 4 more commits to my repo (synced with latest Frank’s 6.18-jumbo,at least few days ago) 1st 3 addresses the same issue as the

Last one should add some performance.

I didn’t heavily testerd it yet, and didn’t test the 4th commit at all.

Feel free to join the testing. I’m planning to do it by the end of the week. Will update on any result

Moderated: added link

Frank, thanks for adding the link, +1 commit

short description - these commits address the next issues:

  • Global MTU config issue: Changing the MTU on a single port can lead to DMA ring buffer reading memory corruption

  • coordinated MTU change on the port - as all ports are on the same RX RMA buffers - all port stopping and re-initialization is required for safe MTU changing

  • Some performance issues with mixed MTU ports: after enabling jumbo frames on a single port the whole system, including MT-1500 ports are switching to jumbo frame mode to be consistent with DMA ring buffers, it causes cache allocation/cleaning overhead and reduces 1-flow RX performance on a 10G port from ~5G to ~2.2G

and

addresses the performance drop - with these 2 commits, the avg RX speed for a single flow is ~2.7G

before:
[  5]  20.00-21.00  sec   256 MBytes  2.15 Gbits/sec    1   1.17 MBytes        
[  5]  21.00-22.00  sec   259 MBytes  2.17 Gbits/sec    0   1.33 MBytes        
[  5]  22.00-23.00  sec   260 MBytes  2.18 Gbits/sec    0   1.46 MBytes        
[  5]  23.00-24.00  sec   259 MBytes  2.17 Gbits/sec    0   1.59 MBytes        
[  5]  24.00-25.00  sec   260 MBytes  2.18 Gbits/sec    1   1.25 MBytes        
[  5]  25.00-26.00  sec   261 MBytes  2.19 Gbits/sec    0   1.39 MBytes        
[  5]  26.00-27.00  sec   260 MBytes  2.18 Gbits/sec    0   1.53 MBytes        
[  5]  27.00-28.00  sec   258 MBytes  2.16 Gbits/sec    3   1.17 MBytes    

after:
[  5]   9.00-10.00  sec   321 MBytes  2.69 Gbits/sec    0   1.40 MBytes        
[  5]  10.00-11.00  sec   328 MBytes  2.75 Gbits/sec    0   1.56 MBytes        
[  5]  11.00-12.00  sec   325 MBytes  2.73 Gbits/sec   26   1.24 MBytes        
[  5]  12.00-13.00  sec   318 MBytes  2.66 Gbits/sec    0   1.40 MBytes        
[  5]  13.00-14.00  sec   324 MBytes  2.72 Gbits/sec    0   1.56 MBytes        
[  5]  14.00-15.00  sec   329 MBytes  2.76 Gbits/sec   35   1.25 MBytes        
[  5]  15.00-16.00  sec   326 MBytes  2.74 Gbits/sec    0   1.43 MBytes   

CAUTION changing MTU under heavy load (iperf) may halt the system

Imho mtu should be changed on system startup which is currently broken because of the flow with enable mac first (to set max-mtu),set mtu and then reset mac again.

Could you add description to the corresponsing commit? I saw that you did many cleanups (removing spaces linebreaks etc.) so hard to find real changes. Maybe cleanup first (separately).

Thanks for working on getting this in better shape!

yes it makes sense to split the cleanup and functionality changing

Sure, it will take some time

Consider it as a draft version now :slight_smile:

If I understood correctly the runtime mtu change halts under the load due to DMA racing during the reset. Likely need to make sure that all buffers are flushed before changing MTU

+2

commit fc44b11c6870c9a3f8e87d2d8eb2c86105d8de56
commit af1b4adcffa8cfdc6781696a6f9f68f673000a68

Have improved the 10G port throughput from

to:

root@pve1:~# iperf3 -c 10.0.1.1
Connecting to host 10.0.1.1, port 5201
[  5] local 10.0.1.10 port 52804 connected to 10.0.1.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   386 MBytes  3.24 Gbits/sec   69   1.40 MBytes       
[  5]   1.00-2.00   sec   391 MBytes  3.28 Gbits/sec    0   1.59 MBytes       
[  5]   2.00-3.00   sec   374 MBytes  3.14 Gbits/sec    8   1.31 MBytes       
[  5]   3.00-4.00   sec   362 MBytes  3.04 Gbits/sec    0   1.48 MBytes       
[  5]   4.00-5.00   sec   389 MBytes  3.26 Gbits/sec   15   1.19 MBytes       
[  5]   5.00-6.00   sec   368 MBytes  3.08 Gbits/sec    0   1.39 MBytes       
[  5]   6.00-7.00   sec   370 MBytes  3.10 Gbits/sec    0   1.56 MBytes       
[  5]   7.00-8.00   sec   379 MBytes  3.18 Gbits/sec   18   1.28 MBytes       
[  5]   8.00-9.00   sec   356 MBytes  2.99 Gbits/sec    0   1.44 MBytes       
[  5]   9.00-10.00  sec   381 MBytes  3.20 Gbits/sec   18   1.13 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  3.67 GBytes  3.15 Gbits/sec  128             sender
[  5]   0.00-10.00  sec  3.66 GBytes  3.15 Gbits/sec                  receiver

with mixed MTU ports

perf top:

Samples: 421K of event 'cycles:P', 4000 Hz, Event count (approx.): 38034979025 lost: 0/0 drop: 0/0
Overhead  Shared Object                                          Symbol
  23.85%  [kernel]                                               [k] __pi_dcache_clean_poc
  15.10%  [kernel]                                               [k] __arch_copy_to_user
   7.90%  [kernel]                                               [k] __pi_dcache_inval_poc
   6.04%  [kernel]                                               [k] page_pool_alloc_pages
   3.87%  [kernel]                                               [k] _copy_to_iter
   3.30%  [kernel]                                               [k] gro_receive_skb
   2.40%  [kernel]                                               [k] default_idle_call
   1.95%  [kernel]                                               [k] mtk_poll_rx
   1.84%  [kernel]                                               [k] finish_task_switch.isra.0

It shows that __pi_dcache_clean_poc is consuming ~24% of cpu time on RX is still spent in cache cleaning for DMA.

changed: typos fixed :slight_smile:

I’ve renamed the branch to 6.18-jumbo-sandbox

I’ll continue here.

At this point i’ll appreciate any testing. especially on MTU-9000 enabled port, as in my current config i’m unable to test it any time soon.

The expected performance for RX should be close to wire-speed.

@frank-w I’ve committed few additional patches.

It’s still unsafe to change MTU during high load (on any port, even when the target port is down) everyth else works pretty well fow me (with described performance trade-off) is there a chance you can test it on jumbo-frames traffic?

1 Like

I hope i can take a look on weekend… currently working on R4Pro (preparing mxl switch pr for openwrt)

4 Likes

made a small test, but the anoying configuration steps are still needed

and when sending (only R4 => laptop) there are many retransmitts which were imho not there before

root@bpi-r4-v11:~
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
3: eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 72:d7:04:8f:23:3a brd ff:ff:ff:ff:ff:ff
4: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 86:bb:71:05:51:34 brd ff:ff:ff:ff:ff:ff
5: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 06:64:14:99:27:35 brd ff:ff:ff:ff:ff:ff
6: wan@eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 72:d7:04:8f:23:3a brd ff:ff:ff:ff:ff:ff
7: lan1@eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 72:d7:04:8f:23:3a brd ff:ff:ff:ff:ff:ff
8: lan2@eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 72:d7:04:8f:23:3a brd ff:ff:ff:ff:ff:ff
9: lan3@eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 72:d7:04:8f:23:3a brd ff:ff:ff:ff:ff:ff
root@bpi-r4-v11:~
# ip link set dev lan3 mtu 9000                                                                                       
RTNETLINK answers: Numerical result out of range
root@bpi-r4-v11:~
# ip link set dev eth0 mtu 9000                                                                                                               
Error: mtu greater than device maximum.
root@bpi-r4-v11:~
# ip link set eth0 up
[  102.451800] mtk_soc_eth 15100000.ethernet eth0: configuring for fixed/internal link mode
[  102.459947] mtk_soc_eth 15100000.ethernet eth0: mtk_open: set max-mtu of mac #0 to 9190 (9K+XGMII)
[  102.459974] mtk_soc_eth 15100000.ethernet eth0: Link is Up - 10Gbps/Full - flow control off
root@bpi-r4-v11:~
# ip link set eth0 down
[  114.419842] mtk_soc_eth 15100000.ethernet eth0: Link is Down
root@bpi-r4-v11:~
# ip link set dev eth0 mtu 9000
root@bpi-r4-v11:~
# ip link set dev lan3 mtu 9000
root@bpi-r4-v11:~
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute 
       valid_lft forever preferred_lft forever
2: sit0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/sit 0.0.0.0 brd 0.0.0.0
3: eth0: <BROADCAST,MULTICAST> mtu 9004 qdisc mq state DOWN group default qlen 1000
    link/ether 72:d7:04:8f:23:3a brd ff:ff:ff:ff:ff:ff
4: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 86:bb:71:05:51:34 brd ff:ff:ff:ff:ff:ff
5: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 06:64:14:99:27:35 brd ff:ff:ff:ff:ff:ff
6: wan@eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 72:d7:04:8f:23:3a brd ff:ff:ff:ff:ff:ff
7: lan1@eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 72:d7:04:8f:23:3a brd ff:ff:ff:ff:ff:ff
8: lan2@eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 72:d7:04:8f:23:3a brd ff:ff:ff:ff:ff:ff
9: lan3@eth0: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN group default qlen 1000
    link/ether 72:d7:04:8f:23:3a brd ff:ff:ff:ff:ff:ff
root@bpi-r4-v11:~
# uname -a
Linux bpi-r4-v11 6.18.0-rc1-bpi-r4-jumbo-sandbox #14 SMP Sat Jan 31 18:02:56 CET 2026 aarch64 GNU/Linux
root@bpi-r4-v11:~
# ip link set dev lan3 mtu 9000 up
[  247.259304] mtk_soc_eth 15100000.ethernet eth0: configuring for fixed/internal link mode
[  247.267454] mtk_soc_eth 15100000.ethernet eth0: mtk_open: set max-mtu of mac #0 to 9190 (9K+XGMII)
[  247.267491] mtk_soc_eth 15100000.ethernet eth0: Link is Up - 10Gbps/Full - flow control off
[  247.276570] mt7530-mmio 15020000.switch lan3: configuring for phy/internal link mode
[  247.293435] mt7530-mmio 15020000.switch lan3: Link is Up - 1Gbps/Full - flow control rx/tx
root@bpi-r4-v11:~
# ip a a 192.168.90.1/24 dev lan3                                                                                     
root@bpi-r4-v11:~
# ping -M do -s 8972  192.168.90.1
PING 192.168.90.1 (192.168.90.1) 8972(9000) bytes of data.
8980 bytes from 192.168.90.1: icmp_seq=1 ttl=64 time=0.101 ms
8980 bytes from 192.168.90.1: icmp_seq=2 ttl=64 time=0.103 ms
8980 bytes from 192.168.90.1: icmp_seq=3 ttl=64 time=0.097 ms
8980 bytes from 192.168.90.1: icmp_seq=4 ttl=64 time=0.109 ms
8980 bytes from 192.168.90.1: icmp_seq=5 ttl=64 time=0.103 ms
^C
--- 192.168.90.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4155ms
rtt min/avg/max/mdev = 0.097/0.102/0.109/0.004 ms
root@bpi-r4-v11:~
# iperf3 -c 192.168.90.2
Connecting to host 192.168.90.2, port 5201
[  5] local 192.168.90.1 port 40100 connected to 192.168.90.2 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   117 MBytes   979 Mbits/sec   46    489 KBytes       
[  5]   1.00-2.00   sec   118 MBytes   987 Mbits/sec  121    507 KBytes       
[  5]   2.00-3.00   sec   117 MBytes   983 Mbits/sec  115    524 KBytes       
[  5]   3.00-4.00   sec   116 MBytes   976 Mbits/sec  115    481 KBytes       
[  5]   4.00-5.00   sec   114 MBytes   953 Mbits/sec   94    428 KBytes       
[  5]   5.00-6.00   sec   117 MBytes   986 Mbits/sec   94    454 KBytes       
[  5]   6.00-7.00   sec   118 MBytes   989 Mbits/sec  115    472 KBytes       
[  5]   7.00-8.00   sec   117 MBytes   982 Mbits/sec   93    533 KBytes       
[  5]   8.00-9.00   sec   117 MBytes   986 Mbits/sec  119    227 KBytes       
[  5]   9.00-10.00  sec   117 MBytes   985 Mbits/sec  114    507 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.14 GBytes   980 Mbits/sec  1026             sender
[  5]   0.00-10.00  sec  1.14 GBytes   979 Mbits/sec                  receiver

iperf Done.
root@bpi-r4-v11:~
# iperf3 -c 192.168.90.2 -R
Connecting to host 192.168.90.2, port 5201
Reverse mode, remote host 192.168.90.2 is sending
[  5] local 192.168.90.1 port 51038 connected to 192.168.90.2 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   117 MBytes   984 Mbits/sec                  
[  5]   1.00-2.00   sec   118 MBytes   987 Mbits/sec                  
[  5]   2.00-3.00   sec   118 MBytes   990 Mbits/sec                  
[  5]   3.00-4.00   sec   117 MBytes   983 Mbits/sec                  
[  5]   4.00-5.00   sec   117 MBytes   979 Mbits/sec                  
[  5]   5.00-6.00   sec   118 MBytes   990 Mbits/sec                  
[  5]   6.00-7.00   sec   117 MBytes   982 Mbits/sec                  
[  5]   7.00-8.00   sec   118 MBytes   990 Mbits/sec                  
[  5]   8.00-9.00   sec   118 MBytes   990 Mbits/sec                  
[  5]   9.00-10.00  sec   118 MBytes   990 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.15 GBytes   989 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.15 GBytes   986 Mbits/sec                  receiver

iperf Done.
1 Like

Could you try to turn off tso ang gso and see if anything changed?

ethtool -K eth0 tso off gso off

I’ve ordered a 10G laptop NIC - will conduct some tests after it’s delivered.

UPD:

My test on a 10G mtu-9000 port:

bpi-r4 ~ # ifconfig eth1
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet 172.16.0.2  netmask 255.255.255.0  broadcast 172.16.0.255
        inet6 fe80::b8bc:94ff:fe68:76f2  prefixlen 64  scopeid 0x20<link>
        ether ba:bc:94:68:76:f2  txqueuelen 1000  (Ethernet)
        RX packets 54039502  bytes 307146278665 (286.0 GiB)
        RX errors 0  dropped 23  overruns 0  frame 0
        TX packets 46428135  bytes 239507656697 (223.0 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 102

bpi-r4 ~ #
bpi-r4 ~ # iperf3 -c 172.16.0.1
^C- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
iperf3: interrupt - the client has terminated by signal Interrupt(2)
bpi-r4 ~ # iperf3 -c 172.16.0.1
Connecting to host 172.16.0.1, port 5201
[  5] local 172.16.0.2 port 53726 connected to 172.16.0.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.15 GBytes  9.84 Gbits/sec    0   2.10 MBytes
[  5]   1.00-2.00   sec  1.15 GBytes  9.89 Gbits/sec    0   2.10 MBytes
[  5]   2.00-3.00   sec  1.15 GBytes  9.91 Gbits/sec    0   2.10 MBytes
[  5]   3.00-4.00   sec  1.15 GBytes  9.89 Gbits/sec    0   2.10 MBytes
[  5]   4.00-5.00   sec  1.15 GBytes  9.90 Gbits/sec    0   2.10 MBytes
[  5]   5.00-6.00   sec  1.15 GBytes  9.90 Gbits/sec    0   2.10 MBytes
[  5]   6.00-7.00   sec  1.15 GBytes  9.89 Gbits/sec    0   2.10 MBytes
[  5]   7.00-8.00   sec  1.15 GBytes  9.91 Gbits/sec    0   2.10 MBytes
[  5]   8.00-9.00   sec  1.15 GBytes  9.89 Gbits/sec    0   2.10 MBytes
[  5]   9.00-10.00  sec  1.15 GBytes  9.89 Gbits/sec    0   2.10 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  11.5 GBytes  9.89 Gbits/sec    0            sender
[  5]   0.00-10.00  sec  2.00 GBytes  1.72 Gbits/sec                  receiver

iperf Done.
bpi-r4 ~ # iperf3 -c 172.16.0.1 -R
Connecting to host 172.16.0.1, port 5201
Reverse mode, remote host 172.16.0.1 is sending
[  5] local 172.16.0.2 port 55804 connected to 172.16.0.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  1.14 GBytes  9.80 Gbits/sec
[  5]   1.00-2.00   sec  1.15 GBytes  9.90 Gbits/sec
[  5]   2.00-3.00   sec  1.15 GBytes  9.90 Gbits/sec
[  5]   3.00-4.00   sec  1.15 GBytes  9.90 Gbits/sec
[  5]   4.00-5.00   sec  1.15 GBytes  9.90 Gbits/sec
[  5]   5.00-6.00   sec  1.15 GBytes  9.90 Gbits/sec
[  5]   6.00-7.00   sec  1.15 GBytes  9.90 Gbits/sec
[  5]   7.00-8.00   sec  1.15 GBytes  9.90 Gbits/sec
[  5]   8.00-9.00   sec  1.15 GBytes  9.90 Gbits/sec
[  5]   9.00-10.00  sec  1.15 GBytes  9.90 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  2.00 GBytes  1.72 Gbits/sec                  sender
[  5]   0.00-10.00  sec  11.5 GBytes  9.89 Gbits/sec                  receiver

iperf Done.
bpi-r4 ~ #

No retransmits on a 10G, seems to run on a wire-speed.

tx perf top:

Samples: 63K of event 'cycles:P', 4000 Hz, Event count (approx.): 10598656413 lost: 0/0 drop: 0/0
Overhead  Shared Object                                          Symbol
  19.88%  [kernel]                                               [k] __pi_dcache_clean_poc
  17.42%  [kernel]                                               [k] __arch_copy_from_user
   2.35%  [kernel]                                               [k] default_idle_call
   2.21%  [kernel]                                               [k] handle_softirqs
   1.52%  [kernel]                                               [k] finish_task_switch.isra.0
   1.33%  [kernel]                                               [k] dma_map_page_attrs
   1.21%  [kernel]                                               [k] __free_frozen_pages
   1.20%  [kernel]                                               [k] mtk_start_xmit
   1.04%  [kernel]                                               [k] tcp_sendmsg_locked
   0.97%  [kernel]                                               [k] dma_unmap_phys
   0.87%  [kernel]                                               [k] fq_codel_dequeue
   0.85%  [kernel]                                               [k] __pi_dcache_inval_poc
   0.81%  [kernel]                                               [k] el0_svc
   0.78%  [kernel]                                               [k] mtk_poll_rx
   0.75%  [kernel]                                               [k] get_page_from_freelist
   0.71%  [nf_conntrack]                                         [k] nf_conntrack_tcp_packet
   0.63%  [kernel]

top

top - 23:23:50 up 6 days, 23:44,  2 users,  load average: 0.67, 0.33, 0.18
Tasks: 126 total, 1 running, 125 sleep, 0 d-sleep, 0 stopped, 0 zombie
%Cpu0  :  1.3 us, 20.9 sy,  0.0 ni, 76.7 id,  0.3 wa,  0.3 hi,  0.3 si,  0.0 st
%Cpu1  :  0.0 us, 19.9 sy,  0.0 ni, 79.5 id,  0.0 wa,  0.3 hi,  0.3 si,  0.0 st
%Cpu2  :  0.3 us,  1.3 sy,  0.0 ni, 58.7 id,  0.0 wa,  1.7 hi, 38.0 si,  0.0 st
%Cpu3  :  0.0 us,  2.7 sy,  0.0 ni, 74.9 id,  0.0 wa,  1.0 hi, 21.4 si,  0.0 st
MiB Mem :   3927.7 total,    726.6 free,    368.1 used,   2888.1 buff/cache
MiB Swap:   8192.0 total,   8189.8 free,      2.2 used.   3559.6 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 4193 root      20   0   41472   3892   3172 S  40.2   0.1   4:14.73 iperf3
 1823 root      20   0   15320   6308   4564 S   0.7   0.2   0:00.46 sshd-session
 4152 root      20   0   14660   6680   5404 S   0.7   0.2  16:02.51 hostapd
   32 root      20   0       0      0

rx perf top:

Samples: 174K of event 'cycles:P', 4000 Hz, Event count (approx.): 12894998253 lost: 0/0 drop: 0/0
Overhead  Shared Object                                          Symbol
  19.93%  [kernel]                                               [k] __arch_copy_to_user
  12.12%  [kernel]                                               [k] __pi_dcache_clean_poc
  11.81%  [kernel]                                               [k] __pi_dcache_inval_poc
   3.29%  [kernel]                                               [k] finish_task_switch.isra.0
   3.23%  [kernel]                                               [k] default_idle_call
   2.46%  [kernel]                                               [k] mtk_poll_rx
   2.13%  [kernel]                                               [k] _copy_to_iter
   1.67%  [kernel]                                               [k] gro_receive_skb
   1.18%  [kernel]                                               [k] page_pool_alloc_pages
   0.84%  [kernel]                                               [k] handle_softirqs
   0.76%  [kernel]                                               [k] el0_svc
   0.70%  [kernel]                                               [k] __check_object_size
   0.62%  [kernel]                                               [k] dev_gro_receive
   0.61%  [kernel]                                               [k] tick_nohz_idle_exit
   0.56%  [nf_conntrack]                                         [k] nf_conntrack_tcp_packet
   0.55%  [kernel]                                               [k] page_pool_put_unrefed_netmem
   0.47%  [kernel]                                               [k] __pi_memset_generic
   0.45%  [kernel]                                               [k] mtk_start_xmit
   0.45%  [kernel]

top:

top - 23:23:03 up 6 days, 23:43,  2 users,  load average: 0.46, 0.24, 0.14
Tasks: 126 total, 1 running, 125 sleep, 0 d-sleep, 0 stopped, 0 zombie
%Cpu0  :  0.0 us,  8.2 sy,  0.0 ni, 91.4 id,  0.0 wa,  0.3 hi,  0.0 si,  0.0 st
%Cpu1  :  0.0 us,  5.5 sy,  0.0 ni, 19.2 id,  0.0 wa,  1.0 hi, 74.3 si,  0.0 st
%Cpu2  :  0.7 us, 26.4 sy,  0.0 ni, 67.8 id,  0.0 wa,  2.1 hi,  3.1 si,  0.0 st
%Cpu3  :  1.0 us, 25.3 sy,  0.0 ni, 73.0 id,  0.0 wa,  0.7 hi,  0.0 si,  0.0 st
MiB Mem :   3927.7 total,    722.2 free,    372.6 used,   2888.0 buff/cache
MiB Swap:   8192.0 total,   8189.8 free,      2.2 used.   3555.1 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 4193 root      20   0   41472   3892   3172 S  60.6   0.1   3:48.74 iperf3
   22 root      20   0       0      0      0 S   0.7   0.0   0:14.36 ksoftirqd/1
 2089 root      20   0       0      0      0 S   0.3   0.0   9:01.65 napi/phy0-0
 2090 root      20   0       0      0      0 S   0.3   0.0  10:05.01 napi/phy0-0
 4025 named     20   0  986484  35044   6928 S   0.3   0.9  13:28.81 named
 4124 root      20   0   14656

I’ve cleaned and refactored the commits from sandbox branch.

Not tested yet, building in progress.

Will update after testing.

P.S. no retransmit issue was addressed yet.

UPD. the kernel seems to be functional, no diiiference with previous tests spotted.

For anyone interested in trying out jumbo frames, here is a testscript you can run on 1 machine to test a connection using mtu 9000.

You’ll need to remove control of 2 interfaces from any manager like systemd-networkd or NetwokManager… Connect the 2 interfaces with real copper. Then edit the script intf1 and intf2.

When the first argument is veth you can test the script using a veth device pair (if this is added to kernel).

#!/bin/bash

# Run with sudo

if [[ "$1" == "veth" ]]; then
  ip link add name veth1a type veth peer name veth1b
  intf1="veth1a"
  intf2="veth1b"
else
 intf1="eth1"
 intf2="enu1u2c2"
fi
mtu="9000"

cleanup() {
  kill -9 $(pidof iperf3)
  if [[ "$1" == "veth" ]]; then
    ip -net ns1 link set $intf1 down
    ip -net ns2 link set $intf2 down
    ip -net ns1 link del name $intf1
  fi
  ip netns delete ns1
  ip netns delete ns2
  echo
}

trap cleanup EXIT

# NS1
ip netns add ns1
ip link set $intf1 netns ns1

# cleanup from any preveous attempt
ip -net ns1 link set $intf1 down
ip -net ns1 link set dev $intf1 mtu 1500
ip -net ns1 link set dev $intf1 nomaster         2> /dev/null
ip -net ns1 route del default                    2> /dev/null
ip -net ns1 addr  del 192.168.22.1/24 dev $intf1 2> /dev/null

ip -net ns1 link set $intf1 up mtu $mtu
ip -net ns1 addr add 192.168.22.1/24 broadcast 192.168.22.255 dev $intf1
ip -net ns1 route add default via 192.168.22.2 dev $intf1

# NS2
ip netns add ns2
ip link set $intf2 netns ns2

# cleanup from any preveous attempt
ip -net ns2 link set $intf2 down
ip -net ns2 link set dev $intf2 mtu 1500
ip -net ns1 link set dev $intf2 nomaster         2> /dev/null
ip -net ns2 route del default                    2> /dev/null
ip -net ns2 addr  del 192.168.22.2/24 dev $intf2 2> /dev/null

ip -net ns2 link set $intf2 up mtu $mtu
ip -net ns2 addr add 192.168.22.2/24 broadcast 192.168.22.255 dev $intf2
ip -net ns2 route add default via 192.168.22.1 dev $intf2

while ! ip -net ns1 link show dev $intf1 up 2>/dev/null | grep -q "state UP"
do sleep 0.2; done
while ! ip -net ns2 link show dev $intf2 up 2>/dev/null | grep -q "state UP"
do sleep 0.2; done

ip -net ns1 a show dev $intf1
ip -net ns1 r

ip -net ns2 a show dev $intf2
ip -net ns2 r

echo GO!

ip  netns exec ns2 ping -l 10 -c 3 192.168.22.1

ip  netns exec ns1 iperf3 -s &
ip  netns exec ns2 iperf3 -c 192.168.22.1 --bidir -t 2

Result:

#sudo ./mtutest.sh veth 
168: veth1a@if167: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    link/ether 9e:a9:8a:2a:ba:38 brd ff:ff:ff:ff:ff:ff link-netns ns2
    inet 192.168.22.1/24 brd 192.168.22.255 scope global veth1a
       valid_lft forever preferred_lft forever
    inet6 fe80::9ca9:8aff:fe2a:ba38/64 scope link tentative proto kernel_ll 
       valid_lft forever preferred_lft forever
default via 192.168.22.2 dev veth1a 
192.168.22.0/24 dev veth1a proto kernel scope link src 192.168.22.1 
167: veth1b@if168: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    link/ether 82:ef:f7:5e:61:66 brd ff:ff:ff:ff:ff:ff link-netns ns1
    inet 192.168.22.2/24 brd 192.168.22.255 scope global veth1b
       valid_lft forever preferred_lft forever
    inet6 fe80::80ef:f7ff:fe5e:6166/64 scope link tentative proto kernel_ll 
       valid_lft forever preferred_lft forever
default via 192.168.22.1 dev veth1b 
192.168.22.0/24 dev veth1b proto kernel scope link src 192.168.22.2 
GO!
PING 192.168.22.1 (192.168.22.1) 56(84) bytes of data.
64 bytes from 192.168.22.1: icmp_seq=1 ttl=64 time=0.039 ms
64 bytes from 192.168.22.1: icmp_seq=2 ttl=64 time=0.014 ms
64 bytes from 192.168.22.1: icmp_seq=3 ttl=64 time=0.015 ms

--- 192.168.22.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.014/0.022/0.039/0.011 ms, pipe 3
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Connecting to host 192.168.22.1, port 5201
Accepted connection from 192.168.22.2, port 44594
[  5] local 192.168.22.1 port 5201 connected to 192.168.22.2 port 44602
[  5] local 192.168.22.2 port 44602 connected to 192.168.22.1 port 5201
[  7] local 192.168.22.2 port 44616 connected to 192.168.22.1 port 5201
[  8] local 192.168.22.1 port 5201 connected to 192.168.22.2 port 44616
[ ID][Role] Interval           Transfer     Bitrate         Retr  Cwnd
[  5][TX-C]   0.00-1.00   sec  8.20 GBytes  70.4 Gbits/sec    1   1.60 MBytes       
[  7][RX-C]   0.00-1.03   sec  5.79 GBytes  48.4 Gbits/sec                  
[ ID][Role] Interval           Transfer     Bitrate         Retr  Cwnd
[  5][RX-S]   0.00-1.00   sec  8.20 GBytes  70.4 Gbits/sec                  
[  8][TX-S]   0.00-1.00   sec  5.66 GBytes  48.6 Gbits/sec    0   1.37 MBytes       
[  5][RX-S]   1.00-2.00   sec  8.14 GBytes  69.9 Gbits/sec                  
[  8][TX-S]   1.00-2.00   sec  6.19 GBytes  53.2 Gbits/sec    0   1.37 MBytes       
[  5][RX-S]   2.00-2.00   sec   512 KBytes  6.05 Gbits/sec                  
[  8][TX-S]   2.00-2.00   sec  0.00 Bytes  0.00 bits/sec    0   1.37 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][RX-S]   0.00-2.00   sec  16.3 GBytes  70.1 Gbits/sec                  receiver
[  8][TX-S]   0.00-2.00   sec  12.4 GBytes  53.2 Gbits/sec    0            sender
[  5][TX-C]   1.00-2.00   sec  7.89 GBytes  67.7 Gbits/sec    0   1.29 MBytes       
[  7][RX-C]   1.03-2.00   sec  6.61 GBytes  58.3 Gbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID][Role] Interval           Transfer     Bitrate         Retr
[  5][TX-C]   0.00-2.00   sec  16.3 GBytes  70.1 Gbits/sec    1            sender
[  5][TX-C]   0.00-2.00   sec  16.3 GBytes  70.1 Gbits/sec                  receiver
[  7][RX-C]   0.00-2.00   sec  12.4 GBytes  53.2 Gbits/sec    0            sender
[  7][RX-C]   0.00-2.00   sec  12.4 GBytes  53.2 Gbits/sec                  receiver

iperf Done.
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------

It is a lot easier to test with this script, then using 2 clients.

You could use eth1 - eth2, or you could use some usb ethernet enu1u2c2 - eth1

Could even connect eth1 - eth2 with a dac cable and compare speed 1500 with 9000