I would like to ask for some guidance, because I can’t reach my goal which is the following. I have a BPI-R2 and an 1Gbps connection from my ISP. I experienced that the download bandwidth with the latest SNAPSHOT (21.02) are around 600Mbps. But with a prebuilt image (downloaded from Frankw Drive folder and from an other Sinovoip shared folder) which is 19.07 the download speed goes up to 900Mbps. From this point of view the prebuilt ones would meet my requirement, but the RAM amount is 1GB instead of 2GB. So I was trying to clone 19.07 imagebuilder from Openwrt github repo and make an own image. This imgbuilder contains no ‘Boot loaders’ in menuconfig, therefore I get no image file that I could write to SD card (while 21.02 has this). I am not an expert in the way the whole procedure is built and the kernel is compiled, so if anyone could help me out in this matter, I would really appreciate it.
Thanks in advance!
I guess cpu is limiting as you may use nat,right?
So you can tune at 2 positions:
- change cpu_affinity for ethernet irqs
- activate hwnat (kernel 5.15+ if openwrt supports this),but i guess openwrt is still at 5.10
In OpenWrt we got hwnat backported to Linux 5.10 and also @frank-w’s patch enabling it on MT7623.
However, you have to use nightly snapshots of the main development tree (not 21.02-SNAPSHOT) in order to get to enjoy that as 21.02 is using Linux 5.4.
Thanks for your suggestions!
@frank-w on my eMMC I have r18454 (coming with 5.10.88) and tried to change the affinity:
echo 4 > /proc/irq/51/smp_affinity
echo 4 > /proc/irq/52/smp_affinity
after this values appeared on CPU2 (before changing only CPU0 had any), but the speedtest results did not increased.
51: 60662 0 161953 0 MT_SYSIRQ 199 Level 1b100000.ethernet
52: 67423 0 168018 0 MT_SYSIRQ 198 Level 1b100000.ethernet
I dl’ed and wrote r18498-b61d756b6c to SD, which has indeed 5.10.89, but it did not reached even 700Mbps.
In addition I don’t know anything about that patch from Frank, which would help.
Is there any way to create an SD image for 19.07.x? This is my biggest problem… preloader, uboot, partitions and such, I don’t understand how make an img that is accepted by the BPI.
Did you enable flow offloading in the firewall settings?
Ie. either do that in Firewall settings in LuCI or edit
/etc/config/firewall and add the settings to the
option input 'ACCEPT'
option output 'ACCEPT'
option forward 'REJECT'
option synflood_protect '1'
option flow_offloading '1'
option flow_offloading_hw '1'
(this all only works on recent snapshots)
Also enable packet steering to make use of all CPU cores for packet processing. This can be found in Network -> Global network options tab in LuCI or by setting the option
globals section in
uci set network.globals.packet_steering='1'
uci commit network
I never tried using 19.07 or anything older than Linux 5.10 on that board, so I wonder why out-of-the-box performance differs so significantly. What I do know is that the image generation code was a bit messy and I cleaned it up this year, so with recent snapshot build (and the upcoming OpenWrt 22.xx release) ImageBuild should work.
Honestly, I did not experienced any changes when flow_offloading_hw was turned on through LuCI. SW offload was on always, and now I’ve added packet steering aswell, (and added affinity changes to rc.local aswell) speedtest.net dl is now around 700Mbps, and with speedtest-netperf.sh:
Download: 679.82 Mbps
Latency: [in msec, 60 pings, 0.00% packet loss]
CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 54 samples]
cpu0: 54.4 +/- 12.8 @ 1300 MHz
cpu1: 43.6 +/- 10.3 @ 1300 MHz
cpu2: 61.3 +/- 12.0 @ 1300 MHz
cpu3: 72.0 +/- 16.1 @ 1300 MHz
Overhead: [in % used of total CPU available]
Upload: 69.64 Mbps
Latency: [in msec, 62 pings, 0.00% packet loss]
CPU Load: [in % busy (avg +/- std dev) @ avg frequency, 58 samples]
cpu0: 5.9 +/- 5.5 @ 1300 MHz
cpu1: 2.0 +/- 3.6 @ 1300 MHz
cpu2: 10.6 +/- 6.1 @ 1300 MHz
cpu3: 22.3 +/- 9.6 @ 1300 MHz
Overhead: [in % used of total CPU available]
51: 69 0 1252773 0 MT_SYSIRQ 199 Level 1b100000.ethernet
52: 88 0 1451500 0 MT_SYSIRQ 198 Level 1b100000.ethernet
If you would like, I can do more tests tomorrow, with variations with the mentioned settings
I assume you are using the device as router, ie. with NAT and connection tracking, because only in that case you benefit from flow offloading. If so, you should notice a least a change in CPU load while routing packets depending on if
flow_offloading_hw is switched on or not.
That’s basically two times the same thing. Doing either of the two will do the job.
Thanks for the explanation regarding the packet steering!
Yes I would use the board as an ethernetrouter, nothing fancy, no hdmi, no SATA, no wireless.
As playing with offload settings and packet steering changes in CPU utilization indeed were visible, but the bandwidth did not go higher than 700mbps. This is my biggest problem, “the Mercedes AMG only can go with a speed of a Smart”.
OpenWrt also currently supports only one of the two Ethernet units found in the SoC. Both are connected to the switch. To achieve full bandwidth, you may need to use both GMACs.
I have created a branch in my staging tree which adds the patches from @frank-w’s 5.10-hnat+gmac2 tree to OpenWrt.
@frank-w do 2nd-gmac and hwnat now both work in your 5.10-hnat+gmac2 tree?
If so, this should now work with OpenWrt in the same way.
At this moment, this cannot go into mainline OpenWrt as the support for multiple CPU ports in DSA has not yet been upstreamed successfully.
However, in a staging tree to try it’s ok
This is how to build it, so you can try:
git clone -b bpi-multicpu https://git.openwrt.org/openwrt/staging/dangole.git openwrt-dsa-multicpu
scripts/feeds update -a
scripts/feeds install -a
# select MediaTek ARM -> MT7623 -> BananaPi BPi-R2
# enable LuCI and/or whatever else you would like
# find the resulting image in bin/targets/mediatek/mt7623
You will need all build prerequisites installed, see
distribution-specific instructions in the OpenWrt Wiki on how to do that in most Linux distributions.
If you don’t want to build yourself, poke me and I will share my binary images with you.
Afair it was working so far
Btw. There is another option using a vlan aware bridge to bypass dsa cpu port and connect wan to second gmac.
I have done it with systemd and help from @ericwoud,maybe it is possible with openwrt too
See second ethernet lane (gmac)
any my Patch daniel may add to openwrt is for supporting hwnat on mt7623 after backporting hw-offloading patches. maybe the dsa-second gmac-patches from deng quingfang
This method of using vlan to connect 2nd GMAC to wan has the advantage that it does not need any patch to source code. Only thing is to add the port in the device tree and setup vlan-aware bridge correctly
Like so: https://github.com/ericwoud/buildR64ubuntu/blob/master/linux-5.16-rc3/dsaport5.patch
And systemd-networkd setup like so: https://github.com/ericwoud/buildR64ubuntu/tree/master/rootfs-arch/etc/systemd/network-RT
This puts aux and wan ports in a separate vlan on the bridge. Be sure not to use this vlan id anywhere else on the bridge, specially not in the bridge’s self port.
And at last setup eth1 as wan port instead of the wan port itself
Is there an ongoing effort to mainline multi-CPU-port support in DSA? Or has this been dropped in favor of just connecting GMAC2 as if it was a client port? (which, of course, works but multi-CPU-port DSA seems to be the better model if you ask me).
Do you gain any performance by doing so?! And shouldn’t we speak TAG_MTK proto on eth1 as well rather than plain 802.3 (if the setup is with multi-CPU-port patch)?
afair the last attempt was dropped, as DSA maintainer had no real detail plan to support a second cpu-port.
They wanted to be configurable with userspace tools (ip) and there were multiple ideas how to do (setting 1 part as master to add others slaves), but no detailed way to follow
the way with vlan-aware bridge is a way without kernel patches (if second gmac are available as interfaces) to configure in userspace. there is no need for additional tagging except vlan-tag done in the bridge
edit: just a link to my tree containing the last second gmac-patches i know
The vlanbridge solution also divides traffic over eth0 and eth1, so the gain would be only under heavy traffic conditions, as with the multi-cpu-dsa solution.
The multi-cpu-dsa would indeed be better, but implementation of the vlanbridge solution is soo much more simple, I personally still prefer using it, as long as multi-cpu-dsa is not mainline.
Hi @dangowrt and gents,
I’ve built and tested a few times the staging tree you shared, and I was able to reach 900+Mbps speed. (although I have ~650 now, but I consider the backbone has a higher load now). I had some trouble with Wireguard (then noticed I won’t use it) so removed and since then I experience nothing unusual.So I would say that’s a good approach to take the most out of this board. Thank you all for your work!