BPI R2 - improved network performance by RPS

Hi,

I was trying to figure out how to improve BPI R2 network performance and I stumbled upon this article https://www.spinics.net/lists/linux-doc/msg02975.html

I tried to configure RPS and I it actually worked. The result was really surprising. I got from 600Mbits to 940Mbits. In addition… the CPU load is now at 1%, instead of 100% for core0.

What RPS does, as I understand, is utilizing other CPU cores instead of one that is being interrupted.

I had compiled latest kernel from https://github.com/frank-w/BPI-R2-4.14/ v5.4

You can turn RPS by setting a value in /sys/class/net/{iface}/queues/rx-0/rps_cpus What you want, is to avoid core0, which is the only one being interrupted. I used value “e” for cores 1-3 by echo e > /sys/class/net/lan1/queues/rx-0/rps_cpus.

BTW Thank you frank-w for your work for community!

Other resources:

3 Likes

Did you test this with iperf or did some speed tests on Internet ?

I enabled RPS in my BananaPi-R2 box and did some Internert speed tests and indeed I saw some improvements. Before the speed was aroun 185Mbps and after it increased around 240 - 250 Mbps.

I have a gigabit connection 500 Mbps download / 250 Mbps upload.

I was testing it with iperf.

Setup was like this:

auto lan1
iface lan1 inet static
  address 10.128.0.1
  netmask 255.255.255.0

auto lan2
iface lan2 inet static
  address 10.128.1.1
  netmask 255.255.255.0

With default routing.

One computer had IP 10.128.0.10 and the other had 10.128.1.10. The only way I was able to achieve full Gbit without RPS was when I had both computers connected on the same interface using bridge. With RPS I had 940Mbit, like I wrote in the first post.

I was also testing it with my main router (Asus), which has a switch with local network 192.168.0.0/24 Using iperf without RPS, I had max. 400Mbit, but sometimes it went as low as 300Mbit. Once I enabled RPS, the speed was ~750Mbit, but I don’t think that the bottleneck was on the side of BPI. The server was at 192.168.0.92 and the client was at 10.128.0.10.

I think, that the main problem is, that the NIC has only one queue and it is interrupting only first core, which is by default the only core that performs routing. But because it gets interrupted all the time, the process of routing is slow. By moving the logic of routing to other cores, the performance improved to a maximum. At least that is what I think is happening.

Anyway, I am glad that it helped!

EDIT: RPS is not being persisted, so I have to enable it every time in post-up

I also have 500Mbit download, so I tested it. With most servers, I couldn’t achieve more than 250Mbit, but there are some that support more. I live in Prague and used “ISP Alliance”. With that, I achieved 443Mbit, but I think that the bottleneck is not BPI, but the server / ISP.

LAN0 is my main router, LAN1 is my notebook. I don’t use NAT on my BPI, just routing.

I see now that you are using LAN0 as WAN port. In my case all 4 LAN ports are bridged and I use the separate WAN port for Internet connection.

Right now I’m using NAT and I’m also behind another TP-link gigabit router which I plan to get rid off soon. I think this might also slow down download speed. From that router I can reach up to 480 Mbps to some close servers.

I’m using an older custom version of 5.4 kernel (5.4.2) also from Frank repository.

Also what distro are you running on the BananaPi-R2 ? I’m using a self made Debian Buster image.

I used WAN instead of LAN0 and replaced computer on which I was testing. I used more recently bought NB with Windows. Results were more consistent and I got the same results when I was in network behind BPI (10.128.0.1/24) as when I was connected to local network of my main router (192.168.0.1/24) This is a result from network 10.128.0.1 (behind BPI) when using WAN. But like I said, our results are not comparable, because I don’t use NAT on my BPI, but on my main router. R2 should have HW NAT, so I don’t understand where you lose bandwidth.

I downloaded Debian Buster from GDrive and overwrote it with compiled kernel from this commit

image

hw nat is only ported to 4.9 and 4.14 (where it is reported as unstable), later kernels do not have hwnat-driver

I tested performance with software NAT. The higher CPU utilization is visible, but R2 with RPS still handles it pretty well. You need to have RPS for all active inerfaces.

EDIT: I found out very interesting thing. You need to assign each interface it’s own cpu core for best performance. For example… if I had 2 different networks behind wan and br0, I would have to assign a different core for wan and different core for br0. I updated my config below. wan uses core1 and br0 uses core2. If I had one more network, I would assign it core3. If I used NAT on WAN, I would give it more CPUs, so instead of echo 2, for core1, I would put echo 6 for core1 and core2. Then core3 would be left for br0 which you can set by echo 8.

Here is my /etc/network/interfaces

auto wan
iface wan inet manual
  post-up echo 2 > /sys/class/net/wan/queues/rx-0/rps_cpus
  hwaddress ether e6:82:86:7f:6d:23
  up /sbin/dhclient -v -cf /etc/dhcp/dhclient.conf wan
  down /sbin/dhclient -v -r -cf /etc/dhcp/dhclient.conf wan

auto lan0
iface lan0 inet manual

auto lan1
iface lan1 inet manual
#iface lan1 inet static
#  address 10.128.1.1
#  netmask 255.255.255.0

auto lan2
iface lan2 inet manual
#iface lan2 inet static
#  address 10.128.0.1
#  netmask 255.255.255.0

auto lan3
iface lan3 inet manual

auto br0
iface br0 inet static
  post-up echo 4 > /sys/class/net/br0/queues/rx-0/rps_cpus
  address 10.128.0.1
  netmask 255.255.255.0
  bridge_ports lan0 lan1 lan2 lan3
  bridge_fd 5
  bridge_stp no

and ip route

default via 78.102.53.1 dev wan
10.128.0.0/24 dev br0 proto kernel scope link src 10.128.0.1
78.102.53.0/24 dev wan proto kernel scope link src 78.102.53.13
192.168.0.0/25 dev lan0 proto kernel scope link src 192.168.0.11

I did a new speed test using recommendations from @daniel.vrsek and applied the below commands:

echo 6 > /sys/class/net/wan/queues/rx-0/rps_cpus

echo 8 > /sys/class/net/br0/queues/rx-0/rps_cpus

Initially all RPS values were 0 for the WAN and bridge br0 interfaces and I got maximum 210Mbps on the speed test.

After applying the below 2 commands the speed tests greatly improved as shown below.

These values are lost after reboot so if you’re running Debian/Ubuntu and using /etc/network/interfaces you need to add these two commands in the post-up section of each interface

auto wan
iface wan inet manual
  post-up echo 6 > /sys/class/net/wan/queues/rx-0/rps_cpus

auto br0
iface br0 inet static
  post-up echo 8 > /sys/class/net/br0/queues/rx-0/rps_cpus

For those using systemd-networkd to make these values permanent you need to create a systemd service (I called it rps-enable.service) with the following content

[Service]
Type=oneshot
ExecStart=/usr/bin/bash -c "/usr/bin/echo 6 > /sys/class/net/wan/queues/rx-0/rps_cpus"
ExecStart=/usr/bin/bash -c "/usr/bin/echo 8 > /sys/class/net/br0/queues/rx-0/rps_cpus"

[Install]
WantedBy=systemd-networkd.service

and then enable to service to start at boot

systemctl enable rps-enable.service

Hope this will help someone in the future. Thanks again @daniel.vrsek for advices.

3 Likes