[mtk_eth_soc] noob question, why there are no prefetches in mtk drivers like in many others ethernet drivers?

the same could be done for mt76 driver, right?

Yes, I’ll try that when I have time

Btw I’m porting a fix on mt76 driver for mt7986/81, it would fix a bug when WDS+WED are enabled together, and welcome to test that:

https://patchwork.kernel.org/project/linux-wireless/patch/OS3P286MB259748F6FA6BE628C3295D7498D92@OS3P286MB2597.JPNP286.PROD.OUTLOOK.COM/

1 Like

On MT7981+MT7976CN decreasing rings size led to increase single stream speed for 20%

Setup: Two devices with MT7981+MT7976CN without WED, 6.6 kernel, mt76 with patches from mtk-feeds and cmonroe. 2x2 160MHz AX

AP(iperf3) → STA(iperf3) from 1100mbps to 1300mpbs

AP(iperf3) ← STA(iperf3) from 950-980 to 1200mbps

Less cache misses, because less data pushed out of cache? In mt76 driver so much data is stored for each descriptor. Can we change q->ndesc on the fly? Use XX_RING_SIZE for allocation, but use only needed number of descriptors based on some conditions. So we will not push data from the cache.

--- a/dma.c
+++ b/dma.c
@@ -883,6 +883,7 @@ mt76_dma_rx_process(struct mt76_dev *dev, struct mt76_queue *q, int budget)
                    !(dev->drv->rx_check(dev, data, len)))
                        goto free_frag;

+               net_prefetch(data + q->buf_offset);
                skb = napi_build_skb(data, q->buf_size);
                if (!skb)
                        goto free_frag;
diff --git a/mt7915/mt7915.h b/mt7915/mt7915.h
index a30d08e..81a6b88 100644
--- a/mt7915/mt7915.h
+++ b/mt7915/mt7915.h
@@ -19,11 +19,11 @@
 #define MT7915_WATCHDOG_TIME           (HZ / 10)
 #define MT7915_RESET_TIMEOUT           (30 * HZ)

-#define MT7915_TX_RING_SIZE            2048
+#define MT7915_TX_RING_SIZE            256
 #define MT7915_TX_MCU_RING_SIZE                256
 #define MT7915_TX_FWDL_RING_SIZE       128

-#define MT7915_RX_RING_SIZE            1536
+#define MT7915_RX_RING_SIZE            128
 #define MT7915_RX_MCU_RING_SIZE                512

 #define MT7915_FIRMWARE_WA             "mediatek/mt7915_wa.bin"

I don’t think it has anything to do with the cpu cache, it could be because the dma is too slow and transferring shorter buffers is more ideal. Did you run several tests? It’s hard to benchmark on wifi

I had 1100mbps AP-> STA without net_prefetch before napi_build_skb. So it’s should be part of the boost. And I did many tests in order to isolate problem which comes with >100 patches for mt76 and mac80211, I’m sure about these numbers.

With small rings net_prefetch

AP → STA

1S

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.01  sec  4.62 GBytes  1.32 Gbits/sec    0             sender
[  5]   0.00-30.00  sec  4.62 GBytes  1.32 Gbits/sec                  receiver

8S

[SUM]   0.00-30.29  sec  4.84 GBytes  1.37 Gbits/sec   18             sender
[SUM]   0.00-30.00  sec  4.83 GBytes  1.38 Gbits/sec                  receiver

STA → AP

1S

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  4.18 GBytes  1.20 Gbits/sec    0             sender
[  5]   0.00-30.01  sec  4.18 GBytes  1.20 Gbits/sec                  receiver

8S

[SUM]   0.00-30.00  sec  4.49 GBytes  1.28 Gbits/sec    0             sender
[SUM]   0.00-30.06  sec  4.48 GBytes  1.28 Gbits/sec                  receiver

With big rings and net_prefetch

AP → STA

1S

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.01  sec  4.16 GBytes  1.19 Gbits/sec    0             sender
[  5]   0.00-30.00  sec  4.16 GBytes  1.19 Gbits/sec                  receiver

8S

[SUM]   0.00-30.05  sec  4.64 GBytes  1.33 Gbits/sec   10             sender
[SUM]   0.00-30.00  sec  4.63 GBytes  1.32 Gbits/sec                  receiver

STA->AP

1S

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-30.00  sec  3.42 GBytes   980 Mbits/sec    0             sender
[  5]   0.00-30.01  sec  3.42 GBytes   979 Mbits/sec                  receiver

8S

[SUM]   0.00-30.03  sec  4.79 GBytes  1.37 Gbits/sec    9             sender
[SUM]   0.00-30.04  sec  4.79 GBytes  1.37 Gbits/sec                  receiver

Also I’m seeing strange things with standard rings

WIth iperf for RX with big rings

2.62% napi/phy0-10 [mt76] [k] mt76_dma_queue_reset

With small rings less then 0.15% or zero ( I didn’t copy whole report )

What about small rings without prefetch?

Double-checked. Actually prefetch here doesn’t do anything good. Post with iperf3 results was about small rings vs big rings, both are with prefetch.

With small ring there is no mt76_dma_queue_reset in perf report

That’s what I thought. I’ll look into the queue reset later

The problem depends on MT7915_TX_RING_SIZE and starts from around 300.

Issue MT7981: MT7915_TX_RING_SIZE > 300 affects performance · Issue #902 · openwrt/mt76 · GitHub

about your patches, I think that mt7621 also requires 2B alining

page_pool isn’t working on mt7621 without add 2 bytes

#define MTK_PP_HEADROOM XDP_PACKET_HEADROOM + NET_IP_ALIGN

https://lore.kernel.org/all/YyB2L8dfnJfnrqWI@lore-desk/