the same could be done for mt76 driver, right?
Yes, I’ll try that when I have time
Btw I’m porting a fix on mt76 driver for mt7986/81, it would fix a bug when WDS+WED are enabled together, and welcome to test that:
On MT7981+MT7976CN decreasing rings size led to increase single stream speed for 20%
Setup: Two devices with MT7981+MT7976CN without WED, 6.6 kernel, mt76 with patches from mtk-feeds and cmonroe. 2x2 160MHz AX
AP(iperf3) → STA(iperf3) from 1100mbps to 1300mpbs
AP(iperf3) ← STA(iperf3) from 950-980 to 1200mbps
Less cache misses, because less data pushed out of cache? In mt76 driver so much data is stored for each descriptor. Can we change q->ndesc on the fly? Use XX_RING_SIZE for allocation, but use only needed number of descriptors based on some conditions. So we will not push data from the cache.
--- a/dma.c
+++ b/dma.c
@@ -883,6 +883,7 @@ mt76_dma_rx_process(struct mt76_dev *dev, struct mt76_queue *q, int budget)
!(dev->drv->rx_check(dev, data, len)))
goto free_frag;
+ net_prefetch(data + q->buf_offset);
skb = napi_build_skb(data, q->buf_size);
if (!skb)
goto free_frag;
diff --git a/mt7915/mt7915.h b/mt7915/mt7915.h
index a30d08e..81a6b88 100644
--- a/mt7915/mt7915.h
+++ b/mt7915/mt7915.h
@@ -19,11 +19,11 @@
#define MT7915_WATCHDOG_TIME (HZ / 10)
#define MT7915_RESET_TIMEOUT (30 * HZ)
-#define MT7915_TX_RING_SIZE 2048
+#define MT7915_TX_RING_SIZE 256
#define MT7915_TX_MCU_RING_SIZE 256
#define MT7915_TX_FWDL_RING_SIZE 128
-#define MT7915_RX_RING_SIZE 1536
+#define MT7915_RX_RING_SIZE 128
#define MT7915_RX_MCU_RING_SIZE 512
#define MT7915_FIRMWARE_WA "mediatek/mt7915_wa.bin"
I don’t think it has anything to do with the cpu cache, it could be because the dma is too slow and transferring shorter buffers is more ideal. Did you run several tests? It’s hard to benchmark on wifi
I had 1100mbps AP-> STA without net_prefetch before napi_build_skb. So it’s should be part of the boost. And I did many tests in order to isolate problem which comes with >100 patches for mt76 and mac80211, I’m sure about these numbers.
With small rings net_prefetch
AP → STA
1S
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-30.01 sec 4.62 GBytes 1.32 Gbits/sec 0 sender
[ 5] 0.00-30.00 sec 4.62 GBytes 1.32 Gbits/sec receiver
8S
[SUM] 0.00-30.29 sec 4.84 GBytes 1.37 Gbits/sec 18 sender
[SUM] 0.00-30.00 sec 4.83 GBytes 1.38 Gbits/sec receiver
STA → AP
1S
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-30.00 sec 4.18 GBytes 1.20 Gbits/sec 0 sender
[ 5] 0.00-30.01 sec 4.18 GBytes 1.20 Gbits/sec receiver
8S
[SUM] 0.00-30.00 sec 4.49 GBytes 1.28 Gbits/sec 0 sender
[SUM] 0.00-30.06 sec 4.48 GBytes 1.28 Gbits/sec receiver
With big rings and net_prefetch
AP → STA
1S
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-30.01 sec 4.16 GBytes 1.19 Gbits/sec 0 sender
[ 5] 0.00-30.00 sec 4.16 GBytes 1.19 Gbits/sec receiver
8S
[SUM] 0.00-30.05 sec 4.64 GBytes 1.33 Gbits/sec 10 sender
[SUM] 0.00-30.00 sec 4.63 GBytes 1.32 Gbits/sec receiver
STA->AP
1S
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-30.00 sec 3.42 GBytes 980 Mbits/sec 0 sender
[ 5] 0.00-30.01 sec 3.42 GBytes 979 Mbits/sec receiver
8S
[SUM] 0.00-30.03 sec 4.79 GBytes 1.37 Gbits/sec 9 sender
[SUM] 0.00-30.04 sec 4.79 GBytes 1.37 Gbits/sec receiver
Also I’m seeing strange things with standard rings
WIth iperf for RX with big rings
2.62% napi/phy0-10 [mt76] [k] mt76_dma_queue_reset
With small rings less then 0.15% or zero ( I didn’t copy whole report )
What about small rings without prefetch?
Double-checked. Actually prefetch here doesn’t do anything good. Post with iperf3 results was about small rings vs big rings, both are with prefetch.
With small ring there is no mt76_dma_queue_reset in perf report
That’s what I thought. I’ll look into the queue reset later
The problem depends on MT7915_TX_RING_SIZE and starts from around 300.
Issue MT7981: MT7915_TX_RING_SIZE > 300 affects performance · Issue #902 · openwrt/mt76 · GitHub
about your patches, I think that mt7621 also requires 2B alining
page_pool isn’t working on mt7621 without add 2 bytes
#define MTK_PP_HEADROOM XDP_PACKET_HEADROOM + NET_IP_ALIGN