Topic name is a question.
your hardware is BPI-R3 or BPI-R4, BPI-R64 ? which one ?
OP raised an āinterestingā question but likely left most readers scratching their heads.
What is the āprefetchā feature? Does BPI R4ās ethernet driver have it?
Yes, a bit more context would be goodā¦and smaller title.
I guess we can add few prefetchs in the hotpaths, Iām not sure if we would see any great improvement though. Good suggestion
Iām talking about net_prefetch in mtk_eth_soc drivers. Like prefetching next RX/TX descriptor
grepping ānet_prefetchā on the kernel tree doesnāt show itās widely called. Particularly, none of the ARM SBCs seem to use it.
ARM has hardware data prefetch. Additionally, compilers should have auto optimisation to prefetch data (?). Are there real advantages to āmanuallyā do prefetch?
If you tell me where in mtk_eth_soc to insert ānet_prefetchā, I can try and test if there is any measurable benefit.
HW prefetch might fail to recognise the pattern to prefetch the desired data. I believe the compiler inserts prefetch instructions only on simple loops, I could be wrong though. We can give it a try on the hotpaths, itās common to add manual hints there. Obviously it might have a negative effect as well
And this all thing is basically unnecessary when using HW offloading
I have a concrete test case. Might be able to help quickly determine any benefits.
ARM SBCs are getting more & more capable. Like MT7988a, itās marketed as for routers and/or application servers such NAS. If there is measurable benefit of ānet_prefetchā, we can find good uses.
letās try this quick modification
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 29b4aea..03759f7 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2081,6 +2081,8 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
struct xdp_buff xdp;
u32 ret;
+ net_prefetch(data + MTK_PP_HEADROOM);
+
new_data = mtk_page_pool_get_buff(ring->page_pool,
&dma_addr,
GFP_ATOMIC);
give it a try without HW offload
I prepared two images, one without the patch and one with the above patch. And made sure the code patched in the patched image. Back and forth refreshed a couple of times and performed tests. And run the following test many times for each image.
I ran Ooklaās speedtest CLI on BPI R4. Itās a lightweight application, surely lighter than a torrent client. āspeedtestā easily saturated my 1000M/1000M internet connection, one direction each time. My measured CPU utilisation was only ~7.8% that already included ~2.5% consumed by āspeedtestā process itself.
I would say the difference is in measuring error. Without the patch, CPU utilisation is 7-ish percent. With the patch, itās in the low 8-ish percent. Consistently reproducible. The speedtest server is fixed. Iām only less than 1ms away from it. Achieved speed is about the same: download between 940-943Mbps. upload slightly lower.
I think itās worth looking further where else to add prefetch or if could add in different ways. Iāll help to test with further changes.
(fyi. Iām stepping out for a little while. Will test & respond while Iām back to my desk again)
EDIT: corrected typo: āsecondsā to āpercentā
letās try this one
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 29b4aea..03759f7 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2104,6 +2104,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
if (ret != XDP_PASS)
goto skip_rx;
+ net_prefetch(xdp->data_meta);
skb = build_skb(data, PAGE_SIZE);
if (unlikely(!skb)) {
I suspect that it will be the same as the one before though. if it doesnāt help then iāll try to prefetch the next index instead
another approach
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 29b4aea..03759f7 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2017,7 +2017,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
struct dim_sample dim_sample = {};
struct mtk_rx_ring *ring;
bool xdp_flush = false;
- int idx;
+ int idx, next_idx;
struct sk_buff *skb;
u64 addr64 = 0;
u8 *data, *new_data;
@@ -2040,6 +2040,10 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
rxd = ring->dma + idx * eth->soc->rx.desc_size;
data = ring->data[idx];
+ next_idx = NEXT_DESP_IDX(idx, ring->dma_size);
+ prefetch(ring->dma + next_idx * eth->soc->rx.desc_size);
+ net_prefetch(ring->data[next_idx] + MTK_PP_HEADROOM);
+
if (!mtk_rx_get_desc(eth, &trxd, rxd))
break;
EDIT: prefetch the next rxd as well
For Patch 2, I had to change from āxdp->data_metaā to āxdp.data_metaā to get it compiled.
Also think it through my previous test again. The differences in no patch and various are too little. I need to eliminate the variance of my ISP assigning me to a different public IP block. So I came up with a new test.
Now the new test is all local, essentially performing āiperf3 -B 192.168.2.1 -c 192.168.1.138 -i 0 -t 15 -Rā on my BPI-R4.
Here are the interesting results. Patch 2 looks interesting and worth a further look.
I just ran a little test myself, the second patch did improve the max RX rate from avg 4.73Gbps to 4.97Gbps. Will investigate it further soon
Iām now running OpenWrt snapshot with Patch 2 on my BPI R4. The benefit is around less than 5% but surely can be measured
Iāll prepare a proper patch when I have time. The TX needs to be taken care for as well. (And the other RX case)
@Dale also, do you think that itās useful? Iām not seeing it in upstream.
From d8a2975939a12686c4a95c40db21efdc3f821f63 Mon Sep 17 00:00:00 2001
From: developer <[email protected]>
Date: Fri, 19 Aug 2022 13:32:03 +0800
Subject: [PATCH] [][kernel][common][eth][Fix GMAC data corruption issue]
[Description]
Fix GMAC data corruption issue.
If without this patch, kernel might receive invalid packets that are corrupted by GMAC.
[Release-log]
N/A
Change-Id: I8a9f00402e4b8d9181a8c4c518665a16619cdc0a
Reviewed-on: https://gerrit.mediatek.inc/c/openwrt/feeds/mtk_openwrt_feeds/+/6398721
---
.../files-5.4/drivers/net/ethernet/mediatek/mtk_eth_soc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/linux/mediatek/files-5.4/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/target/linux/mediatek/files-5.4/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 60939f20..54674ada 100755
--- a/target/linux/mediatek/files-5.4/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/target/linux/mediatek/files-5.4/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -581,7 +581,7 @@
/* Mac control registers */
#define MTK_MAC_MCR(x) (0x10100 + (x * 0x100))
#define MAC_MCR_MAX_RX_1536 BIT(24)
-#define MAC_MCR_IPG_CFG (BIT(18) | BIT(16))
+#define MAC_MCR_IPG_CFG (BIT(18) | BIT(16) | BIT(12))
#define MAC_MCR_FORCE_MODE BIT(15)
#define MAC_MCR_TX_EN BIT(14)
#define MAC_MCR_RX_EN BIT(13)
itās there, just more clearly defined.
#define MAC_MCR_RX_FIFO_CLR_DIS
BTW, iāve submitted this patch as RFC https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
will send it to net-next when it opens again