[mtk_eth_soc] noob question, why there are no prefetches in mtk drivers like in many others ethernet drivers?

Topic name is a question.

your hardware is BPI-R3 or BPI-R4, BPI-R64 ? which one ?

OP raised an ā€˜interestingā€™ question but likely left most readers scratching their heads. :smiley:

What is the ā€œprefetchā€ feature? Does BPI R4ā€™s ethernet driver have it?

Yes, a bit more context would be goodā€¦and smaller title.

I guess we can add few prefetchs in the hotpaths, Iā€™m not sure if we would see any great improvement though. Good suggestion

1 Like

Iā€™m talking about net_prefetch in mtk_eth_soc drivers. Like prefetching next RX/TX descriptor

grepping ā€˜net_prefetchā€™ on the kernel tree doesnā€™t show itā€™s widely called. Particularly, none of the ARM SBCs seem to use it.

ARM has hardware data prefetch. Additionally, compilers should have auto optimisation to prefetch data (?). Are there real advantages to ā€˜manuallyā€™ do prefetch?

If you tell me where in mtk_eth_soc to insert ā€˜net_prefetchā€™, I can try and test if there is any measurable benefit.

HW prefetch might fail to recognise the pattern to prefetch the desired data. I believe the compiler inserts prefetch instructions only on simple loops, I could be wrong though. We can give it a try on the hotpaths, itā€™s common to add manual hints there. Obviously it might have a negative effect as well

And this all thing is basically unnecessary when using HW offloading

I have a concrete test case. Might be able to help quickly determine any benefits.

ARM SBCs are getting more & more capable. Like MT7988a, itā€™s marketed as for routers and/or application servers such NAS. If there is measurable benefit of ā€˜net_prefetchā€™, we can find good uses.

letā€™s try this quick modification

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 29b4aea..03759f7 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2081,6 +2081,8 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 			struct xdp_buff xdp;
 			u32 ret;
 
+			net_prefetch(data + MTK_PP_HEADROOM);
+
 			new_data = mtk_page_pool_get_buff(ring->page_pool,
 							  &dma_addr,
 							  GFP_ATOMIC);

give it a try without HW offload

I prepared two images, one without the patch and one with the above patch. And made sure the code patched in the patched image. Back and forth refreshed a couple of times and performed tests. And run the following test many times for each image.

I ran Ooklaā€™s speedtest CLI on BPI R4. Itā€™s a lightweight application, surely lighter than a torrent client. ā€˜speedtestā€™ easily saturated my 1000M/1000M internet connection, one direction each time. My measured CPU utilisation was only ~7.8% that already included ~2.5% consumed by ā€˜speedtestā€™ process itself.

I would say the difference is in measuring error. Without the patch, CPU utilisation is 7-ish percent. With the patch, itā€™s in the low 8-ish percent. Consistently reproducible. The speedtest server is fixed. Iā€™m only less than 1ms away from it. Achieved speed is about the same: download between 940-943Mbps. upload slightly lower.

I think itā€™s worth looking further where else to add prefetch or if could add in different ways. Iā€™ll help to test with further changes.

(fyi. Iā€™m stepping out for a little while. Will test & respond while Iā€™m back to my desk again)

EDIT: corrected typo: ā€˜secondsā€™ to ā€˜percentā€™

letā€™s try this one

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 29b4aea..03759f7 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2104,6 +2104,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 
 			if (ret != XDP_PASS)
 				goto skip_rx;
+			net_prefetch(xdp->data_meta);
 
 			skb = build_skb(data, PAGE_SIZE);
 			if (unlikely(!skb)) {

I suspect that it will be the same as the one before though. if it doesnā€™t help then iā€™ll try to prefetch the next index instead

another approach

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 29b4aea..03759f7 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -2017,7 +2017,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 	struct dim_sample dim_sample = {};
 	struct mtk_rx_ring *ring;
 	bool xdp_flush = false;
-	int idx;
+	int idx, next_idx;
 	struct sk_buff *skb;
 	u64 addr64 = 0;
 	u8 *data, *new_data;
@@ -2040,6 +2040,10 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 		rxd = ring->dma + idx * eth->soc->rx.desc_size;
 		data = ring->data[idx];
 
+		next_idx = NEXT_DESP_IDX(idx, ring->dma_size);
+		prefetch(ring->dma + next_idx * eth->soc->rx.desc_size);
+		net_prefetch(ring->data[next_idx] + MTK_PP_HEADROOM);
+
 		if (!mtk_rx_get_desc(eth, &trxd, rxd))
 			break;
 

EDIT: prefetch the next rxd as well

For Patch 2, I had to change from ā€˜xdp->data_metaā€™ to ā€˜xdp.data_metaā€™ to get it compiled.

Also think it through my previous test again. The differences in no patch and various are too little. I need to eliminate the variance of my ISP assigning me to a different public IP block. So I came up with a new test.

Now the new test is all local, essentially performing ā€˜iperf3 -B 192.168.2.1 -c 192.168.1.138 -i 0 -t 15 -Rā€™ on my BPI-R4.

Here are the interesting results. Patch 2 looks interesting and worth a further look.

I just ran a little test myself, the second patch did improve the max RX rate from avg 4.73Gbps to 4.97Gbps. Will investigate it further soon

Iā€™m now running OpenWrt snapshot with Patch 2 on my BPI R4. The benefit is around less than 5% but surely can be measured :smiley:

Iā€™ll prepare a proper patch when I have time. The TX needs to be taken care for as well. (And the other RX case)

2 Likes

@Dale also, do you think that itā€™s useful? Iā€™m not seeing it in upstream.

From d8a2975939a12686c4a95c40db21efdc3f821f63 Mon Sep 17 00:00:00 2001
From: developer <[email protected]>
Date: Fri, 19 Aug 2022 13:32:03 +0800
Subject: [PATCH] [][kernel][common][eth][Fix GMAC data corruption issue]

[Description]
Fix GMAC data corruption issue.

If without this patch, kernel might receive invalid packets that are corrupted by GMAC.

[Release-log]
N/A

Change-Id: I8a9f00402e4b8d9181a8c4c518665a16619cdc0a
Reviewed-on: https://gerrit.mediatek.inc/c/openwrt/feeds/mtk_openwrt_feeds/+/6398721
---
 .../files-5.4/drivers/net/ethernet/mediatek/mtk_eth_soc.h       | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/linux/mediatek/files-5.4/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/target/linux/mediatek/files-5.4/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 60939f20..54674ada 100755
--- a/target/linux/mediatek/files-5.4/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/target/linux/mediatek/files-5.4/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -581,7 +581,7 @@
 /* Mac control registers */
 #define MTK_MAC_MCR(x)		(0x10100 + (x * 0x100))
 #define MAC_MCR_MAX_RX_1536	BIT(24)
-#define MAC_MCR_IPG_CFG		(BIT(18) | BIT(16))
+#define MAC_MCR_IPG_CFG		(BIT(18) | BIT(16) | BIT(12))
 #define MAC_MCR_FORCE_MODE	BIT(15)
 #define MAC_MCR_TX_EN		BIT(14)
 #define MAC_MCR_RX_EN		BIT(13)

itā€™s there, just more clearly defined. #define MAC_MCR_RX_FIFO_CLR_DIS

BTW, iā€™ve submitted this patch as RFC https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

will send it to net-next when it opens again

2 Likes