Have took a look into last comment of the pull request above?
I’m not sure if 6.12 uses same sram structure like 6.6 (inside eth node) or backport of the upstream version with dedicated sram node.
Have took a look into last comment of the pull request above?
I’m not sure if 6.12 uses same sram structure like 6.6 (inside eth node) or backport of the upstream version with dedicated sram node.
I was aware of these two patches. They are from MTK Feed and addressed the error in kernel 6.6 raised in post #1 of this thread.
As I said in my previous reply, whatever resolved the error in post #1 was not picked up by stock OpenWrt 24.10.x releases, though apparently Dangort himself was involved in the GitHub pull request. lol
I also had checked before these patches are not needed for kernel 6.12. So the stall riddle remains for kernel 6.12.
Any more ideas, folks?
I assume this is you acknowledge that crypto-safexecel doesn’t work for you on kernel 6.12 as well. Am I correct ?
That is my original question: does crypto-safexcel or/and crypto-eip drivers work on kernel 6.12. for anybody…
Also ping @moore
Just to clarify my concern and original question. In case, you’re confused or don’t have time go through minute irrelevant details of today’s discussion, then just forget them.
I compiled my own OpenWrt Snapshot (kernel 6.12) which comes with crypto-safexcel driver. Then I also ported crypto-eip driver from MTK Feed to this build.
I successfully had both (one at a time) loaded and initialised on kernel 6.12.55. However, when I “openssl speed” tested them through AF_ALG or CRYPTDEV interfaces. All stall.
Right now, I suspect it’s a driver issue. But of course open to user error i.e. me as always.
My question is on the readiness status of EIP197 drivers on kernel 6.12. Let’s focus and move discussion from here on.
well … openwrt aside … i don’t see the 6.17 dts with the safexcel entry as on openwrt. So i presume it hasn’t been upstreamed or @frank-w hasn’t updated the dts yet
If i know how it should look when working i can upstream it…but i do not know any working state for mainline
Thats why i asked for the changes in openwrt, i cannot follow them completely
Well, it’s really OFF TOPIC but since we’re here already. My advice for you is to work on OpenWrt’s kernel.
You don’t have to give up your own edition of the kernel. Not only that you’ll learn how to properly maintain a kernel tree, and improve on your own kernel three. You can effectively upstream more patches, and have more impact of your work.
The problem (as I see it) of OpenWrt isn’t the kernel which is solidly maintained.
Mainline changes will end up in openwrt sooner or later anyway.
This is like saying nothing burger ![]()
Seems you don’t get how vendor, OpenWrt and mainline Linux work. A vendor like MediaTek has their SDK based on a version of OpenWrt. That means vendor’s patches (major contributor of codes for their HW) are usually diffs w.r.t master branch of OpenWrt. But changes from MediaTek and OpenWrt always lag behind mainline Linux, usually a few versions behind.
Ideally Mediatek should try to upstream as much as possible. However, it’s a challenge because 1) both mediatek and openwrt lag behind the current mainline Linux. For example, now it’s 6.12, but mainline is already 6.18. 2) in order to upstream your change, your always have to work on the current mainline. 3) getting codes reviewed and merged in next mainline itself is a daunting task, and could drag on forever.
That’s why it seems so difficult to get changes mainlined. Not to mention, relatively complex changes drag on and on. Take a long time to get merged upstream.
Also, since MediaTek is based on OpenWrt. There is little reason that you spend time massaging outside OpenWrt on a non-current version of Linux kernel, and hope that you can help to upstream MediaTek’s changes.
Working directly on OpenWrt cut through lots of unnecessary work in upstreaming the vendor’s code. It streamlines down to two steps:
See how more effective and impactful would frank-w become ?
Now you see why i work on mainline and not openwrt. Openwrt is more interesting for users as features land earlier there based on last or previous lts. But uppstreaming process depends on last rc1 or -next and there is much work in fixing patches on mainline.
TBH i find the structure in openwrt a hell with all the patches instead of a linear git history. I understand why it is done (e.g. different vendors contain conflicting patches),but imho this is broken by design. Mainline requires some more generic and vendor specific adaption and so it takes more time to get in.
The openwrt people do great work as preparation for devs like me or daniel who upstreaming things,but it is still additional work to collect related parts and upport to recent version, handling reviews etc.
But yes it would be great if mtk put more work in upstreaming ![]()
Btw. Mainline is 6.18-rc4 and a bunch of patches are in next too for 6.19. feature patches are placed in next based on last rc1. Only fixes are applied to current torvalds/master till merge window (2 weeks between a stable released till next rc1).
Eric already upstreamed some things (e.g. the realtek phy for 2.5g sfp), so he has some experience with mainline…but maybe not that deep in openwrt similar to me. And upstreaming results in less patches needed to be added and upported in next openwrt release. But it is more work to upstream than getting into openwrt…but also from security perspective: patches are not updated if security flaws were found there, or did they? this will only work in mainline where more people look in code.
But back to topic…if there something in openwrt related to crypto tell us the needed parts and how to test correctly and we can try upstreaming it.
dts-node seems still wrong for mt7988:
or the binding has to be changed, as this requires 6 interrupts currently (+eip +mem).
yes but it is not just a matter of dts - openwrt has patched the safexcel code as it wasn’t detecting the crypto module in spite of the dts change
thanks for the 6.12 patches…had only found the 6.6, but they applied to 6.18-rc1
does it work for you? how did you test it?
made a quick test with 6.12 patches on 6.18-rc1, but not sure if hw-acceleration is really active as i do not see any dmesg logs.
root@bpi-r4-v11:~
# dmesg | grep crypto
[ 0.000000] Linux version 6.18.0-rc1-bpi-r4-crypto (frank@frank-u24) (aarch64-linux-gnu-gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #1 SMP Mon No5
[ 5.925051] systemd[1]: kmod-static-nodes.service - Create List of Static Device Nodes was skipped because of an unmet condition check (ConditionFileNotEmpty=/lib/modules/6.18.0-rc1-b.
root@bpi-r4-v11:~
# openssl speed -evp aes128 -elapsed
You have chosen to measure elapsed time instead of user CPU time.
Doing AES-128-CBC for 3s on 16 size blocks: 46652751 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 64 size blocks: 30687788 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 256 size blocks: 12992738 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 1024 size blocks: 3867389 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 8192 size blocks: 521652 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 16384 size blocks: 262087 AES-128-CBC's in 3.00s
version: 3.0.15
built on: Sun Oct 27 14:16:28 2024 UTC
options: bn(64,64)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -fzero-call-used-regs=used-gpr -DOPENSSL_TLS_SECURITY_LEVEL=2 -Wa,--noexecstack -g -O2 -ffile-prefix-map=/build/reproducible-path/ope2
CPUINFO: OPENSSL_armcap=0xbd
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-128-CBC 248814.67k 654672.81k 1108713.64k 1320068.78k 1424457.73k 1431344.47k
root@bpi-r4-v11:~
for me it looks like not probed, but i have driver builtin
# zgrep -i 'safexcel' /proc/config.gz
CONFIG_CRYPTO_DEV_SAFEXCEL=y
but i have no entried for crypto in /proc/interrupts
safexcel_probe_generic returns -19 (no such device)
it exits here (version read before is 0):
/* So it wasn't an EIP97 ... maybe it's an EIP197? */
version = readl(priv->base + EIP197_HIA_AIC_BASE +
EIP197_HIA_VERSION);
if (EIP197_REG_LO16(version) == EIP197_HIA_VERSION_LE) {
priv->hwconfig.hiaver = EIP197_VERSION_MASK(version);
priv->flags |= SAFEXCEL_HW_EIP197;
} else if (EIP197_REG_HI16(version) ==
EIP197_HIA_VERSION_BE) {
/* read back byte-swapped, so complement swap bits */
mask = EIP197_MST_CTRL_BYTE_SWAP_BITS;
priv->hwconfig.hiaver = EIP197_VERSION_SWAP(version);
priv->flags |= SAFEXCEL_HW_EIP197;
} else {
dev_err(dev, "DBG %s:%d\n",__func__,__LINE__);
return -ENODEV; // <<<<<<<<<<<<
}
Your numbers don’t look like EIP197 accelerated (?). How much CPU utilisation you saw when running above test?
Here is my CPU accelerated numbers (as EIP197 didn’t work for me):
# openssl speed -evp aes128 -elapsed
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-128-CBC 347630.23k 868588.22k 1376191.15k 1597686.78k 1695727.62k 1702816.43k
Yes it’s not accelerated as probe fails silently with ENoDev because of version is 0.
so here are my changes
diff --git a/arch/arm64/boot/dts/mediatek/mt7988a.dtsi b/arch/arm64/boot/dts/mediatek/mt7988a.dtsi
index 6a2e2e265508..e5b1cc5e2765 100644
--- a/arch/arm64/boot/dts/mediatek/mt7988a.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt7988a.dtsi
@@ -374,6 +374,20 @@ serial2: serial@11000200 {
status = "disabled";
};
+ crypto: crypto@15600000 {
+ compatible = "inside-secure,safexcel-eip197b",
+ "security-ip-197-srv";
+ reg = <0 0x15600000 0 0x180000>;
+ interrupts = <GIC_SPI 213 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 214 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 215 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 216 IRQ_TYPE_LEVEL_HIGH>,
+ <GIC_SPI 217 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-names = "global", "ring0", "ring1", "ring2", "ring3";
+ status = "okay";
+ eth = <ð>;
+ };
+
i2c0: i2c@11003000 {
compatible = "mediatek,mt7981-i2c";
reg = <0 0x11003000 0 0x1000>,
diff --git a/drivers/crypto/inside-secure/safexcel.c b/drivers/crypto/inside-secure/safexcel.c
index c3b2b22934b7..b7c3321a7177 100644
--- a/drivers/crypto/inside-secure/safexcel.c
+++ b/drivers/crypto/inside-secure/safexcel.c
@@ -608,7 +608,14 @@ static int safexcel_hw_init(struct safexcel_crypto_priv *priv)
val |= EIP197_MST_CTRL_TX_MAX_CMD(5);
writel(val, EIP197_HIA_AIC(priv) + EIP197_HIA_MST_CTRL);
}
-
+ /*
+ * Set maximum number of TX commands to 2^4 = 16 for EIP97 HW2.1/HW2.3
+ */
+ else {
+ val = 0;
+ val |= EIP97_MST_CTRL_TX_MAX_CMD(4);
+ writel(val, EIP197_HIA_AIC(priv) + EIP197_HIA_MST_CTRL);
+ }
/* Configure wr/rd cache values */
writel(EIP197_MST_CTRL_RD_CACHE(RD_CACHE_4BITS) |
EIP197_MST_CTRL_WD_CACHE(WR_CACHE_4BITS),
diff --git a/drivers/crypto/inside-secure/safexcel.h b/drivers/crypto/inside-secure/safexcel.h
index 0f27367a85fa..64179265fa99 100644
--- a/drivers/crypto/inside-secure/safexcel.h
+++ b/drivers/crypto/inside-secure/safexcel.h
@@ -315,6 +315,7 @@
#define EIP197_MST_CTRL_RD_CACHE(n) (((n) & 0xf) << 0)
#define EIP197_MST_CTRL_WD_CACHE(n) (((n) & 0xf) << 4)
#define EIP197_MST_CTRL_TX_MAX_CMD(n) (((n) & 0xf) << 20)
+#define EIP97_MST_CTRL_TX_MAX_CMD(n) (((n) & 0xf) << 4)
#define EIP197_MST_CTRL_BYTE_SWAP BIT(24)
#define EIP197_MST_CTRL_NO_BYTE_SWAP BIT(25)
#define EIP197_MST_CTRL_BYTE_SWAP_BITS GENMASK(25, 24)
@@ -741,6 +742,7 @@ struct safexcel_priv_data {
/* Priority we use for advertising our algorithms */
#define SAFEXCEL_CRA_PRIORITY 300
+#define SYSTEM_CACHELINE_SIZE 64
/* SM3 digest result for zero length message */
#define EIP197_SM3_ZEROM_HASH "\x1A\xB2\x1D\x83\x55\xCF\xA1\x7F" \
diff --git a/drivers/crypto/inside-secure/safexcel_hash.c b/drivers/crypto/inside-secure/safexcel_hash.c
index ef0ba4832928..e5a55807260a 100644
--- a/drivers/crypto/inside-secure/safexcel_hash.c
+++ b/drivers/crypto/inside-secure/safexcel_hash.c
@@ -55,9 +55,10 @@ struct safexcel_ahash_req {
u8 block_sz; /* block size, only set once */
u8 digest_sz; /* output digest size, only set once */
__le32 state[SHA3_512_BLOCK_SIZE /
- sizeof(__le32)] __aligned(sizeof(__le32));
+ sizeof(__le32)] __aligned(SYSTEM_CACHELINE_SIZE);
+
- u64 len;
+ u64 len __aligned(SYSTEM_CACHELINE_SIZE);
u64 processed;
u8 cache[HASH_CACHE_SIZE] __aligned(sizeof(u32));
cat /proc/crypto shows
just a sample
driver : safexcel-rfc4309-ccm-aes
module : crypto_safexcel
driver : safexcel-rfc4543-gcm-aes
driver : safexcel-rfc4106-gcm-aes
driver : safexcel-authenc-hmac-sha384-cbc-des
module : crypto_safexcel
driver : safexcel-authenc-hmac-sha512-cbc-des
module : crypto_safexcel
driver : safexcel-authenc-hmac-sha224-cbc-des
module : crypto_safexcel
driver : safexcel-authenc-hmac-sha256-cbc-des
module : crypto_safexcel
driver : safexcel-authenc-hmac-sha384-cbc-des3_ede
module : crypto_safexcel
driver : safexcel-authenc-hmac-sha512-cbc-des3_ede
module : crypto_safexcel
driver : safexcel-authenc-hmac-sha224-cbc-des3_ede
module : crypto_safexcel
driver : safexcel-authenc-hmac-sha256-cbc-des3_ede
module : crypto_safexcel
driver : safexcel-authenc-hmac-sha1-cbc-des
module : crypto_safexcel
driver : safexcel-cmac-aes
module : crypto_safexcel
driver : safexcel-xcbc-aes
module : crypto_safexcel
driver : safexcel-cbcmac-aes
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-128-CBC 200650.32k 532561.54k 913213.44k 1107125.93k 1186736.81k 1192831.66k
comparing with openwrt i have a /dev/crypto … unsure if this makes difference
note i am on 6.17
Just to double check… your numbers are by EIP197 accelerator ?
Performance looks disappointing. About if not lower than CPU.
Not sure…
These are my changes (reverted are the 6.6 patches then i applied the 6.12)
Currently at work so cannot do additional tests.
unsure if the engine is picking up the safexcel & don’t see any interrupts increasing
131: 0 0 0 0 GICv3 246 Level 15600000.crypto
132: 0 0 0 0 GICv3 247 Level 15600000.crypto
133: 0 0 0 0 GICv3 248 Level 15600000.crypto
134: 0 0 0 0 GICv3 249 Level 15600000.crypto
But you have at least probe working…do i need additional drivers than the savexcel? Wonder why version read is 0