Is it possible to have the crypto extensions working?

I downloaded de kernel source code, … compiled it, then the cryptodev driver … compiled it … and openssl for testing … compiled it. Different details everywhere with different architectures … but anyway. It works :slight_smile: (a comment about this at the end).

Speed without cryptodev:

openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 3255382 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 910647 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 235052 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 59220 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 7419 aes-256-cbc's in 3.00s
OpenSSL 1.0.2g  1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr) 
compiler: cc -I. -I.. -I../include  -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM

The 'numbers' are in 1000s of bytes per second processed.

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-cbc      17362.04k    19427.14k    20057.77k    20213.76k    20258.82k

Speed with cryptodev:

openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 113991 aes-256-cbc's in 0.06s
Doing aes-256-cbc for 3s on 64 size blocks: 84804 aes-256-cbc's in 0.02s
Doing aes-256-cbc for 3s on 256 size blocks: 74743 aes-256-cbc's in 0.03s
Doing aes-256-cbc for 3s on 1024 size blocks: 74032 aes-256-cbc's in 0.01s
Doing aes-256-cbc for 3s on 8192 size blocks: 41348 aes-256-cbc's in 0.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 28113 aes-256-cbc's in 0.06s
OpenSSL 1.1.1-dev  xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) idea(int) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\""  -march=armv7-a -Wa,--noexecstack

The 'numbers' are in 1000s of bytes per second processed.

type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-256-cbc      30397.60k   271372.80k   637806.93k  7580876.80k         infk  7676723.20k

Now the comment.

I used all my day doing this, downloading, learning, checking, testing. In this case, MTK kernel updates are not so regular and they lack these things. How BPI will help the users to use this extremely important feature? The problem is not only because of the work, but also because the different packages are being updated (without this feature) and there is no clear method to assemble everything together to use the R2 to its real potential.

And this is important, because without this extension to use the crypto driver, it is impossible to have a well behaved secured channel (for example, with SSH). Simply, the ARM cores are not powerful enough.

Try command: openssl speed -evp aes-128-cbc -engine cryptodev.

Then cat /proc/interrupts to observe HW crypto engine behavior

I don’t have cryptodev as an engine … the openssl I am using for these tests was compiled to use cryptodev directly.

Numbers (with cryptodev)

As the interrupts are accumulative, these are the numbers before and after the speed command:

           CPU0       CPU1       CPU2       CPU3       
 16:   18380457  116330246  184033929  142284130       GIC  29 Edge      arch_timer
 71:    6110223          0          0          0  MT_SYSIRQ  82 Level     mtk-aes
 72:          0          0          0          0  MT_SYSIRQ  83 Level     mtk-aes
 73:          3          0          0          0  MT_SYSIRQ  84 Level     mtk-sha
 74:          0          0          0          0  MT_SYSIRQ  91 Level     mtk-sha
           CPU0       CPU1       CPU2       CPU3       
 16:   18382717  116344862  184048824  142306755       GIC  29 Edge      arch_timer
 71:    6504905          0          0          0  MT_SYSIRQ  82 Level     mtk-aes
 72:          0          0          0          0  MT_SYSIRQ  83 Level     mtk-aes
 73:          3          0          0          0  MT_SYSIRQ  84 Level     mtk-sha
 74:          0          0          0          0  MT_SYSIRQ  91 Level     mtk-sha

To check this, I repeated the test with the regular openssl (no cryptodev)

CPU before test

 16:   18385875  116407765  184154131  142381108       GIC  29 Edge      arch_timer

CPU after test

 16:   18387585  116412097  184158991  142385465       GIC  29 Edge      arch_timer

mtk-aes didn’t change.

First run (with cryptodev)
Total change for mtk-aes: 394682
Total change CPU : 2260 + 14616 + 14895 + 22625 = 54396

Second run (without cryptodev)
Total change for mtk-aes: 0
Total change CPU : 1710 + 4332 + 4860 + 4357 = 15259

This follows another test I did where the version using cryptodev performed terribly bad with blocks smaller than 1024 and then jumped to the sky (16K blocks size around 6 times faster than using CPU). Could be because of the punishment when triggering so many interrupts for small packages (my opinion).

Yes, currently the small chunk is the bottleneck. BTW, If you want to do some multiple pairs throughput test (e.g. IPSec, MACSec), you can route these interrupts to different CPU (so called smp affinity) since we use some rings for parallel processing.

I didn’t notice before that the crypto driver it is only using the first CPU, no matter if it has F as the affinity value (use all CPUs). So, if I include “other” technology on a different CPU, the cryptographic hardware will work in parallel with the first one?

If that is the case, then to use IPSec will not interfere with openssl … am I right? … a crazy idea … could be possible to add a copy of cryptodev with different internal name affined to a different CPU?

With only cryptodev it is a little complicated to do other tests. But when on purpose I desynchronized my 4 tasks using the crypto and the networking, I was able to achieve around the same throughput even when I let two threads to wait for 20 seconds. In that case, that power capacity could be used by different concurrent tasks.

In general, the sequential speed can’t tell all the secrets a platform as the R2 has. When using it wisely, we can have surprises.

These DMA interrupts will cause overhead, I guess CPU0 is busy all the time, but others are idle. That’s why I suggest you try to bind IRQ 71- IRQ74 to other CPUs.

Please refer to https://cs.uwaterloo.ca/~brecht/servers/apic/SMP-affinity.txt.

echo 8 > /proc/irq/71/smp_affinity …

Besides, you can use something like “RPS” for network testing.

echo 4 > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 8 > /sys/class/net/eth1/queues/rx-0/rps_cpus

I have some test reports.

2 Likes

I have been playing with the machine for a while … and there is something related with the encryption that seems not to work correctly.

When using cryptsetup / LUKS, it is necessary to specify aes as the cipher, and then the operating system uses the hardware extensions to accelerate the processing. This is very good.

However, this not always works. After rebooting, the machine no longer can mount the partitions, neither can work making new ones. The behavior is like if the SoC enter some incoherent state and can’t work reliably anymore. In fact, the only way is to send some type of “electroshock”, by erasing and recreating the physical partition … something that obviously is not a valid solution. This is happening with the latest Ubuntu from november. As a reference, I executed the same commands without indicating the cipher, so cryptsetup is using the default software based functionality, and the result is a reliable encryption before and after rebooting the computer.

merged current cryptodev to main-branch of my 4.14 kernel on github

Where exactly must be cryptodev copied to? Can i use cyptodev without compiling openssl? How about the engine-param…how to use it?

MT7621 used MTK CryptoEngine ‘Proprietary’ code. https://forum.mqmaker.com/t/is-crypto-engine-enabled-in-witi-firmware/241. Does MT7623 have similar code to get better performance?

The performance/code quality of MT7623 upstream crypto driver is better than MT7621 proprietary driver, so there is no reason to request such driver for mt7623, thanks.

is it enough how i added cryptodev to my github or is anything missing?

We can use cryptodev module to create /dev/crypto and openssl command to test performance.

with hw crypto:

> root@LEDE:/# openssl speed -evp aes-128-cbc
>     Doing aes-128-cbc for 3s on 16 size blocks: 103906 aes-128-cbc's in 0.07s
>     Doing aes-128-cbc for 3s on 64 size blocks: 103955 aes-128-cbc's in 0.09s
>     Doing aes-128-cbc for 3s on 256 size blocks: 103742 aes-128-cbc's in 0.11s
>     Doing aes-128-cbc for 3s on 1024 size blocks: 85149 aes-128-cbc's in 0.09s
>     Doing aes-128-cbc for 3s on 8192 size blocks: 51995 aes-128-cbc's in 0.05s

without hw crypto:

root@LEDE:/# openssl speed -evp aes-128-cbc
    Doing aes-128-cbc for 3s on 16 size blocks: 3140090 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 64 size blocks: 930010 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 256 size blocks: 244271 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 1024 size blocks: 61847 aes-128-cbc's in 3.00s
    Doing aes-128-cbc for 3s on 8192 size blocks: 7761 aes-128-cbc's in 3.00s

finally, if you can see many interrupts after testing, the hw crypto works.

root@LEDE:/# cat /proc/interrupts
               CPU0       CPU1       CPU2       CPU3
     ................
     61:     448747          0              0             0             MT_SYSIRQ  82 Level     mtk-aes
     62:          0               0              0             0             MT_SYSIRQ  83 Level     mtk-aes
     63:         32              0              0             0             MT_SYSIRQ  84 Level     mtk-sha

image

1 Like

I checked http://cryptodev-linux.org/ and found it is a general driver and not optimized for specific SOC. I am curiosity how it make better performance than mainstream kernel.

AF-ALG and Cryptodev have access to the user space cryptography exposed Kernel primitives.

Then, when using these APIs, we are calling the Kernel. If the Kernel have access to the hardware cryptography elements, then, such APIs will use the cryptography hardware. If the Kernel have no access or the hardware have no cryptography hardware capacity, software will be used instead.

So, will be faster to use Cryptodev or AF-ALG that pure cryptography software as BOTAN. And the time difference can be huge.

Sorry for the late reply.

How did you deal with this problem - “After rebooting, the machine no longer can mount the partitions”?

I don’t have the same result using kernel 4.9 with cryptodev module load or not doesn’t affect the result. Is there something i am missing?

crypto device:

# ll /dev/crypto 
crw-rw-rw- 1 root root 10, 58 Mar 30 14:14 /dev/crypto

openssl command:

# openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 3146569 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 1097400 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 305013 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 78430 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 8192 size blocks: 9888 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 4944 aes-128-cbc's in 3.00s
OpenSSL 1.1.0f  25 May 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr) 
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/lib/ssl\"" -DENGINESDIR="\"/usr/lib/arm-linux-gnueabihf/engines-1.1\"" 
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
aes-128-cbc      16781.70k    23411.20k    26027.78k    26860.31k    27000.83k    27000.83k

you can see the mtk interrupts:

# cat /proc/interrupts | grep mtk
....
214:          0          0          0          0  MT_SYSIRQ  82 Level     mtk-aes
215:          0          0          0          0  MT_SYSIRQ  83 Level     mtk-aes
216:          0          0          0          0  MT_SYSIRQ  84 Level     mtk-sha
217:          0          0          0          0  MT_SYSIRQ  91 Level     mtk-sha

I guess your openssl didn’t enable Crypto acceleration support”. you can refer to below link, thanks.

https://wiki.openwrt.org/doc/hardware/cryptographic.hardware.accelerators

openssl speed -evp aes-128-cbc -engine cryptodev

http://processors.wiki.ti.com/index.php/Cryptography_Users_Guide#Using_Cryptographic_Hardware_Accelerators

Hi, Ryder.Lee … I really desisted of using the hardware crypto extensions for LUKS in this case, as it was unstable. For that I only use the pure software option. Encryption it is a very sensitive thing; if your endianess it is different, or if there is a bit lost here or there or whatever, then everything it is different. The OpenSSL was OK.

Right now I am testing other things with AF_ALG, but with a Banana M2+ (no LUKS yet), and it works well in both directions. Maybe I could try that with the R2 later, to see how it works.

I used Debian 9 armv7l, with Kernel 4.9 patched from openwrt. I have the “<*> Mediatek Random Number Generator support” include in the kernel config as you can see wit the interrupts. You right i don’t have “ARM Accelerated Cryptographic Algorithms”, which one should i enable, all except ARMv8?

  • SHA1 digest algorithm (ARM-asm)
  • SHA1 digest algorithm (ARM NEON)
  • SHA1 digest algorithm (ARM v8 Crypto Extensions)
  • SHA-224/256 digest algorithm (ARM v8 Crypto Extensions)
  • SHA-224/256 digest algorithm (ARM-asm and NEON)
  • SHA-384/512 digest algorithm (ARM-asm and NEON)
  • AES cipher algorithms (ARM-asm)
  • Bit sliced AES using NEON instructions
  • Accelerated AES using ARMv8 Crypto Extensions
  • PMULL-accelerated GHASH using ARMv8 Crypto Extensions

I use Kernel 4.9 as it is the only kernel with all feature, eg: VLAN, DSA, 2GMAC, HWNAT.