The Press Release about the MT7623 indicated a “1Gbps VPN crypto engine”. Also the SoC information in Mediatek web site makes reference to “Crypto:500Mbps Crypto Suite”.
Are these capabilities available in the R2 machine?
Can them be used (if they are available) with the basic operating system or it is needed some type of license to enable them?
1) Downloadd it from here : http://cryptodev-linux.org/download.html
2) decompress it on host PC in which there is BPI R2 BSP
3) Run below command to compile it
cd cryptodev-linux-1.9
make KERNEL_DIR=/work/BPI-R2-bsp_bk/linux-mt/ CROSS_COMPILE=arm-linux-gnueabi- ARCH=arm
4) Copy the cryptodev.ko to board and load it
insmod cryptodev.ko
After this step, we will a device /dev/crypto
How to validate if it works
1) Decompresss cryptodev-linux-1.9.tar.gz to R2 board
2) Build the test tool
cd cryptodev-linux-1.9/tests
make (if it fails, please install libssl-dev first)
I downloaded de kernel source code, … compiled it, then the cryptodev driver … compiled it … and openssl for testing … compiled it. Different details everywhere with different architectures … but anyway. It works (a comment about this at the end).
Speed without cryptodev:
openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 3255382 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 64 size blocks: 910647 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 256 size blocks: 235052 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 1024 size blocks: 59220 aes-256-cbc's in 3.00s
Doing aes-256-cbc for 3s on 8192 size blocks: 7419 aes-256-cbc's in 3.00s
OpenSSL 1.0.2g 1 Mar 2016
built on: reproducible build, date unspecified
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) blowfish(ptr)
compiler: cc -I. -I.. -I../include -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-Bsymbolic-functions -Wl,-z,relro -Wa,--noexecstack -Wall -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
aes-256-cbc 17362.04k 19427.14k 20057.77k 20213.76k 20258.82k
Speed with cryptodev:
openssl speed -evp aes-256-cbc
Doing aes-256-cbc for 3s on 16 size blocks: 113991 aes-256-cbc's in 0.06s
Doing aes-256-cbc for 3s on 64 size blocks: 84804 aes-256-cbc's in 0.02s
Doing aes-256-cbc for 3s on 256 size blocks: 74743 aes-256-cbc's in 0.03s
Doing aes-256-cbc for 3s on 1024 size blocks: 74032 aes-256-cbc's in 0.01s
Doing aes-256-cbc for 3s on 8192 size blocks: 41348 aes-256-cbc's in 0.00s
Doing aes-256-cbc for 3s on 16384 size blocks: 28113 aes-256-cbc's in 0.06s
OpenSSL 1.1.1-dev xx XXX xxxx
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/local/ssl\"" -DENGINESDIR="\"/usr/local/lib/engines-1.1\"" -march=armv7-a -Wa,--noexecstack
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-256-cbc 30397.60k 271372.80k 637806.93k 7580876.80k infk 7676723.20k
Now the comment.
I used all my day doing this, downloading, learning, checking, testing. In this case, MTK kernel updates are not so regular and they lack these things. How BPI will help the users to use this extremely important feature? The problem is not only because of the work, but also because the different packages are being updated (without this feature) and there is no clear method to assemble everything together to use the R2 to its real potential.
And this is important, because without this extension to use the crypto driver, it is impossible to have a well behaved secured channel (for example, with SSH). Simply, the ARM cores are not powerful enough.
First run (with cryptodev)
Total change for mtk-aes: 394682
Total change CPU : 2260 + 14616 + 14895 + 22625 = 54396
Second run (without cryptodev)
Total change for mtk-aes: 0
Total change CPU : 1710 + 4332 + 4860 + 4357 = 15259
This follows another test I did where the version using cryptodev performed terribly bad with blocks smaller than 1024 and then jumped to the sky (16K blocks size around 6 times faster than using CPU). Could be because of the punishment when triggering so many interrupts for small packages (my opinion).
Yes, currently the small chunk is the bottleneck. BTW, If you want to do some multiple pairs throughput test (e.g. IPSec, MACSec), you can route these interrupts to different CPU (so called smp affinity) since we use some rings for parallel processing.
I didn’t notice before that the crypto driver it is only using the first CPU, no matter if it has F as the affinity value (use all CPUs). So, if I include “other” technology on a different CPU, the cryptographic hardware will work in parallel with the first one?
If that is the case, then to use IPSec will not interfere with openssl … am I right? … a crazy idea … could be possible to add a copy of cryptodev with different internal name affined to a different CPU?
With only cryptodev it is a little complicated to do other tests. But when on purpose I desynchronized my 4 tasks using the crypto and the networking, I was able to achieve around the same throughput even when I let two threads to wait for 20 seconds. In that case, that power capacity could be used by different concurrent tasks.
In general, the sequential speed can’t tell all the secrets a platform as the R2 has. When using it wisely, we can have surprises.
These DMA interrupts will cause overhead, I guess CPU0 is busy all the time, but others are idle. That’s why I suggest you try to bind IRQ 71- IRQ74 to other CPUs.
I have been playing with the machine for a while … and there is something related with the encryption that seems not to work correctly.
When using cryptsetup / LUKS, it is necessary to specify aes as the cipher, and then the operating system uses the hardware extensions to accelerate the processing. This is very good.
However, this not always works. After rebooting, the machine no longer can mount the partitions, neither can work making new ones. The behavior is like if the SoC enter some incoherent state and can’t work reliably anymore. In fact, the only way is to send some type of “electroshock”, by erasing and recreating the physical partition … something that obviously is not a valid solution. This is happening with the latest Ubuntu from november. As a reference, I executed the same commands without indicating the cipher, so cryptsetup is using the default software based functionality, and the result is a reliable encryption before and after rebooting the computer.
The performance/code quality of MT7623 upstream crypto driver is better than MT7621 proprietary driver, so there is no reason to request such driver for mt7623, thanks.
We can use cryptodev module to create /dev/crypto and openssl command to test performance.
with hw crypto:
> root@LEDE:/# openssl speed -evp aes-128-cbc
> Doing aes-128-cbc for 3s on 16 size blocks: 103906 aes-128-cbc's in 0.07s
> Doing aes-128-cbc for 3s on 64 size blocks: 103955 aes-128-cbc's in 0.09s
> Doing aes-128-cbc for 3s on 256 size blocks: 103742 aes-128-cbc's in 0.11s
> Doing aes-128-cbc for 3s on 1024 size blocks: 85149 aes-128-cbc's in 0.09s
> Doing aes-128-cbc for 3s on 8192 size blocks: 51995 aes-128-cbc's in 0.05s
without hw crypto:
root@LEDE:/# openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 3140090 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 930010 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 244271 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 61847 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 7761 aes-128-cbc's in 3.00s
finally, if you can see many interrupts after testing, the hw crypto works.
I checked http://cryptodev-linux.org/ and found it is a general driver and not optimized for specific SOC. I am curiosity how it make better performance than mainstream kernel.
AF-ALG and Cryptodev have access to the user space cryptography exposed Kernel primitives.
Then, when using these APIs, we are calling the Kernel. If the Kernel have access to the hardware cryptography elements, then, such APIs will use the cryptography hardware. If the Kernel have no access or the hardware have no cryptography hardware capacity, software will be used instead.
So, will be faster to use Cryptodev or AF-ALG that pure cryptography software as BOTAN. And the time difference can be huge.
# openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 3146569 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 1097400 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 305013 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 78430 aes-128-cbc's in 2.99s
Doing aes-128-cbc for 3s on 8192 size blocks: 9888 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 16384 size blocks: 4944 aes-128-cbc's in 3.00s
OpenSSL 1.1.0f 25 May 2017
built on: reproducible build, date unspecified
options:bn(64,32) rc4(char) des(long) aes(partial) blowfish(ptr)
compiler: gcc -DDSO_DLFCN -DHAVE_DLFCN_H -DNDEBUG -DOPENSSL_THREADS -DOPENSSL_NO_STATIC_ENGINE -DOPENSSL_PIC -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DAES_ASM -DBSAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DOPENSSLDIR="\"/usr/lib/ssl\"" -DENGINESDIR="\"/usr/lib/arm-linux-gnueabihf/engines-1.1\""
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 16781.70k 23411.20k 26027.78k 26860.31k 27000.83k 27000.83k