Mali-450 support by lima

Ok - it worked successfully for off-screen rendering - but to render directly to the screen requires modifications to mesa - specifically a DRI shim for the mediatek HDMI output (basically allows rendering on Lima (/dev/dri/card1) but with the final output written to an area of graphics memory using dmabuf.

After discussions on #dri-devel and #lima on freenode IRC - I have successfully written the shim and the R2 can now render directly to the screen :slight_smile:

I’m cleaning up my code now - making sure everything is compliant and any uneeded testing is reverted.

Then I’ll be updating my repositories with both the 4.20 kernel driver and the modified mesa with the Mediatek DRI shim driver - I’ll keep you posted.

2 Likes

How far are you with code-cleaning? We want to see how lima works :slight_smile:

Should be tomorrow :slight_smile: sorry for the delay - been busy, might have a surprise addition too :wink:

any news here? can you please upload to your github and make a pull request?

@JohnnyWednesday can you upload your code for testing?

Hiya - sorry for disappearing - I’ve had health issues but resolved now :slight_smile:

I’m going to port and test my changes for lima to your latest kernel revisions - there’s additions to XOrg as well as the lima kernel driver. It was quite the pain because I needed to submit my additions up the lima project trees on gitlab and to your trees on github - sorting through it all was quite a nightmare given that ultimately only half a dozen files were touched or added.

lima performance wasn’t great - perhaps there were DTS related things / clock related things - perhaps it’s lima being too incomplete.

So my main focus was taking what I’d learned from the lima changes and getting some official mali blobs working on the R2 - for this I was using the sunxi version of the mali kernel driver along with their xorg code.

I’m confident I can get this working given that all the prerequisites have been proven with a working lima setup - and indeed I’ve gotten the kernel part working and detecting the mali450 - there are minor errors, the reported clock frequency was way too low (DTS? same reason lima is slow?) - but the module loaded just fine.

next will be the xorg driver part - there are caveats - given the FB device and the mali device are different nodes, you need to create a kind of stub driver which essentially blits from one part of memory to the FB display memory - the setup for this stub and the functions makes use of are already proven with the lima code.

I need a few days to hack away at this - if I fail I will return to the lima driver and focus on performance.

lima is incomplete of course - BUT there’s enough functionality working that if you take care to only use working features? it’s useful. IE lima is perfectly capable of texture mapping a couple of triangles with filtering - so hardware rescaling for video / emulators is definitely in sight :smiley:

I’ve already written a kind of emulator/media front end called ‘BananaBox’ that brings all the best ARM optimized code that exists into one place, with the fast NEON rescaling code it works great for many things - but for emulating more powerful systems perfectly? I need the GPU to take some load off the CPU.

You start writing a front end then a few months later you’re knee deep in the kernel. Don’t you just love programming? :wink:

did you enable iommu in this case?

According to this post https://www.phoronix.com/scan.php?page=news_item&px=Lima-DRM-Ready-For-Linux-5.2 it seems that the Lima driver for Arm Mali 400/450 graphics processors will be included in the 5.2 linux kernel.

it seems to be this commit:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.2-rc4&id=a1d2a6339961efc078208dc3b2f006e9e9a8e119

needs to be added to dts too…maybe anyone with more knowledge about lima can try this on base of my 5.2-hdmi tree

https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.2-DRM-Update is maybe also interesting, link to merge for any troubles cause by it

Anyone tried getting lima on 5.2 working?

It works fine - you just need to add the mali DTS nodes, compile lima as a module and then you need the latest MESA source, add a stub for Mediatek - compile and install.

KMSCube as well as X11 egl tests work fine - however the framerate is very low.

I’ve also successfully compiled the SunXI mali kernel modules but none of the available userspace binary blobs will work. The interrupts exposed by the mali DTS nodes differ from other MP4 nodes on other boards, the order is different and the expected PMU is not present. Modifying the kernel driver to skip over the pmu allows it to function - but the userspace blobs do not.

It would appear that the vendor specific blobs that are in the wild? expect a different configuration.

With Lima - I’m currently trying to figure out how to solve the low framerate (around 5fps with KMSCube at 1080p) - given performance is apparently fine on other boards? the bottleneck likely lies with the mediatek DRM / framebuffer implementation.

But yes, with your 5.2-hdmi branch? only the Mali DTS nodes need adding to the kernel (and the lima module compiled - the FB enabled) - the greater part of the configuration is compiling MESA.

When I got it working with 4.19, I had to do extensive work with MESA in order to create a stub for mediatek that would allow rendering to the framebuffer.

But the latest MESA uses some form of generic access now (kmsro) so generating a new stub is a simple as copying and modifying a couple of config lines in order for the compilation to generate the needed dri/mediatek_dri.so

(src/gallium/targets/dri/target.c and src/gallium/targets/dri/meson.build - maybe somewhere else I’ve forgotten)

Here is the meson command I use for cross compiling MESA :

meson build/ \ --cross-file=/home/johnny/R2/mesa-lima.cross.meson \ --prefix=/usr/local \ --libdir=lib/arm-linux-gnueabihf \ -Dplatforms=x11,drm,surfaceless \ -Dgallium-drivers=lima,kmsro \ -Ddri-drivers=swrast \ -Dgles2=true -Degl=true -Dgbm=true \ -Dglx=auto \ -Dllvm=false \ -Dlibunwind=false "$@"

So yep - all working - just trying to figure out why it’s so slow!

Here’s the DTS nodes as per the patches originally released by ryder lee? can’t remember where I got them from :

g3dsys: clock-controller@13000000 {
	compatible = "mediatek,mt7623-g3dsys",
		     "mediatek,mt2701-g3dsys",
		     "syscon";
	reg = <0 0x13000000 0 0x200>;
	#clock-cells = <1>;
	#reset-cells = <1>;
};

larb3: larb@13010000 {
	compatible = "mediatek,mt7623-smi-larb",
		     "mediatek,mt2701-smi-larb";
	reg = <0 0x13010000 0 0x1000>;
	mediatek,smi = <&smi_common>;
	mediatek,larb-id = <3>;
	clocks = <&clk26m>, <&clk26m>;
	clock-names = "apb", "smi";
	power-domains = <&scpsys MT2701_POWER_DOMAIN_MFG>;
};

mali: gpu@13040000 {
	compatible = "mediatek,mt7623-mali", "arm,mali-450";
    	reg = <0 0x13040000 0 0x30000>;
	interrupts = <GIC_SPI 170 IRQ_TYPE_LEVEL_LOW>,
		     <GIC_SPI 171 IRQ_TYPE_LEVEL_LOW>,
		     <GIC_SPI 172 IRQ_TYPE_LEVEL_LOW>,
		     <GIC_SPI 173 IRQ_TYPE_LEVEL_LOW>,
		     <GIC_SPI 174 IRQ_TYPE_LEVEL_LOW>,
		     <GIC_SPI 175 IRQ_TYPE_LEVEL_LOW>,
		     <GIC_SPI 176 IRQ_TYPE_LEVEL_LOW>,
		     <GIC_SPI 177 IRQ_TYPE_LEVEL_LOW>,
		     <GIC_SPI 180 IRQ_TYPE_LEVEL_LOW>;
	interrupt-names = "gp", "gpmmu", "pp0", "ppmmu0", "pp1", "ppmmu1", "pp2", "ppmmu2", "pp";
	clocks = <&topckgen CLK_TOP_MMPLL>, <&g3dsys CLK_G3DSYS_CORE>;
	clock-names = "bus", "core";
	power-domains = <&scpsys MT2701_POWER_DOMAIN_MFG>;
	mediatek,larb = <&larb3>;
	resets = <&g3dsys MT2701_G3DSYS_CORE_RST>;
};

I don’t know if this is correct. Other Mali450 boards list a “pmu” interrupt which is also required by the official mali kernel driver. Note the SPI ‘address’ goes from 177 to 180 - I’ve seen other MP4 boards list “pp3” and “ppmmu3” interrupts - which makes sense given the Mali 450 MP4 (which is apparently on the R2) is supposed to have 8 pp units and there’s a gap of two in that list. Let’s be clear - I have no idea whatsoever if this entry is correct and I do suspect that it’s incomplete. I’ve been through all of the documentation I have including the datasheets but I couldn’t find anything pertaining to how the mali450 is implemented here. These DTS entries are all we have to go on right now.

also larb3 needs adding to iommu :

iommu: mmsys_iommu@10205000 {
	compatible = "mediatek,mt7623-m4u",
		     "mediatek,mt2701-m4u";
	reg = <0 0x10205000 0 0x1000>;
	interrupts = <GIC_SPI 106 IRQ_TYPE_LEVEL_LOW>;
	clocks = <&infracfg CLK_INFRA_M4U>;
	clock-names = "bclk";
	mediatek,larbs = <&larb0 &larb1 &larb2 &larb3>;
	#iommu-cells = <1>;
};

Yes, you should add <GIC_SPI 178 IRQ_TYPE_LEVEL_LOW>, <GIC_SPI 179 IRQ_TYPE_LEVEL_LOW>,

so

mfg_irq_gp_b, mfg_irq_gpmmu_b, mfg_irq_pp0_b, mfg_irq_ppmmu0_b, mfg_irq_pp1_b, mfg_irq_ppmmu1_b, mfg_irq_pp2_b, mfg_irq_ppmmu2_b, mfg_irq_pp3_b, mfg_irq_ppmmu3_b, mfg_irq_pp_b,

Just note that there’s no need to add mediatek,larb = <&larb3>; for the latest kernel. https://lore.kernel.org/patchwork/cover/1086579/

and mali could also work even remove larb3 and iommu stuff

1 Like

@JohnnyWednesday can you try https://github.com/frank-w/BPI-R2-4.14/tree/5.3-lima ? i’ve got Patches from @ryder.lee for testing

can you please also check if fbcon is working (show console on hdmi while bootup) and if it works give me your settings (resolution, maybe hsync/vsync)?

i tested also with 1080p on my TV which does not work in console-mode…only x-server works on both resolutions

1 Like

Thank you very much for the additional information - I will make the changes and experiment

Will test ASAP - all I did was enable the fbcon in the kernel (as well as enabling lima as a module along with the above DTS changes) and it worked (to confirm - yes, the console on HDMI while booting up - I don’t automatically start an X server and have been testing KMSCube by launching it from the console as well)

If it’s also working with this specific branch? I will endevour to get as much information about the mode with these specific TVs (the TVs report 1080p - the size of the console characters screams 1080p) and report back.

EDIT :

framebuffer console resolution - TV reports 1080p 1920x1080 60Hz

fbset reports

mode "1920x1080"
geometry 1920 1080 1920 2160 32
timings 0 0 0 0 0 0 0
accel true
rgba 8/16,8/8,8/0,0/0
endmode

Here’s the output from dmesg running 5.2.0-rc1-bpi-r2-hdmi :

dmesg.txt (58.0 KB)

arg…missed that…

now i see a login-prompt…no messages while booting, but a prompt…had also a error on my cmdline for tftp (loaded last kernels with tftp) …

setenv bootargs "${netbootargs} ${console} ${graphic}"

in definition of variable bootnet (if anyone does the same as me)

IMG_20190727_110905_888

btw. i had not enabled lima yet…have done it now in 5.3-lima

1 Like

Oh dear - that’s my fault. Yeah it didn’t work the very first time I tried it and while pondering the issue I just happened to notice that it wasn’t enabled in the kernel. I didn’t think it was worth mentioning as I thought it had been turned off for a reason - sorry!

At least that’s one less problem to solve :wink:

adding pp3 and ppmmu3 to DTS did result in a very minor improvement to framerate (tested on 5.2 and 5.3) - removing larb3 and its references entirely works perfectly fine, performance is seemingly identical without it however.

Assuming that people don’t have an issue compiling MESA? it’s very trivial to modify and compile to generate the needed “dri/mediatek_dri.so” as part of the ‘kmsro’ feature of MESA.

I’ve seen discussions on the Lima GIT that suggest I should be seeing much better performance (framerates) that I do - es2gears_x11 gives me 18fps while kmscube is seemingly lower at about 7fps.

Given es2gears_x11 uses a small window (about 400x400) and kmscube was fullscreen (at 1080p) - that relative performance MIGHT be scaling with the size of the egl buffer - which is ultimately being transfered to our video framebuffer.

If the relative performance is a factor of buffer size? that would strongly hint at the bottleneck being related to the blitting/copying operation - or supporting functions.

Hopefully now more people can play around with it? we can improve performance.

Lima isn’t ‘complete’ yet - but many functions are - discussions on GIT suggest that MythTV and even Kodi will work with Lima.

From a programmers perspective? we’ll have a fast, real-time way to scale graphics - so it’ll enable rescaling video, emulators etc

Lima also has various ‘hooks’ into acceleration on Xorg - I’m not sure what form they take currently but I imagine that scaling and various color operations / compositing etc can at least benefit from Lima at this stage.

How does your dts differ from mine (5.3-lima)? i have also removed larb3, and have pp3/ppmmu3 in mali-node as interrupts…

Can you make a step-by step guide? Maybe with crosscompile? And how to register it with xorg