Faster DrawImage?

Started by Blainiac, September 08, 2018, 06:18:16

Previous topic - Next topic

Blainiac

Hey everyone!  I was able to fix my last problem thanks to col pointing me in the right direction, so thank you.

I have a DrawImage-intensive program that's drawing a few thousand images and the framerate is subpar.  Is there a way to improve the performance using a different driver/routine or some other trick?  Thanks!

Yellownakji

I think blitzmax can be multi-threaded?  Have you looked into that?

Derron

Threading won't help as all rendering needs to be done in the main thread.

Things for improvement:

Less draw calls
DrawImage = a draw call
DrawText("123456789") = 9 draw calls)
...


Less texture switches
With "a" and "b" being different images ("textures"):
DrawImage(a, 10,10) + DrawImage(b, 10, 20) + DrawImage(a, 10, 30)
leads to 3 texture switches (A then B and then A again)

Doing a
DrawImage(a, 10,10) + DrawImage(a, 10, 30) + DrawImage(b, 10, 20)
leads to only 2 switches (A and again A and then B)
Other engines call this "batching" and in BlitzMax you need to manually organize your stuff


----

So what could you do: use Sprite-Atlases (multiple sprites/"images" on one single bigger image). Means you use "DrawSubImageRect()" to draw the portion of the image. Assume you have all tiles of a jumpnrun level in one spritesheet. Drawing 1000 tiles now leads to 1000 draw calls - but only to the 1 texture switch to activate the spritesheet texture.
Texture switches are rather "expensive".

To reduce draw calls you might consider merging separate stuff into a single but new texture. Eg instead of rendering the glyphs of "1, 2, 3, 4, ... 9" in above's drawtext-command, you would render them _one_time_ on a new texture, which then contains "123456789" on a single image. You then just need to draw this single image. Reducing 9 draw calls to a single one (but adding a texture switch). Especially text blocks benefit from such "caching" (imagine you draw big tooltips with a lot of formatted text). When the text changes you invalidate the cache and create a new texture...
If you do not want to code your own software rendering stuff (draw pixel by pixel on a newly created pixmap) you could use "render to texture" - which our beloved user "Col" nicely wrapped into a convenient module/source-files. Others did similar stuff - for either GL _or_ DX. Col made it easier for us cross-platform users.

Col's code:
https://github.com/davecamp/Render2Texture

For stuff not moving separately from each other and not getting animated, such "r2t"-images will help to improve performance:
- texts
- sprites built from different component
- gui stuff...


bye
Ron

Blainiac

Ron,

Thank you for this valuable information, I will be playing around with some new ideas based on what you said and see if I can improve the performance.  I'll leave a longer response soon but for now I wanted to say thank you!  There is a lot there I didn't even know about.

-Blain

Rooster

@Derron
What's the technical reason that rendering needs to be done on the main thread?

col

#5
Quote@Derron
What's the technical reason that rendering needs to be done on the main thread?

If I may chime in
I did start writing some information as to why this is the case but it turned into a huge wall of text and not very good reading so I've cut it all down to this...

The rendering ( using the GL, Dx7, Dx9 and Dx11 APIs in BMax ) is done on the main thread because that's the thread used for UI message processing of which the driver needs to sync the process of presentation and rendering - Presentation is tightly coupled with the OS while rendering is done in the GPU.

Rendering doesn't HAVE to be on the main thread at all, (but literally everyones code does when using BMax). The drivers are expecting the API commands to all come from the SAME thread. Internally there is not much synchronisation (if any) done for the API calls. Rendering from different threads suffers from race conditions in the same way as the CPU suffers from them when multi-threading - there is only 1 GPU. You can come up with elaborate schemes yourself to allow rendering from different threads by using your own command buffers to store the commands and data. However those commands are still required to be sent to the GPU from the same 'submitting' thread. That's one of the main reasons (other than the political ones) that the newer APIs were brought to desktop - fine grained control over rendering from multiple threads.
https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

Rooster