Threaded Loader Test NG with animated Shaders

Started by Krischan, April 18, 2021, 15:57:43

Previous topic - Next topic

Krischan

I've experimented with Threads and BlitzmaxNG to speedup the loading process in my current project by using parallelized threads and Loading Animation using Shaders in OpenB3D.

The shaders are mostly taken from www.shadertoy.com (the link to the original shader is always mentioned in the shader file, License should be mostly CC 3.0, see Shadertoy terms) and adjusted to meet the Loading process, some are just for fun or to test the performance. I've tried to organize the fragment shaders into categories as there are currently 333 :o different loading shaders available, I've been very busy adding and testing them which made a lot of fun to do.

When you start the demo, the loader asks for a single fragment shader to load, either select one from the "shader" subfolder structure or press Escape to use the default one I've created for this demo. There are three different approaches available: basic, simple and advanced:

- Basic just loads in the background and animates the shader only
- Simple shows minimal info about the loading process (Percent and Bytes), too
- Advanced will only work in Windows as it uses Win32 extensions to get and show detailed infos about the system during the loading process.

And here is the demo, have fun. Be aware that A LOT OF MEMORY is being used during the loading process, so if you don't have at least 16 to 32GB of RAM quit the demo with Escape before it blows up your system ;D

If you have problems compiling the source, I've uploaded a ZIP with my portable lightweight BlitzmaxNG+BLIDE+OpenB3D development environment, this should work out of the box from any path, at least here 8) "Lightweight" means I've excluded Bruceys Modules in this ZIP as they're not needed yet for this demo to save space.

Download: LoaderTestNG.zip
Download: BlitzMaxNGlight.zip

EDIT: adding a custom Shadertoy shader is quite simple:

1. copy the file "shader/empty.frag" to myshader.frag
2. edit myshader.frag in a text editor
3. open a shadertoy shader, like for example https://www.shadertoy.com/view/NdsSDM
4. copy the shader contents between the two comment blocks in the middle of the file
5. in the last line, add: fragColor.rgb*=iFade; (just before the second comment block!)
6. optional: in the last line, add: fragColor.a=1.0; (if there is an issue with alpha shining through)
7. in the shader block, replace all "iTime" with "TIME" (or leave it, as it may run already)
8. save and test the shader
9. if something is not working, test the fragment shader with the included "glslangValidator" tool for errors
    (syntax: glslangValidator.exe myshader.frag)

Sometimes, there are Shadertoy vars like iDate or iFrame used. This is too complex to explain but the shader must be rewritten slightly to make use of the iTime var instead. Same thing is when iChannelResolution arrays are used. It is currently impossible to use multilayer shaders which require the output of another shader as an input. So if the shader has only one layer, it is possible to use it in this loader. Some shaders have a bad coding style and don't consider other resolutions or aspects than in the shadertoy window. Well, it depends on your GLSL knowledge then if you get it running.

Another case is when the shader has a "Common" section. Then just add the "Common" section direct after the first comment block but before the main shader, these are functions needed by the main shader and must appear before it (Example: https://www.shadertoy.com/view/ssfXD4)
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects

Pingus

That's impressive !
I did'nt know that openB3D was working that well.
I guess that there is no way to use 2D rendering and 3D rendering at the same time ?

Krischan

What do you mean with "there is no way to use 2D rendering and 3D rendering at the same time"? Actually you already see here in this demo  2D and 3D at the same time, the Loader is just a Sprite in front of the 3D Camera with the shader running as a texture attached to this sprite, only obscuring the background which is normal 3D. When the loading process is finished you can see that the sprite is fading away (getting transparent) and revealing the 3D environment behind it.

And during the whole loading process, you see 2D on 3D - the text is 2D. And yes, OpenB3D is very sophisticated and capable of many things, if you know how.
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects

Pingus

I thought that it was not possible to do a 2D drawing over a 3D rendering which obvioulsy is working ::)
Running your blide version allowed me to compile the samples.
In debug mode it is pretty unstable and quickly crash my PC. Standard compil works somehow but become unstable after few runs, but well, this is very interresting.
I wonder now if 3D can be rendered over the 2D. It would mean that 3D fx could enhance a 2D game which would be great.

Krischan

Well the debug mode crashes because the LoadTest function is not very advanced yet. There is only a simple "stumble" test if the mesh/tex/pixmap has been loaded or not but no test if the mesh is really available. That's why this works in the basic demo but can crash in simple/complex.

It looks like the crashes in debug mode happen when the mesh creation process is too slow there so the texture is being loaded faster than the mesh created - which nearly never happens in Release mode when running at full speed. I do not have much experience in threaded coding yet, but there is much more to consider than in single thread mode.

In my project, the textures will be applied after loading/creating the meshes, so I didn't put much effort into this timing issue. It's a test, not a polished product ;D
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects

col

Hiya,

I enjoy a bit of multi-threading hehe.

Upon initially looking at your LoadTest function you have a race conditions on the Global variables - especially bad is the iLoaded variable. At any unknown time any thread will call the statement to update that Globals - it will mess up big time when further down the function is assigning the loaded resource to the array slot. I'm sure there are more issues like this.

You have to think about what threads will do all at the same time, and also think that if something can happen then you can be sure than it WILL happen. Try to keep as many variables local to the threaded function so no other thread interferes with it, and guard global values so that only 1 thread can update them at a time - then it will all work harmoniously :)

https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

Krischan

Thanks for the hint, col. I'm still impressed how fast some AAA games can load their huge textures and environments today and compared to my old single-threaded serial loading approach which is taking ages I decided to dig deeper into this multithreaded loading stuff.

Ok, this demo loads the same texture over and over again so its cached but even with many different textures, multithreaded loading is much faster. So I guess when I separate the mesh creation process from the texture loading and combine them later when everything is created and loaded it can't interfere each other, right?
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects

col

#7
QuoteI'm still impressed how fast some AAA games can load their huge textures and environments today

Yep, not only are they multi-threaded with loading but they also stream geometry and textures in too (which definitely requires more than one thread to do :) ).

With pretty much most computers - even your phone - having multiple cores its such a waste of potential processing speed to not use them.

QuoteSo I guess when I separate the mesh creation process from the texture loading and combine them later when everything is created and loaded it can't interfere each other, right?
Once everything is loaded and you only have 1 thread accessing them then it's perfectly Ok.


Here are 2 simple rules to remember that will help keep you out of trouble:-

Resource :- Absolutely anything including all object instances and even simple integer/float/string variables.

READING a shared "resource" can be done safely without protection against race conditions from multiple threads as long as ALL threads are READING that "resource". If there a chance that any other thread will update or WRITE to that "resource" at the same time as a thread is READING from it then you need to protect the updating and reading so that only one thread can READ/WRITE to it at a single moment in time.

WRITING to a shared "resource" can be done safely without protection against race conditions as long as you can guarantee ONLY 1 THREAD is writing at a single moment. If there a chance that any other thread will WRITE to or READ that "resource" at the same time then you need to protect the updating and reading of that "resource" so that only one thread can READ/WRITE to it at a single moment in time.



Those rules are essentially the same except seen from the READING point of view and the WRITING point of view.

As an example in your Loading Function that is run by all threads at the same time. You have the 'iLoaded' variable that is modified and later used as an index. You have to remember that more than one thread is running that function at the same time - One thread will execute the 'iLoaded :+ 1' statement at the SAME time as another thread is executing the 'iPixmaps[iLoaded] =... ' statement (There is more to think of than this but I want to keep this simple to understand). What happens here is this:-

Thread1 reads iLoaded as '2', increments its value to '3', Thread1 goes to the next statement, meanwhile...
Thread2 reads iLoaded as '3' increments its value to '4', Thread2 goes to the next statement

Thread1 now loads a pixmap and puts the result into the array at 'iLoaded' index - iLoaded is 4 but it originally modified it to 3! = this is known as a 'race condition'.
Thread2 now loads a pixmap at the same time and puts the result into the array at the same index as Thread1 tries to! = data from 2 pixmaps could be mixed into 1.
At best you get scrambled intermixed data - worse is a crash.

The best way is to not use global variables in threaded functions, but you can't always have that luxury so one way to protect against this situation is to protect concurrent access to the globals with a Mutex.
Take for example your 'Global iLoaded':

Code (BlitzMax) Select

'Make a Global Mutex - Only one thread can lock a mutex at a time - even if you try to lock it at the same time from more than 1 thread, only one thread will succeed with locking it - all other threads will go to sleep and wait until the mutex is unlocked. The OS will wake ANY thread waiting to lock the mutex when the mutex is unlocked - any thread not succeeding in locking the mutex will go to sleep and wait again - you can see why this isn't the most efficient method but it will work.
Global iLoadedMutex = CreateMutex()

' Loading Function - called from each thread
Local Loaded:int
LockMutex(iLoadedMutex)       '  \         Use a mutex to prevent more than one thread running this small section of code
iLoaded :+ 1                  '   \        You can safely update the Global knowing only 1 thread can do so at a time now
Loaded = iLoaded              '   /        Keep a local copy of the global as no other thread can touch this Local
UnlockMutex(iLoadedMutex)     '  /         Unlock the mutex so that another thread can now run this proctected section of code

[...]
' Use the 'Local Loaded' variable as no other thread can modify a 'Local' in a threaded function.
iPixmap[Loaded] =...
iTexture[Loaded] =...
[...]


One of the worse things about multi-threading is that 'It works perfectly ok on my machine but crashes on another.' - this is just down to blind luck, good and bad :P
https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

Derron

I am not sure why Col posted it - so maybe I just skipped the part where someone asked for it - so I just chime in now and do as Col did :D

In my game (TVTower) I load alle ressources in an extra thread (hmm could spawn more ...). Each ressource gets an TRessourceItem (or similar) assigned which is placed in a queue. Ressource items could lead to more ressources to be loaded (so an XML ressource containing more XML links, or images or ...).
Each ressource could define to be an "immediate load" or not (default). Useful if you want to display a start screen as soon as possible etc.
All this loading is protected by lockmutex/unlockmutex.
Next to "update/write" to the ressources-queue you can also lock stuff run _after_ loading was done. Eg you could load a "script" or so which needs to execute and would create objects.

Ok, so I am mutexing as Col did propose. But I also mentioned the deferred loading. So ressources "non immediate" to load, just get appended to a queue and are processed once there is "time" to do it. This way I only ensure that all the GUI stuff is loaded, mouse cursor, specific sounds - so to display the start screen, with animation, language selection ... while the "game ressources" (needed in all "games", so nothing "scenario related") still are loaded in the extra thread.

If you set eg "xml files" to immediate load mode, you can calculate the amount of totally "to load" ressources and so are easily able to show a progress bar.

For BlitzMax it is very important to know, that you CAN NOT load TImages (LoadImage()) in threads ... and if it works somehow -  avoid it!. So all the images I load are actually loaded as TPixmap (so just into RAM, not to the GPU/VRAM). Then they are marked for further "processing" and then the main thread has to load them into TImage if required. The slow part of "LoadImage()" is decoding the image file into a pixmap, not loading it into an image.


TLDR: Yeah, use mutexes to protect the "data collections" in which you store your loaded information. Do not load TImage from within a thread. Think about "lazy loading" assets (load not all of them right on start - show game screen earlier).


bye
Ron

Krischan

#9
Whoah thanks guys. But now you have scared me a bit about multithreading. :o

I think I have to dig deeper into this subject but I won't use this excessively I think - only for loading. I've already experimented with my own streamlined texture format to combine my Diffuse/Normal/PBR texture into one file, too but the speed increase was not that much compared to TGA loading. I want to keep this "KISS", silly simple.

Another "trick" I've seen in AAA games is to load all textures in a low resolution first, determine which textures are initially close to the player at the start and replace them constantly while moving in the world with higher resolutions loaded in the background. This is a clever solution for unpatient players as the level can be loaded very fast then with only a small loss of detail which is barely noticeable.

Another question I couldn't solve yet is about micro stuttering in the shader animation when using more than one thread. I guess the processes need to run in some kind of exclusive realtime mode? But I don't know how to achieve this in NG yet.
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects

col

#10
QuoteAnother question I couldn't solve yet is about micro stuttering
One of the reasons the later APIs (Vulkan/D3D12/Metal) were created was to give the programmer some control to eliminate this but those APIs definitely fall outside your KISS.

I'd guess that the stuttering is caused by stalls because of uploading the texture - but it is just a guess.
Some of the older APIs will let you CREATE resources on different threads - you would have to look into those APIs details to know if you can do this. D3D9 onwards definitely does but you have to set a flag when creating the device instance to allow the functionality. I haven't a clue about OpenGL so maybe someone with better GL experience could chip in and let us know?

QuoteBut now you have scared me a bit about multithreading
There's nothing to be scared of at all. Just requires different thinking that's all. Go on... you can do it, you've already got the threads up and running, you just need to sync everything up  8)
https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

Krischan

Quote from: col on April 20, 2021, 11:09:03I'd guess that the stuttering is caused by stalls because of uploading the texture - but it is just a guess.

I haven't seen this stuttering by just creating meshes without loading textures or pixmaps so something interferes during the loading process (it happens on TPixmap loadings, too without loading a texture). In complex shaders it could happen too as sometimes such a shader takes too much GPU power and causes the stuttering itself (but I wouldn't recommend complex shaders for a loading process animation, I've included them just for fun)

Well, it's not that important, but it's annoying :D
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects

Derron

do you load the TPixmap in the additional threads or in the main thread? As said TPixmap-loading can be done in extra spawned threads without much affecting the main thread then.
But make sure you do not create 20 threads simultaneously trying stuff - this will lead to hickups and "stalls" on eg. (real ) Windows. So creating a thread is not a "cost free" operation in BlitzMax (with GC etc). So this is why brl.mod has a threadpool submodule which you can create and assign tasks to.


bye
Ron

Krischan

#13
The Pixmaps are loaded in the additional threads only. And I use a maximum of 4 to 8 threads but I noticed that the speed increase is not linear. 2 is already better than single threaded, 4 is optimal while 8 gives only a little more speed on the cost that "stumbling" appears.

I'm using a TWorkQueue here I've found in the old blitzbasic archives ;D But I didn't really understand how it works, but it works. I've seen something similar in the NG mods but couldn't get it running there.

EDIT: here on my rig I have 8 "real" cores (and only real cores) and according to the task manager they are all kept busy very well, so CPU multithreading is working good. The SSD is a NVMe with 4 PCIe Lanes, which has a constant sequential read speed of about 1.5GBytes/sec., Memory is DDR4 with 3.6GHz (DC) and the GPU is a RTX2080 so the hardware is more than sufficient - that's why I wonder about these microstutters. I've experimented with a very small delay in the TWorkQueue but it has no effect. But I think it is a software issue.
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects

Derron

https://github.com/bmx-ng/brl.mod/blob/master/threadpool.mod/examples/example_01.bmx

-> TTask is something you could extend and configure it so it knows what to load (so eg. the url). In "TLoadTask.run()" you would then LoadPixmap, LoadSound, ...

But it would need more to get what I described above (identifying "delayable/deferred" loading). Yet it should work for a multithreaded ressource loading.


bye
Ron