OpenB3D Shader Cache

Started by Krischan, February 28, 2018, 17:38:35

Previous topic - Next topic

Krischan

I found an issue which might be not a bug of OpenB3D but I'd like to improve this. In my game project I have a map which consists of a floor/wall/ceiling geometry and a lot of doors (50+) which use an alpha texture. Because of the alpha flag and to avoid Z-sorting issues, each of these doors is a single mesh instead of grouping all together into one single mesh (which leads to Z-sorting issues). And each unique surface (and therefore each door) has its own shader. It works fine so far and I've tested my map now on a slower PC which has a intel graphics card.

There I've noticed that the map loads much slower than on my nVidia card here. I could locate the problem to the CreateShader() function. I've already optimized my shader initialization time by loading the shader code as a text file once and use the CreateShader() function with the strings instead of loading each shader separate with the LoadShader() function which saves a lot of time. I'm only using only one single shader source for the whole 3D scene, but with different textures per surface.

I'm not very familar with the OpenB3D mechanics behind the command but it looks like if the shader source (as text) gets compiled each time again which takes a few milliseconds resulting in longer loading times in total on slower graphic cards.

Is there any kind of optimization possible to speed up the CreateShader() processing? Precompiling or instancing like the CopyEntity command (with a CopyShader() command)? Or do I misunderstood the shader concept here?
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects

col

#1
Are you using CreateShader for each mesh/door? That would cause a lot of compilation of the shader with each instance using up gpu memory.

Quickly looking through the c++ source... there are ShadeEntity, ShadeMesh and ShadeSurface functions. I imagine there are equivalent 'Max functions with similar names? If so then they should let you load and compile the shader once and apply the shader as a material ( if I understand OpenB3D correctly ).
https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

Krischan

Quote from: col on March 01, 2018, 04:52:56
Are you using CreateShader for each mesh/door? That would cause a lot of compilation of the shader with each instance using up gpu memory.

Quickly looking through the c++ source... there are ShadeEntity, ShadeMesh and ShadeSurface functions. I imagine there are equivalent 'Max functions with similar names? If so then they should let you load and compile the shader once and apply the shader as a material ( if I understand OpenB3D correctly ).

Yes, I create a shader for each NEW surface, which doesn't already exists (I have implemented a basic cache for that). So I don't use ShadeEntity, only ShadeSurface command. Do you mean that I apply that single shader to ALL entities once and only change the material? Hmm interesting, but I didn't work with materials in GLSL yet. Unfortunately I couldn't find any material example in the OpenB3D examples.
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects

col

#3
I'm not sure on how OpenB3D handles its materials, but I would expect that you can apply a single shader ( that works with textures ) to all entities and have different textures for each entity.

You wouldn't normally have each single entity have its own single shader.

If you wanted to render 10 textured cubes with each cube having its own vertex colors and textures, you would write/use a single shader that takes the vertex colors and a texture into account. You would set the shader, then change the texture as you render each cube - ie no need to change shaders as the same shader is used for all cubes.

When using a library such as OpenB3D you would like to think that these details are taken of for you - looking at the code it certainly looks that way.
https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

markcwm

#4
Hi Krischan,
I'm not sure as I've not really studied this but I'm fairly sure the problem is like col suggested, Angros simplified loading shaders so the whole process is done again for each shader object.

Looking at the Nehe GLSL tutorial it simplifies it to 1. pass source to shader object 2. compile shader source 3. link shader to program object (which is used by the renderer). In Openb3d LoadShader calls CreateShaderMaterial (step 1 for program object) then it calls AddShader (steps 1 to 3 for shader object) this calls CreateFragShader (steps 1 & 2) and then AttachFragShader (step 3).

So if the attach and link function was split up that should speed it up. To do that I'll need access to the program and shader objects which are internal. So I'll see how this goes.

col

#5
Hiya markcwm,

Not meaning to tell you what to do or how to do it but if I may suggest...

As you say... each shader object ( vertex or pixel shader ) has its source, gets compiled to its binary form and then gets linked with other shader objects to form a shader program - the shader program consisting of a vertex shader object binary and a pixel shader object binary. As an optimization it would be perfectly OK to compile each shader object once and store the binary id ( or the shader 'name' or whatever OpenGL calls the handle to the shader object ). When you want to create a complete shader program ( consisting of the vertex and pixel/fragment shader ) you can choose the binary shader object(s) as needed without needing to recompile each shader object, and link them.

IE... create/compile all shader objects once ( for a binary )... then use the existing shader object binaries to form different shader programs. As long as the output of the vertex shader matches the input of the pixel shader then it should link and execute ok.

It's exactly the same as you would compile a c++ file to create an object file and use/link many existing object files to create an executable. So unless the source of a file changes you only need to compile it once. Changing the source then requires to recompile that file to a new object that then needs to be linked in to create the new executable.

I hope that makes sense?
https://github.com/davecamp

"When you observe the world through social media, you lose your faith in it."

markcwm

Thanks col, that makes sense and is a better idea. So what you're suggesting is to split AddShader and AddShaderFromString up into new functions CreateFragShader, AttachFragShader, etc as well as CreateShaderMaterial while keeping the CreateShader all-in-one functions.

markcwm

#7
Okay, I added the advanced shader functions, hopefully this will fix Krischan's issues. See shaders/createshader2.bmx and greyscale2.bmx for usage of TShaderObject. I also added DeleteFragShader which gives you more flexibility than FreeShader, check out greyscale2 for delete usage.

I couldn't get DetachFragShader to work which would have been preferable but it seems once the shader object is attached to the program object it can't simply be detached, detach is only used when deleting.

Krischan

Well, the new feature works (thanks guys) but updating to the current OpenB3D version created another problem here: my doors are transparent models with FX flag 32. I'm getting the alpha information in the shader from the diffusemap alpha channel (color.a). In my previous OpenB3D installation everything was fine. It looks like the shader gets no specific alpha value from the diffuse texture anymore and I don't know why.

Performing a quick compare between the two OpenB3D versions I've noticed that not much changed since October 2017 (I compiled the previous mod on October 31st). I'm loading 32bit TGA uncompressed textures with an alpha channel and never had any problems with them. I only noticed that there was a large change in the texture load function in TTexture.bmx in the current OpenB3D version:

Code (Blitzmax) Select
' TODO - mysterious memory corruption with Brl tga loader
If tex.width[0]=0 Or tex.width[0]=1 ' Stbimage failed try pixmap, progressive jpgs
'If (flags & 128) Then LoadCubeMapTexture(file,flags,tex) ' TODO load cubemaps
map=LoadPixmap(file)
If map=Null Then Return tex
tex=CreateTexture(PixmapWidth(map),PixmapHeight(map),flags)
If map.format<>PF_RGBA8888 Then map=map.Convert(PF_RGBA8888)

If (flags & 2) Then ApplyAlpha(map)
If (flags & 4) Then ApplyMask(map,10,10,10)

glBindTexture(GL_TEXTURE_2D,tex.texture[0])
gluBuild2DMipmaps(GL_TEXTURE_2D,GL_RGBA,tex.width[0],tex.height[0],GL_RGBA,GL_UNSIGNED_BYTE,PixmapPixelPtr(map,0,0))

tex.BufferToTex PixmapPixelPtr(map,0,0)
EndIf


The two shaders compile without any errors against the glslangValidator.exe tool. Do you have an idea what might be wrong here?

basic.vert
#version 130

#define MAX_LIGHTS 8
#define NUM_LIGHTS 2

out vec2 Vertex_UV;
out vec3 Vertex_Normal;
out vec3 Vertex_LightDir[MAX_LIGHTS];
out vec4 Vertex_EyeVec;
out vec4 position;

void main()
{
gl_TexCoord[0] = gl_MultiTexCoord0; // diffuseMap
gl_TexCoord[1] = gl_MultiTexCoord1; // normalMap
gl_TexCoord[2] = gl_MultiTexCoord2; // lightMap

Vertex_Normal = gl_NormalMatrix * gl_Normal;
vec4 view_vertex = gl_ModelViewMatrix * gl_Vertex;

for (int i = 0; i < NUM_LIGHTS; ++i)
{
Vertex_LightDir[i] = gl_LightSource[i].position.xyz+vec3(0,0,0) - view_vertex.xyz;
}

Vertex_EyeVec = -view_vertex;
position = gl_ModelViewMatrix * gl_Vertex;

gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
}


basic.frag
#version 130
#define MAX_LIGHTS 8
#define NUM_LIGHTS 2

precision mediump float;

// desaturate RGB constants
const float r = 0.2126;
const float g = 0.7152;
const float b = 0.0722;

const float specularintensity = 0.5;
const float normalhighlight_on = 0.5;
const float normalhighlight_off = 0.125;
const float colormapintensity = 1.0;
const float lightmapintensity = 1.0;
const float fogrange = 1024.0;

// input textures
uniform sampler2D diffuseMap;
uniform sampler2D normalMap;
uniform sampler2D lightMap;

// input variables from vertex shader
in vec2 Vertex_UV;
in vec3 Vertex_Normal;
in vec3 Vertex_LightDir[MAX_LIGHTS];
in vec4 Vertex_EyeVec;
in vec4 position;

// input flags from game
uniform int flagFog;
uniform int flagTone;
uniform int flagSpec;
uniform int flagBump;
uniform int flagLight;
uniform int flagAtt;

// input variables from game
uniform float gamma;
uniform float lightningshaderintensity;
uniform float torchintensity;
//uniform int bumpmode;
//uniform int fog;
//uniform int lightswitch;
uniform vec3 fogColor;
uniform vec4 baseColor;

struct FloatArray { float Float; };
uniform FloatArray lightradius[MAX_LIGHTS];

uniform vec4 vambient=vec4(0.1,0.1,0.1,1.0);

// function: tone mapping
vec3 ToneMapping(vec3 color)
{
float A = 0.15;
float B = 0.50;
float C = 0.10;
float D = 0.20;
float E = 0.02;
float F = 0.30;
float W = 11.2;

float exposure = 2.0;

color *= exposure;
color = ((color * (A * color + C * B) + D * E) / (color * (A * color + B) + D * F)) - E / F;
float white = ((W * (A * W + C * B) + D * E) / (W * (A * W + B) + D * F)) - E / F;
color /= white;
color = pow(color, vec3(1. / (1.0 + gamma)))*1.5;
return color;
}

// TBN matrix calculations #1
mat3 cotangent_frame(vec3 N, vec3 p, vec2 uv)
{
vec3 dp1 = dFdx(p);
vec3 dp2 = dFdy(p);
vec2 duv1 = dFdx(uv);
vec2 duv2 = dFdy(uv);

vec3 dp2perp = cross(dp2, N);
vec3 dp1perp = cross(N, dp1);
vec3 T = dp2perp * duv1.x + dp1perp * duv2.x;
vec3 B = dp2perp * duv1.y + dp1perp * duv2.y;

float invmax = inversesqrt(max(dot(T, T), dot(B, B)));
return mat3(T * invmax, B * invmax, N);
}

// TBN matrix calculations #2
vec3 perturb_normal(vec3 N, vec3 V, vec2 texcoord)
{
vec3 map = texture2D(normalMap, texcoord).xyz * vec3(1.0, 1.0, 1.0);

map = map * 255. / 127. - 128. / 127.;
mat3 TBN = cotangent_frame(N, -V, texcoord);
return normalize(TBN * map);
}

vec3 DirectIllumination(vec3 P, vec3 N, vec3 lightCentre, float lightRadius, vec3 lightColour, float cutoff)
{
// calculate normalized light vector and distance to sphere light surface
float r = lightRadius;
vec3 L = lightCentre - P;
float distance = length(L);
float d = max(distance - r, 0);
L /= distance;
     
// calculate basic attenuation
float denom = d / r + 1;
float attenuation = 1 / (denom * denom);
     
// scale and bias attenuation such that:
//   attenuation == 0 at extent of max influence
//   attenuation == 1 when d == 0
attenuation = (attenuation - cutoff) / (1 - cutoff);
attenuation = max(attenuation, 0);
     
float dot = max(dot(L, N), 0);
return lightColour * dot * attenuation;
}

// main fragment shader
void main()
{
float distSqr, att;
float specular, lambertTerm;
float attenuation;
float normalhighlight;
vec3 E, R, L;

// UV coordinates (1st & second set)
vec2 uv1 = gl_TexCoord[0].xy;
vec2 uv2 = gl_TexCoord[2].xy;

// texture samplers
vec4 color = texture(diffuseMap, uv1);
vec4 normal = texture(normalMap, uv1);
vec4 spec = texture(normalMap, uv1);
vec4 light = texture(lightMap, uv2);

// mix lightmap with a base color to create a subtle ambient light
vec4 amb = (max(light * 1, vec4(0.025, 0.03, 0.035, 1.0)) * color);

// normalize normal/view vector and perturbated normal vector
vec3 N = normalize(Vertex_Normal.xyz);
vec3 V = normalize(Vertex_EyeVec.xyz);
vec3 PN = perturb_normal(N, V, uv1);

// final color starts with ambient light only
//vec4 final_color = vec4(0.0, 1.0, 0.0, 1.0);//amb + amb + amb;
vec4 final_color = amb + amb + amb;

// distance to frag texel
float dist = abs(position.z);

for (int i = 0; i < NUM_LIGHTS; ++i)
{
// calculate light distance
vec3 positionToLightSource = vec3(gl_LightSource[i].position - position);
vec3 lightDirection = normalize(positionToLightSource);

float distance = length(positionToLightSource);
float lightdist = gl_LightSource[i].constantAttenuation
+ (gl_LightSource[i].linearAttenuation * distance)
+ (gl_LightSource[i].quadraticAttenuation * sqrt(distance));

// spotlight attenuation
attenuation = 1.0 / lightdist;
if (gl_LightSource[i].spotCutoff <= 90.0)
{

float clampedCosine = clamp(max(0.0, dot(-lightDirection, gl_LightSource[i].spotDirection)), 0.0, 1.0);

// spotlight
if (clampedCosine < gl_LightSource[i].spotCosCutoff)
{
// no light outside spotlight cone
attenuation = 0.0;
}
else
{
// light intensity inside spotlight cone
attenuation = attenuation * pow(clampedCosine, gl_LightSource[i].spotExponent);
}
}

// calculate lambert term
distSqr = dot(Vertex_LightDir[i], Vertex_LightDir[i]);
att = clamp(1.0 - lightradius[i].Float * sqrt(distSqr), 0.0, 1.0);
L = normalize(Vertex_LightDir[i].xyz * inversesqrt(distSqr)); 
lambertTerm = dot(PN, L);

// specularity
final_color += gl_LightSource[i].diffuse * gl_FrontMaterial.diffuse *  color * att; 
E = normalize(Vertex_EyeVec.xyz);
R = reflect(-L, PN);
specular = pow(max(dot(R, E), 0.0), gl_FrontMaterial.shininess)*att* 10.0 / dist;

// add specularity to final color
final_color += gl_LightSource[i].specular * gl_FrontMaterial.specular * specular * lambertTerm * att; 

}

// decrease attenuation by number of lights
attenuation*=1.0/NUM_LIGHTS;

// calculate direct illumination
vec4 dir = vec4(DirectIllumination(position.xyz, V, vec3(0, 0, 0), 1024, vec3(1.0, 0.9, 0.8), 0.25),1.0);

// mix ambient, direct illumination, base color, attenuation with final color
final_color += vambient * color*amb;
final_color*=dir;
final_color*=baseColor;
final_color*=attenuation;
final_color+=amb;

float fogFactor;

// fog
if (flagFog == 1)
{
float density = (dist / fogrange);
const float e = 2.71828;

// calculate fog factor
fogFactor = (density * gl_FragCoord.z);
fogFactor *= fogFactor;
fogFactor = clamp(pow(e, -fogFactor), 0.0, 1.0);

// mix fog color with colormap
final_color = mix(vec4(fogColor.rgb, 1.0), final_color, fogFactor);
}

// lightning strike
if (lightningshaderintensity > 0.0)
{
float dist = 100.0 / (distance(position, vec4(0.0, 0.0, 0.0, 1.0)));
//final_color = color * lightningshaderintensity * dist;

float luminance = r * final_color.r + g * final_color.g + b * final_color.b;
final_color.r = max(final_color.r, luminance * lightningshaderintensity * (2.5 * fogFactor));
final_color.g = max(final_color.g, luminance * lightningshaderintensity * (2.5 * fogFactor));
final_color.b = max(final_color.b, luminance * lightningshaderintensity * (2.5 * fogFactor));

//final_color*=lightningshaderintensity;
}

// apply Tonemapping
gl_FragColor.rgb = ToneMapping(final_color.rgb);
gl_FragColor.a = color.a;

if (flagBump == 1)
{
gl_FragColor=vec4(color.rgb*attenuation,color.a)+amb;
}

}
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects

markcwm

Well I'm not sure if your doors are supposed to have alpha or not, I'm guessing not. It sounds like the new ApplyAlpha function is overwriting your alpha, this should only happen if flags 2 is set in LoadTexture.

The code you posted is only supposed to run if it's a progressive jpg which isn't supported by stbimage so it should return a zero width texture.

So try in texture.cpp find LoadAnimTexture and comment out:
if(flags&2) ApplyAlpha(tex,buffer);

Krischan

Ok the alpha textures were still loaded with flags 2+8 (from the pre-shader variant) which never made problems. Changing it to 1+8 works, thanks.
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects

Krischan

I'm still having problems with alpha meshes. See this screenshot: the alpha information is visible but behind the mesh there is not the level geometry, only the CameraCLScolor which is violet here and all other alpha meshes, but no solid meshes. Very strange. I played with the FX 32 and tried to learn from your alpha example but I couldn't fix it. Disabling all shaders always results in a solid block, even FX 1 and EntityAlpha don't work. With the October 2017 version of OpenB3D, everything is fine.

So Mark - perhaps you can take a closer look at my game project (I've sent you a PN with a download link) and tell me what's wrong there? Thanks!
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects

markcwm

#12
Hi Krischan,

I think it's fixed now, it was a fix to mesh::alpha because it never worked right in Minib3d, but I forgot to test shaders/alphamap.bmx. I didn't test your LOF project as it had dependencies.

Krischan

Works, great! Thanks!
Kind regards
Krischan

Windows 10 Pro | i7 9700K@ 3.6GHz | RTX 2080 8GB]
Metaverse | Blitzbasic Archive | My Github projects