![]() |
OGRE
1.10.4
Object-Oriented Graphics Rendering Engine
|
The HLMS is the new material system used in Ogre. It's more user friendly and performs faster.
HLMS stands for “High Level Material System”, because for the user, the HLMS means just define the material and start looking at it (no need for coding or shader knowledge!). But on retrospective, tweaking the shader code for an HLMS is much low level than the old Materials have ever been (and that makes them very powerful).
Described in detail in the Blocks section, many parameters have been grouped into blocks. Changing depth checks means changing the whole Macroblock.
You could be thinking the reason I came up with these two is to fit with D3D11′s grand scheme of things while being compatible with OpenGL. But that’s a half truth and an awesome side effect. I’ve been developing the Hlms using OpenGL this whole time.
An OpenGL fan will tell you that grouping these together in single call like D3D11 did barely reduce API overhead in practice (as long as you keep sorting by state), and they’re right about that.
However, there are big advantages for using blocks:
hash |= (macroblock->getId() << bits) & mask
than to do: hash =| m->depth_check | m->depthWrite << 1 | m->depthBias << 2 | m->depth_slope_bias << 3 | m->cullMode << 18 | ... ;
We also need a lot more bits we can’t afford. Ogre 2.0 imposes a limit on the amount of live Macroblocks you can have at the same time; as we run out of hashing space (by the way, D3D11 has its own limit). It operates around the idea that most setting combinations won’t be used in practice.Of course it’s not perfect, it can’t fit every use case. We inherit the same problems D3D11 has. If a particular rendering technique relies on regularly changing a property that lives in a Macroblock (i.e. like alternating depth comparison function between less & greater with every draw call, or gradually incrementing the depth bias on each draw call); you’ll end up redundantly changing a lot of other states (culling mode, polygon mode, depth check & write flags, depth bias) alongside it. This is rare. We’re aiming the general use case.
These problems make me wonder if D3D11 made the right choice of using blocks from an API perspective, since I’m not used to driver development. However from an engine perspective, blocks make sense.
Let me get this straight: You should be using the HLMS. The usual “Materials” are slow. Very slow. They’re inefficient and not suitable for rendering most of your models.
However, materials are still useful for:
Under the hood there is an HLMS C++ implementation (hlms_LOW_LEVEL) that acts just as a proxy to the material. The HLMS is an integral part of Ogre 2.0, not just a fancy add-in.
Materials have been refactored, and thus your old code may need a few changes. Most notably Macroblocks & Blendblocks have been added to Materials, thus functions like Pass::setDepthCheck & Co have been replaced by a two calls: Pass::setMacroblock & Pass::setBlendblock.
Based on your skillset and needs, you can pick up to which parts you want to mess with. Most users will just use the scripts to define materials, advanced users will change the template, and very advanced users who need something entirely different will change all three.
For example the PBS (Physically Based Shading) type has its own C++ implementation and its own set of shader templates. The Toon Shading has its own C++ implementation and set of shaders. There is also an “Unlit” implementation, specifically meant to deal with GUI and simple particle FXs (ignores normals & lighting, manages multiple UVs, can mix multiple texture with photoshop-like blend modes, can animate the UVs, etc)
It is theoretically possible to implement both Toon & PBS in the same C++ module, but that would be crazy, hard to maintain and not very modular.
We’re introducing the concept of blocks, most of them are immutable. Being immutable means you can’t change the Macro- Blend- & Samplerblocks after being created. If you want to make a change, you have to create a new block and assign the new one. The previous one won’t be destroyed until asked explicitly.
Technically on OpenGL render systems (GL3+, GLES2) you could const_cast the pointers, change the block’s parameters (mind you, the pointer is shared by other datablocks, so you will be changing them as well as side effect) and it would probably work. But it will definitely fail on D3D11 render system.
A Datablock is a “material” from the user’s perspective. It is the only mutable block. It holds data (i.e. material properties) that will be passed directly to the shaders, and also holds which Macroblock, Blendblocks and Samplerblocks are assigned to it.
Most Hlms implementations will create a derived class for Datablocks to hold their data. For example, HlmsPbs creates a datablock called HlmsPbsDatablock. This datablock contains roughness and fresnel values, which do not make any sense in (e.g.) a GUI implementation.
Named like that because most entities end up using the macroblock. Except for transparents, we sort by macroblock first. These contain information like depth check & depth write, culling mode, polygon mode (point, wireframe, solid). They’re quite analogous to D3D11_RASTERIZER_DESC. And not without reason: under the hood Macroblocks hold a ID3D11RasterizerState, and thanks to render queue’s sorting, we change them as little as possible. In other words, reduce API overhead. On GL backends, we just change the individual states on each block change. Macroblocks can be shared by many Datablocks.
Even in OpenGL, there are performance benefits, because there are enumeration translations (i.e. CMPF_LESS -> GL_LESS) that are performed and cached when the macroblock gets created, instead of doing it every time the setting changes.
Blendblocks are like Macroblocks, but they hold alpha blending operation information (blend factors: One, One_Minus_Src_Alpha; blending modes: add, substract, min, max. etc). They’re analogous to D3D11_BLEND_DESC. We also sort by blendblocks to reduce state changes.
Samplerblocks hold information about texture units, like filtering options, addressing modes (wrap, clamp, etc), Lod bias, anisotropy, border colour, etc. They're analogous to D3D11_SAMPLER_DESC.
GL3+ and D3D11 both support samplerblocks natively. On GLES2, the functionality is emulated (still performance has improved since we can cache the samplerblock's GL value translations and whether a texture has already set to a given samplerblock's paremeters).
The diagram shows a typical layout of a datablock. Note that Samplerblocks do not live inside base HlmsDatablock, but rather in its derived implementation. This is because some implementations may not need textures at all, and the number of samplerblocks is unknown. Some implementations may want one samplerblock per texture, whereas others may just need one.
Macroblocks and Blendblocks on the other hand, we just need one per material.
The Hlms will parse the template files from the template folder according to the following rules:
The Hlms takes a template file (i.e. a file written in GLSL or HLSL) and spits out valid shader code. Templates can take advantage of the Hlms' preprocessor, which is a simple yet powerful macro-like preprocessor that helps writing the required code.
The preprocessor was written with speed and simplicity in mind. It does not implement an AST or anything fancy. This is very important to account while writing templates because there will be cases when using the preprocessor may feel counter-intuitive or frustrating.
For example
is analogous to
However you can't evaluate IncludeLighting to anything other than zero and non-zero, i.e. you can't check whether IncludeLighting == 2 with the Hlms preprocessor. A simple workaround is to define, from C++, the variable “IncludeLightingEquals2” and check whether it's non-zero. Another solution is to use the GLSL/HLSL preprocessor itself instead of Hlms'. However, the advantage of Hlms is that you can see its generated output in a file for inspection, whereas you can't see the GLSL/HLSL after the macro preprocessor without vendor-specific tools. Plus, in the case of GLSL, you'll depend on the driver implementation having a good macro preprocessor.
The preprocessor always starts with @ followed by the command, and often with arguments inside parenthesis. Note that the preprocessor is always case-sensitive. The following keywords are recognized:
Checks whether the variables in the expression are true, if so, the text inside the block is printed. Must be finazlied with @end. The expression is case-sensitive. When the variable hasn't been declared, it evaluates to false.
The logical operands && || ! are valid.
Examples:
It is very similar to #if hlms_skeleton != 0 #endif; however there is no equivalent #else or #elif syntax. As a simple workaround you can do:
Newlines are not necessary. The following is perfectly valid:
Which will print:
Loop that prints the text inside the block, The text is repeated count - start times. Must be finalized with @end.
Newlines are very important, as they will be printed with the loop.
Examples:
Expression | Output |
---|---|
@foreach( 4, n ) @n@end | 0 1 2 3 |
@foreach( 4, n ) @n@end | 0 1 2 3 |
@foreach( 4, n ) @n @end | 0 1 2 3 |
@foreach( 4, n, 2 ) @n@end | 2 3 |
@pset( myStartVar, 1 ) @pset( myCountVar, 3 ) @foreach( myStartVar, n, myCountVar ) @n@end | 1 2 |
@foreach( 2, n ) @insertpiece( pieceName@n )@end | @insertpiece( pieceName0 ) @insertpiece( pieceName1 ) |
Attention #1!
Don't use the common letter i for the loop counter. It will conflict with other keywords.
i.e. “@foreach( 1, i )@insertpiece( pieceName )@end” will print “0nsertpiece( pieceName )” which is probably not what you intended.
Attention #2!
foreach is parsed after property math (pset, padd, etc). That means that driving each iteration through a combination of properties and padd functions will not work as you would expect.
i.e. The following code will not work:
@pset( myVar, 1 )@foreach( 2, n )//Code@psub( myVar, 1 ) //Decrement myVar on each loop\@property( myVar )//Code that shouldn't be printed in the last iteration@end@endBecause psub will be evaluated before expanding the foreach.
Prints the current value of variable and increments it by 1. If the variable hasn't been declared yet, it is initialized to 0.
Examples:
Prints the current value of variable without incrementing it. If the variable hasn't been declared, prints 0.
Sets a variable to a given value, adds, subtracts, multiplies, divides, calculates modulus, or the minimum/maximum of a variable and a constant, or two variables. This family of functions get evaluated after foreach(s) have been expanded and pieces have been inserted. Doesn't print its value.
Arguments can be in the form @add(a, b) meaning a += b; or in the form @add( a, b, c ) meaning a = b + c
Useful in combination with @counter and @value
Expression | Output | Math |
---|---|---|
@set( myVar, 1 ) @value( myVar ) | 1 | myVar = 1 |
@add( myVar, 5 ) @value( myVar ) | 6 | myVar = 1 + 5 |
@div( myVar, 2 ) @value( myVar ) | 3 | myVar = 6 / 2 |
@mul( myVar, myVar ) @value( myVar ) | 9 | myVar = 3 * 3 |
@mod( myVar, 5 ) @value( myVar ) | 4 | myVar = 9 % 5 |
@add( myVar, 1, 1 ) @value( myVar ) | 2 | myVar = 1 + 1 |
Saves all the text inside the blocks and saves it as a named piece. If a piece with the given name already exists, a compiler error will be thrown. The text that was inside the block won't be printed. Useful when in combination with @insertpiece. Pieces can also be defined from C++ or collected from piece template files.
Example:
Prints a block of text that was previously saved with piece (or from C++). If no piece with such name exists, prints nothing.
Example:
Analogous to the family of math functions without the 'p' prefix. The difference is that the math is evaluated before anything else. There is no much use to these functions, probably except for quickly testing whether a given flag/variable is being properly set from C++ without having to recompile.
i.e. If you suspect hlms_normal is never being set, try @pset( hlms_normal, 1 )
One important use worth mentioning, is that variables retain their values across shader stages. First the vertex shader template is parsed, then the pixel shader one. If 'myVal' is 0 and the vertex shader contains @counter( myVal ); when the pixel shader is parsed @value( myVal ) will return 1, not 0.
If you need to reset these variables across shader stages, you can use pset( myVal, 0 ); which is guaranteed to reset your variable to 0 before anything else happens; even if the pset is stored in a piece file.
There are two components that needs to be evaluated that may affect the shader itself and would need to be recompiled:
When calling Renderable::setDatablock(), what happens is that Hlms::calculateHashFor will get called and this function evaluates both the mesh and datablock compatibility. If they're incompatible (i.e. the Datablock or the Hlms implementation requires the mesh to have certain feature. e.g. the Datablock needs 2 UV sets bu the mesh only has one set of UVs) it throws.
If they're compatible, all the variables (aka properties) and pieces are generated and cached in a structure (mRenderableCache) with a hash key to this cache entry. If a different pair of datablock-mesh ends up having the same properties and pieces, they will get the same hash (and share the same shader).
The following graph summarizes the process:
Later on during rendering, at the start each render pass, a similar process is done, which ends up generating a “pass hash” instead of a renderable hash. Pass data stores settings like number of shadow casting lights, number of lights per type (directional, point, spot).
While iterating each renderable for render, the hash key is read from the Renderable and merged with the pass' hash. With the merged hash, the shader is retrieved from a cache. If it's not in the cache, the shader will be generated and compiled by merging the cached data (pieces and variables) from the Renderable and the Pass. The following graph illustrates the process:
Note: This section is relevant to those seeking to write their own Hlms implementation.
C++ can use Hlms::setProperty( "key", value ) to set “key” to the given value. This value can be read by @property, @foreach, @add/sub/mul/div/mod, @counter, @value and @padd/psub/pmul/pdiv/pmod
To create pieces (or read them) you need to pass your custom Hlms::PiecesMap to Hlms::addRenderableCache.
The recommended place to do this is in Hlms::calculateHashForPreCreate and Hlms::calculateHashForPreCaster. Both are virtual. The former gets called right before adding the set of properties, pieces and hash to the cache, while the latter happens right before adding the similar set for the shadow caster pass.
In those two functions you get the chance to call setProperty to set your own variables and add your own pieces.
Another option is to overload Hlms::calculateHashFor which gives you more control but you'll have to do some of the work the base class does.
For some particularly complex features, the Hlms preprocessor may not be enough, too difficult, or just impossible to implement, and thus you can generate the string from C++ and send it as a piece. The template shader can insert it using @insertpiece.
The function Hlms::createShaderCacheEntry is the main responsible for generating the shaders and parsing the template through the Hlms preprocessor. If you overload it, you can ignore pieces, properties; basically override the entire Hlms system and provide the source for the shaders yourself. See the HlmsLowLevel implementation which overrides the Hlms entirely and acts as a mere proxy to the old Material system from Ogre 1.x; the flexibility is really high.
Properties starting with 'hlms_' prefix are common to all or most Hlms implementations. i.e. 'hlms_skeleton' is set to 1 when a skeleton is present and hardware skinning should be performed.
Save properties' IdStrings (hashed strings) into constant as performance optimizations. Ideally the compiler should detect the constant propagation and this shouldn't be needed, but this often isn't the case.
For mobile, avoid mat4 and do the math yourself. As for 4x3 matrices (i.e. skinning), perform the math manually as many GLES2 drivers have issues compiling valid glsl code.
Properties in underscore_case are set from C++; propierties in camelCase are set from the template.
Propierties and pieces starting with 'custom_' are for user customizations of the template
TBD
Hlms supports modifying the template files externally and reloading them, taking immediate effect. Call Hlms::reloadFrom to achieve this. How to get notified when the files were changed is up to the user.
By default if a template isn't present, the shader stage won't be created. e.g. if there is no GeometryShader_gs.glsl file, no geometry shader will be created. However there are times where you want to use a template but only use this stage in particular scenarios (e.g. toggled by a material parameter, disable it for shadow mapping, etc.). In this case, set the property hlms_disable_stage to non-zero from within the template (i.e. using @set) . The value of this property is reset to 0 for every stage.
Note that even when disabled, the Hlms template will be fully parsed and dumped to disk; and any modification you perform to the Hlms properties will be carried over to the next stages. Setting hlms_disable_stage is not an early out or an abort.
In many cases, users may want to slightly customize the shaders to achieve a particular look, implement a specific feature, or solve a unique problem; without having to rewrite the whole implementation.
Maximum flexibility can be get by directly modifying the original source code. However this isn't modular, making it difficult to merge when the original source code has changed. Most of of the customizations don't require such intrusive approach.
Note: For performance reasons, the listener interface does not allow you to add customizations that work per Renderable, as that loop is performance sensitive. The only listener callback that works inside Hlms::fillBuffersFor is hlmsTypeChanged which only gets evaluated when the previous Renderable used a different Hlms implementation; which is rare, and since we sort the RenderQueue, it often branch predicts well.
There are different levels in which an Hlms implementation can be customized:
Variable | Description |
---|---|
custom_passBuffer | Piece where users can add extra information for the pass buffer (only useful if the user is using HlmsListener or overloaded HlmsPbs. |
custom_VStoPS | Piece where users can add more interpolants for passing data from the vertex to the pixel shader. |
custom_vs_attributes | Custom vertex shader attributes in the Vertex Shader (i.e. a special texcoord, etc). |
custom_vs_uniformDeclaration | Data declaration (textures, texture buffers, uniform buffers) in the Vertex Shader. |
custom_vs_preExecution | Executed before Ogre's code from the Vertex Shader. |
custom_vs_posExecution | Executed after all code from the Vertex Shader has been performed. |
custom_ps_uniformDeclaration | Same as custom_vs_uniformDeclaration, but for the Pixel Shader |
custom_ps_preExecution | Executed before Ogre's code from the Pixel Shader. |
custom_ps_posMaterialLoad | Executed right after loading material data; and before anything else. May not get executed if there is no relevant material data (i.e. doesn't have normals or QTangents for lighting calculation) |
custom_ps_preLights | Executed right before any light (i.e. to perform your own ambient / global illumination pass). All relevant texture data should be loaded by now. |
custom_ps_posExecution | Executed after all code from the Pixel Shader has been performed. |