Skip to content

Conversation

@ohlidalp
Copy link
Contributor

@ohlidalp ohlidalp commented Sep 25, 2025

image

I wanted to see the ResourceBackgroundQueue::prepare() functionality in action before I use it anywhere else - so I created a proof-of concept app in the form of OGRE Sample. I was particularly interested in the common case - loading a mesh, complete with linked material(s) and texture(s).

Opening as draft because of things left to do:

  • Add thumbnail
  • Add slider (maybe also Apply button) to set number of worker threads
  • Implement threaded prep of textures, too (not doable out of the box as linked materials aren't known until mesh load)
  • Implement threaded prep of skeletons, too (same reason)
  • Squash the commits together

@paroj
Copy link
Member

paroj commented Sep 25, 2025

and what kind of improvement could you measure?

@ohlidalp
Copy link
Contributor Author

ohlidalp commented Sep 25, 2025

On my laptop with Ryzen7 (7435HS, 3.10GHz) and nVidia RTX 4070, under Debug build, reloading all meshes at once:

  • Sync prep: 20 reloads = avg loading time 1.062sec
  • Threaded prep: 20 reloads = avg loading time 1.037sec

I think the gain is so minor because textures and skeletons are still prepared synchronously when load()-ing the mesh resource.

        /** Copy out the raw data fetched from disk after resource preparation completes (state `LOADSTATE_PREPARED`).
        * For advanced users only - you can manually parse the mesh data to retrieve linked resources (materials & skeleton).
        * @returns Mesh file data if in state `LOADSTATE_PREPARED`, otherwise null.
        */
from Ogre.log:
03:19:39: [ThreadedResourcePrep]facial.mesh: skeletons:0, materials:5
03:19:39: [ThreadedResourcePrep]penguin.meshskeletons:1, materials:1
03:19:39: [ThreadedResourcePrep]knot.mesh: skeletons:0, materials:1
03:19:39: [ThreadedResourcePrep]sibenik.mesh: skeletons:0, materials:18
03:19:39: [ThreadedResourcePrep]ninja.mesh: skeletons:1, materials:2
03:19:39: [ThreadedResourcePrep]spine.mesh: skeletons:1, materials:1
03:19:39: [ThreadedResourcePrep]fish.mesh: skeletons:1, materials:1
03:19:39: [ThreadedResourcePrep]jaiqua.mesh: skeletons:1, materials:1
03:19:39: [ThreadedResourcePrep]robot.mesh: skeletons:1, materials:1
03:19:39: [ThreadedResourcePrep]Sword.mesh: skeletons:0, materials:4
03:19:39: [ThreadedResourcePrep]Barrel.mesh: skeletons:0, materials:1
03:19:39: [ThreadedResourcePrep]Sinbad.mesh: skeletons:1, materials:7
03:19:39: [ThreadedResourcePrep]DamagedHelmet.mesh: skeletons:0, materials:1
03:19:40: [ThreadedResourcePrep]razor.mesh: skeletons:0, materials:1
03:19:40: [ThreadedResourcePrep]geosphere4500.mesh: skeletons:0, materials:1
03:19:40: [ThreadedResourcePrep]dragon.mesh: skeletons:0, materials:3
03:19:40: [ThreadedResourcePrep]geosphere8000.mesh: skeletons:0, materials:1
@ohlidalp
Copy link
Contributor Author

I've proved my previous assumption. When I also prepare textures & skeletons on background, the stat becomes:

  • 20 reloads = avg loading time 0.233sec

Major win, but required me to hack OgreCore by adding DataStreamPtr Mesh::copyPreparedMeshFileData() and reimplement part of MeshSerializer inside the sample.

@paroj
Copy link
Member

paroj commented Sep 26, 2025

yeah.. maybe you should just bg load all materials in a resource group instead of that..

@ohlidalp
Copy link
Contributor Author

ohlidalp commented Sep 26, 2025

That would be pretty disappointing though. I'm looking for a supported (non-deprecated) solution for Rigs of Rods which has literally hundreds of user made mods distributed in ZIP archives, most having multiple variants (+multiple skins) inside single ZIP. Bruteforcing prep for all resources just isn't a reasonable solution. Plus, using this approach in a sample would sort of send out a message that threaded prep is technically here but it's not useful for the common case. Frankly my motivation to create this sample was to either find a workaround or point a finger at this issue.

I'm open to suggestions how to tackle this. I noticed there is clone(bool copy, HwBufferMan* newMan) method in both VertexData and IndexData, which made me wonder if Mesh::loadImpl() could be run with a dummy HwBufferMan which would belong to dummy rendersystem that just creates dummy buffers in CPU memory to be cloned to actual rendersystem later. This would most likely show even better results in the benchmark. Another option would be to restore THREAD_SUPPORT 2. I'm a fan of https://preshing.com/20111118/locks-arent-slow-lock-contention-is/ and skimming OGRE source left me with an impression of heavy-handed locking (meaning the developer thought "locks are expensive per se, let's leave each locked as long as possible"). So maybe the arguments in #454 aren't entirely valid.

EDIT: I'll also explore the possibility of using custom loader just to load it the same way, just parsing it on the go. It should save me the copying of the whole stream which Mesh::copyPreparedMeshFileData() currently does.

@paroj
Copy link
Member

paroj commented Sep 27, 2025

  • arent you loading the mods in RoR into separate resource groups anyway?
  • I dont think loading the mesh and its material in parallel is that beneficial. The bottleneck should be texture loading (especially coming from PNG/ JPG). You can just load the mesh normally and then load its material in bg.
  • the "dummy" HwBufMgr is called DefaultHardwareBufferManager and used inside the LOD system like that. There is also Mesh::setHardwareBufferManager to force its use.

@ohlidalp
Copy link
Contributor Author

ohlidalp commented Sep 27, 2025

• Yes RoR loads every mod to separate RG, so brute prepping all content would be bearable. I will explore the ManualLoader option first, though.
• I was under impression loading mesh also synchronously loads the textures, but I was probably wrong. This should do the trick.
• Thanks for info, I couldn't make it out from the source.

@ohlidalp
Copy link
Contributor Author

ohlidalp commented Oct 7, 2025

I looked at ManualResourceLoader and I can't figure out how to use it to just prepare meshes. Resource::prepare() calls it and then sets LOADSTATE_PREPARED which is a lie in case of Mesh because mFreshFromDisk cannot be filled from the outside.

@paroj can you advise?

@paroj
Copy link
Member

paroj commented Oct 7, 2025

you cannot just prepare meshes. The ManualResourceLoader API is made to mainly handle load.

This reverts the `Mesh::copyPreparedMeshFileData()` hack ~ commit cd1c47d. However, the early discovery logic is kept, the mesh file is simply loaded twice.

To double check load()-ing the mesh doesn't prepare any other resources, the reload test can be performed without the early discovery where mesh gets simply load()-ed, then background prep of textures/skeletons is queried.
To enable it, tick all the checkboxes (to be cleaned up).
This means the mesh is prepared via ResourceBackgroundQueue, then a custom task is queued to WorkQueue which:
   1. loads the mesh file again,
   2. discovers skeletons+textures using custom code
   3. and then queues all of them via ResourceBackgroundQueue.

I expected to see almost no FPS spike with this approach as everything is both fetched to RAM and analyzed on background, only the final mesh loading (which constructs actual hardware buffers) is done on foreground.
However, the results are basically identical to just loading the mesh on foreground first and then queue textures+skeletons via ResourceBackgroundQueue.
@ohlidalp
Copy link
Contributor Author

ohlidalp commented Oct 8, 2025

I added "Threaded early discovery" mode. To enable it, tick all the checkboxes on top left (to be cleaned up).
This means the mesh is prepared via ResourceBackgroundQueue, then a custom task is queued to WorkQueue which:

  1. loads the mesh file again (I reverted the Mesh::copyPreparedMeshFileData()` hack, so I need to load the data myself)
  2. discovers skeletons+textures using custom code
  3. and then queues all of them via ResourceBackgroundQueue.

I expected to see almost no FPS spike with this approach as everything is both fetched to RAM and analyzed on background, only the final mesh loading (which constructs actual hardware buffers) is done on foreground.
However, the results are basically identical to just loading the mesh on foreground first and then queuing textures+skeletons via ResourceBackgroundQueue.

@paroj
Copy link
Member

paroj commented Oct 9, 2025

maybe this is better suited as a VTest, as there is nothing visual here, but rather tests the workflow.
VTests are here:

PlayPen_CameraSetDirection::PlayPen_CameraSetDirection()

and are executed as part of our CI.

They generate this overview page:
https://ogrecave.github.io/ogre/vtests1.12/TestResults_GL.html

however, we do not validate the images but only fail if the test crashes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants