Skip to content

Explore possible Gaussian Splat optimizations #1814

@j9liu

Description

@j9liu

Feature

Our Gaussian splat implementation has worked fairly well with growing scale of sample data that we're testing. However, we've recently tested a ~110mil splat dataset (2GB when zipped!) that, naturally, is pushing the system to its limits.

Of course, performance is dependent on the amount of splats loaded in, because those get aggregated into the buffers passed to Niagara. Accumulating too many can result in a video memory budget error like so:

Image

This sadly worsens when trying to render sequences, because Unreal wants to preload all the necessarily tiles ahead of the render. Ultimately this is overwhelming the GPU to the point where it hangs and disconnects, causing Unreal to crash:

[2026.03.20-20.47.21:121][694]LogD3D12RHI: Error: GPU crash detected:
	- Device 0 Removed: DXGI_ERROR_DEVICE_HUNG
[2026.03.20-20.47.21:121][694]LogD3D12RHI: Error: Shader diagnostic messages and asserts:
	Device: 0, Queue 3D:
		No shader diagnostics found for this queue.
	Device: 0, Queue Copy:
		No shader diagnostics found for this queue.
	Device: 0, Queue Compute:
		No shader diagnostics found for this queue.
[2026.03.20-20.47.21:124][694]LogRHI: Error: Active GPU breadcrumbs:

(omitted unhelpful shader info dump)

On the one hand, LODs are meant to mitigate the sheer number of splats being loaded, so increasing Maximum Screen Space Error is our go-to workaround.

On the other hand, this dataset doesn't crash in CesiumJS at the same SSE. Obviously JS is working with very different constraints/rendering environment, but it's worth understanding how they are avoiding that problem. At the very least, if our splat system had some sort of bail system to prevent itself from taking the GPU hostage, that would be better than nothing.

Idea Dump

These are extremely half-baked (it's the end of a Friday), but I'm hoping that naming even some bad ideas can start a convo:

Throttling LOD updates

CesiumJS seems to do this (intentionally?) -- it takes a few seconds to update after the camera has moved. Ideally we want that time to be shorter, but preventing back-to-back uploads could be helpful.

More aggressive tile unloading

Not sure if this could go either in the Unreal side or in Cesium Native...

For voxels in #1685, we sort tiles by SSE to prioritize which ones should be kept alive in the megatexture. Tiles will be rotated in and out of available texture slots as needed to balance texture limits with visual fidelity.

This can't translate one-to-one with splats; every tile has a different splat count so it's not as simple as rotating them in and out. But I'm wondering if we can still be more aggressive with limiting the resources that the splat system can take up, and finding a heuristic to reasonably boot tiles. For example, if a tile is hidden for a while but not unloaded (like a tile that is frustum culled), worth unloading it if we keep looking at tiles that need to be loaded in front of us.

Breaking up datasets into multiple Niagara systems?

This one is probably not a good idea -- I imagine it could potentially break sorting, and that's probably why we're only using one system in the first place.

Compute Shader?

Obviously would take a lot of effort and a whole revamp of the Gaussian Splat system, but given that they tend to be more performant, it could be a distant future option.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformanceImprovements to performance, including reductions in memory usage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions