Smooth normal calculation Update! #941

devshgraphicsprogramming · 2025-10-21T06:20:18Z

Description

Testing

TODO list:

…eldVertices into CVertexWelder

include/nbl/builtin/hlsl/shapes/triangle.hlsl

include/nbl/core/algorithm/radix_sort.h

include/nbl/asset/utils/CVertexHashGrid.h

include/nbl/asset/utils/CPolygonGeometryManipulator.h

include/nbl/core/algorithm/radix_sort.h

include/nbl/asset/utils/CVertexHashGrid.h

devshgraphicsprogramming · 2025-10-25T16:17:18Z

include/nbl/asset/utils/CVertexHashGrid.h

+		hlsl::float32_t3 cellfloatcoord = floor(vertex.getPosition() / m_cellSize - hlsl::float32_t3(0.5));
+		hlsl::uint32_t3 baseCoord = hlsl::uint32_t3(static_cast<uint32_t>(cellfloatcoord.x), static_cast<uint32_t>(cellfloatcoord.y), static_cast<uint32_t>(cellfloatcoord.z));


you might wanna summarize in comments and an ASCII diagram the logic I had to explain on discord

i meant the justification for the 0.5, so far you only have the floor

include/nbl/asset/utils/CVertexWelder.h

devshgraphicsprogramming · 2025-10-27T17:30:22Z

include/nbl/asset/utils/CVertexWelder.h

+public:
+  using WeldPredicateFn = std::function<bool(const ICPUPolygonGeometry* geom, uint64_t idx1, uint64_t idx2)>;
+
+  template <typename AccelStructureT>


there should be some concept you require here

devshgraphicsprogramming · 2025-10-27T17:35:13Z

include/nbl/asset/utils/CVertexWelder.h

+    for (uint64_t vertexData_i = 0u; vertexData_i < as.getVertexCount(); vertexData_i++)
+    {
+      const auto& vertexData = as.vertices()[vertexData_i];
+      vertexIndexToAsIndex[vertexData.index] = vertexData.index;


but this is the same as initializing your vertexIndexToAsIndex to IOTA !? but with gaps (you get 0 wherever there's no such index value in the acceleration structure)

it should be vertexIndexToAsIndex[vertexData.index] = vertexData_i.

devshgraphicsprogramming · 2025-10-27T17:37:48Z

include/nbl/asset/utils/CVertexWelder.h

+    // iterate by index, so that we always use the smallest index when multiple vertexes can be welded together
+    for (uint64_t index = 0; index < as.getVertexCount(); index++)
+    {
+      const auto asIndex = vertexIndexToAsIndex[index];


why have that whole vector when you can just as.vertices()[index].index or even better for (const auto& asVertRef : as.vertices()) and const auto asIndex = asVertRef.index;

also the asIndex you get right now will be same as index due to how you initialized

this needs a different construction https://github.com/Devsh-Graphics-Programming/Nabla/pull/941/files?diff=unified&w=1#r2466545963

Grab the vertices 1 by 1

decode the position view for the vertex

make an overload of iterateBroadphaseCandidates takes a position and not a whole vertex data (because you only use the position from the vertex data and re-hash anyway - you only use the vertex data to make sure you don't tap yourself which is fine here - or the if (&vertex != &neighborVertex) check should be moved out into the iterations lambdas themselves? )

in the iteration lambda you want to check that neighbour.index==selfIndex || neighborRemappedIndex != INVALID_INDEX && shouldWeldFn(polygon, index, neighbor.index)

Now any vertex thats not in the acceleration structure itself or has no equivalent will be set Invalid in the remap vector

when you later iterate over the index buffer to create a new one, if you encounter such an invalid remapping it means the vertex was never the in Acceleration Structure and therefore you can abort with an error

devshgraphicsprogramming · 2025-10-27T17:45:17Z

include/nbl/asset/utils/CVertexWelder.h

+    // iterate by index, so that we always use the smallest index when multiple vertexes can be welded together
+    for (uint64_t index = 0; index < as.getVertexCount(); index++)
+    {


you cannot assume that as.getVertexCount() == polygon->getIndexCount() because its only the Acceleration Structure left over from smooth normal computation that has 3 vertices per triangle and will happily insert duplicate verts.

you need to go from 0 to <=Max Index Value as reported by the polygon (the thing it gives to BLAS build triangle geometry data)

devshgraphicsprogramming · 2025-10-27T17:46:15Z

include/nbl/asset/utils/CVertexWelder.h

+      const auto asIndex = vertexIndexToAsIndex[index];
+      const auto& vertexData = as.vertices()[asIndex];
+      auto& remappedVertexIndex = remappedVertexIndexes[index];
+      as.iterateBroadphaseCandidates(vertexData, [&, polygon, index](const typename AccelStructureT::vertex_data_t& neighbor) {


btw & captures everything, no need to list

devshgraphicsprogramming · 2025-10-27T18:00:39Z

include/nbl/asset/utils/CVertexHashGrid.h

+			for (; bounds.begin != bounds.end; bounds.begin++)
+			{
+				const vertex_data_t& neighborVertex = *bounds.begin;
+				if (&vertex != &neighborVertex)


imho the "not checking against oneself" thing could be done in the fn if we change the iterateBroadphaseCandidates to take a position and not VertexData

devshgraphicsprogramming · 2025-10-27T18:01:25Z

include/nbl/asset/utils/CVertexWelder.h

+      auto& remappedVertexIndex = remappedVertexIndexes[index];
+      as.iterateBroadphaseCandidates(vertexData, [&, polygon, index](const typename AccelStructureT::vertex_data_t& neighbor) {
+        const auto neighborRemappedIndex = remappedVertexIndexes[neighbor.index];
+        if (shouldWeldFn(polygon, index, neighbor.index) && neighborRemappedIndex != INVALID_INDEX) {


check the neighbour remap for being initialized first, remember && operator short-circuits in C++, I'd rather check that neighbour vertex hasn't been visited yet before starting to decode and compare all vertex attributes of two vertices

devshgraphicsprogramming · 2025-10-27T18:02:24Z

include/nbl/asset/utils/CVertexWelder.h

+    {
+      const auto asIndex = vertexIndexToAsIndex[index];
+      const auto& vertexData = as.vertices()[asIndex];
+      auto& remappedVertexIndex = remappedVertexIndexes[index];


if you went through vertices in order from 0 to max vertex reference, then you could initialize to INVALID here, and not std::fill(remappedVertexIndexes

devshgraphicsprogramming · 2025-10-27T18:05:00Z

include/nbl/asset/utils/CVertexWelder.h

+      as.iterateBroadphaseCandidates(vertexData, [&, polygon, index](const typename AccelStructureT::vertex_data_t& neighbor) {
+        const auto neighborRemappedIndex = remappedVertexIndexes[neighbor.index];
+        if (shouldWeldFn(polygon, index, neighbor.index) && neighborRemappedIndex != INVALID_INDEX) {
+          remappedVertexIndex = neighborRemappedIndex;


btw because of funny cache and aliasing effects (tapping same array), you want to make the remappedVertexIndex a local variable and then assign its result to remappedVertexIndexes[index] after the iteration completes

devshgraphicsprogramming · 2025-10-27T18:07:01Z

include/nbl/asset/utils/CVertexWelder.h

+      if (remappedVertexIndex != INVALID_INDEX) {
+        remappedVertexIndex = vertexData.index;


why are you setting it back?

devshgraphicsprogramming · 2025-10-29T08:34:51Z

include/nbl/builtin/hlsl/shapes/triangle.hlsl

+  vector<float_t, 3> compInternalAngle(NBL_CONST_REF_ARG(vector<float_t, 3>) e1, NBL_CONST_REF_ARG(vector<float_t, 3>) e2, NBL_CONST_REF_ARG(vector<float_t, 3>) e3)
  {
    // Calculate this triangle's weight for each of its three m_vertices
    // start by calculating the lengths of its sides
-    const float_t a = dot(e1, e1);
-    const float_t asqrt = sqrt(a);
-    const float_t b = dot(e2, e2);
-    const float_t bsqrt = sqrt(b);
-    const float_t c = dot(e3, e3);
-    const float_t csqrt = sqrt(c);
+    const float_t a = hlsl::dot(e1, e1);
+    const float_t asqrt = hlsl::sqrt(a);
+    const float_t b = hlsl::dot(e2, e2);
+    const float_t bsqrt = hlsl::sqrt(b);
+    const float_t c = hlsl::dot(e3, e3);
+    const float_t csqrt = hlsl::sqrt(c);

+    const float_t angle1 = hlsl::acos((b + c - a) / (2.f * bsqrt * csqrt));
+    const float_t angle2 = hlsl::acos((-b + c + a) / (2.f * asqrt * csqrt));
+    const float_t angle3 = hlsl::numbers::pi<float_t> - (angle1 + angle2);
    // use them to find the angle at each vertex
-    return vector<float_t, 3>(
-      acosf((b + c - a) / (2.f * bsqrt * csqrt)),
-      acosf((-b + c + a) / (2.f * asqrt * csqrt)),
-      acosf((b - c + a) / (2.f * bsqrt * asqrt)));
+    return vector<float_t, 3>(angle1, angle2, angle3);


0-index your edges and angles

btw normal convention in a triangle should be

v2 -- e1 --- v1 | / | / e2 e0 | / | / v0

or

v2 -- e0 --- v1 | / | / e1 e2 | / | / v0

because of CCW winding being natural (right hand rule and boundary integrals), the edge being a diff of subsequent triangles

Make sure to document whether you're using the e_i = v_{i+1}-v_{i} or e_i = v_{i+2}-v_{i+1} convention (the addition is always modulo 3)

P.S. it would be nice to be consistent with manual barycentric computation if possible, or denote the manual barycentric comp needs to change

devshgraphicsprogramming · 2025-10-29T08:54:09Z

include/nbl/core/algorithm/radix_sort.h

+			for (histogram_t i = 0u; i < rangeSize; i++)
+				++m_histogram[comp.template operator()<shift,radix_mask>(input[i])];


iterate from back to front, it doesn't matter the order in which you increase the histogram, but doing one pass in reverse over input then the scatter pass forward means you don't skip from front to back or back to front between end and start of previous loop so no cache miss

devshgraphicsprogramming · 2025-10-29T08:59:21Z

include/nbl/asset/utils/CVertexHashGrid.h

-	CVertexHashGrid(size_t _vertexCount, uint32_t _hashTableMaxSize, float _cellSize) :
-		m_sorter(createSorter(_vertexCount)),
-		m_hashTableMaxSize(_hashTableMaxSize),
-		m_cellSize(_cellSize)
+	CVertexHashGrid(size_t cellSize, uint32_t hashTableMaxSizeLog2, float vertexCountReserve) :


why is cellSize now a integer and vertex count a float !?

add a default value for vertexCountReserve like 8k

devshgraphicsprogramming · 2025-10-29T09:05:13Z

include/nbl/asset/utils/CVertexHashGrid.h

 	void add(VertexData&& vertex)
 	{
 		vertex.setHash(hash(vertex));
-		m_vertices.push_back(vertex);
+		m_vertices.push_back(std::move(vertex));
 	}

-	void validate()
+	void bake()


btw all your functions defined in include/nbl/PUBLIC headers need the inline keyword.

(many people will tell you inline does nothing for the compiler and decides whether inline or deduplicate by tiself, but it does matter acrosss DLL boundaries - this is why for src/PRIVATE headers it doesn't matter)

If a class has functions defined in .cpp it needs class NBL_API2

This is all to account for DLL linking.

devshgraphicsprogramming · 2025-10-29T11:28:34Z

include/nbl/asset/utils/CVertexHashGrid.h

 	struct KeyAccessor
 	{
-		_NBL_STATIC_INLINE_CONSTEXPR size_t key_bit_count = 32ull;
+		constexpr static size_t key_bit_count = 32ull;


inline is missing, its 3 keywords

devshgraphicsprogramming · 2025-10-29T11:48:41Z

include/nbl/asset/utils/CVertexWelder.h

+            hlsl::uint64_t4 val1, val2;
+            view.decodeElement<hlsl::uint64_t4>(index1, val1);
+            view.decodeElement<hlsl::uint64_t4>(index2, val2);
+            for (auto channel_i = 0u; channel_i < channelCount; channel_i++)
+              if (val1[channel_i] != val2[channel_i]) return false;
+            break;
+          }
+          case IGeometryBase::EAABBFormat::S64:
+          case IGeometryBase::EAABBFormat::S32:
+          {
+            hlsl::int64_t4 val1, val2;
+            view.decodeElement<hlsl::int64_t4>(index1, val1);
+            view.decodeElement<hlsl::int64_t4>(index2, val2);
+            for (auto channel_i = 0u; channel_i < channelCount; channel_i++)
+              if (val1[channel_i] != val2[channel_i]) return false;
+            break;
+          }
+          default:
+          {
+            // TODO: Handle 16,32,64 bit float vectors once the pixel encode/decode functions get reimplemented in HLSL and decodeElement can actually benefit from that.
+            hlsl::float64_t4 val1, val2;
+            view.decodeElement<hlsl::float64_t4>(index1, val1);
+            view.decodeElement<hlsl::float64_t4>(index2, val2);
+            for (auto channel_i = 0u; channel_i < channelCount; channel_i++)
+            {
+              const auto diff = abs(val1[channel_i] - val2[channel_i]);
+              if (diff > epsilon) return false;
+            }
+            break;
+          }
+        }
+        return true;
+      }
+
+      static bool isAttributeDirEqual(const ICPUPolygonGeometry::SDataView& view, uint32_t index1, uint32_t index2, float epsilon)
+      {
+        if (!view) return true;
+        const auto channelCount = getFormatChannelCount(view.composed.format);
+        switch (view.composed.rangeFormat)
+        {
+          case IGeometryBase::EAABBFormat::U64:
+          case IGeometryBase::EAABBFormat::U32:
+          {
+            hlsl::uint64_t4 val1, val2;
+            view.decodeElement<hlsl::uint64_t4>(index1, val1);
+            view.decodeElement<hlsl::uint64_t4>(index2, val2);
+            return (1.0 - hlsl::dot(val1, val2)) < epsilon;
+          }
+          case IGeometryBase::EAABBFormat::S64:
+          case IGeometryBase::EAABBFormat::S32:
+          {
+            hlsl::int64_t4 val1, val2;
+            view.decodeElement<hlsl::int64_t4>(index1, val1);
+            view.decodeElement<hlsl::int64_t4>(index2, val2);
+            return (1.0 - hlsl::dot(val1, val2)) < epsilon;
+          }


for formats which don't decode to floats you need to use memcmp

There's a getTexeBlockSize(E_FORMAT) function
#941 (comment)

letf overs:

Smooth normal calculation Update! #941 (comment)

Smooth normal calculation Update! #941 (comment)

devshgraphicsprogramming · 2025-10-29T12:01:33Z

include/nbl/asset/utils/CVertexWelder.h

+class CVertexWelder {
+
+public:
+  using WeldPredicateFn = std::function<bool(const ICPUPolygonGeometry* geom, uint32_t idx1, uint32_t idx2)>;


at this point it can be a pure virtual interface class with pure virtual bool operator()(const ICPUPolygonGeometry* geom, uint32_t idx1, uint32_t idx2) and bool init(const ICPUPolygonGeometry* geom), why?

Because the init needs to check validity of the polygon geometry, e.g. your default predicate should fail the initialization if there are any attribute views which are not formatted (format is unknown) because it doesn't know what data in the buffer belongs to what vertex (because there's no clear mapping)

See #921 (comment)

devshgraphicsprogramming · 2025-10-29T12:03:01Z

include/nbl/asset/utils/CVertexWelder.h

+    auto outPolygon = core::move_and_static_cast<ICPUPolygonGeometry>(polygon->clone(0u));
+    outPolygon->setIndexing(IPolygonGeometryBase::TriangleList());


you can actually keep the same indexing #921 (comment)

Also for the weld you can use any polygon kind, not just triangle.

Because you're just substituting index values in original index buffer (or implicit IOTA).

devshgraphicsprogramming · 2025-10-29T12:06:36Z

include/nbl/asset/utils/CVertexWelder.h

+      });
+      if (remappedVertexIndex != INVALID_INDEX) {
+        remappedVertexIndex = vertexData.index;
+        maxRemappedIndex = vertexData.index;


no, not the max of the original vertex index, you want the max you FOUND so if (maxRemappedIndex<remappedVertexIndex) maxRemappedIndex = remappedVertexIndex

devshgraphicsprogramming · 2025-10-29T12:07:01Z

include/nbl/asset/utils/CVertexWelder.h

+      if (indexView.composed.rangeFormat == IGeometryBase::EAABBFormat::U16) {
+        remappedIndexes.template operator()<uint16_t>();
+      }
+      else if (indexView.composed.rangeFormat == IGeometryBase::EAABBFormat::U32) {


thats not reliable, you really want to deduce from maxRemappedIndex value

#921 (comment)

devshgraphicsprogramming · 2025-10-29T12:07:35Z

include/nbl/asset/utils/CVertexWelder.h

+      auto remappedIndexView = [&]
+      {
+        const auto bytesize = indexView.src.size;
+        auto indices = ICPUBuffer::create({bytesize,IBuffer::EUF_INDEX_BUFFER_BIT});
+
+        auto retval = indexView;
+        retval.src.buffer = std::move(indices);
+        if (retval.composed.rangeFormat == IGeometryBase::EAABBFormat::U16)
+          retval.composed.encodedDataRange.u16.maxVx[0] = maxRemappedIndex;
+        else if (retval.composed.rangeFormat == IGeometryBase::EAABBFormat::U32)
+          retval.composed.encodedDataRange.u32.maxVx[0] = maxRemappedIndex;
+
+        return retval;
+      }();


create it inside remapIndices lambda below

devshgraphicsprogramming · 2025-10-29T12:08:18Z

include/nbl/asset/utils/CVertexWelder.h

+    {
+      auto remappedIndexView = [&]
+      {
+        const auto bytesize = indexView.src.size;


compute it properly from getIndexCount() and sizeof(IndexT)

devshgraphicsprogramming · 2025-10-29T12:09:20Z

include/nbl/asset/utils/CVertexWelder.h

+          retval.composed.encodedDataRange.u16.maxVx[0] = maxRemappedIndex;
+        else if (retval.composed.rangeFormat == IGeometryBase::EAABBFormat::U32)
+          retval.composed.encodedDataRange.u32.maxVx[0] = maxRemappedIndex;


if https://github.com/Devsh-Graphics-Programming/Nabla/pull/941/files#r2472746800 gets solved, then this is correct, otherwise its the wrong max

devshgraphicsprogramming · 2025-10-29T12:24:31Z

include/nbl/asset/utils/CVertexWelder.h

+      outPolygon->setIndexView(std::move(remappedIndexView));
+    } else
+    {
+      const uint32_t indexSize = (outPolygon->getPositionView().getElementCount() - 1 < std::numeric_limits<uint16_t>::max()) ? sizeof(uint16_t) : sizeof(uint32_t);


wrong, use maxRemappedIndex

Why?

Because my iota buffer of 0,1,2,3,..,N might have later vertices which are identical, and the actual values in the remappedVertexIndices are much lower than N

Btw this choice of index size, creation of new index buffer and view is common to both branches.
( actually the only thing that changes is what lambda to fire off to fill the index index buffer with data)

devshgraphicsprogramming · 2025-10-29T12:25:04Z

include/nbl/asset/utils/CVertexWelder.h

+        .composed = {
+          .stride = indexSize,
+        },
+        .src = {
+          .offset = 0,
+          .size = remappedIndexBuffer->getSize(),
+          .buffer = std::move(remappedIndexBuffer)
+        }
+      };


correct format, range format and range needs to be set

devshgraphicsprogramming · 2025-10-29T12:27:07Z

include/nbl/asset/utils/CVertexWelder.h

+      auto fillRemappedIndex = [&]<typename IndexT>(){
+        auto remappedIndexBufferPtr = reinterpret_cast<IndexT*>(remappedIndexBuffer->getPointer());
+        for (uint64_t index = 0; index < outPolygon->getPositionView().getElementCount(); index++)
+        {
+          remappedIndexBufferPtr[index] = remappedVertexIndexes[index];
+        }
+      };


outPolygon->getPositionView().getElementCount() is the vertex count, not the index count, so this loop is wrong

I you can just use std::copy_n(remappedVertexIndices.begin(),polygon->getVertexReferenceCount(),reinterpret_cast<IndexT*>(outIndexBufferPtr));

in the two if-cases below.

devshgraphicsprogramming · 2025-10-29T12:27:38Z

include/nbl/asset/utils/CVertexWelder.h

+      if (indexView.composed.rangeFormat == IGeometryBase::EAABBFormat::U16) {
+        fillRemappedIndex.template operator()<uint16_t>();
+      }
+      else if (indexView.composed.rangeFormat == IGeometryBase::EAABBFormat::U32) {


again, switch based on maxRemappedIndex

or set up the remappedIndexView.composed.rangeFormat from maxRemappedIndex properly first

kevyuu added 11 commits September 18, 2025 13:31

Remove wage and store weighted normal instead

efa37a7

Move getAngleWeight to hlsl/shapes/triangle.hlsl

cc4a219

Refactor erase to resize

c018762

Remove parentTriangleFaceNormal from SSNGVertexData

42ccf48

Refactor getNeighboringCells

05ee8ed

Slight improvement for readability

f4b8ac6

Refactor Radix Sorter and use histogram as skip list

0dd02e8

Move VertexHashMap into its own file

ce99287

Refactor smooth normal generator to use VertexHashGrid and separate w…

28f73f5

…eldVertices into CVertexWelder

Fix normal comparison to use dot instead of value by value comparison

5acb40d

Merge branch 'master' into smooth_normal_calculation

0feb4e9

devshgraphicsprogramming commented Oct 21, 2025

View reviewed changes

include/nbl/builtin/hlsl/shapes/triangle.hlsl Outdated Show resolved Hide resolved

devshgraphicsprogramming commented Oct 21, 2025

View reviewed changes

include/nbl/builtin/hlsl/shapes/triangle.hlsl Outdated Show resolved Hide resolved

devshgraphicsprogramming commented Oct 21, 2025

View reviewed changes

include/nbl/builtin/hlsl/shapes/triangle.hlsl Outdated Show resolved Hide resolved

devshgraphicsprogramming commented Oct 21, 2025

View reviewed changes

include/nbl/core/algorithm/radix_sort.h Outdated Show resolved Hide resolved

devshgraphicsprogramming commented Oct 22, 2025

View reviewed changes

include/nbl/asset/utils/CVertexHashGrid.h Show resolved Hide resolved

devshgraphicsprogramming commented Oct 22, 2025

View reviewed changes

include/nbl/asset/utils/CPolygonGeometryManipulator.h Outdated Show resolved Hide resolved

devshgraphicsprogramming commented Oct 25, 2025

View reviewed changes

include/nbl/core/algorithm/radix_sort.h Outdated Show resolved Hide resolved

devshgraphicsprogramming commented Oct 25, 2025

View reviewed changes

include/nbl/core/algorithm/radix_sort.h Outdated Show resolved Hide resolved

devshgraphicsprogramming commented Oct 25, 2025

View reviewed changes

include/nbl/asset/utils/CVertexHashGrid.h Outdated Show resolved Hide resolved