Skip to content

Conversation

@devshgraphicsprogramming
Copy link
Member

Description

Testing

TODO list:

Comment on lines +140 to +141
hlsl::float32_t3 cellfloatcoord = floor(vertex.getPosition() / m_cellSize - hlsl::float32_t3(0.5));
hlsl::uint32_t3 baseCoord = hlsl::uint32_t3(static_cast<uint32_t>(cellfloatcoord.x), static_cast<uint32_t>(cellfloatcoord.y), static_cast<uint32_t>(cellfloatcoord.z));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might wanna summarize in comments and an ASCII diagram the logic I had to explain on discord

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i meant the justification for the 0.5, so far you only have the floor

public:
using WeldPredicateFn = std::function<bool(const ICPUPolygonGeometry* geom, uint64_t idx1, uint64_t idx2)>;

template <typename AccelStructureT>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be some concept you require here

for (uint64_t vertexData_i = 0u; vertexData_i < as.getVertexCount(); vertexData_i++)
{
const auto& vertexData = as.vertices()[vertexData_i];
vertexIndexToAsIndex[vertexData.index] = vertexData.index;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this is the same as initializing your vertexIndexToAsIndex to IOTA !? but with gaps (you get 0 wherever there's no such index value in the acceleration structure)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be vertexIndexToAsIndex[vertexData.index] = vertexData_i.

// iterate by index, so that we always use the smallest index when multiple vertexes can be welded together
for (uint64_t index = 0; index < as.getVertexCount(); index++)
{
const auto asIndex = vertexIndexToAsIndex[index];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have that whole vector when you can just as.vertices()[index].index or even better for (const auto& asVertRef : as.vertices()) and const auto asIndex = asVertRef.index;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the asIndex you get right now will be same as index due to how you initialized

Copy link
Member Author

@devshgraphicsprogramming devshgraphicsprogramming Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs a different construction https://github.com/Devsh-Graphics-Programming/Nabla/pull/941/files?diff=unified&w=1#r2466545963

  1. Grab the vertices 1 by 1
  2. decode the position view for the vertex
  3. make an overload of iterateBroadphaseCandidates takes a position and not a whole vertex data (because you only use the position from the vertex data and re-hash anyway - you only use the vertex data to make sure you don't tap yourself which is fine here - or the if (&vertex != &neighborVertex) check should be moved out into the iterations lambdas themselves? )
  4. in the iteration lambda you want to check that neighbour.index==selfIndex || neighborRemappedIndex != INVALID_INDEX && shouldWeldFn(polygon, index, neighbor.index)

Now any vertex thats not in the acceleration structure itself or has no equivalent will be set Invalid in the remap vector

when you later iterate over the index buffer to create a new one, if you encounter such an invalid remapping it means the vertex was never the in Acceleration Structure and therefore you can abort with an error

Comment on lines 34 to 36
// iterate by index, so that we always use the smallest index when multiple vertexes can be welded together
for (uint64_t index = 0; index < as.getVertexCount(); index++)
{
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you cannot assume that as.getVertexCount() == polygon->getIndexCount() because its only the Acceleration Structure left over from smooth normal computation that has 3 vertices per triangle and will happily insert duplicate verts.

you need to go from 0 to <=Max Index Value as reported by the polygon (the thing it gives to BLAS build triangle geometry data)

const auto asIndex = vertexIndexToAsIndex[index];
const auto& vertexData = as.vertices()[asIndex];
auto& remappedVertexIndex = remappedVertexIndexes[index];
as.iterateBroadphaseCandidates(vertexData, [&, polygon, index](const typename AccelStructureT::vertex_data_t& neighbor) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw & captures everything, no need to list

for (; bounds.begin != bounds.end; bounds.begin++)
{
const vertex_data_t& neighborVertex = *bounds.begin;
if (&vertex != &neighborVertex)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imho the "not checking against oneself" thing could be done in the fn if we change the iterateBroadphaseCandidates to take a position and not VertexData

auto& remappedVertexIndex = remappedVertexIndexes[index];
as.iterateBroadphaseCandidates(vertexData, [&, polygon, index](const typename AccelStructureT::vertex_data_t& neighbor) {
const auto neighborRemappedIndex = remappedVertexIndexes[neighbor.index];
if (shouldWeldFn(polygon, index, neighbor.index) && neighborRemappedIndex != INVALID_INDEX) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check the neighbour remap for being initialized first, remember && operator short-circuits in C++, I'd rather check that neighbour vertex hasn't been visited yet before starting to decode and compare all vertex attributes of two vertices

{
const auto asIndex = vertexIndexToAsIndex[index];
const auto& vertexData = as.vertices()[asIndex];
auto& remappedVertexIndex = remappedVertexIndexes[index];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you went through vertices in order from 0 to max vertex reference, then you could initialize to INVALID here, and not std::fill(remappedVertexIndexes

as.iterateBroadphaseCandidates(vertexData, [&, polygon, index](const typename AccelStructureT::vertex_data_t& neighbor) {
const auto neighborRemappedIndex = remappedVertexIndexes[neighbor.index];
if (shouldWeldFn(polygon, index, neighbor.index) && neighborRemappedIndex != INVALID_INDEX) {
remappedVertexIndex = neighborRemappedIndex;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw because of funny cache and aliasing effects (tapping same array), you want to make the remappedVertexIndex a local variable and then assign its result to remappedVertexIndexes[index] after the iteration completes

Comment on lines +48 to +49
if (remappedVertexIndex != INVALID_INDEX) {
remappedVertexIndex = vertexData.index;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you setting it back?

Comment on lines 22 to 37
vector<float_t, 3> compInternalAngle(NBL_CONST_REF_ARG(vector<float_t, 3>) e1, NBL_CONST_REF_ARG(vector<float_t, 3>) e2, NBL_CONST_REF_ARG(vector<float_t, 3>) e3)
{
// Calculate this triangle's weight for each of its three m_vertices
// start by calculating the lengths of its sides
const float_t a = dot(e1, e1);
const float_t asqrt = sqrt(a);
const float_t b = dot(e2, e2);
const float_t bsqrt = sqrt(b);
const float_t c = dot(e3, e3);
const float_t csqrt = sqrt(c);
const float_t a = hlsl::dot(e1, e1);
const float_t asqrt = hlsl::sqrt(a);
const float_t b = hlsl::dot(e2, e2);
const float_t bsqrt = hlsl::sqrt(b);
const float_t c = hlsl::dot(e3, e3);
const float_t csqrt = hlsl::sqrt(c);

const float_t angle1 = hlsl::acos((b + c - a) / (2.f * bsqrt * csqrt));
const float_t angle2 = hlsl::acos((-b + c + a) / (2.f * asqrt * csqrt));
const float_t angle3 = hlsl::numbers::pi<float_t> - (angle1 + angle2);
// use them to find the angle at each vertex
return vector<float_t, 3>(
acosf((b + c - a) / (2.f * bsqrt * csqrt)),
acosf((-b + c + a) / (2.f * asqrt * csqrt)),
acosf((b - c + a) / (2.f * bsqrt * asqrt)));
return vector<float_t, 3>(angle1, angle2, angle3);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0-index your edges and angles

btw normal convention in a triangle should be

v2 -- e1 --- v1
|                /
|             /
e2      e0
|       /
|    /
v0 

or

v2 -- e0 --- v1
|                /
|             /
e1      e2
|       /
|    /
v0 

because of CCW winding being natural (right hand rule and boundary integrals), the edge being a diff of subsequent triangles

Make sure to document whether you're using the e_i = v_{i+1}-v_{i} or e_i = v_{i+2}-v_{i+1} convention (the addition is always modulo 3)

P.S. it would be nice to be consistent with manual barycentric computation if possible, or denote the manual barycentric comp needs to change

Comment on lines +87 to +88
for (histogram_t i = 0u; i < rangeSize; i++)
++m_histogram[comp.template operator()<shift,radix_mask>(input[i])];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterate from back to front, it doesn't matter the order in which you increase the histogram, but doing one pass in reverse over input then the scatter pass forward means you don't skip from front to back or back to front between end and start of previous loop so no cache miss

Comment on lines 29 to 36
CVertexHashGrid(size_t _vertexCount, uint32_t _hashTableMaxSize, float _cellSize) :
m_sorter(createSorter(_vertexCount)),
m_hashTableMaxSize(_hashTableMaxSize),
m_cellSize(_cellSize)
CVertexHashGrid(size_t cellSize, uint32_t hashTableMaxSizeLog2, float vertexCountReserve) :
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is cellSize now a integer and vertex count a float !?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a default value for vertexCountReserve like 8k

Comment on lines 45 to 51
void add(VertexData&& vertex)
{
vertex.setHash(hash(vertex));
m_vertices.push_back(vertex);
m_vertices.push_back(std::move(vertex));
}

void validate()
void bake()
Copy link
Member Author

@devshgraphicsprogramming devshgraphicsprogramming Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw all your functions defined in include/nbl/PUBLIC headers need the inline keyword.

(many people will tell you inline does nothing for the compiler and decides whether inline or deduplicate by tiself, but it does matter acrosss DLL boundaries - this is why for src/PRIVATE headers it doesn't matter)

If a class has functions defined in .cpp it needs class NBL_API2

This is all to account for DLL linking.

struct KeyAccessor
{
_NBL_STATIC_INLINE_CONSTEXPR size_t key_bit_count = 32ull;
constexpr static size_t key_bit_count = 32ull;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inline is missing, its 3 keywords

Comment on lines +28 to +83
hlsl::uint64_t4 val1, val2;
view.decodeElement<hlsl::uint64_t4>(index1, val1);
view.decodeElement<hlsl::uint64_t4>(index2, val2);
for (auto channel_i = 0u; channel_i < channelCount; channel_i++)
if (val1[channel_i] != val2[channel_i]) return false;
break;
}
case IGeometryBase::EAABBFormat::S64:
case IGeometryBase::EAABBFormat::S32:
{
hlsl::int64_t4 val1, val2;
view.decodeElement<hlsl::int64_t4>(index1, val1);
view.decodeElement<hlsl::int64_t4>(index2, val2);
for (auto channel_i = 0u; channel_i < channelCount; channel_i++)
if (val1[channel_i] != val2[channel_i]) return false;
break;
}
default:
{
// TODO: Handle 16,32,64 bit float vectors once the pixel encode/decode functions get reimplemented in HLSL and decodeElement can actually benefit from that.
hlsl::float64_t4 val1, val2;
view.decodeElement<hlsl::float64_t4>(index1, val1);
view.decodeElement<hlsl::float64_t4>(index2, val2);
for (auto channel_i = 0u; channel_i < channelCount; channel_i++)
{
const auto diff = abs(val1[channel_i] - val2[channel_i]);
if (diff > epsilon) return false;
}
break;
}
}
return true;
}

static bool isAttributeDirEqual(const ICPUPolygonGeometry::SDataView& view, uint32_t index1, uint32_t index2, float epsilon)
{
if (!view) return true;
const auto channelCount = getFormatChannelCount(view.composed.format);
switch (view.composed.rangeFormat)
{
case IGeometryBase::EAABBFormat::U64:
case IGeometryBase::EAABBFormat::U32:
{
hlsl::uint64_t4 val1, val2;
view.decodeElement<hlsl::uint64_t4>(index1, val1);
view.decodeElement<hlsl::uint64_t4>(index2, val2);
return (1.0 - hlsl::dot(val1, val2)) < epsilon;
}
case IGeometryBase::EAABBFormat::S64:
case IGeometryBase::EAABBFormat::S32:
{
hlsl::int64_t4 val1, val2;
view.decodeElement<hlsl::int64_t4>(index1, val1);
view.decodeElement<hlsl::int64_t4>(index2, val2);
return (1.0 - hlsl::dot(val1, val2)) < epsilon;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for formats which don't decode to floats you need to use memcmp

There's a getTexeBlockSize(E_FORMAT) function
#941 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class CVertexWelder {

public:
using WeldPredicateFn = std::function<bool(const ICPUPolygonGeometry* geom, uint32_t idx1, uint32_t idx2)>;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point it can be a pure virtual interface class with pure virtual bool operator()(const ICPUPolygonGeometry* geom, uint32_t idx1, uint32_t idx2) and bool init(const ICPUPolygonGeometry* geom), why?

Because the init needs to check validity of the polygon geometry, e.g. your default predicate should fail the initialization if there are any attribute views which are not formatted (format is unknown) because it doesn't know what data in the buffer belongs to what vertex (because there's no clear mapping)

See #921 (comment)

Comment on lines +122 to +123
auto outPolygon = core::move_and_static_cast<ICPUPolygonGeometry>(polygon->clone(0u));
outPolygon->setIndexing(IPolygonGeometryBase::TriangleList());
Copy link
Member Author

@devshgraphicsprogramming devshgraphicsprogramming Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can actually keep the same indexing #921 (comment)

Also for the weld you can use any polygon kind, not just triangle.

Because you're just substituting index values in original index buffer (or implicit IOTA).

});
if (remappedVertexIndex != INVALID_INDEX) {
remappedVertexIndex = vertexData.index;
maxRemappedIndex = vertexData.index;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, not the max of the original vertex index, you want the max you FOUND so if (maxRemappedIndex<remappedVertexIndex) maxRemappedIndex = remappedVertexIndex

Comment on lines +188 to +191
if (indexView.composed.rangeFormat == IGeometryBase::EAABBFormat::U16) {
remappedIndexes.template operator()<uint16_t>();
}
else if (indexView.composed.rangeFormat == IGeometryBase::EAABBFormat::U32) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats not reliable, you really want to deduce from maxRemappedIndex value

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +161 to +174
auto remappedIndexView = [&]
{
const auto bytesize = indexView.src.size;
auto indices = ICPUBuffer::create({bytesize,IBuffer::EUF_INDEX_BUFFER_BIT});

auto retval = indexView;
retval.src.buffer = std::move(indices);
if (retval.composed.rangeFormat == IGeometryBase::EAABBFormat::U16)
retval.composed.encodedDataRange.u16.maxVx[0] = maxRemappedIndex;
else if (retval.composed.rangeFormat == IGeometryBase::EAABBFormat::U32)
retval.composed.encodedDataRange.u32.maxVx[0] = maxRemappedIndex;

return retval;
}();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create it inside remapIndices lambda below

{
auto remappedIndexView = [&]
{
const auto bytesize = indexView.src.size;
Copy link
Member Author

@devshgraphicsprogramming devshgraphicsprogramming Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute it properly from getIndexCount() and sizeof(IndexT)

Comment on lines +169 to +171
retval.composed.encodedDataRange.u16.maxVx[0] = maxRemappedIndex;
else if (retval.composed.rangeFormat == IGeometryBase::EAABBFormat::U32)
retval.composed.encodedDataRange.u32.maxVx[0] = maxRemappedIndex;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if https://github.com/Devsh-Graphics-Programming/Nabla/pull/941/files#r2472746800 gets solved, then this is correct, otherwise its the wrong max

outPolygon->setIndexView(std::move(remappedIndexView));
} else
{
const uint32_t indexSize = (outPolygon->getPositionView().getElementCount() - 1 < std::numeric_limits<uint16_t>::max()) ? sizeof(uint16_t) : sizeof(uint32_t);
Copy link
Member Author

@devshgraphicsprogramming devshgraphicsprogramming Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong, use maxRemappedIndex

Why?

Because my iota buffer of 0,1,2,3,..,N might have later vertices which are identical, and the actual values in the remappedVertexIndices are much lower than N

Btw this choice of index size, creation of new index buffer and view is common to both branches.
( actually the only thing that changes is what lambda to fire off to fill the index index buffer with data)

Comment on lines +201 to +209
.composed = {
.stride = indexSize,
},
.src = {
.offset = 0,
.size = remappedIndexBuffer->getSize(),
.buffer = std::move(remappedIndexBuffer)
}
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct format, range format and range needs to be set

Comment on lines +211 to +217
auto fillRemappedIndex = [&]<typename IndexT>(){
auto remappedIndexBufferPtr = reinterpret_cast<IndexT*>(remappedIndexBuffer->getPointer());
for (uint64_t index = 0; index < outPolygon->getPositionView().getElementCount(); index++)
{
remappedIndexBufferPtr[index] = remappedVertexIndexes[index];
}
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outPolygon->getPositionView().getElementCount() is the vertex count, not the index count, so this loop is wrong

I you can just use std::copy_n(remappedVertexIndices.begin(),polygon->getVertexReferenceCount(),reinterpret_cast<IndexT*>(outIndexBufferPtr));

in the two if-cases below.

Comment on lines +219 to +222
if (indexView.composed.rangeFormat == IGeometryBase::EAABBFormat::U16) {
fillRemappedIndex.template operator()<uint16_t>();
}
else if (indexView.composed.rangeFormat == IGeometryBase::EAABBFormat::U32) {
Copy link
Member Author

@devshgraphicsprogramming devshgraphicsprogramming Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, switch based on maxRemappedIndex

or set up the remappedIndexView.composed.rangeFormat from maxRemappedIndex properly first

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants