-
Couldn't load subscription status.
- Fork 67
Smooth normal calculation Update! #941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…eldVertices into CVertexWelder
| hlsl::float32_t3 cellfloatcoord = floor(vertex.getPosition() / m_cellSize - hlsl::float32_t3(0.5)); | ||
| hlsl::uint32_t3 baseCoord = hlsl::uint32_t3(static_cast<uint32_t>(cellfloatcoord.x), static_cast<uint32_t>(cellfloatcoord.y), static_cast<uint32_t>(cellfloatcoord.z)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might wanna summarize in comments and an ASCII diagram the logic I had to explain on discord
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i meant the justification for the 0.5, so far you only have the floor
| public: | ||
| using WeldPredicateFn = std::function<bool(const ICPUPolygonGeometry* geom, uint64_t idx1, uint64_t idx2)>; | ||
|
|
||
| template <typename AccelStructureT> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be some concept you require here
| for (uint64_t vertexData_i = 0u; vertexData_i < as.getVertexCount(); vertexData_i++) | ||
| { | ||
| const auto& vertexData = as.vertices()[vertexData_i]; | ||
| vertexIndexToAsIndex[vertexData.index] = vertexData.index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this is the same as initializing your vertexIndexToAsIndex to IOTA !? but with gaps (you get 0 wherever there's no such index value in the acceleration structure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be vertexIndexToAsIndex[vertexData.index] = vertexData_i.
| // iterate by index, so that we always use the smallest index when multiple vertexes can be welded together | ||
| for (uint64_t index = 0; index < as.getVertexCount(); index++) | ||
| { | ||
| const auto asIndex = vertexIndexToAsIndex[index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why have that whole vector when you can just as.vertices()[index].index or even better for (const auto& asVertRef : as.vertices()) and const auto asIndex = asVertRef.index;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also the asIndex you get right now will be same as index due to how you initialized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a different construction https://github.com/Devsh-Graphics-Programming/Nabla/pull/941/files?diff=unified&w=1#r2466545963
- Grab the vertices 1 by 1
- decode the position view for the vertex
- make an overload of
iterateBroadphaseCandidatestakes a position and not a whole vertex data (because you only use the position from the vertex data and re-hash anyway - you only use the vertex data to make sure you don't tap yourself which is fine here - or theif (&vertex != &neighborVertex)check should be moved out into the iterations lambdas themselves? ) - in the iteration lambda you want to check that
neighbour.index==selfIndex || neighborRemappedIndex != INVALID_INDEX && shouldWeldFn(polygon, index, neighbor.index)
Now any vertex thats not in the acceleration structure itself or has no equivalent will be set Invalid in the remap vector
when you later iterate over the index buffer to create a new one, if you encounter such an invalid remapping it means the vertex was never the in Acceleration Structure and therefore you can abort with an error
| // iterate by index, so that we always use the smallest index when multiple vertexes can be welded together | ||
| for (uint64_t index = 0; index < as.getVertexCount(); index++) | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you cannot assume that as.getVertexCount() == polygon->getIndexCount() because its only the Acceleration Structure left over from smooth normal computation that has 3 vertices per triangle and will happily insert duplicate verts.
you need to go from 0 to <=Max Index Value as reported by the polygon (the thing it gives to BLAS build triangle geometry data)
| const auto asIndex = vertexIndexToAsIndex[index]; | ||
| const auto& vertexData = as.vertices()[asIndex]; | ||
| auto& remappedVertexIndex = remappedVertexIndexes[index]; | ||
| as.iterateBroadphaseCandidates(vertexData, [&, polygon, index](const typename AccelStructureT::vertex_data_t& neighbor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw & captures everything, no need to list
| for (; bounds.begin != bounds.end; bounds.begin++) | ||
| { | ||
| const vertex_data_t& neighborVertex = *bounds.begin; | ||
| if (&vertex != &neighborVertex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho the "not checking against oneself" thing could be done in the fn if we change the iterateBroadphaseCandidates to take a position and not VertexData
| auto& remappedVertexIndex = remappedVertexIndexes[index]; | ||
| as.iterateBroadphaseCandidates(vertexData, [&, polygon, index](const typename AccelStructureT::vertex_data_t& neighbor) { | ||
| const auto neighborRemappedIndex = remappedVertexIndexes[neighbor.index]; | ||
| if (shouldWeldFn(polygon, index, neighbor.index) && neighborRemappedIndex != INVALID_INDEX) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check the neighbour remap for being initialized first, remember && operator short-circuits in C++, I'd rather check that neighbour vertex hasn't been visited yet before starting to decode and compare all vertex attributes of two vertices
| { | ||
| const auto asIndex = vertexIndexToAsIndex[index]; | ||
| const auto& vertexData = as.vertices()[asIndex]; | ||
| auto& remappedVertexIndex = remappedVertexIndexes[index]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you went through vertices in order from 0 to max vertex reference, then you could initialize to INVALID here, and not std::fill(remappedVertexIndexes
| as.iterateBroadphaseCandidates(vertexData, [&, polygon, index](const typename AccelStructureT::vertex_data_t& neighbor) { | ||
| const auto neighborRemappedIndex = remappedVertexIndexes[neighbor.index]; | ||
| if (shouldWeldFn(polygon, index, neighbor.index) && neighborRemappedIndex != INVALID_INDEX) { | ||
| remappedVertexIndex = neighborRemappedIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw because of funny cache and aliasing effects (tapping same array), you want to make the remappedVertexIndex a local variable and then assign its result to remappedVertexIndexes[index] after the iteration completes
| if (remappedVertexIndex != INVALID_INDEX) { | ||
| remappedVertexIndex = vertexData.index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you setting it back?
| vector<float_t, 3> compInternalAngle(NBL_CONST_REF_ARG(vector<float_t, 3>) e1, NBL_CONST_REF_ARG(vector<float_t, 3>) e2, NBL_CONST_REF_ARG(vector<float_t, 3>) e3) | ||
| { | ||
| // Calculate this triangle's weight for each of its three m_vertices | ||
| // start by calculating the lengths of its sides | ||
| const float_t a = dot(e1, e1); | ||
| const float_t asqrt = sqrt(a); | ||
| const float_t b = dot(e2, e2); | ||
| const float_t bsqrt = sqrt(b); | ||
| const float_t c = dot(e3, e3); | ||
| const float_t csqrt = sqrt(c); | ||
| const float_t a = hlsl::dot(e1, e1); | ||
| const float_t asqrt = hlsl::sqrt(a); | ||
| const float_t b = hlsl::dot(e2, e2); | ||
| const float_t bsqrt = hlsl::sqrt(b); | ||
| const float_t c = hlsl::dot(e3, e3); | ||
| const float_t csqrt = hlsl::sqrt(c); | ||
|
|
||
| const float_t angle1 = hlsl::acos((b + c - a) / (2.f * bsqrt * csqrt)); | ||
| const float_t angle2 = hlsl::acos((-b + c + a) / (2.f * asqrt * csqrt)); | ||
| const float_t angle3 = hlsl::numbers::pi<float_t> - (angle1 + angle2); | ||
| // use them to find the angle at each vertex | ||
| return vector<float_t, 3>( | ||
| acosf((b + c - a) / (2.f * bsqrt * csqrt)), | ||
| acosf((-b + c + a) / (2.f * asqrt * csqrt)), | ||
| acosf((b - c + a) / (2.f * bsqrt * asqrt))); | ||
| return vector<float_t, 3>(angle1, angle2, angle3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0-index your edges and angles
btw normal convention in a triangle should be
v2 -- e1 --- v1
| /
| /
e2 e0
| /
| /
v0
or
v2 -- e0 --- v1
| /
| /
e1 e2
| /
| /
v0
because of CCW winding being natural (right hand rule and boundary integrals), the edge being a diff of subsequent triangles
Make sure to document whether you're using the e_i = v_{i+1}-v_{i} or e_i = v_{i+2}-v_{i+1} convention (the addition is always modulo 3)
P.S. it would be nice to be consistent with manual barycentric computation if possible, or denote the manual barycentric comp needs to change
| for (histogram_t i = 0u; i < rangeSize; i++) | ||
| ++m_histogram[comp.template operator()<shift,radix_mask>(input[i])]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterate from back to front, it doesn't matter the order in which you increase the histogram, but doing one pass in reverse over input then the scatter pass forward means you don't skip from front to back or back to front between end and start of previous loop so no cache miss
| CVertexHashGrid(size_t _vertexCount, uint32_t _hashTableMaxSize, float _cellSize) : | ||
| m_sorter(createSorter(_vertexCount)), | ||
| m_hashTableMaxSize(_hashTableMaxSize), | ||
| m_cellSize(_cellSize) | ||
| CVertexHashGrid(size_t cellSize, uint32_t hashTableMaxSizeLog2, float vertexCountReserve) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is cellSize now a integer and vertex count a float !?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a default value for vertexCountReserve like 8k
| void add(VertexData&& vertex) | ||
| { | ||
| vertex.setHash(hash(vertex)); | ||
| m_vertices.push_back(vertex); | ||
| m_vertices.push_back(std::move(vertex)); | ||
| } | ||
|
|
||
| void validate() | ||
| void bake() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw all your functions defined in include/nbl/PUBLIC headers need the inline keyword.
(many people will tell you inline does nothing for the compiler and decides whether inline or deduplicate by tiself, but it does matter acrosss DLL boundaries - this is why for src/PRIVATE headers it doesn't matter)
If a class has functions defined in .cpp it needs class NBL_API2
This is all to account for DLL linking.
| struct KeyAccessor | ||
| { | ||
| _NBL_STATIC_INLINE_CONSTEXPR size_t key_bit_count = 32ull; | ||
| constexpr static size_t key_bit_count = 32ull; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inline is missing, its 3 keywords
| hlsl::uint64_t4 val1, val2; | ||
| view.decodeElement<hlsl::uint64_t4>(index1, val1); | ||
| view.decodeElement<hlsl::uint64_t4>(index2, val2); | ||
| for (auto channel_i = 0u; channel_i < channelCount; channel_i++) | ||
| if (val1[channel_i] != val2[channel_i]) return false; | ||
| break; | ||
| } | ||
| case IGeometryBase::EAABBFormat::S64: | ||
| case IGeometryBase::EAABBFormat::S32: | ||
| { | ||
| hlsl::int64_t4 val1, val2; | ||
| view.decodeElement<hlsl::int64_t4>(index1, val1); | ||
| view.decodeElement<hlsl::int64_t4>(index2, val2); | ||
| for (auto channel_i = 0u; channel_i < channelCount; channel_i++) | ||
| if (val1[channel_i] != val2[channel_i]) return false; | ||
| break; | ||
| } | ||
| default: | ||
| { | ||
| // TODO: Handle 16,32,64 bit float vectors once the pixel encode/decode functions get reimplemented in HLSL and decodeElement can actually benefit from that. | ||
| hlsl::float64_t4 val1, val2; | ||
| view.decodeElement<hlsl::float64_t4>(index1, val1); | ||
| view.decodeElement<hlsl::float64_t4>(index2, val2); | ||
| for (auto channel_i = 0u; channel_i < channelCount; channel_i++) | ||
| { | ||
| const auto diff = abs(val1[channel_i] - val2[channel_i]); | ||
| if (diff > epsilon) return false; | ||
| } | ||
| break; | ||
| } | ||
| } | ||
| return true; | ||
| } | ||
|
|
||
| static bool isAttributeDirEqual(const ICPUPolygonGeometry::SDataView& view, uint32_t index1, uint32_t index2, float epsilon) | ||
| { | ||
| if (!view) return true; | ||
| const auto channelCount = getFormatChannelCount(view.composed.format); | ||
| switch (view.composed.rangeFormat) | ||
| { | ||
| case IGeometryBase::EAABBFormat::U64: | ||
| case IGeometryBase::EAABBFormat::U32: | ||
| { | ||
| hlsl::uint64_t4 val1, val2; | ||
| view.decodeElement<hlsl::uint64_t4>(index1, val1); | ||
| view.decodeElement<hlsl::uint64_t4>(index2, val2); | ||
| return (1.0 - hlsl::dot(val1, val2)) < epsilon; | ||
| } | ||
| case IGeometryBase::EAABBFormat::S64: | ||
| case IGeometryBase::EAABBFormat::S32: | ||
| { | ||
| hlsl::int64_t4 val1, val2; | ||
| view.decodeElement<hlsl::int64_t4>(index1, val1); | ||
| view.decodeElement<hlsl::int64_t4>(index2, val2); | ||
| return (1.0 - hlsl::dot(val1, val2)) < epsilon; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for formats which don't decode to floats you need to use memcmp
There's a getTexeBlockSize(E_FORMAT) function
#941 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class CVertexWelder { | ||
|
|
||
| public: | ||
| using WeldPredicateFn = std::function<bool(const ICPUPolygonGeometry* geom, uint32_t idx1, uint32_t idx2)>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at this point it can be a pure virtual interface class with pure virtual bool operator()(const ICPUPolygonGeometry* geom, uint32_t idx1, uint32_t idx2) and bool init(const ICPUPolygonGeometry* geom), why?
Because the init needs to check validity of the polygon geometry, e.g. your default predicate should fail the initialization if there are any attribute views which are not formatted (format is unknown) because it doesn't know what data in the buffer belongs to what vertex (because there's no clear mapping)
See #921 (comment)
| auto outPolygon = core::move_and_static_cast<ICPUPolygonGeometry>(polygon->clone(0u)); | ||
| outPolygon->setIndexing(IPolygonGeometryBase::TriangleList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can actually keep the same indexing #921 (comment)
Also for the weld you can use any polygon kind, not just triangle.
Because you're just substituting index values in original index buffer (or implicit IOTA).
| }); | ||
| if (remappedVertexIndex != INVALID_INDEX) { | ||
| remappedVertexIndex = vertexData.index; | ||
| maxRemappedIndex = vertexData.index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, not the max of the original vertex index, you want the max you FOUND so if (maxRemappedIndex<remappedVertexIndex) maxRemappedIndex = remappedVertexIndex
| if (indexView.composed.rangeFormat == IGeometryBase::EAABBFormat::U16) { | ||
| remappedIndexes.template operator()<uint16_t>(); | ||
| } | ||
| else if (indexView.composed.rangeFormat == IGeometryBase::EAABBFormat::U32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats not reliable, you really want to deduce from maxRemappedIndex value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| auto remappedIndexView = [&] | ||
| { | ||
| const auto bytesize = indexView.src.size; | ||
| auto indices = ICPUBuffer::create({bytesize,IBuffer::EUF_INDEX_BUFFER_BIT}); | ||
|
|
||
| auto retval = indexView; | ||
| retval.src.buffer = std::move(indices); | ||
| if (retval.composed.rangeFormat == IGeometryBase::EAABBFormat::U16) | ||
| retval.composed.encodedDataRange.u16.maxVx[0] = maxRemappedIndex; | ||
| else if (retval.composed.rangeFormat == IGeometryBase::EAABBFormat::U32) | ||
| retval.composed.encodedDataRange.u32.maxVx[0] = maxRemappedIndex; | ||
|
|
||
| return retval; | ||
| }(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create it inside remapIndices lambda below
| { | ||
| auto remappedIndexView = [&] | ||
| { | ||
| const auto bytesize = indexView.src.size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compute it properly from getIndexCount() and sizeof(IndexT)
| retval.composed.encodedDataRange.u16.maxVx[0] = maxRemappedIndex; | ||
| else if (retval.composed.rangeFormat == IGeometryBase::EAABBFormat::U32) | ||
| retval.composed.encodedDataRange.u32.maxVx[0] = maxRemappedIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if https://github.com/Devsh-Graphics-Programming/Nabla/pull/941/files#r2472746800 gets solved, then this is correct, otherwise its the wrong max
| outPolygon->setIndexView(std::move(remappedIndexView)); | ||
| } else | ||
| { | ||
| const uint32_t indexSize = (outPolygon->getPositionView().getElementCount() - 1 < std::numeric_limits<uint16_t>::max()) ? sizeof(uint16_t) : sizeof(uint32_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong, use maxRemappedIndex
Why?
Because my iota buffer of 0,1,2,3,..,N might have later vertices which are identical, and the actual values in the remappedVertexIndices are much lower than N
Btw this choice of index size, creation of new index buffer and view is common to both branches.
( actually the only thing that changes is what lambda to fire off to fill the index index buffer with data)
| .composed = { | ||
| .stride = indexSize, | ||
| }, | ||
| .src = { | ||
| .offset = 0, | ||
| .size = remappedIndexBuffer->getSize(), | ||
| .buffer = std::move(remappedIndexBuffer) | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct format, range format and range needs to be set
| auto fillRemappedIndex = [&]<typename IndexT>(){ | ||
| auto remappedIndexBufferPtr = reinterpret_cast<IndexT*>(remappedIndexBuffer->getPointer()); | ||
| for (uint64_t index = 0; index < outPolygon->getPositionView().getElementCount(); index++) | ||
| { | ||
| remappedIndexBufferPtr[index] = remappedVertexIndexes[index]; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outPolygon->getPositionView().getElementCount() is the vertex count, not the index count, so this loop is wrong
I you can just use std::copy_n(remappedVertexIndices.begin(),polygon->getVertexReferenceCount(),reinterpret_cast<IndexT*>(outIndexBufferPtr));
in the two if-cases below.
| if (indexView.composed.rangeFormat == IGeometryBase::EAABBFormat::U16) { | ||
| fillRemappedIndex.template operator()<uint16_t>(); | ||
| } | ||
| else if (indexView.composed.rangeFormat == IGeometryBase::EAABBFormat::U32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, switch based on maxRemappedIndex
or set up the remappedIndexView.composed.rangeFormat from maxRemappedIndex properly first
Description
Testing
TODO list: