bitonic_sort #940

CrabExtra · 2025-10-07T18:32:17Z

Description

Testing

TODO list:

devshgraphicsprogrammingjenkins · 2025-10-07T18:33:04Z

[CI]: Can one of the admins verify this patch?

Fletterio · 2025-10-07T20:33:21Z

include/nbl/builtin/hlsl/subgroup/bitonic_sort.hlsl

+					NBL_REF_ARG(value_t) loVal, NBL_REF_ARG(value_t) hiVal)
+				{
+					comparator_t comp;
+					const bool shouldSwap = ascending ? comp(hiKey, loKey) : comp(loKey, hiKey);


The compiler is probably dumb and might not realize the right term is the negation of the left term. Ternaries in SPIR-V usually get compiled to an OpSelect which treats both terms after the ? not as branches to conditionally execute, but as operands whose result must be evaluated before the select operation runs. That is to say, if the compiler is stupid you're going to run two comparisons. If you make the right term the negation of the left one, CSE is likely to kick in and evaluate the comparison only once.

Fletterio · 2025-10-07T20:36:51Z

include/nbl/builtin/hlsl/subgroup/bitonic_sort.hlsl

+					const uint32_t invocationID = glsl::gl_SubgroupInvocationID();
+					const uint32_t subgroupSizeLog2 = glsl::gl_SubgroupSizeLog2();
+					[unroll]
+						for (uint32_t stage = 0; stage <= subgroupSizeLog2; stage++)


don't add indentation after compiler directives

Fletterio · 2025-10-07T20:57:54Z

include/nbl/builtin/hlsl/subgroup/bitonic_sort.hlsl

+			};
+			template<bool Ascending, typename Config, class device_capabilities = void>
+			struct bitonic_sort;
+			template<bool Ascending, typename KeyType, typename ValueType, typename Comparator, class device_capabilities>


I get that Ascending is used because when moving onto workgroup you're going to need to call alternating subgroup sorts. However, as a front-facing API if I wanted a single subgroup shuffle I'd usually want it in the order specified by the Comparator. Maybe push it after the Config and give it a default value of true. Or better yet, since Ascending can be confusing, consider calling it ReverseOrder or something simpler that conveys the intent better

Ascending and later names like takeLarger implicitly assume the comparator is less (lo and hi don't, those are related to the "lane" order in the bitonic sort diagram). That's fine on its own, it makes the code more readable vs naming them with a more generic option. However, there should be comments mentioning that names assume this implicitly so there's no confusion.

Fletterio · 2025-10-07T21:12:46Z

include/nbl/builtin/hlsl/subgroup/bitonic_sort.hlsl

+										if (takeLarger)
+										{
+											if (comp(loKey, pLoKey)) { loKey = pLoKey; loVal = pLoVal; }
+											if (comp(hiKey, pHiKey)) { hiKey = pHiKey; hiVal = pHiVal; }


Are you sure this isn't reversed? Assume a less comparator, bitonicAscending = true for the current stage and upperHalf = true for the current thread. Then takeLarger semantically conveys that this thread wants to keep the larger values. And yet this code assigns the smaller values.

Fletterio · 2025-10-07T21:28:11Z

include/nbl/builtin/hlsl/subgroup/bitonic_sort.hlsl

+										else
+										{
+											if (comp(pLoKey, loKey)) { loKey = pLoKey; loVal = pLoVal; }
+											if (comp(pHiKey, hiKey)) { hiKey = pHiKey; hiVal = pHiVal; }


Again, if the compiler is dumb this code is very costly: half your threads in a subgroup will have upperHalf = true and the other half will have it set to false. Parallel code execution needs to be uniform across threads in the same SM, so this section of code will run twice: first some half of your threads (say, those in the upper half) will run, then the other half. This kills your throughput.

Inside each branch, the inner ifs will likely get compiled down to two OpSelects each. You can make this whole code branchless by doing

loKey = loCondition ? loKey : pLoKey; loVal = loCondition ? loVal : pLoVal; hiKey = hiCondition ? hiKey : pHiKey; hiVal = hiCondition ? hiVal : pHiVal;

where loCondition and hiCondition are predicates that depend on both takeLarger and the result of the key comparison

Fletterio · 2025-10-19T23:56:12Z

include/nbl/builtin/hlsl/subgroup/bitonic_sort.hlsl

+				}
+
+
+				static void lastMergeStage(uint32_t stage, uint32_t invocationID, NBL_REF_ARG(key_t) loKey, NBL_REF_ARG(key_t) hiKey,


In the end this is just mergeStage with bitonicAscending = true, right? I think you can just have mergeStage and avoid having this function duplicated

Fletterio · 2025-10-19T23:59:13Z

include/nbl/builtin/hlsl/subgroup/bitonic_sort.hlsl

+							const value_t pLoVal = glsl::subgroupShuffleXor<value_t>(loVal, threadStride);
+							const value_t pHiVal = glsl::subgroupShuffleXor<value_t>(hiVal, threadStride);
+							comparator_t comp;
+							if (comp(loKey, pLoKey)) { loKey = pLoKey; loVal = pLoVal; }


unlike the other method, both threads keep the min elements? Like upperHalf is not being considred here, so I'm inclined to believe this function is going to fail. I'd delete this method and just use mergeStage, since this is just that method but with a forced bitonicAscending = true.

Test this code to make sure it's right, but it feels wrong. Either way, just use mergeStage and avoid having this duped.

Fletterio · 2025-10-20T00:48:50Z

include/nbl/builtin/hlsl/subgroup/bitonic_sort.hlsl

+					[unroll]
+						for (uint32_t stage = 0; stage < subgroupSizeLog2; stage++)
+						{
+							const bool bitonicAscending = (stage == subgroupSizeLog2) ? Ascending : !bool(invocationID & (1u << stage));


stage == subgroupSizeLog2is never true in this loop, so just assign the term for the false clause.

Fletterio · 2025-10-20T00:58:51Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+						for (uint32_t pass = 0; pass <= stage; pass++)
+						{
+							const uint32_t stride = 1u << ((stage - pass) + subgroupSizeLog2); // Element stride shifts to inter-subgroup scale
+							// Shuffle from partner using WG XOR need to implument


We already have a workgroup shuffle:

Nabla/include/nbl/builtin/hlsl/workgroup/shuffle.hlsl

Line 94 in aa5d6cb

void shuffleXor(NBL_REF_ARG(T) value, uint32_t mask, NBL_REF_ARG(SharedMemoryAdaptor) sharedmemAdaptor)

You need to template this bitonic sort workgroup struct on a shared memory accessor. You can either do the shuffle in two rounds (shuffle keys -> barrier -> shuffle values -> barrier) or in a single round, by shuffling a pair of both key and value together. Ideally this would be a setting you can choose as well. For now settle on one and leave a comment on how we can also consider the other case down the line (two rounds is probably easier at this stage, and useful for the bigger array sizes)

Fletterio · 2025-10-20T01:04:08Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+						const uint32_t subgroupSizeLog2 = glsl::gl_SubgroupSizeLog2();
+
+						//first sort all subgroup inside wg
+						subgroup::bitonic_sort<true, SortConfig>::__call(loKey, hiKey, loVal, hiVal);


Ascending should be a parameter you pass to __call and not a template parameter. That way, you can control whether this starting subgroup sort is ascending or descending (notice that whenever you do a bigger-than-subgroup sort, some subgroup sorts are descending and some ascending, depending on the parity of the subgroupID. This is a condition you can't control from compiletime since the subgroupID is only known at runtime

This is also true of the workgroup struct btw. When you try to do virtual threading down the line, you will have to do the same thing based on "virtual workgroup ID", which will be known at runtime. So nbl::hlsl::workgroup::bitonic_sort::bitonic_sort should not be templated on Ascensing, it should instead be a parameter you pass to __call

On that note, nbl::hlsl::workgroup::bitonic_sort::bitonic_sort is an ugly namespace. The struct you use to run a bitonic sort should be nbl::hlsl::workgroup::BitonicSort (and similarly for subgroup). nbl::hlsl::workgroup::bitonic_sort should have structs related to the bitonic sort, but not the functional struct itself.

In the FFT, for example, we have the struct nbl::hlsl::workgroup::FFT to run the FFT, and nbl::hlsl::workgroup::fft is a namespace hat has structs useful for running an FFT, such as the config struct it's templated on

Fletterio · 2025-10-20T01:11:25Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+					using SortConfig = subgroup::bitonic_sort_config<uint32_t, uint32_t, less<uint32_t> >;
+
+
+					static void mergeWGStage(uint32_t stage, bool bitonicAscending, uint32_t invocationID, NBL_REF_ARG(key_t) loKey, NBL_REF_ARG(key_t) hiKey,


This is already in the workgroup namespace, you can just call it mergeStage

Fletterio · 2025-10-22T13:54:38Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+	using value_t = ValueType;
+	using comparator_t = Comparator;
+
+	NBL_CONSTEXPR_STATIC_INLINE uint16_t ElementsPerInvocationLog2 = _ElementsPerInvocationLog2;


NBL_CONSTEXPR_STATIC_INLINE resolves to const static when preprocessed. DXC is stupid and when it sees a static it WILL initialize a variable. But if it sees const on its own it does compile it down to a constant, which is the behaviour you would expect (and what would happen in C++). So for now just replace these usages with just const.

@devshgraphicsprogramming do we just change this macro to resolve to const in HLSL in master?

Fletterio · 2025-10-22T14:48:39Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

 	{
-		namespace workgroup
+		NBL_CONSTEXPR_STATIC_INLINE uint32_t WorkgroupSize = config_t::WorkgroupSize;
+		using adaptor_t = accessor_adaptors::StructureOfArrays<SharedMemoryAccessor, key_t, value_t, 1, WorkgroupSize>;


This looks wrong. You're making the index type for the array be value_t. If your values were floats, for example, this would blow up. You are getting lucky here because (I guess) the types you're testing with are all integers.

You'll want the shared memory accessor to satisfy this concept:

Nabla/include/nbl/builtin/hlsl/concepts/accessors/generic_shared_data.hlsl

Line 15 in aa5d6cb

#define NBL_CONCEPT_NAME GenericSharedMemoryAccessor

That concept basically states that the accessor can read and write uint32_ts. This is because shared memory (in most architectures) works with certain restrictions due to memory banking and size per transaction for each bank. It is the adaptor that is later in charge of reading/writing from/to shared memory with your actual type.

What you want here is to have a SharedMemoryAccessor accessing shared memory that is at least max(sizeof(key_t), sizeof(value_t)) * ArraySize bytes (this is unenforceable via concepts but you can make a utility for the config that returns this value so the user can allocate such an array).

Then you want TWO different adaptors: one is going to be
key_adaptor = accessor_adaptors::StructureOfArrays<SharedMemoryAccessor, key_t, uint32_t, 1, WorkgroupSize>
and the other is going to be
value_adaptor = accessor_adaptors::StructureOfArrays<SharedMemoryAccessor, value_t, uint32_t, 1, WorkgroupSize>

You would then shuffle the keys using the key adaptor, barrier, then shuffle the values using the value adaptor

Fletterio · 2025-10-22T14:55:36Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+			// Separate shuffles for lo/hi streams (two-round shuffle as per PR review)
+			// TODO: Consider single-round shuffle of key-value pairs for better performance
+			key_t pLoKey = loKey;
+			shuffleXor(pLoKey, threadStride, sharedmemAdaptor);


this would be a shuffle using the key_adaptor

Fletterio · 2025-10-22T14:55:56Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+			key_t pLoKey = loKey;
+			shuffleXor(pLoKey, threadStride, sharedmemAdaptor);
+			value_t pLoVal = loVal;
+			shuffleXor(pLoVal, threadStride, sharedmemAdaptor);


and this one would be a shuffle using the value_adaptor

Fletterio · 2025-10-22T15:00:13Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+
+			key_t pHiKey = hiKey;
+			shuffleXor(pHiKey, threadStride, sharedmemAdaptor);
+			value_t pHiVal = hiVal;


Inbetween shuffles, your array is aliased. The first shuffle has the following behaviour: all threads write values, then there's a barrier (so all threads are done writing before they start reading) then they start reading. On the next shuffle, they immediately start writing again. If you don't barrier inbetween these shuffles, you risk writing before some other thread was done reading, overwriting what needed to be read. Between shuffles, therefore, you need to barrier to unalias the memory

Fletterio · 2025-10-22T15:17:18Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+		for (uint32_t stage = 0; stage < numSubgroupsLog2; ++stage)
+		{
+			const bool isLastStage = (stage == numSubgroupsLog2 - 1);
+			const bool bitonicAscending = isLastStage ? true : !bool(invocationID & (subgroupSize << (stage + 1)));


might be wrong here but it feels like the formula on the right yields true even for the last stage (at that point the single 1 in the bitmask is too far to the left, so the result of the & is a 0)

Fletterio · 2025-10-22T15:18:10Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+
+			mergeStage(sharedmemAccessor, stage, bitonicAscending, invocationID, loKey, hiKey, loVal, hiVal);
+
+			const uint32_t subgroupInvocationID = glsl::gl_SubgroupInvocationID();


pull this one out of the loop

Fletterio · 2025-10-27T19:52:27Z

include/nbl/builtin/hlsl/subgroup/bitonic_sort.hlsl

+					if (shouldSwap)
+					{
+						// Swap keys
+						key_t tempKey = loKey;
+						loKey = hiKey;
+						hiKey = tempKey;
+						// Swap values
+						value_t tempVal = loVal;
+						loVal = hiVal;
+						hiVal = tempVal;
+					}


Make this branchless like you did the swaps in the subgroup branch

Fletterio · 2025-10-27T19:52:57Z

include/nbl/builtin/hlsl/subgroup/bitonic_sort.hlsl

+							// lo update
+							const bool loSelfSmaller = comp(loKey, pLoKey);
+							const bool takePartnerLo = takeLarger ? loSelfSmaller : !loSelfSmaller;
+							loKey = takePartnerLo ? pLoKey : loKey;
+							loVal = takePartnerLo ? pLoVal : loVal;
+
+							// hi update
+							const bool hiSelfSmaller = comp(hiKey, pHiKey);
+							const bool takePartnerHi = takeLarger ? hiSelfSmaller : !hiSelfSmaller;
+							hiKey = takePartnerHi ? pHiKey : hiKey;
+							hiVal = takePartnerHi ? pHiVal : hiVal;


It feels like both the lo and hi update can be expressed using the compareAndSwap method above, using !takeLarger (or maybe takeLarger but I feel it's negated) instead of ascending

Fletterio · 2025-10-27T20:56:12Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+            // Separate shuffles for lo/hi streams (two-round shuffle as per PR review)
+            // TODO: Consider single-round shuffle of key-value pairs for better performance
+            key_t pLoKey = loKey;
+            shuffleXor(pLoKey, threadStride, sharedmemAdaptorKey);
+            value_t pLoVal = loVal;
+            shuffleXor(pLoVal, threadStride, sharedmemAdaptorValue);
+
+            sharedmemAdaptorKey.workgroupExecutionAndMemoryBarrier();
+            sharedmemAdaptorValue.workgroupExecutionAndMemoryBarrier();


You are using the same accessor here (the adaptors might be different, but notice you create both from the same SharedMemoryAccessor and therefore they both address the same memory), these shuffles are therefore not independent. Imagine this scenario: thread 0 is running this code.

On the first shuffleXor, it will write loKey to position 0 into the sharedmem array, wait on a workgroup barrier. Meanwhile, thread threadStride will write its own loKey (which is pLoKey for thread 0, since 0 XOR threadStride = threadStride and similarly block on the same barrier.

The thing that thread 0 wants to do next is therefore to read pLoKey from position threadStride. But what if after the barrier, it's thread threadStride that gets to run first? Well, this thread wants to read its own pLoKey from position 0 into the array to complete the first shuffleXor, and immediately afterwards it's going to call the next shuffleXor, which will cause it to write its loVal into position threadStride of the array. If all of this happens before thread 0 had read its own pLoKey from that position, you've overwritten what that thread wanted to get

So what do we do? There's two options:

You ask the user to provide two sharedmem accessors that access different memory, one will be used to shuffle keys and the other will be used to shuffle values.

The user provides a single shared memory accessor with enough space for all keys and all values (max(sizeof(key_t), sizeof(value_t)) * WorkgroupSize) but you MUST barrier between each consecutive shuffle. You only call the barrier on the adaptor that was used last. Also, if pass > 0 you also barrier before the first shuffleXor, because you might have aliased memory from the last shuffleXor of the previous loop iteration

Ideally you'd want this to also be something you can choose in the Config struct (use more sharedmem with less barriers or use a single array with more barriers) but that's an optimization for the future. Right now just do the second one.

Fletterio · 2025-10-27T21:04:22Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+            sharedmemAdaptorKey.workgroupExecutionAndMemoryBarrier();
+            sharedmemAdaptorValue.workgroupExecutionAndMemoryBarrier();


Don't barrier at the end of an iteration, but rather at the start of every iteration past the first. This saves one barrier at the end of the call, which might be unnecessary.

Fletterio · 2025-10-27T21:06:45Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+            // lo update
+            const bool loSelfSmaller = comp(loKey, pLoKey);
+            const bool takePartnerLo = takeLarger ? loSelfSmaller : !loSelfSmaller;
+            loKey = takePartnerLo ? pLoKey : loKey;
+            loVal = takePartnerLo ? pLoVal : loVal;
+
+            // hi update
+            const bool hiSelfSmaller = comp(hiKey, pHiKey);
+            const bool takePartnerHi = takeLarger ? hiSelfSmaller : !hiSelfSmaller;
+            hiKey = takePartnerHi ? pHiKey : hiKey;
+            hiVal = takePartnerHi ? pHiVal : hiVal;


This is again compareAndSwap, identical to the one in the subgroup code. Make that method branchless like you did the inter-thread swaps in the subgroup sort, pull it out into bitonic_sort/common.hlsl then reuse it

Fletterio · 2025-10-27T21:10:48Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+
+        // Load this thread's 2 elements from accessor
+        const uint32_t loIdx = invocationID * 2;
+        const uint32_t hiIdx = loIdx + 1;


Since loIdx is even you can get away with doing this addition as hiIdx = loIdx | 1

Fletterio · 2025-10-27T21:14:23Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+        // Final: ensure lo <= hi within each thread (for ascending sort)
+        comparator_t comp;
+        if (comp(hiKey, loKey))
+        {
+            // Swap keys
+            key_t tempKey = loKey;
+            loKey = hiKey;
+            hiKey = tempKey;
+            // Swap values
+            value_t tempVal = loVal;
+            loVal = hiVal;
+            hiVal = tempVal;
+        }


Doesn't the last pass of subgroup::bitonic_sort<SortConfig>::mergeStage ensure this already?

Fletterio · 2025-10-27T21:16:10Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+	    NBL_CONSTEXPR_STATIC_INLINE uint32_t WorkgroupSize = config_t::WorkgroupSize;
+	    NBL_CONSTEXPR_STATIC_INLINE uint32_t ElementsPerThread = config_t::ElementsPerInvocation;
+		NBL_CONSTEXPR_STATIC_INLINE uint32_t TotalElements = WorkgroupSize * ElementsPerThread;
+		NBL_CONSTEXPR_STATIC_INLINE uint32_t ElementsPerSimpleSort = WorkgroupSize * 2; // E=1 handles WG*2 elements


make these const

Fletterio · 2025-10-27T21:17:10Z

include/nbl/builtin/hlsl/workgroup/bitonic_sort.hlsl

+        NBL_CONSTEXPR_STATIC_INLINE uint32_t WorkgroupSize = config_t::WorkgroupSize;
+        NBL_CONSTEXPR_STATIC_INLINE uint32_t ElementsPerThread = config_t::ElementsPerInvocation;
+        NBL_CONSTEXPR_STATIC_INLINE uint32_t TotalElements = WorkgroupSize * ElementsPerThread;
+        NBL_CONSTEXPR_STATIC_INLINE uint32_t ElementsPerSimpleSort = WorkgroupSize * 2;


Make these const as well

CrabExtra and others added 8 commits October 2, 2025 12:00

created a structured like fft

5d6322a

config question

264650c

subgroupsort

268949e

added bitonic_sort name space

73aa820

Merge branch 'Devsh-Graphics-Programming:master' into master

dcb5e6e

subgroup changes

b84a4bd

removed unused

779815e

Merge branch 'master' of https://github.com/CrabExtra/Nabla

78a307a

Fletterio reviewed Oct 12, 2025

View reviewed changes

CrabExtra added 3 commits October 19, 2025 14:30

added last merge step as a function

ad7a4c5

uncomplete workgroup fn

b80283a

complete the logic for some pr questions

7c91744

Fletterio reviewed Oct 20, 2025

View reviewed changes

CrabExtra added 2 commits October 22, 2025 17:12

Refactor bitonic sort for workgroup + Accessor support

4d253f3

Update bitonic_sort.hlsl

f03b8b2

Fletterio reviewed Oct 22, 2025

View reviewed changes

VT implumentation

555dcbe

Fletterio reviewed Oct 27, 2025

View reviewed changes

CrabExtra closed this Oct 29, 2025

		}


		static void lastMergeStage(uint32_t stage, uint32_t invocationID, NBL_REF_ARG(key_t) loKey, NBL_REF_ARG(key_t) hiKey,

		using SortConfig = subgroup::bitonic_sort_config<uint32_t, uint32_t, less<uint32_t> >;


		static void mergeWGStage(uint32_t stage, bool bitonicAscending, uint32_t invocationID, NBL_REF_ARG(key_t) loKey, NBL_REF_ARG(key_t) hiKey,


		mergeStage(sharedmemAccessor, stage, bitonicAscending, invocationID, loKey, hiKey, loVal, hiVal);

		const uint32_t subgroupInvocationID = glsl::gl_SubgroupInvocationID();

		sharedmemAdaptorKey.workgroupExecutionAndMemoryBarrier();
		sharedmemAdaptorValue.workgroupExecutionAndMemoryBarrier();

Uh oh!

bitonic_sort #940

bitonic_sort #940

Conversation

CrabExtra commented Oct 7, 2025

Description

Testing

TODO list:

Uh oh!

devshgraphicsprogrammingjenkins commented Oct 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fletterio Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fletterio Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Fletterio Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Fletterio Oct 27, 2025 •

edited

Loading

Fletterio Oct 27, 2025 •

edited

Loading

Fletterio Oct 27, 2025 •

edited

Loading