Skip to content

Conversation

@mertbilgic
Copy link

No description provided.


// Calculate sparsity ratio to detect sparse primary key distributions
// Example: Snowflake IDs where minValue=1, maxValue=7234567890123456789, but only 1000 rows
sparsityRatio := float64(totalRange) / float64(rowCount)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In large distributed ID systems (e.g., snowflake/timestamp IDs), it’s common to switch from offset/range chunking to keyset pagination based on a “gap density” (sparsity) check.

Comment on lines +308 to +315
type SnapshotChunkingMode string

const (
SnapshotChunkingModeAuto SnapshotChunkingMode = "auto"
SnapshotChunkingModeRange SnapshotChunkingMode = "range"
SnapshotChunkingModeKeyset SnapshotChunkingMode = "keyset"
SnapshotChunkingModeOffset SnapshotChunkingMode = "offset"
)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user might not want the decision made according to the tree. I added this so that, if possible, we can choose a moderator and direct them to the type of snapshot they want.

Comment on lines +363 to +368
// ChunkResult contains the result of processing a chunk
type ChunkResult struct {
RowCount int64
LastPK *int64 // Last processed primary key value (for keyset pagination)
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In keyset chunks, the starting cursor for the next chunk is reliably determined. That's why we needed this model.

Comment on lines -420 to +467
"SELECT * FROM %s.%s WHERE %s >= %d AND %s <= %d ORDER BY %s LIMIT %d",
`SELECT * FROM "%s"."%s" WHERE "%s" >= %d AND "%s" <= %d ORDER BY %s LIMIT %d`,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the SQL generation logic to wrap column names in double quotes ("column_name"). This ensures our queries are compliant with the PostgreSQL Lexical Structure for "Quoted Identifiers."

Comment on lines +505 to +511
heartbeat_at = '%s',
-- Update range_start from previous chunk's last_pk for sequential keyset chunks
range_start = CASE
WHEN c.range_end IS NULL AND c.range_start < 0 AND c.chunk_index > 0
THEN COALESCE((SELECT last_pk FROM prev_chunk_info), c.range_start)
ELSE c.range_start
END
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The range_end IS NULL condition is scoped to sequential keyset only; range, offset, and parallel keyset chunks are unaffected.

Comment on lines +797 to +813
// Use NTILE to divide rows into equal groups and get boundary values
// Use quoted identifiers to handle special characters in table/column names
query := fmt.Sprintf(`
WITH chunk_boundaries AS (
SELECT
"%s" as pk_value,
NTILE(%d) OVER (ORDER BY "%s") as chunk_num
FROM "%s"."%s"
)
SELECT
chunk_num - 1 as chunk_index,
MIN(pk_value) as range_start,
MAX(pk_value) as range_end
FROM chunk_boundaries
GROUP BY chunk_num
ORDER BY chunk_num
`, pkColumn, numChunks, pkColumn, table.Schema, table.Name)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We spoke with @Abdulsametileri . There is a performance problem here at the moment.

@Abdulsametileri
Copy link
Member

Abdulsametileri commented Jan 4, 2026

let's use for ctid partitioning #67

{
					Name:                      "yourTable",
					Schema:                    "yourSchema",
					SnapshotPartitionStrategy: publication.SnapshotPartitionStrategyCTIDBlock,
				},

@Abdulsametileri
Copy link
Member

we added ctid and fixed with it for now, ı am closing this, in the future maybe it needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants