Add Support for PG PUT Batch #60795

smallinsky · 2025-10-30T13:11:42Z

What

Add PutBatch support for Postgres

Why not UPSERT - Postgres don't support UPSERT it needed to be UPDATE ON CONFLICTS

lib/backend/pgbk/put_batch.go

espadolini · 2025-10-31T16:04:41Z

lib/backend/pgbk/put_batch.go

+		}
+
+		if _, err := pgcommon.Retry(ctx, b.log, func() (struct{}, error) {
+			_, err := b.pool.Exec(ctx, putBatchStmt, keys, values, expires, revs)


I've liked having the sql right in the Exec/Query in the rest of pgbk so it's obvious what the positional parameters are.

Any chance of moving the SQL here so we can see that the parameters are in the right order?

lib/backend/pgbk/put_batch.go

espadolini · 2025-10-31T16:18:36Z

lib/backend/pgbk/put_batch.go

+)
+
+const (
+	defaultUpsertBatchChunk = 300


This feels a little high, until we switch to a weaker transaction isolation every range read that touches one of the items in the batch will cause a serialization failure, I think.

I think this might be the same sort of question that prompted gravitational/teleport.e#4390 in CRDB. cc @dboslee, especially since we're going to want this in crdbbk as well.

espadolini · 2025-11-03T23:26:15Z

lib/backend/pgbk/put_batch.go

+		}
+
+		if _, err := pgcommon.Retry(ctx, b.log, func() (struct{}, error) {
+			_, err := b.pool.Exec(ctx, putBatchStmt, keys, values, expires, revs)


Any chance of moving the SQL here so we can see that the parameters are in the right order?

espadolini · 2025-11-03T23:27:35Z

lib/backend/test/put_batch.go

+type PutBatcher interface {
+	PutBatch(ctx context.Context, items []backend.Item) ([]string, error)
+}


Isn't this just backend.BatchPutter?

espadolini · 2025-11-03T23:30:18Z

lib/backend/test/put_batch.go

+		case <-time.After(watchInitTimeout):
+			t.Fatal("timed out waiting for init event")


I think that waiting for arbitrary timeouts on the execution of steps in the test is an antipattern, there's already a mechanism to set and enforce a timeout in the test harness, and a test runner that's slow enough might fail this for no reason other than needing some more time.

espadolini · 2025-11-03T23:30:49Z

lib/backend/test/put_batch.go

+		case <-deadline.C:
+			t.Fatalf("timed out waiting for events: got=%d want=%d", len(out), wantCount)


Same as https://github.com/gravitational/teleport/pull/60795/files#r2488150545

espadolini · 2025-11-03T23:33:52Z

lib/backend/pgbk/put_batch.go

+	putBatchStmt            = `
+INSERT INTO kv (key, value, expires, revision)
+SELECT * FROM UNNEST(
+  $1::bytea[],
+  $2::bytea[],
+  $3::timestamptz[],
+  $4::uuid[]
+)
+ON CONFLICT (key) DO UPDATE
+SET
+	value    = EXCLUDED.value,
+	expires  = EXCLUDED.expires,
+	revision = EXCLUDED.revision;
+`


Putting this in a single line makes pg_stat_activity bearable to look at, and avoids accidental mixed indentation.

Suggested change

putBatchStmt = `

INSERT INTO kv (key, value, expires, revision)

SELECT * FROM UNNEST(

$1::bytea[],

$2::bytea[],

$3::timestamptz[],

$4::uuid[]

)

ON CONFLICT (key) DO UPDATE

SET

value = EXCLUDED.value,

expires = EXCLUDED.expires,

revision = EXCLUDED.revision;

`

putBatchStmt = "INSERT INTO kv (key, value, expires, revision) SELECT * FROM UNNEST($1::bytea[], $2::bytea[], $3::timestamptz[], $4::uuid[]) ON CONFLICT (key) DO UPDATE

SET value = EXCLUDED.value, expires = EXCLUDED.expires, revision = EXCLUDED.revision;"

cthach · 2025-11-04T20:39:41Z

lib/backend/test/put_batch.go

+		for i := 0; i < itemCount; i++ {
+			items = append(items, backend.Item{
+				Key:     prefix(fmt.Sprintf("item/%04d", i)),
+				Value:   bytes.Repeat([]byte(fmt.Sprintf("%d", i)), payloadSize),


Do we want variable sized payloads? I think the bytes slice can vary from 1-3 bytes * 300 * 1024.

Wouldn't this consume a large amount of memory in order to run this test? Could we perhaps make the value fixed or reduce the number of items while getting the same test coverage?

cthach · 2025-11-04T20:47:24Z

lib/backend/pgbk/put_batch.go

+		for _, item := range chunk {
+			keys = append(keys, nonNilKey(item.Key))
+			values = append(values, nonNil(item.Value))
+			expires = append(expires, zeronull.Timestamptz(item.Expires.UTC()))


Does this handle as expected if item.Expires is zeroed aka it never expires?

It shoud, The came approach is used in Put flow where the zeronull.Timestamptz handles IsZero

Gotcha. Please just make sure we have a test case for this please 🙏🏾 🙏🏾 🙏🏾

cthach · 2025-11-04T21:00:58Z

lib/backend/pgbk/put_batch.go

+		keys := make([][]byte, 0, len(chunk))
+		values := make([][]byte, 0, len(chunk))
+		expires := make([]zeronull.Timestamptz, 0, len(chunk))
+		revs := make([]revision, 0, len(chunk))
+
+		for _, item := range chunk {
+			keys = append(keys, nonNilKey(item.Key))
+			values = append(values, nonNil(item.Value))
+			expires = append(expires, zeronull.Timestamptz(item.Expires.UTC()))
+
+			revVal := newRevision()
+			revs = append(revs, revVal)
+			revOut = append(revOut, revisionToString(revVal))


Suggested change

keys := make([][]byte, 0, len(chunk))

values := make([][]byte, 0, len(chunk))

expires := make([]zeronull.Timestamptz, 0, len(chunk))

revs := make([]revision, 0, len(chunk))

for _, item := range chunk {

keys = append(keys, nonNilKey(item.Key))

values = append(values, nonNil(item.Value))

expires = append(expires, zeronull.Timestamptz(item.Expires.UTC()))

revVal := newRevision()

revs = append(revs, revVal)

revOut = append(revOut, revisionToString(revVal))

keys := make([][]byte, len(chunk))

values := make([][]byte, len(chunk))

expires := make([]zeronull.Timestamptz, len(chunk))

revs := make([]revision, len(chunk))

for i, item := range chunk {

keys[i] = nonNilKey(item.Key)

values[i] = nonNil(item.Value)

expires[i] = zeronull.Timestamptz(item.Expires.UTC())

revVal := newRevision()

revs[i] = revVal

revOut = append(revOut, revisionToString(revVal))

/nit

Direct indexing is slightly more efficient since the runtime can skip bounds checks.

Disclaimer: I haven't actually tested this.

I'd really like to see a benchmark for that before we go and use a pattern that's more error-prone.

I'd really like to see a benchmark for that before we go and use a pattern that's more error-prone.

I gotchu. Heres a simple benchmark.

package teleport_test import "testing" func BenchmarkAppendNoPrealloc(b *testing.B) { for b.Loop() { var s []int for i := range 10000 { s = append(s, i) } } } func BenchmarkAppendWithPrealloc(b *testing.B) { for b.Loop() { s := make([]int, 0, 10000) for i := range 10000 { s = append(s, i) } } } func BenchmarkDirectIndexWithPrealloc(b *testing.B) { for b.Loop() { s := make([]int, 10000) for i := range 10000 { s[i] = i } } }

Results:

❯ go test -bench=. -benchmem ./benchmark_prealloc_test.go goos: darwin goarch: arm64 cpu: Apple M4 Pro BenchmarkAppendNoPrealloc-12 12636 90313 ns/op 357627 B/op 19 allocs/op BenchmarkAppendWithPrealloc-12 94554 21342 ns/op 81920 B/op 1 allocs/op BenchmarkDirectIndexWithPrealloc-12 65499 15631 ns/op 81920 B/op 1 allocs/op PASS

With a chunk size of 1000 and copying the types we have in the loop in four slices (but with the revision getting copied as a string) I'm getting

BenchmarkAppendWithPrealloc-14 45082 25864 ns/op 98113 B/op 1004 allocs/op BenchmarkDirectIndexWithPrealloc-14 48951 24134 ns/op 98113 B/op 1004 allocs/op

and if we add the generation of the revision, like we have in code

BenchmarkAppendWithPrealloc-14 5415 220249 ns/op 114116 B/op 2004 allocs/op BenchmarkDirectIndexWithPrealloc-14 5626 216876 ns/op 114113 B/op 2004 allocs/op

an improvement of 6.6% and 1.5% respectively, which is a lot less worth it compared to the potential for misuse that direct indexing has - especially considering that this is minor preparation for a much larger amount of network I/O. It's definitely worth keeping in mind for very tight loops that mainly deal in memory tho, thank you for pointing that out.

benchmark code

package foo import ( "fmt" "testing" "time" "github.com/google/uuid" "github.com/jackc/pgx/v5/pgtype/zeronull" "github.com/gravitational/teleport/lib/backend" ) func BenchmarkAppendWithPrealloc(b *testing.B) { var chunk []backend.Item for i := range 1000 { chunk = append(chunk, backend.Item{ Key: backend.KeyFromString(fmt.Sprintf("/%d", i)), Value: []byte("foo"), Expires: time.Now(), Revision: uuid.NewString(), }) } for b.Loop() { keys := make([][]byte, 0, len(chunk)) values := make([][]byte, 0, len(chunk)) expires := make([]zeronull.Timestamptz, 0, len(chunk)) revs := make([]uuid.UUID, 0, len(chunk)) // revs := make([]string, 0, len(chunk)) for _, item := range chunk { keys = append(keys, []byte(item.Key.String())) values = append(values, item.Value) expires = append(expires, zeronull.Timestamptz(item.Expires.UTC())) revs = append(revs, uuid.New()) // revs = append(revs, item.Revision) } } } func BenchmarkDirectIndexWithPrealloc(b *testing.B) { var chunk []backend.Item for i := range 1000 { chunk = append(chunk, backend.Item{ Key: backend.KeyFromString(fmt.Sprintf("/%d", i)), Value: []byte("foo"), Expires: time.Now(), Revision: uuid.NewString(), }) } for b.Loop() { keys := make([][]byte, len(chunk)) values := make([][]byte, len(chunk)) expires := make([]zeronull.Timestamptz, len(chunk)) revs := make([]uuid.UUID, len(chunk)) // revs := make([]string, len(chunk)) for i, item := range chunk { keys[i] = []byte(item.Key.String()) values[i] = item.Value expires[i] = zeronull.Timestamptz(item.Expires.UTC()) revs[i] = uuid.New() // revs[i] = item.Revision } } }

Nice, thanks for providing that! Yep, the efficiency gain is very small and probably only worth the tradeoff in performance-critical code.

That being said, I don't have a preference, so will defer to @smallinsky on which direction we go with.

smallinsky force-pushed the smallinsky/pg/pug_batch branch 3 times, most recently from ee56fb5 to 131a7ee Compare October 31, 2025 10:54

smallinsky commented Oct 31, 2025

View reviewed changes

lib/backend/pgbk/put_batch.go Outdated Show resolved Hide resolved

smallinsky force-pushed the smallinsky/pg/pug_batch branch 4 times, most recently from b98b8b9 to 48e2da7 Compare October 31, 2025 15:09

smallinsky marked this pull request as ready for review October 31, 2025 15:29

smallinsky requested review from espadolini and r0mant October 31, 2025 15:29

github-actions bot added the size/md label Oct 31, 2025

github-actions bot requested review from cthach and rosstimothy October 31, 2025 15:29

espadolini reviewed Oct 31, 2025

View reviewed changes

smallinsky force-pushed the smallinsky/pg/pug_batch branch from 48e2da7 to cb7dd4c Compare October 31, 2025 18:06

Add Support for PG PUT Batch

9e910de

smallinsky force-pushed the smallinsky/pg/pug_batch branch from cb7dd4c to 9e910de Compare October 31, 2025 18:10

smallinsky requested a review from espadolini November 3, 2025 18:07

espadolini reviewed Nov 3, 2025

View reviewed changes

cthach reviewed Nov 4, 2025

View reviewed changes

		case <-time.After(watchInitTimeout):
		t.Fatal("timed out waiting for init event")

		case <-deadline.C:
		t.Fatalf("timed out waiting for events: got=%d want=%d", len(out), wantCount)

Add Support for PG PUT Batch #60795

Are you sure you want to change the base?

Add Support for PG PUT Batch #60795

Uh oh!

Conversation

smallinsky commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cthach Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

espadolini Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

smallinsky commented Oct 30, 2025 •

edited

Loading

cthach Nov 5, 2025 •

edited

Loading

espadolini Nov 5, 2025 •

edited

Loading