-
Notifications
You must be signed in to change notification settings - Fork 37
Description
Describe the bug
The JSON transformer cannot properly mask individual object keys within JSON arrays while preserving the original length of each value. When using the #.field path syntax to target all fields in an array of objects, the transformer either:
- Returns a template execution error due to receiving
[]interface{}instead of individual string values - Concatenates all masked values into a single string and applies it to every object in the array
Why this should be considered a bug rather than a limitation:
- sjson is working correctly: When using
#.fieldpath, sjson correctly extracts all matching values and returns them as an array, which is the expected behavior per sjson documentation - pgstream's template system is incomplete: pgstream's template execution context passes
.GetValuecontaining the array[]interface{}but doesn't provide proper array handling capabilities - Other tools solve this: Tools like
jqcan easily process arrays with expressions like.[] | .field = mask(.field) - Template capabilities exist: pgstream uses Go templates with Sprig functions that have array iteration capabilities (
range, etc.) but doesn't expose them properly for this use case - The data is available: The array of values is correctly extracted by sjson - pgstream just needs better template context handling
To Reproduce
Steps to reproduce the behavior:
- Use this configuration:
transformations:
validation_mode: relaxed
table_transformers:
- schema: my_schema
table: my_table
column_transformers:
json_column:
name: json
parameters:
operations:
- operation: set
path: "#.sensitive_field"
value_template: '{{ masking "default" .GetValue }}'
error_not_exist: false
skip_not_exist: true- Run pgstream with a JSONB column containing:
[
{"sensitive_field": "ABC123DEF456GHI789", "type": "system_a"},
{"sensitive_field": "XYZ789UVW012RST345QWE678", "type": "system_b"}
]-
Perform replication or snapshot operation
-
See error:
error executing template: template: op[0] set #.sensitive_field:1:21: executing "op[0] set #.sensitive_field" at <.GetValue>: wrong type for value; expected string; got []interface {}
Expected behavior
The JSON transformer should be able to:
- Target individual object keys within JSON arrays using path syntax like
#.field - Apply masking functions to each individual value while preserving the original length
- Maintain the JSON array structure with each object having its own masked field
Expected output:
[
{"sensitive_field": "******************", "type": "system_a"},
{"sensitive_field": "*************************", "type": "system_b"}
]Current Workaround Limitations
- Using indexed paths (
0.field,1.field, etc.) defeats the purpose of handling variable-length arrays - Using literal replacement values (
value: "***MASKED***") doesn't preserve original length - The
templatetransformer doesn't support JSONB data types - Template approaches that attempt to process the array return concatenated results applied to all objects
Potential Solutions
The fix would be in pgstream's JSON transformer to either:
- Detect when
.GetValuereturns an array and provide iteration helpers in the template context - Provide template functions that can work with arrays (e.g.,
{{ range .GetValue }}{{ masking "default" . }}{{ end }}) - Change the approach to process array elements individually before template execution
Setup (please complete the following information):
- pgstream version: v0.8.3
- Postgres version: 15
- Postgres environment: Docker container
- Column type: JSONB
Additional context
This limitation significantly impacts the ability to anonymize JSON data containing arrays of objects with sensitive identifiers. The current JSON transformer works well for simple key-value pairs but struggles with array iteration and individual element processing, despite having all the necessary underlying capabilities through sjson and Go templates.