-
Notifications
You must be signed in to change notification settings - Fork 675
merge develop into feat/human-annotation #6426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
merge develop into feat/human-annotation #6426
Conversation
* initial commit * don't send etag back on 412 easy way to cheat the system * remove unused * refactor keeping match closer to db * address PR comments and cleanup save code and refactor tests to account for if-match * check for Etag on success * update last_modified at on if match save * tweaks * use ETag.create
* fix pipeline type from/to json; add tests * use dataclass func instead for to_json * fix test
WalkthroughShifts pipeline JSON format from list to dict with "stages"; centralizes JSON utilities (dumps/loads, JSONResponse) and jsonpatch under fiftyone.server.utils.json; removes json_transform registry; adds HTTP ETag helper; overhauls sample routes to support If-Match/ETag and optimistic concurrency; updates decorators to use new JSON utilities; adds/updates tests accordingly. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant R as Sample Route
participant UJ as utils.json (parse/deserialize)
participant DB as Dataset/Sample Store
participant UH as utils.http.ETag
Note over C,R: PATCH /samples/{id} with If-Match
C->>R: HTTP PATCH (If-Match, JSON Patch)
R->>R: get_if_last_modified_at()
R->>DB: get_sample(dataset_id, sample_id, if_last_modified_at?)
alt If-Match provided and mismatch
DB-->>R: last_modified_at != If-Match
R-->>C: 412 Precondition Failed
else Match or no If-Match
R->>UJ: parse_jsonpatch + deserialize values
UJ-->>R: patch operations
R->>DB: save_sample(sample, if_last_modified_at?)
DB-->>R: persisted sample
R->>UH: generate_sample_etag()
UH-->>R: ETag
R-->>C: 200 OK (ETag header, JSON body)
end
sequenceDiagram
autonumber
participant D as Decorator.wrapper
participant UJ as utils.json
participant H as Handler
participant JR as utils.json.JSONResponse
D->>UJ: loads(request.body)
UJ-->>D: payload dict
D->>H: call with payload
H-->>D: result
D->>UJ: dumps(result)
D->>JR: JSONResponse(content)
JR-->>D: Response
D-->>Client: HTTP Response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (5)
fiftyone/server/utils/json/jsonpatch/__init__.py (1)
50-59
: Defensive type check for iterable elements in parse()Strings/bytes are Iterable and will currently raise a confusing TypeError when indexed like a dict. Guard each element to fail fast with a clear message.
Apply this diff:
return_one = False if isinstance(patches, dict): patches = [patches] return_one = True elif not isinstance(patches, Iterable): raise TypeError("Patches must be a dict or an iterable of dicts") parsed = [] for patch in patches: + if not isinstance(patch, dict): + raise TypeError("Each patch must be a dict") try: op_str = patch["op"] path = patch["path"] except KeyError as err: raise ValueError(f"Missing {err} field") from errfiftyone/server/decorators.py (1)
24-27
: Prefer media_type over manual Content-Type headerUse the Response media_type parameter for clarity.
- return Response( - await run_sync_task(lambda: utils.json.dumps(response)), - headers={"Content-Type": "application/json"}, - ) + return Response( + await run_sync_task(lambda: utils.json.dumps(response)), + media_type="application/json", + )fiftyone/server/utils/json/__init__.py (1)
14-18
: Consider re-exporting jsonpatch.parse under a short aliasProviding parse as a top-level alias improves ergonomics and maintains parity with prior usage.
from fiftyone.server.utils.json.jsonpatch import parse as parse_jsonpatch from fiftyone.server.utils.json.serialization import deserialize, serialize def dumps(obj: Any) -> str: @@ class JSONResponse(StarletteJSONResponse): """Custom JSON response that uses the custom Encoder.""" def render(self, content: Any) -> bytes: return dumps(content).encode("utf-8") + +# Backwards/ergonomic alias +parse = parse_jsonpatchtests/unittests/sample_route_tests.py (2)
354-361
: Make malformed payload JSON-serializable to test server-side validationUsing a Python set risks serialization errors in the test itself. Use a dict (still invalid for detections) so the server can surface the intended 400.
- patch_payload = { - # Detections object is missing the required 'detections' list - "ground_truth": { - "_cls": "Detections", - "detections": {"some messed up map"}, - } - } + patch_payload = { + # Detections object is missing the required 'detections' list + "ground_truth": { + "_cls": "Detections", + "detections": {"some": "messed up map"}, + } + }
592-606
: Remove duplicate MagicMock assignmentRedundant reinitialization; keep a single MagicMock.
- mock_request = MagicMock() - mock_request = MagicMock() + mock_request = MagicMock()
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (19)
fiftyone/operators/_types/pipeline.py
(1 hunks)fiftyone/server/decorators.py
(2 hunks)fiftyone/server/routes/sample.py
(5 hunks)fiftyone/server/utils/__init__.py
(1 hunks)fiftyone/server/utils/http.py
(1 hunks)fiftyone/server/utils/json/__init__.py
(1 hunks)fiftyone/server/utils/json/encoder.py
(1 hunks)fiftyone/server/utils/json/jsonpatch/__init__.py
(2 hunks)fiftyone/server/utils/json/jsonpatch/patch.py
(6 hunks)fiftyone/server/utils/json/serialization.py
(1 hunks)fiftyone/server/utils/json_transform/__init__.py
(0 hunks)fiftyone/server/utils/json_transform/transform.py
(0 hunks)fiftyone/server/utils/json_transform/types.py
(0 hunks)tests/unittests/factory/delegated_operation_doc_tests.py
(1 hunks)tests/unittests/operators/types_tests.py
(1 hunks)tests/unittests/sample_route_tests.py
(9 hunks)tests/unittests/server/utils/jsonpatch/test_json_patch_patch.py
(1 hunks)tests/unittests/server/utils/jsonpatch/test_jsonpatch.py
(1 hunks)tests/unittests/server/utils/jsonpatch/test_jsonpatch_methods.py
(1 hunks)
💤 Files with no reviewable changes (3)
- fiftyone/server/utils/json_transform/types.py
- fiftyone/server/utils/json_transform/transform.py
- fiftyone/server/utils/json_transform/init.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-14T19:09:00.286Z
Learnt from: j053y
PR: voxel51/fiftyone#6416
File: fiftyone/server/utils/json/serialization.py:25-0
Timestamp: 2025-10-14T19:09:00.286Z
Learning: In `fiftyone/server/utils/json/serialization.py`, the `deserialize()` function intentionally raises a `ValueError` when encountering an unregistered `_cls` value that is not in the hard-coded list of supported label types (Classification, Classifications, Detection, Detections, Polyline, Polylines). This strict validation is by design and should not be changed to silently return unknown values.
Applied to files:
fiftyone/server/utils/json/serialization.py
🔇 Additional comments (2)
fiftyone/server/utils/json/jsonpatch/__init__.py (1)
12-21
: Import path migration looks goodRe-exports and Operation class mapping remain intact. No API changes observed.
tests/unittests/server/utils/jsonpatch/test_jsonpatch_methods.py (1)
14-23
: Import path update is correctTests now target the relocated jsonpatch module under utils.json. No further changes needed.
def from_json(cls, json_dict): | ||
"""Loads the pipeline from a JSON/python dict. | ||
Ex., [ | ||
{"operator_uri": "@voxel51/test/blah", "name": "my_stage", ...}, | ||
..., | ||
] | ||
Ex., { | ||
"stages": [ | ||
{"operator_uri": "@voxel51/test/blah", "name": "my_stage"}, | ||
..., | ||
] | ||
} | ||
Args: | ||
json_list: a list of JSON / python dicts | ||
json_dict: a JSON / python dict representation of the pipeline | ||
""" | ||
stages = [PipelineStage(**stage) for stage in json_list] | ||
stages = [ | ||
PipelineStage(**stage) for stage in json_dict.get("stages") or [] | ||
] | ||
return cls(stages=stages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle legacy list payloads in Pipeline.from_json
Existing persisted documents still store pipelines as bare lists. The new implementation unconditionally calls json_dict.get(...)
, so deserializing those records now raises an AttributeError
instead of rebuilding the pipeline. Please keep the old list shape supported while accepting the new dict form.
@classmethod
def from_json(cls, json_dict):
"""Loads the pipeline from a JSON/python dict.
@@
- stages = [
- PipelineStage(**stage) for stage in json_dict.get("stages") or []
- ]
+ if json_dict is None:
+ stage_defs = []
+ elif isinstance(json_dict, list):
+ stage_defs = json_dict
+ elif isinstance(json_dict, dict):
+ stage_defs = json_dict.get("stages") or []
+ else:
+ raise TypeError(
+ "Pipeline.from_json expected a dict with a 'stages' key or a list"
+ )
+
+ stages = [PipelineStage(**stage) for stage in stage_defs]
return cls(stages=stages)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def from_json(cls, json_dict): | |
"""Loads the pipeline from a JSON/python dict. | |
Ex., [ | |
{"operator_uri": "@voxel51/test/blah", "name": "my_stage", ...}, | |
..., | |
] | |
Ex., { | |
"stages": [ | |
{"operator_uri": "@voxel51/test/blah", "name": "my_stage"}, | |
..., | |
] | |
} | |
Args: | |
json_list: a list of JSON / python dicts | |
json_dict: a JSON / python dict representation of the pipeline | |
""" | |
stages = [PipelineStage(**stage) for stage in json_list] | |
stages = [ | |
PipelineStage(**stage) for stage in json_dict.get("stages") or [] | |
] | |
return cls(stages=stages) | |
@classmethod | |
def from_json(cls, json_dict): | |
"""Loads the pipeline from a JSON/python dict. | |
Ex., { | |
"stages": [ | |
{"operator_uri": "@voxel51/test/blah", "name": "my_stage"}, | |
..., | |
] | |
} | |
Args: | |
json_dict: a JSON / python dict representation of the pipeline | |
""" | |
if json_dict is None: | |
stage_defs = [] | |
elif isinstance(json_dict, list): | |
stage_defs = json_dict | |
elif isinstance(json_dict, dict): | |
stage_defs = json_dict.get("stages") or [] | |
else: | |
raise TypeError( | |
"Pipeline.from_json expected a dict with a 'stages' key or a list" | |
) | |
stages = [PipelineStage(**stage) for stage in stage_defs] | |
return cls(stages=stages) |
🤖 Prompt for AI Agents
In fiftyone/operators/_types/pipeline.py around lines 99 to 115, from_json
currently assumes json_dict is a dict and calls json_dict.get("stages"), which
breaks when persisted pipelines are stored as a bare list; update the method to
accept both legacy list payloads and the new dict shape by detecting if
json_dict is a list (or tuple) and using it directly as the stages list,
otherwise extract stages = json_dict.get("stages") or [] (also handle json_dict
being None), then build PipelineStage objects from that stages iterable and
return cls(stages=stages).
try: | ||
if_last_modified_at = datetime.datetime.fromtimestamp( | ||
float(if_match) | ||
) | ||
except Exception: | ||
... | ||
|
||
if if_last_modified_at is None: | ||
raise HTTPException( | ||
status_code=400, detail="Invalid If-Match header" | ||
) | ||
return if_last_modified_at | ||
|
||
|
||
def get_sample( | ||
dataset_id: str, | ||
sample_id: str, | ||
if_last_modified_at: Union[datetime.datetime, None], | ||
) -> fo.Sample: | ||
"""Retrieves a sample from a dataset. | ||
Args: | ||
dataset_id: The ID of the dataset | ||
sample_id: The ID of the sample | ||
if_last_modified_at: The if last modified date, if it exists | ||
Raises: | ||
HTTPException: if the dataset or sample is not found | ||
HTTPException: If the dataset or sample is not found or the if last | ||
modified date is present and does not match the sample | ||
Returns: | ||
The sample | ||
""" | ||
|
||
try: | ||
dataset = fou.load_dataset(id=dataset_id) | ||
except ValueError: | ||
except ValueError as err: | ||
raise HTTPException( | ||
status_code=404, | ||
detail=f"Dataset '{dataset_id}' not found", | ||
) | ||
) from err | ||
|
||
try: | ||
sample = dataset[sample_id] | ||
except KeyError: | ||
except KeyError as err: | ||
raise HTTPException( | ||
status_code=404, | ||
detail=f"Sample '{sample_id}' not found in dataset '{dataset_id}'", | ||
) | ||
) from err | ||
|
||
# Fail early, if very out-of-date | ||
if if_last_modified_at is not None: | ||
if sample.last_modified_at != if_last_modified_at: | ||
raise HTTPException( | ||
status_code=412, detail="If-Match condition failed" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalize parsed datetimes before comparison
datetime.fromtimestamp()
yields a naive datetime, so when the If-Match header supplies a Unix timestamp we end up comparing a naive value against the UTC-aware sample.last_modified_at
, raising TypeError
and returning a 500. The same risk exists for ISO strings without an explicit offset. Please ensure any successfully parsed value is made timezone-aware (e.g., defaulting to timezone.utc
) before the equality check and Mongo filter.
🤖 Prompt for AI Agents
In fiftyone/server/routes/sample.py around lines 63 to 118, the parsed If-Match
datetime can be naive (fromtimestamp or ISO parse) while sample.last_modified_at
is UTC-aware, causing TypeError; ensure any successfully parsed datetime is
timezone-aware (prefer timezone.utc). Replace
datetime.fromtimestamp(float(if_match)) with
datetime.fromtimestamp(float(if_match), tz=datetime.timezone.utc) or, if parsing
ISO strings, use datetime.fromisoformat(...) and if result.tzinfo is None call
result = result.replace(tzinfo=datetime.timezone.utc); use this timezone-aware
value for the equality check and any DB filter.
def create(value: Any, is_weak: bool = False) -> str: | ||
"""Creates an ETag string from the given value.""" | ||
# Wrap in quotes if not already quoted | ||
if not (value.startswith('"') and value.endswith('"')): | ||
value = f'"{value}"' | ||
|
||
# Add weak prefix if necessary | ||
if is_weak: | ||
return f'W/"{value}"' | ||
|
||
return value | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ETag.create quoting bug and type handling
- Non-str values will raise AttributeError on startswith/endswith.
- Weak ETags are double-quoted: returns W/""abc"" instead of W/"abc".
Fix by coercing to str once, quoting once, and prefixing without re-quoting.
- def create(value: Any, is_weak: bool = False) -> str:
- """Creates an ETag string from the given value."""
- # Wrap in quotes if not already quoted
- if not (value.startswith('"') and value.endswith('"')):
- value = f'"{value}"'
-
- # Add weak prefix if necessary
- if is_weak:
- return f'W/"{value}"'
-
- return value
+ def create(value: Any, is_weak: bool = False) -> str:
+ """Creates an ETag string from the given value."""
+ value_str = str(value)
+ if not (value_str.startswith('"') and value_str.endswith('"')):
+ value_str = f'"{value_str}"'
+ return f"W/{value_str}" if is_weak else value_str
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def create(value: Any, is_weak: bool = False) -> str: | |
"""Creates an ETag string from the given value.""" | |
# Wrap in quotes if not already quoted | |
if not (value.startswith('"') and value.endswith('"')): | |
value = f'"{value}"' | |
# Add weak prefix if necessary | |
if is_weak: | |
return f'W/"{value}"' | |
return value | |
def create(value: Any, is_weak: bool = False) -> str: | |
"""Creates an ETag string from the given value.""" | |
value_str = str(value) | |
if not (value_str.startswith('"') and value_str.endswith('"')): | |
value_str = f'"{value_str}"' | |
return f"W/{value_str}" if is_weak else value_str |
🤖 Prompt for AI Agents
In fiftyone/server/utils/http.py around lines 16 to 27, the current create()
calls startswith/endswith on value and re-quotes when is_weak causing
AttributeError for non-str inputs and double quotes for weak ETags; coerce value
to str once (e.g. s = str(value)), then ensure it is quoted only if not already
(if not (s.startswith('"') and s.endswith('"')): s = f'"{s}"'), and when is_weak
prefix with W/ without adding extra quotes (return f'W/{s}' for weak, otherwise
return s).
class Encoder(json.JSONEncoder): | ||
"""Custom JSON encoder that handles numpy types.""" | ||
|
||
def default(self, o): | ||
"""Override the default method to handle numpy types.""" | ||
|
||
if isinstance(o, np.floating): | ||
return float(o) | ||
|
||
if isinstance(o, np.integer): | ||
return int(o) | ||
|
||
return json.JSONEncoder.default(self, o) | ||
|
||
|
||
def dumps(obj: Any) -> str: | ||
"""Serializes an object to a JSON-formatted string.""" | ||
return json_util.dumps(obj, cls=Encoder) | ||
|
||
|
||
def loads(s: Union[str, bytes, bytearray, None]) -> Any: | ||
"""Deserializes a JSON-formatted string to a Python object.""" | ||
return json_util.loads(s) if s else {} | ||
|
||
|
||
class JSONResponse(StarletteJSONResponse): | ||
"""Custom JSON response that uses the custom Encoder.""" | ||
|
||
def render(self, content: Any) -> bytes: | ||
return dumps(content).encode("utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restore numpy handling before delegating to json_util
json_util.dumps
always passes its own default
callable. Because Encoder
doesn’t override __init__
, json.JSONEncoder.__init__
assigns that callable to the instance attribute self.default
, shadowing this method. As a result our numpy checks never run and we fall straight back to BSON extended JSON. Please capture the fallback first and reinstate the method so we can coerce numpy scalars before delegating.
class Encoder(json.JSONEncoder):
"""Custom JSON encoder that handles numpy types."""
+ def __init__(self, *args, **kwargs):
+ self._fallback_default = kwargs.pop("default", None)
+ super().__init__(*args, default=None, **kwargs)
+
def default(self, o):
"""Override the default method to handle numpy types."""
if isinstance(o, np.floating):
return float(o)
if isinstance(o, np.integer):
return int(o)
- return json.JSONEncoder.default(self, o)
+ if self._fallback_default is not None:
+ return self._fallback_default(o)
+
+ return super().default(o)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
class Encoder(json.JSONEncoder): | |
"""Custom JSON encoder that handles numpy types.""" | |
def default(self, o): | |
"""Override the default method to handle numpy types.""" | |
if isinstance(o, np.floating): | |
return float(o) | |
if isinstance(o, np.integer): | |
return int(o) | |
return json.JSONEncoder.default(self, o) | |
def dumps(obj: Any) -> str: | |
"""Serializes an object to a JSON-formatted string.""" | |
return json_util.dumps(obj, cls=Encoder) | |
def loads(s: Union[str, bytes, bytearray, None]) -> Any: | |
"""Deserializes a JSON-formatted string to a Python object.""" | |
return json_util.loads(s) if s else {} | |
class JSONResponse(StarletteJSONResponse): | |
"""Custom JSON response that uses the custom Encoder.""" | |
def render(self, content: Any) -> bytes: | |
return dumps(content).encode("utf-8") | |
import json | |
import numpy as np | |
from typing import Any, Union | |
from bson import json_util | |
from starlette.responses import JSONResponse as StarletteJSONResponse | |
class Encoder(json.JSONEncoder): | |
"""Custom JSON encoder that handles numpy types.""" | |
def __init__(self, *args, **kwargs): | |
# Capture any default passed in (e.g. by json_util.dumps) | |
self._fallback_default = kwargs.pop("default", None) | |
# Force JSONEncoder to use our .default method | |
super().__init__(*args, default=None, **kwargs) | |
def default(self, o): | |
"""Override the default method to handle numpy types.""" | |
if isinstance(o, np.floating): | |
return float(o) | |
if isinstance(o, np.integer): | |
return int(o) | |
# Delegate to the captured fallback, if provided | |
if self._fallback_default is not None: | |
return self._fallback_default(o) | |
# Otherwise use the normal JSONEncoder logic | |
return super().default(o) | |
def dumps(obj: Any) -> str: | |
"""Serializes an object to a JSON-formatted string.""" | |
return json_util.dumps(obj, cls=Encoder) | |
def loads(s: Union[str, bytes, bytearray, None]) -> Any: | |
"""Deserializes a JSON-formatted string to a Python object.""" | |
return json_util.loads(s) if s else {} | |
class JSONResponse(StarletteJSONResponse): | |
"""Custom JSON response that uses the custom Encoder.""" | |
def render(self, content: Any) -> bytes: | |
return dumps(content).encode("utf-8") |
🤖 Prompt for AI Agents
In fiftyone/server/utils/json/encoder.py around lines 16 to 45, the custom
Encoder's numpy handling is shadowed because json_util.dumps supplies its own
default callable via JSONEncoder.__init__, so our default method never runs; fix
by overriding Encoder.__init__ to capture any incoming default into an instance
attribute (e.g. self._fallback_default), then set self.default to a wrapper
method that first checks for numpy types (returning float/int as currently
implemented) and otherwise delegates to the captured fallback_default (or to
super().default if no fallback), ensuring unhandled objects are passed through
to json_util's fallback behavior; keep dumps/loads usage unchanged and preserve
signatures and return types.
What changes are proposed in this pull request?
Merge from develop for latest API changes
How is this patch tested? If it is not, please explain why.
local
Release Notes
Is this a user-facing change that should be mentioned in the release notes?
notes for FiftyOne users.
(Details in 1-2 sentences. You can just refer to another PR with a description
if this PR is part of a larger change.)
What areas of FiftyOne does this PR affect?
fiftyone
Python library changesSummary by CodeRabbit