Skip to content

Conversation

swheaton
Copy link
Contributor

@swheaton swheaton commented Sep 10, 2025

$function is deprecated. It's also slow and bad security (which is why it's deprecated).

What changes are proposed in this pull request?

  1. Translate ViewExpression.sort to mongo native $sortArray. This was added in 5.2 which wouldn't have been OK originally but is fine now since we require >=6.0.
  2. In server count list items, replace the JS function with a fairly complicated mongo expression that also makes a histogram of values. Unfortunately it is not nearly as succinct as the JS function, but it will probably be faster since it's mongo native

NOTE: ViewExpression.sort docstring examples do not work in any configuration I've tried, with code going back 4+ years. I've tried mongodb versions back to 4.2. Both local (Mac) and some deployed (Linux). Either the docstrings have been totally wrong forever, the function has been broken forever, or I'm doing something silly. Would love to know in any case.

How is this patch tested? If it is not, please explain why.

App runs if you disable serverside JS 🤷🏼
Gemini says they're logically equivalent 🤷🏼 🤖
Letting unit tests run to see results of that.
I'm open to more tests to run? I don't really know what the server function populates (2)

Release Notes

Is this a user-facing change that should be mentioned in the release notes?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release
    notes for FiftyOne users.

Remove usage of deprecated $function MongoDB stage

What areas of FiftyOne does this PR affect?

  • App: FiftyOne application changes
  • Build: Build and test infrastructure changes
  • Core: Core fiftyone Python library changes
  • Documentation: FiftyOne documentation changes
  • Other

Summary by CodeRabbit

  • Refactor

    • Array sorting now uses database-side sorting for deterministic results; the previous client-side function-based sorter was removed. Reversing arrays remains supported.
    • List-item counting now uses a database aggregation to produce item-count maps for lists, deriving counts from centralized label data for improved consistency.
  • Chores

    • Updated internal dependencies to support the new aggregation-based approaches.
  • Documentation

    • Updated docs to reflect the sorting removal and new counting behavior.

@swheaton swheaton requested review from a team as code owners September 10, 2025 01:57
Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Walkthrough

Replaces JS-based array sorting and counting with MongoDB aggregation expressions: adds a pymongo import and changes ViewExpression.sort to emit a $sortArray stage (using pymongo.ASCENDING/DESCENDING) and removes ViewExpression._function; replaces server-side JS list-item counting with a MongoDB aggregation expression assigned via set_field(..., _allow_missing=True).

Changes

Cohort / File(s) Summary
Expressions: sort refactor & API removal
fiftyone/core/expressions.py
Added pymongo import. Reworked ViewExpression.sort(...) to produce a $sortArray expression (uses pymongo.DESCENDING / pymongo.ASCENDING and optional key) and removed the internal ViewExpression._function(function) JS wrapper and the JS-based sorting path.
Server: list-item counting aggregation
fiftyone/server/view.py
Replaced JS-based _count_list_items implementation with a MongoDB aggregation expression (uses $arrayToObject, $map/$setUnion and $size/$filter to build a counts map) and applies the expression via view.set_field(path, expr, _allow_missing=True) instead of a JS function.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant App(ViewExpression / Server)
    participant MongoDB

    Client->>App: Request view update (sort or count)
    alt sort -> $sortArray
        App->>MongoDB: Build expression with $sortArray (input: expr, sortBy: {...})
        MongoDB-->>App: Evaluated/represented sorted array
        App-->>Client: Return ViewExpression containing $sortArray stage
    else count -> aggregation expr
        App->>MongoDB: Build aggregation expr ($map / $setUnion -> $filter/$size -> $arrayToObject)
        MongoDB-->>App: Evaluated counts map (aggregation result or expression)
        App->>App: set_field(path, expr, _allow_missing=True)
        App-->>Client: Acknowledge field set / return expression
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title "Remove serverside JS usages" is concise and accurately captures the primary intent of the changeset — removing server-side JavaScript ($function) in favor of MongoDB-native aggregation (e.g., $sortArray and native expressions) — and is directly related to the files and summaries in the PR. It is short, specific, and free of extraneous details.
Description Check ✅ Passed The PR description follows the repository template and provides the required elements: a clear summary of proposed changes, motivation (deprecation/security of $function), testing notes, release-note selection, and affected areas. It documents the two main code changes, notes testing status and observed differences, and flags outstanding investigation items, so it is sufficiently complete for review.

Poem

I twitch my whiskers, nudge the heap,
No JS sly tricks to wake from sleep.
I sort with Mongo, count with maps,
Hop, merge, and tidy up the scraps.
A rabbit's nod — the query's neat. 🐇

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/remove-serverside-js-mongo

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
fiftyone/core/expressions.py (1)

18-18: Remove unused pymongo import

pymongo isn't referenced in this module. Drop it to avoid lint failures and unnecessary dependency coupling.

- import pymongo
fiftyone/server/view.py (1)

949-982: Optional: use $setField to simplify and avoid $mergeObjects+$arrayToObject

MongoDB 5.0+ supports $setField, which directly sets/updates a key on an object. This is clearer and avoids building single-element objects each step.

 def _count_list_items(path, view):
-    expr = {
-        "$reduce": {
-            "input": F(path),  # The field to be processed
-            "initialValue": {},
-            "in": {
-                "$mergeObjects": [
-                    "$$value",
-                    {
-                        "$arrayToObject": [
-                            [
-                                {
-                                    "k": "$$this",
-                                    "v": {
-                                        "$add": [
-                                            {
-                                                "$ifNull": [
-                                                    {
-                                                        "$getField": {
-                                                            "input": "$$value",
-                                                            "field": "$$this",
-                                                        }
-                                                    },
-                                                    0,
-                                                ]
-                                            },
-                                            1,
-                                        ]
-                                    },
-                                }
-                            ]
-                        ]
-                    },
-                ]
-            },
-        }
-    }
+    expr = {
+        "$reduce": {
+            "input": F(path),
+            "initialValue": {},
+            "in": {
+                "$setField": {
+                    "input": "$$value",
+                    "field": "$$this",
+                    "value": {
+                        "$add": [
+                            {
+                                "$ifNull": [
+                                    {
+                                        "$getField": {
+                                            "input": "$$value",
+                                            "field": "$$this",
+                                        }
+                                    },
+                                    0,
+                                ]
+                            },
+                            1,
+                        ]
+                    },
+                }
+            },
+        }
+    }
     return view.set_field(path, expr, _allow_missing=True)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 23b4ae2 and 9119a27.

📒 Files selected for processing (2)
  • fiftyone/core/expressions.py (1 hunks)
  • fiftyone/server/view.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
fiftyone/server/view.py (2)
fiftyone/core/collections.py (2)
  • view (1059-1066)
  • set_field (6332-6441)
fiftyone/core/expressions.py (1)
  • set_field (2061-2123)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: test-windows / test-python (windows-latest, 3.12)
  • GitHub Check: test-windows / test-python (windows-latest, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: test-windows / test-python (windows-latest, 3.10)
  • GitHub Check: test-windows / test-python (windows-latest, 3.9)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
  • GitHub Check: test / test-app
  • GitHub Check: lint / eslint
  • GitHub Check: build / build
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build
🔇 Additional comments (1)
fiftyone/server/view.py (1)

949-982: Confirm tag key validity for object field names

This builds a document whose keys are tag strings. If tags can contain '.' or start with '$', MongoDB will error on invalid field names. If such tags are possible, either sanitize keys or store counts as an array of {k, v} pairs instead. Otherwise, please confirm upstream constraints disallow these characters.

Copy link
Contributor

@kaixi-wang kaixi-wang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem like it gives the correct results.
added 8 label tags 'test' to sample then ran


view= _add_labels_tags_counts(ds)
pipeline2=view._pipeline()
pipeline2.extend([{
    "$match": {
        "_label_tags": {"$ne": {}}
    }
},  {"$project":{"_label_tags":1}}])

# proposed
In [37]: list(ds._sample_collection.aggregate(pipeline))
Out[37]: [{'_id': ObjectId('6829068fb59076c98c88bdcc'), '_label_tags': {'test': 1}}]

# existing
list(ds._sample_collection.aggregate(pipeline2))
Out[34]: [{'_id': ObjectId('6829068fb59076c98c88bdcc'), '_label_tags': {'test': 8.0}}]

@swheaton
Copy link
Contributor Author

Fixed. The only difference now is that the count is an integer instead of a float, idk if that matters

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
fiftyone/core/expressions.py (1)

2667-2741: Fix behavior/doc drift in ViewExpression.sort; deprecate/handle numeric and update docs

  • The docstring still references JS .sort() and implies string-based sorting; the new impl uses MongoDB $sortArray. Update docs accordingly.
  • numeric is now ignored. Either implement a cast workflow or at least emit a deprecation warning to avoid silent behavior changes.

Apply:

@@
-        If no ``key`` is provided, this array must contain elements whose
-        BSON representation can be sorted by JavaScript's ``.sort()`` method.
+        If no ``key`` is provided, the array elements are sorted by their
+        values using MongoDB's ``$sortArray`` (ascending by default).
@@
-            numeric (False): whether the array contains numeric values. By
-                default, the values will be sorted alphabetically by their
-                string representations
+            numeric (False): deprecated; ``$sortArray`` sorts numeric types
+                natively. If your data are numeric strings, cast to numbers
+                before sorting (e.g. via ``map(...).to_double()``).
@@
-        sort_order = pymongo.DESCENDING if reverse else pymongo.ASCENDING
+        if numeric:
+            warnings.warn(
+                "numeric is deprecated; $sortArray sorts numeric types natively. "
+                "For numeric strings, cast to numbers before sorting.",
+                DeprecationWarning,
+            )
+        sort_order = -1 if reverse else 1
🧹 Nitpick comments (2)
fiftyone/core/expressions.py (2)

18-18: Drop hard dependency on pymongo for sort constants

$sortArray accepts 1/-1; importing pymongo here adds an unnecessary import to core expressions.

Apply:

- import pymongo
+ # keep this module pymongo-agnostic; $sortArray accepts 1 (asc) and -1 (desc)

And (already covered above in sort()):

- sort_order = pymongo.DESCENDING if reverse else pymongo.ASCENDING
+ sort_order = -1 if reverse else 1

Also applies to: 2727-2727


2729-2733: Type expectations for key

If callers pass a ViewField instead of a string, the resulting $sortArray.sortBy would be incorrect (keys must be field names, not $-prefixed paths). Guard or coerce to string, or raise with a clear error.

Apply:

         if key is not None:
-            sort_by = {key: sort_order}
+            if not isinstance(key, str):
+                raise TypeError("sort(key=...) must be a field path string")
+            sort_by = {key: sort_order}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9119a27 and a374307.

📒 Files selected for processing (2)
  • fiftyone/core/expressions.py (2 hunks)
  • fiftyone/server/view.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • fiftyone/server/view.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: test-windows / test-python (windows-latest, 3.9)
  • GitHub Check: test-windows / test-python (windows-latest, 3.12)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.12)
  • GitHub Check: test-windows / test-python (windows-latest, 3.11)
  • GitHub Check: test-windows / test-python (windows-latest, 3.10)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
  • GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
  • GitHub Check: e2e / test-e2e
  • GitHub Check: build

@swheaton
Copy link
Contributor Author

idk why tests would pass in ubuntu but fail in windows. those tests shouldn't have anything to do with the OS? Unless their MongoDB installations differ in behavior which would be wild?

@swheaton
Copy link
Contributor Author

Ok so confusingly this is because windows downloads v6 while ubuntu downloads v8 of mongod. So it's a version thing not a platform thing

Copy link
Contributor

@brimoor brimoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW- the example code for sort() still runs successfully for me on develop. I'm running MongoDB 5.0.2 on an x86 mac.

I'm onboard with swapping out $function for $sortArray though. Although looks like I'll have to upgrade to at least Mongo 5.2 if I want to use this function going forward 😝

comp = "(a, b) => a - b"
else:
comp = ""
sort_order = pymongo.DESCENDING if reverse else pymongo.ASCENDING
Copy link
Contributor

@brimoor brimoor Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments on the implementation:

  • The numeric parameter is no longer used. Assuming that the new implementation sorts strings alphabetically and numbers numerically by default, can you delete the numeric parameter?
  • The docstring currently mentions "JavaScript's .sort() method". Can you update to say something like this?
If no ``key`` is provided, this array must contain sortable non-document values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright will clean up if we actually want to go forward with this. Sort of a prototype at the moment

return view.set_field(
path, F(path)._function(function), _allow_missing=True
)
expr = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defer to @benjaminpkane to evaluate the suitability of this new implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it's not working in mongo6. Solution is getting really ugly ...

Copy link
Contributor

@benjaminpkane benjaminpkane Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing this to existence instead of counts could work. These counts are for the tag:count grid bubble values but showing the distinct tags is sufficient imo if we are technically blocked (also a performance improvement for the grid, less computation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got a working version now I think. Check it out and lmk.
Of course if you want to change the grid behavior that would work too but that's up to you + your squad

@swheaton
Copy link
Contributor Author

FWIW- the example code for sort() still runs successfully for me on develop. I'm running MongoDB 5.0.2 on an x86 mac.

I'm onboard with swapping out $function for $sortArray though. Although looks like I'll have to upgrade to at least Mongo 5.2 if I want to use this function going forward 😝

well I have no idea what's going on then because I definitely tried Mongo 5.
But yes you should upgrade, as FiftyOne requires a minimum of Mongo 6 😉

@swheaton swheaton marked this pull request as draft September 10, 2025 13:56
@swheaton swheaton marked this pull request as ready for review September 11, 2025 03:07
@swheaton swheaton marked this pull request as draft September 11, 2025 03:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
fiftyone/server/view.py (1)

950-970: Verify tag values won’t produce illegal document keys

$arrayToObject will fail if any tag contains . or starts with $. If such tags are possible, encode keys (e.g., replace . with a safe char and track a reverse map) or return an array of {tag, count} objects instead.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a374307 and 2c2a7ba.

📒 Files selected for processing (1)
  • fiftyone/server/view.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
fiftyone/server/view.py (2)
fiftyone/core/collections.py (2)
  • view (1059-1066)
  • set_field (6332-6441)
fiftyone/core/expressions.py (1)
  • set_field (2061-2123)

Comment on lines +950 to +970
expr = {
"$arrayToObject": {
"$map": {
"input": {"$setUnion": "$_label_tags"},
"as": "tag",
"in": {
"k": "$$tag",
"v": {
"$size": {
"$filter": {
"input": "$_label_tags",
"as": "t",
"cond": {"$eq": ["$$t", "$$tag"]},
}
}
},
},
}
}
}
return view.set_field(path, expr, _allow_missing=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Critical: $setUnion invocation is invalid (and hard-codes the source path)

  • $setUnion expects an array of arrays; passing a single array expression will error on MongoDB 6.x.
  • The source is hard-coded to $_label_tags, ignoring the path param.

Fix both by referencing path and unioning with []:

 def _count_list_items(path, view):
-    expr = {
-        "$arrayToObject": {
-            "$map": {
-                "input": {"$setUnion": "$_label_tags"},
-                "as": "tag",
-                "in": {
-                    "k": "$$tag",
-                    "v": {
-                        "$size": {
-                            "$filter": {
-                                "input": "$_label_tags",
-                                "as": "t",
-                                "cond": {"$eq": ["$$t", "$$tag"]},
-                            }
-                        }
-                    },
-                },
-            }
-        }
-    }
+    expr = {
+        "$arrayToObject": {
+            "$map": {
+                "input": {"$setUnion": [f"${path}", []]},
+                "as": "tag",
+                "in": {
+                    "k": "$$tag",
+                    "v": {
+                        "$size": {
+                            "$filter": {
+                                "input": f"${path}",
+                                "as": "t",
+                                "cond": {"$eq": ["$$t", "$$tag"]},
+                            }
+                        }
+                    },
+                },
+            }
+        }
+    }
     return view.set_field(path, expr, _allow_missing=True)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
expr = {
"$arrayToObject": {
"$map": {
"input": {"$setUnion": "$_label_tags"},
"as": "tag",
"in": {
"k": "$$tag",
"v": {
"$size": {
"$filter": {
"input": "$_label_tags",
"as": "t",
"cond": {"$eq": ["$$t", "$$tag"]},
}
}
},
},
}
}
}
return view.set_field(path, expr, _allow_missing=True)
expr = {
"$arrayToObject": {
"$map": {
"input": {"$setUnion": [f"${path}", []]},
"as": "tag",
"in": {
"k": "$$tag",
"v": {
"$size": {
"$filter": {
"input": f"${path}",
"as": "t",
"cond": {"$eq": ["$$t", "$$tag"]},
}
}
},
},
}
}
}
return view.set_field(path, expr, _allow_missing=True)
🤖 Prompt for AI Agents
In fiftyone/server/view.py around lines 950 to 970, the aggregation expr
incorrectly calls $setUnion with a single array and hard-codes $_label_tags;
change it to reference the provided path and pass an array-of-arrays by unioning
the field with an empty array (e.g. use {"$setUnion": [f"${path}", []]}), and
ensure subsequent references to the tags use the same dynamic field expression;
keep returning view.set_field(path, expr, _allow_missing=True).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants