Skip to content

[Cosmos][Query] Support for mixed-type order by queries #43965

@simorenoh

Description

@simorenoh

From a user:

while doing some tests I came across an odd behavior. I have a collection with millions of docs where all but one have "value" property defined and this property is always a string (see below). The single document where this property is not defined appears to break ORDER BY clause. The specific error I am running into is "Expected NoValue, but got string. Cannot execute cross partition order-by queries on mixed types. Consider filtering your query using IS_STRING or IS_NUMBER to get around this exception."

Why does query engine or SDK consider missing property a "mixed type"? This feels very unintuitive. After some searching, I came across Azure/azure-cosmos-dotnet-v3#733 which seems to refer to the exact same issue. However, the discussion also references two different PRs from some years ago that mention the issue was fixed. Is there a recent regression or has it actually never worked/been fully fixed?

I ran the queries using Portal as well as Python SDK with same result. My queries:

SELECT TOP 20 * FROM c ORDER BY c.value ASC -- this fails with "Expected string, but got NoValue. Cannot execute cross partition order-by queries on mixed types. Consider filtering your query using IS_STRING or IS_NUMBER to get around this exception."
SELECT TOP 20 * FROM c ORDER BY c.value DESC -- this runs fine
SELECT TOP 20 * FROM c WHERE IS_DEFINED(c.value) ORDER BY c.value ASC -- this runs fine
SELECT TOP 20 * FROM c WHERE IS_DEFINED(c.value) ORDER BY c.value DESC -- this runs fine
SELECT DISTINCT c.value FROM c -- for reference here's output of distinct [{"value": "20"}, {"value": "21"}, {}] note that there's empty object because one doc doesn't have value defined at all.

SELECT TOP 20 * FROM c WHERE (c.id = 'some-id') OR (c.id = 'some-id2') ORDER BY c.value ASC -- this runs fine; doc id 'some-id' is the test doc without value and query returns it first in the result set; note both docs belong to the same physical partition
SELECT TOP 20 * FROM c WHERE (c.id = 'some-id') OR (c.id = 'some-id2') ORDER BY c.value DESC -- just for sanity check, this works too; query returns doc 'some-id' last in the result set
SELECT TOP 20 * FROM c WHERE (c.id = 'some-id') OR (c.id = 'some-id3') ORDER BY c.value ASC -- this is same as example before last one, but it fails; the difference is now each doc comes from different partition and somehow SDK or engine can't figure it out;
SELECT TOP 20 * FROM c WHERE (c.id = 'some-id') OR (c.id = 'some-id3') ORDER BY c.value DESC -- this fails too

We currently call this out as a limitation in our SDK README: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/cosmos/azure-cosmos/README.md#data-plane-limitations. This would be the official tracking item for this work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    ClientThis issue points to a problem in the data-plane of the library.CosmosService AttentionWorkflow: This issue is responsible by Azure service team.cosmos-python-p0needs-team-attentionWorkflow: This issue needs attention from Azure service team or SDK team

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions