-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Description
From a user:
while doing some tests I came across an odd behavior. I have a collection with millions of docs where all but one have "value" property defined and this property is always a string (see below). The single document where this property is not defined appears to break ORDER BY clause. The specific error I am running into is "Expected NoValue, but got string. Cannot execute cross partition order-by queries on mixed types. Consider filtering your query using IS_STRING or IS_NUMBER to get around this exception."
Why does query engine or SDK consider missing property a "mixed type"? This feels very unintuitive. After some searching, I came across Azure/azure-cosmos-dotnet-v3#733 which seems to refer to the exact same issue. However, the discussion also references two different PRs from some years ago that mention the issue was fixed. Is there a recent regression or has it actually never worked/been fully fixed?
I ran the queries using Portal as well as Python SDK with same result. My queries:
SELECT TOP 20 * FROM c ORDER BY c.value ASC -- this fails with "Expected string, but got NoValue. Cannot execute cross partition order-by queries on mixed types. Consider filtering your query using IS_STRING or IS_NUMBER to get around this exception."
SELECT TOP 20 * FROM c ORDER BY c.value DESC -- this runs fine
SELECT TOP 20 * FROM c WHERE IS_DEFINED(c.value) ORDER BY c.value ASC -- this runs fine
SELECT TOP 20 * FROM c WHERE IS_DEFINED(c.value) ORDER BY c.value DESC -- this runs fine
SELECT DISTINCT c.value FROM c -- for reference here's output of distinct [{"value": "20"}, {"value": "21"}, {}] note that there's empty object because one doc doesn't have value defined at all.
SELECT TOP 20 * FROM c WHERE (c.id = 'some-id') OR (c.id = 'some-id2') ORDER BY c.value ASC -- this runs fine; doc id 'some-id' is the test doc without value and query returns it first in the result set; note both docs belong to the same physical partition
SELECT TOP 20 * FROM c WHERE (c.id = 'some-id') OR (c.id = 'some-id2') ORDER BY c.value DESC -- just for sanity check, this works too; query returns doc 'some-id' last in the result set
SELECT TOP 20 * FROM c WHERE (c.id = 'some-id') OR (c.id = 'some-id3') ORDER BY c.value ASC -- this is same as example before last one, but it fails; the difference is now each doc comes from different partition and somehow SDK or engine can't figure it out;
SELECT TOP 20 * FROM c WHERE (c.id = 'some-id') OR (c.id = 'some-id3') ORDER BY c.value DESC -- this fails too
We currently call this out as a limitation in our SDK README: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/cosmos/azure-cosmos/README.md#data-plane-limitations. This would be the official tracking item for this work.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status