-
-
Notifications
You must be signed in to change notification settings - Fork 619
Feature/multiple count distinct #3682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
- Replace single distinct string with list of distinct expressions - Update parsing, grouping, and sorting to handle multiple distincts - Maintain backward compatibility using first distinct alias - Enable multiple COUNT(DISTINCT) clauses in one query - Propagate distinct expressions across multi-statement queries
- Use .cstr() to convert CSphString to const char* for constructor - Fixes compilation error introduced in previous commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the COUNT(DISTINCT) functionality to support multiple distinct expressions instead of being limited to a single distinct expression. The change migrates from storing a single distinct attribute in m_sGroupDistinct to storing multiple distinct attributes in a vector m_dGroupDistinct.
- Replaces the single
CSphString m_sGroupDistinctfield withCSphVector<CSphString> m_dGroupDistinctto store multiple distinct expressions - Updates all references from
@distinctto use indexed naming scheme@distinct_Nwhere N is the index in the vector - Modifies parser logic to handle multiple COUNT(DISTINCT) clauses and generate unique identifiers for each
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/sphinx.h | Changes data structure from single string to vector for storing distinct expressions |
| src/sphinx.cpp | Implements new AddDistinct method and updates existing logic to use vector-based distinct storage |
| src/sphinxselect.y | Updates parser to call new AddDistinct method instead of directly adding "@distinct" item |
| src/searchdsql.cpp | Refactors SQL parser to handle multiple distinct expressions with indexed naming |
| src/sphinxjsonquery.cpp | Updates JSON query handling to use new "@distinct_" prefix pattern |
| src/sphinxfilter.cpp | Updates filter logic to recognize new "@distinct_" naming pattern |
| src/sortergroup.cpp | Changes hardcoded "@distinct" reference to "@distinct_0" for backward compatibility |
| src/queuecreator.cpp | Major refactoring to handle multiple distinct expressions in queue creation logic |
| src/joinsorter.cpp | Updates join sorter to iterate over multiple distinct expressions |
| src/frontendschema.cpp | Updates schema sorting to recognize new "@distinct_" pattern |
| src/daemon/search_handler.cpp | Updates search handler to use new distinct vector and naming pattern |
| src/daemon/api_search.cpp | Maintains API compatibility by using first distinct expression for legacy clients |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Linux debug test results0 tests 0 ✅ 0s ⏱️ Results for commit f9b386b. ♻️ This comment has been updated with latest results. |
Linux release test results1 134 tests 846 ✅ 20m 51s ⏱️ For more details on these failures, see this check. Results for commit 1197bee. ♻️ This comment has been updated with latest results. |
Windows test results 3 files 3 suites 43m 23s ⏱️ For more details on these failures, see this check. Results for commit 1197bee. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the issue with master agent communication that could fail with the wrong error message
| tOut.SendInt ( q.m_iCutoff ); | ||
| tOut.SendInt ( q.m_iRetryCount<0 ? 0 : q.m_iRetryCount ); // runaround for old clients. | ||
| tOut.SendInt ( q.m_iRetryDelay<0 ? 0 : q.m_iRetryDelay ); | ||
| tOut.SendString ( q.m_sGroupDistinct.cstr() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can not just replace the logic as new master could send request into the old agent and agent will fail with the wrong error message.
It could be better to increase VER_COMMAND_SEARCH and handle the old behavior for single distinct and allow agent to faile the incoming request from the newer master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it also worth for all tests with the new result set to ask LLM to calculate these results set from the raw table data and compare that output with the result sets tests got to make sure the new result sets are valid. Otherwise somebody need to check these results sets by hand.
- Increase protocol version to 0x11A - Support multiple group distinct fields in queries
- Align VER_COMMAND_SEARCH values across Ruby, PHP, MySQL, and C++ code to 0x119/0x121 - Update protocol version comments to reflect new version numbers - Fix minor whitespace inconsistencies in ha_sphinx.cc
| tQuery.m_iRetryDelay = tReq.GetInt (); | ||
| tQuery.m_sGroupDistinct = tReq.GetString (); | ||
| sphColumnToLowercase ( const_cast<char *>( tQuery.m_sGroupDistinct.cstr() ) ); | ||
| // Version 0x121+: Read multiple distinct fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still wrong usage of the uVer
// v.1.27
if ( uVer>=0x121+ )
{
read multiple distincts
} else
{
read the single distinct string
ie the usual nodes upgrade strategy from the manual
If you are running Manticore Search in a distributed environment with multiple instances, make sure to first upgrade the agents, then the masters.
- first upgrade the agents. The master will send the old protocol request to the new agent should be able to parse it then reply to that master
- next upgrade the master. The master will send the new protocol request to the new agent and the new agent should be able to parse it. But if there is an old agent it will fail request with the error message the agent has the old protocol version
That is why now al ubertest fails as sphinxapi.php uses the old protocol version and send the old packet format the new daemon unable to handle. If the receive will support the new and old format version it should fix the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the is still lack of the ubertest for that new functionality to make sure it works well and the cases with multiple distincts and alias to the multiple distincts does not break.
It also strange that test 20 failed for the simple count cases with error messages about schema but all other cases works and another test now returns multiple distinct values. As I said
it also worth for all tests with the new result set to ask LLM to calculate these results set from the raw table data and compare that output with the result sets tests got to make sure the new result sets are valid. Otherwise somebody need to check these results sets by hand.
Type of Change (select one):
Description of the Change:
Related Issue (provide the link):