Skip to content

Conversation

@donhardman
Copy link
Contributor

Type of Change (select one):

  • Bug fix
  • New feature
  • Documentation update

Description of the Change:

Related Issue (provide the link):

- Replace single distinct string with list of distinct expressions
- Update parsing, grouping, and sorting to handle multiple distincts
- Maintain backward compatibility using first distinct alias
- Enable multiple COUNT(DISTINCT) clauses in one query
- Propagate distinct expressions across multi-statement queries
- Use .cstr() to convert CSphString to const char* for constructor
- Fixes compilation error introduced in previous commit
@donhardman donhardman requested a review from Copilot August 18, 2025 05:16
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the COUNT(DISTINCT) functionality to support multiple distinct expressions instead of being limited to a single distinct expression. The change migrates from storing a single distinct attribute in m_sGroupDistinct to storing multiple distinct attributes in a vector m_dGroupDistinct.

  • Replaces the single CSphString m_sGroupDistinct field with CSphVector<CSphString> m_dGroupDistinct to store multiple distinct expressions
  • Updates all references from @distinct to use indexed naming scheme @distinct_N where N is the index in the vector
  • Modifies parser logic to handle multiple COUNT(DISTINCT) clauses and generate unique identifiers for each

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/sphinx.h Changes data structure from single string to vector for storing distinct expressions
src/sphinx.cpp Implements new AddDistinct method and updates existing logic to use vector-based distinct storage
src/sphinxselect.y Updates parser to call new AddDistinct method instead of directly adding "@distinct" item
src/searchdsql.cpp Refactors SQL parser to handle multiple distinct expressions with indexed naming
src/sphinxjsonquery.cpp Updates JSON query handling to use new "@distinct_" prefix pattern
src/sphinxfilter.cpp Updates filter logic to recognize new "@distinct_" naming pattern
src/sortergroup.cpp Changes hardcoded "@distinct" reference to "@distinct_0" for backward compatibility
src/queuecreator.cpp Major refactoring to handle multiple distinct expressions in queue creation logic
src/joinsorter.cpp Updates join sorter to iterate over multiple distinct expressions
src/frontendschema.cpp Updates schema sorting to recognize new "@distinct_" pattern
src/daemon/search_handler.cpp Updates search handler to use new distinct vector and naming pattern
src/daemon/api_search.cpp Maintains API compatibility by using first distinct expression for legacy clients

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 18, 2025

Linux debug test results

0 tests   0 ✅  0s ⏱️
0 suites  0 💤
0 files    0 ❌

Results for commit f9b386b.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 18, 2025

Linux release test results

1 134 tests   846 ✅  20m 51s ⏱️
    1 suites   53 💤
    1 files    235 ❌

For more details on these failures, see this check.

Results for commit 1197bee.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 18, 2025

Windows test results

    3 files    3 suites   43m 23s ⏱️
1 111 tests 828 ✅ 53 💤 230 ❌
1 113 runs  830 ✅ 53 💤 230 ❌

For more details on these failures, see this check.

Results for commit 1197bee.

♻️ This comment has been updated with latest results.

Copy link
Contributor

@tomatolog tomatolog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see the issue with master agent communication that could fail with the wrong error message

tOut.SendInt ( q.m_iCutoff );
tOut.SendInt ( q.m_iRetryCount<0 ? 0 : q.m_iRetryCount ); // runaround for old clients.
tOut.SendInt ( q.m_iRetryDelay<0 ? 0 : q.m_iRetryDelay );
tOut.SendString ( q.m_sGroupDistinct.cstr() );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can not just replace the logic as new master could send request into the old agent and agent will fail with the wrong error message.
It could be better to increase VER_COMMAND_SEARCH and handle the old behavior for single distinct and allow agent to faile the incoming request from the newer master.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also worth for all tests with the new result set to ask LLM to calculate these results set from the raw table data and compare that output with the result sets tests got to make sure the new result sets are valid. Otherwise somebody need to check these results sets by hand.

donhardman and others added 4 commits August 19, 2025 11:33
- Increase protocol version to 0x11A
- Support multiple group distinct fields in queries
- Align VER_COMMAND_SEARCH values across Ruby, PHP, MySQL, and C++ code to 0x119/0x121
- Update protocol version comments to reflect new version numbers
- Fix minor whitespace inconsistencies in ha_sphinx.cc
tQuery.m_iRetryDelay = tReq.GetInt ();
tQuery.m_sGroupDistinct = tReq.GetString ();
sphColumnToLowercase ( const_cast<char *>( tQuery.m_sGroupDistinct.cstr() ) );
// Version 0x121+: Read multiple distinct fields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still wrong usage of the uVer

	// v.1.27
	if ( uVer>=0x121+ )
	{
read multiple distincts
} else
{
read the single distinct string

ie the usual nodes upgrade strategy from the manual

If you are running Manticore Search in a distributed environment with multiple instances, make sure to first upgrade the agents, then the masters.

  • first upgrade the agents. The master will send the old protocol request to the new agent should be able to parse it then reply to that master
  • next upgrade the master. The master will send the new protocol request to the new agent and the new agent should be able to parse it. But if there is an old agent it will fail request with the error message the agent has the old protocol version

That is why now al ubertest fails as sphinxapi.php uses the old protocol version and send the old packet format the new daemon unable to handle. If the receive will support the new and old format version it should fix the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the is still lack of the ubertest for that new functionality to make sure it works well and the cases with multiple distincts and alias to the multiple distincts does not break.

It also strange that test 20 failed for the simple count cases with error messages about schema but all other cases works and another test now returns multiple distinct values. As I said

it also worth for all tests with the new result set to ask LLM to calculate these results set from the raw table data and compare that output with the result sets tests got to make sure the new result sets are valid. Otherwise somebody need to check these results sets by hand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants