Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize ch dicts #7051

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

optimize ch dicts #7051

wants to merge 10 commits into from

Conversation

pjain1
Copy link
Member

@pjain1 pjain1 commented Apr 2, 2025

Fixes #6909

Checklist:

  • Covered by tests
  • Ran it and it works as intended
  • Reviewed the diff before requesting a review
  • Checked for unhandled edge cases
  • Linked the issues it closes
  • Checked if the docs need to be updated
  • Intend to cherry-pick into the release branch
  • I'm proud of this work!

@pjain1 pjain1 marked this pull request as draft April 2, 2025 20:46
@pjain1 pjain1 requested review from Copilot and begelundmuller and removed request for Copilot April 4, 2025 05:54
@pjain1 pjain1 marked this pull request as ready for review April 4, 2025 18:23
@@ -39,4 +39,5 @@ message Subquery {
repeated string measures = 2;
Expression where = 3;
Expression having = 4;
string raw_sql = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't safe – it would enable even users with just viewer access to inject arbitrary SQL

Comment on lines +179 to +184
Lookup lookup = 8; // Lookup fields for the dimension
}
message Lookup {
string table = 1;
string key_column = 2;
string value_column = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay to nest the properties in YAML, but in the protos it's easiest to keep types flat (most places we only use nested types for arrays or maps). E.g.:

string lookup_table = 8;
string lookup_key_column = 9;
string lookup_value_column = 10;

Comment on lines +772 to +777
func (d Dialect) DictGetExpr(lookupTable, lookupColumn, lookupKey string) string {
switch d {
case DialectClickHouse:
return fmt.Sprintf("dictGet('%s', '%s', %s)", lookupTable, lookupColumn, lookupKey)
default:
return ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this easily be generalized for Druid and DuckDB also? E.g. call the function MetricsViewDimensionLookupExpression and use lookup in Druid and a subquery in DuckDB?

@@ -40,6 +40,11 @@ type MetricsViewYAML struct {
Ignore bool `yaml:"ignore"` // Deprecated
Unnest bool
URI string
Lookup *struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't make this a pointer, the code gets simpler:

Suggested change
Lookup *struct {
Lookup struct {

@@ -186,6 +187,7 @@ type Condition struct {
}

type Subquery struct {
RawSQL string `mapstructure:"raw_sql"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as earlier – dangerous to expose raw_sql in the Query type where anyone with access can set it

Value any `mapstructure:"val"`
Condition *Condition `mapstructure:"cond"`
Subquery *Subquery `mapstructure:"subquery"`
Identifier string `mapstructure:"identifier"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment – I think it's unsafe even to expose Identifier in the metrics query API because it can leak underlying fields that are not exposed through dimension/measure expressions.


// Rewrites filter on dictionary columns to use the key column instead of the value column. For example, if the dictionary value column is "country" and key column is "country_key",
// then the filter will be rewritten to country_key = 'US' instead of country = 'USA'.
func (e *Executor) rewriteClickhouseDictFilters(qry *Query) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since lookups are introduced as a general concept in the metrics view spec, I wonder if it should be handled directly inside astexpr.go when converting an expression to SQL?

I think that would also avoid the need for exposing internal fields in the Query type (which is supposed to be user-facing, as opposed to AST which is intended to be internal).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize Query time lookups for ClickHouse
2 participants