Improve performance of high cardinality grouping by reusing hash values

### Is your feature request related to a problem or challenge?

As described on https://github.com/apache/datafusion/issues/11679, we can do better for high cardinality aggregates

One thing that consumes significant time in such queries is hashing, and I think we can reduce that significantly.

Specifically, for the multi-phase repartition plan, the number of hashed rows is something like 

```
(input cardinality)  + 2 * (intermediate group cardinality) * (number of partitions)
```

For low cardinality aggregates (e.g when the intermediate group cardinality is 1000) the second term is small (a few thousand extra hashes isn't a big deal)

However, for high cardinality aggregates (eg. when the intermediate cardinality is like 1,000,000 and there are 16 partitions) the second term is substantial

In pictures, this looks like


```
               ▲                          ▲
               │                          │
               │                          │
               │                          │
               │                          │
               │                          │
   ┌───────────────────────┐  ┌───────────────────────┐       4. The  AggregateMode::Final
   │GroupBy                │  │GroupBy                │       GroupBy computes hash(group keys)
   │(AggregateMode::Final) │  │(AggregateMode::Final) │       *AGAIN* to find the correct hash
   │                       │  │                       │       bucket
   └───────────────────────┘  └───────────────────────┘
               ▲                          ▲
               │                          │
               └─────────────┬────────────┘
                             │
                             │
                             │
                ┌─────────────────────────┐                   3. The output of the first phase
                │       Repartition       │                   is repartitioned by computing
                │         HASH(x)         │                   hash(group keys) -- this is the
                └─────────────────────────┘                   same hash as computed in step 2.
                             ▲
                             │
             ┌───────────────┴─────────────┐
             │                             │
             │                             │
┌─────────────────────────┐  ┌──────────────────────────┐     2. Each AggregateMode::Partial
│        GroubyBy         │  │         GroubyBy         │     GroupBy hashes the group keys to
│(AggregateMode::Partial) │  │ (AggregateMode::Partial) │     find the correct hash bucket.
└─────────────────────────┘  └──────────────────────────┘
             ▲                             ▲
             │                            ┌┘
             │                            │
        .─────────.                  .─────────.
     ,─'           '─.            ,─'           '─.
    ;      Input      :          ;      Input      :          1. Input is read
    :   Partition 0   ;          :   Partition 1   ;
     ╲               ╱            ╲               ╱
      '─.         ,─'              '─.         ,─'
         `───────'                    `───────'
```



This effect can be seen in profiling for ClickBench Q17:

```sql
SELECT "UserID", "SearchPhrase", COUNT(*) FROM "hits.parquet" GROUP BY "UserID", "SearchPhrase" ORDER BY COUNT(*) DESC LIMIT 10
```

```
$ datafusion-cli -c 'SELECT "UserID", "SearchPhrase", COUNT(*) FROM "hits.parquet" GROUP BY "UserID", "SearchPhrase" ORDER BY COUNT(*) DESC LIMIT 10;'
```

Here is the profiling from Instruments:
![Screenshot 2024-07-26 at 4 26 14 PM](https://github.com/user-attachments/assets/1caf922d-8955-4062-8bcf-409b87b36ac7)





### Describe the solution you'd like

The basic idea is to avoid recompute the hash values in `RepartitionExec` and `AggregateMode::Final` by reuse the values from `AggregateMode::Partial` (which has already computed a hash value for each input group)

Something like this
```text
                         ▲                          ▲                                                       
                         │                          │                                                       
                         │                          │                                                       
                         │                          │                                                       
                         │                          │                                                       
                         │                          │                                                       
             ┌───────────────────────┐  ┌───────────────────────┐       4. The  AggregateMode::Final        
             │GroupBy                │  │GroupBy                │       GroupBy also gets the hash values   
             │(AggregateMode::Final) │  │(AggregateMode::Final) │       and does not recompute them         
             │                       │  │                       │                                           
             └───────────────────────┘  └───────────────────────┘                                           
               ▲         ▲                          ▲                                                       
               │         │                          │                                                       
                         └─────────────┬────────────┘                                                       
Pass hash      │                       │                                                                    
values up the                          │                                                                    
plan rather    │                       │                                                                    
than                      ┌─────────────────────────┐                   3. In addition to the partial       
recomputing    │          │       Repartition       │                   aggregates and group values, *ALSO* 
them                      │    PRECOMPUTED_HASH     │                   pass the hash values to the         
               │          └─────────────────────────┘                   RepartitionExec which also passed   
                                       ▲                                them on to the AggregateMode::Final 
               │                       │                                                                    
                       ┌───────────────┴─────────────┐                                                      
               │       │                             │                                                      
                       │                             │                                                      
          ┌─────────────────────────┐  ┌──────────────────────────┐     2. Each AggregateMode::Partial      
          │        GroubyBy         │  │         GroubyBy         │     GroupBy hashes the group keys to    
          │(AggregateMode::Partial) │  │ (AggregateMode::Partial) │     find the correct hash bucket.       
          └─────────────────────────┘  └──────────────────────────┘                                         
                       ▲                             ▲                                                      
                       │                            ┌┘                                                      
                       │                            │                                                       
                  .─────────.                  .─────────.                                                  
               ,─'           '─.            ,─'           '─.                                               
              ;      Input      :          ;      Input      :          1. Input is read                    
              :   Partition 0   ;          :   Partition 1   ;                                              
               ╲               ╱            ╲               ╱                                               
                '─.         ,─'              '─.         ,─'                                                
                   `───────'                    `───────'                                                   
```

### Describe alternatives you've considered

We maybe could pass the data as an explicit new column somehow, or maybe as a field in a struct array 🤔 

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance of high cardinality grouping by reusing hash values #11680

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve performance of high cardinality grouping by reusing hash values #11680

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions