Skip to content

Unnecessary RDD repartition if RDD is already indexed. [Improvement] #76

Open
@merlintang

Description

@merlintang

For the distance join like RDJSpark,
the left RDD is always repartitioned based on the STRPartition.
However, suppose that the left RDD is already indexed and partitioned, this redundant repartition is painful. how about we add function inside the STRPartition to check whether the index partitioner is existed or not? This can avoid the unnecessary shuffle cost.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions