Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import into merge stage is extremely slow for large datasets due to suboptimal resource usage #60375

Open
shaoxiqian opened this issue Apr 2, 2025 · 0 comments · May be fixed by #60395
Open
Assignees

Comments

@shaoxiqian
Copy link
Contributor

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. datesize 8T with about 48 billion rows
  2. DXF is enabled
  3. import into xxx from s3:xxx

2. What did you expect to see? (Required)

The merge phase should be optimized to make full use of all available TiDB resources

3. What did you see instead (Required)

Only 2 subtasks were allocated, and while one finishes the merge step quickly, the other takes nearly 2 hours.
[next-step=merge-sort] [subtask-count=2]
Image
Image

4. What is your TiDB version? (Required)

v8.5.1

@shaoxiqian shaoxiqian added type/bug The issue is confirmed as a bug. component/import and removed type/bug The issue is confirmed as a bug. labels Apr 2, 2025
@tangenta tangenta self-assigned this Apr 3, 2025
@tangenta tangenta linked a pull request Apr 3, 2025 that will close this issue
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants