-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve performance by removing use_tensor_core dependency #2496
base: main
Are you sure you want to change the base?
Conversation
…d_size_compiled_for_decode_kernels
Previously, we found an accuracy regression if we remove this. Did you test the accuracy of all models? |
result of test_eval_accuracy_large.py
|
What is your hardware? |
A100(40GB) |
Can you test the command in this PR #1511? If it passes, we can merge this. |
Run the test on L20*4 twice, here's the result:
|
Thanks for testing this! The previous bug only happens on certain hardware. We reproduced it once on 8xA100 40GB. That is why I am asking which hardware you are using. Is it possible for you to run it again on 8xA100 40GB with the exact command? Especially @yzh119 also acknowledged it is a bug so we need to very careful here. |
@merrymercy Run the test on 8xA100 40GB for three times, here's the result:
|
Hi, @merrymercy , do you have time to review this PR, thank you! |
We probably won't merge this because
Can you try to run the exact comment in #1511 again? It uses TP4 on A100 80GB.
|
test on A100(80GB) * 4, no accuracy degradation now
|
Motivation
According to the discussion of this PR
Modifications
E2E Test
Checklist