microsoft / DeepSpeed Public

Notifications You must be signed in to change notification settings
Fork 4.2k
Star 36.1k

Code
Issues 988
Pull requests 123
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: microsoft/DeepSpeed

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

988 Open 1,922 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[REQUEST] Deepspeed Inference Supports VL (vision language) model enhancement

New feature or request

#6917 opened Dec 26, 2024 by ethen8181

This problem occurs when precompiling on windows system. I tried to modify setup but could not solve it. How to solve it

#6915 opened Dec 26, 2024 by rununnnn

初始化问题

#6914 opened Dec 25, 2024 by lckkkk02

[BUG] Cannot access local variable 'locations' where it is not associated with a value bug

Something isn't working

compression

#6913 opened Dec 25, 2024 by Guodanding

[BUG]Convergence Issue: Training BERT for Embedding with Zero2 and 3 as compared to Torchrun bug

Something isn't working

training

#6911 opened Dec 24, 2024 by dawnik17

[BUG] RuntimeError: The size of tensor a (2048) must match the size of tensor b (1024) at non-singleton dimension 2 bug

Something isn't working

deepspeed-chat

Related to DeepSpeed-Chat

#6910 opened Dec 24, 2024 by Lowlowlowlowlowlow

[REQUEST] is fp8 training supported? enhancement

New feature or request

#6908 opened Dec 24, 2024 by janelu9

[BUG] RuntimeError: Unable to JIT load the fp_quantizer op due to it not being compatible due to hardware/software issue. FP Quantizer is using an untested triton version (3.1.0), only 2.3.(0, 1) and 3.0.0 are known to be compatible with these kernels bug

Something isn't working

compression

#6906 opened Dec 23, 2024 by GHBigD

[BUG] triton kernel， loss 0， grar-norm nan bug

Something isn't working

training

#6902 opened Dec 22, 2024 by mdy666

[REQUEST] Support for XLA/TPU enhancement

New feature or request

#6901 opened Dec 21, 2024 by radna0

prterun noticed that process rank 7 with PID 0 on node gpu0304 exited on signal 6 (Aborted).

#6896 opened Dec 19, 2024 by fabiogeraci

MPI environment variables are not set

#6895 opened Dec 18, 2024 by fabiogeraci

DeepSpeed with ZeRO3 strategy cannot build 'fused_adam' bug

Something isn't working

training

#6892 opened Dec 18, 2024 by LeonardoZini

How to perform inference MoE model with expert parallel

#6891 opened Dec 18, 2024 by Guodanding

How can DeepSpeed be configured to prevent the merging of parameter groups

#6878 opened Dec 16, 2024 by CLL112

How do I know if stage-3 is a success by using deepspeed？ training

#6877 opened Dec 16, 2024 by hwhyyds

[BUG] Cannot use --hostfile to start multi-node training in Docker. bug

Something isn't working

training

#6875 opened Dec 16, 2024 by Ind1x1

Windows wheel build error - Tried everything with all requirements you have build

Improvements to the build and testing systems.

windows

#6871 opened Dec 14, 2024 by FurkanGozukara

[BUG] Invalidate trace cache @ step 10: expected module 11, but got module 19 bug

Something isn't working

training

#6870 opened Dec 14, 2024 by yafuly

[BUG] Mismatch of model parameters when using Sequence Parallel bug

Something isn't working

training

#6868 opened Dec 13, 2024 by chetwin-character

[BUG]When fine-tuning an LLM, the following error occurs after training for some time: self.optimizer.param_groups[param_group_id]['params'] = [] IndexError: list index out of range bug

Something isn't working

training

#6857 opened Dec 12, 2024 by tdtgi

[BUG] Unable to Use quantization_setting for Customizing MoQ in DeepSpeed Inference bug

Something isn't working

compression

#6853 opened Dec 11, 2024 by cyx96

DeepSpeed with trl bug

Something isn't working

training

#6852 opened Dec 11, 2024 by sagie-dekel

Opinion on Refactoring Ulysses enhancement

New feature or request

#6843 opened Dec 9, 2024 by Eugene29

Question about Ulysses and loss agregation

#6841 opened Dec 9, 2024 by pavelgein

Previous 1 2 3 4 5 … 39 40 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly