Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there still anyone working on dynamic embedding now, and when will it be in the official release? #2735

Open
freshduer opened this issue Feb 8, 2025 · 8 comments

Comments

@freshduer
Copy link

Is there still anyone working on dynamic embedding now, and when will it be in the official release?

@dstaay-fb
Copy link
Contributor

Dynamic Embeddings (ie. collision-less embeddings), are supported in a few different ways as a Beta / Prototype; and is area of active investment.

Curious, what specific use cases did you have in mind? most OSS uses case we see can manage today using Managed Collision Collection modules and mapping from a large raw id range down to physical memory space. [ie. whats your trainer hardware / memory budget vs number of unique sparse ids]

@WhoisZihan
Copy link

Curious about it too. I find there is very few introduction about MCH in torchrec, so I took a quick glance at the code.
Looks like it trades the low-freq features to save memory, treating them as default. How does it deal with the situation when it grows full?

Also I wonder, some frameworks already support models with TB level embeddings, is MCH enough in such case? How does it compare to others like HKV?

@freshduer
Copy link
Author

Dynamic Embeddings (ie. collision-less embeddings), are supported in a few different ways as a Beta / Prototype; and is area of active investment.

Curious, what specific use cases did you have in mind? most OSS uses case we see can manage today using Managed Collision Collection modules and mapping from a large raw id range down to physical memory space. [ie. whats your trainer hardware / memory budget vs number of unique sparse ids]

Thanks for replying. What should I do if I encounter a change in user behavior during inference and embedding needs to change?

@dstaay-fb
Copy link
Contributor

Curious about it too. I find there is very few introduction about MCH in torchrec, so I took a quick glance at the code. Looks like it trades the low-freq features to save memory, treating them as default. How does it deal with the situation when it grows full?

Also I wonder, some frameworks already support models with TB level embeddings, is MCH enough in such case? How does it compare to others like HKV?

I agree, MCH (aka ZCH) examples could be better, thats good feedback; and the tutorials tend show our more advanced features (in part because hard to demonstrate on public datasets). Let me chat to team (its probably combination of more industry papers + examples to demonstrate).

@dstaay-fb
Copy link
Contributor

Thanks for replying. What should I do if I encounter a change in user behavior during inference and embedding needs to change?

In practice, you learn a mapping of raw sparse id -> physical memory address in Training. As you hit on, during online training you may encounter a new id and it will trigger a insert/re-sort of remapping vector. So to propagate this change to the inference tier you would need tensor replacement; but generally most parameters need sync'ed so this isn't really any different.

Ie typical pattern: run distributed training on online data -> call online_training_model.state_dict() -> quantize relevant params/buffers (optional) -> load into predictor/inference tier.

@freshduer
Copy link
Author

Thanks for replying. What should I do if I encounter a change in user behavior during inference and embedding needs to change?

In practice, you learn a mapping of raw sparse id -> physical memory address in Training. As you hit on, during online training you may encounter a new id and it will trigger a insert/re-sort of remapping vector. So to propagate this change to the inference tier you would need tensor replacement; but generally most parameters need sync'ed so this isn't really any different.

Ie typical pattern: run distributed training on online data -> call online_training_model.state_dict() -> quantize relevant params/buffers (optional) -> load into predictor/inference tier.

I see, thanks for the reply, does torchrec have a tutorial document on the online training update model?

@freshduer
Copy link
Author

Thanks for replying. What should I do if I encounter a change in user behavior during inference and embedding needs to change?

In practice, you learn a mapping of raw sparse id -> physical memory address in Training. As you hit on, during online training you may encounter a new id and it will trigger a insert/re-sort of remapping vector. So to propagate this change to the inference tier you would need tensor replacement; but generally most parameters need sync'ed so this isn't really any different.

Ie typical pattern: run distributed training on online data -> call online_training_model.state_dict() -> quantize relevant params/buffers (optional) -> load into predictor/inference tier.

Sorry to bother, I have another question, can torchrec update the model only in full during online training, or can it replace some embedding vectors. Thank you

@WhoisZihan
Copy link

I agree, MCH (aka ZCH) examples could be better, thats good feedback; and the tutorials tend show our more advanced features (in part because hard to demonstrate on public datasets). Let me chat to team (its probably combination of more industry papers + examples to demonstrate).

Great, so the MCH is broadly used and satisfy your industrial requirements? It would be great to just utilize existing components in torchrec and not incorporate any new 3rd-party library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants