Many thanks for the great work!
-
Recommendation system is a very import application in AI. I was wondering if there is set of recommended settings (e.g. the scales of data, embedding table sizes, feature interaction architecture etc.) to emulate a production-level recsys inference/training, so we can understand what the exact workload is like?
-
Besides, what sort of hardware are best suited for inference or training, CPUs and/or GPUs? I'm very curious what each component in a recsys is like, which part is memory/compute/communication bound? What parallelizing methods (TP, DP, mixed butterfly etc) are employed?
-
Additionally, if a recsys is combined with LLMs (or it is LLM architecture), how do they interact, does RecIS support it?
Kind regards