Add probe usage practice for super large models, including multi-node case #782
Labels
area/performance
kind/documentation
Improvements or additions to documentation
kind/enhancement
New feature or request
priority/critical-urgent
Highest priority. Must be actively worked on as someone's top priority right now.
🚀 Feature Description and Motivation
When we deploy deepseek 671B model using multi-node way, start up takes very long. It brings few problems
startupProbe
andlivenessProbe
,readinessProbe
to control the interval separately.We need to build some practice on this, how to make two mechanisms work together or just use application one instead.
Use Case
fault tolerance and high availability
Proposed Solution
No response
The text was updated successfully, but these errors were encountered: