Wire error categories into executor event reporting#4745
Wire error categories into executor event reporting#4745dejanzele wants to merge 3 commits intoarmadaproject:masterfrom
Conversation
5a399ca to
8a49600
Compare
8a49600 to
b828caf
Compare
b828caf to
30aa1ca
Compare
Greptile SummaryThis PR wires a new Key points:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant App as application.go
participant Cfg as Config<br/>(ErrorCategories)
participant C as categorizer.Classifier
participant PIH as PodIssueHandler
participant JSR as JobStateReporter
participant Ev as reporter/event.go
participant PS as util/pod_status.go
participant Pulsar as Pulsar (EventSequence)
App->>Cfg: read ErrorCategories
App->>C: NewClassifier(ErrorCategories)
App->>PIH: NewPodIssuerHandler(..., classifier)
App->>JSR: NewJobStateReporter(..., classifier)
Note over PIH: handleNonRetryableJobIssue
PIH->>C: classifier.Classify(pod)
C-->>PIH: []string{categories}
PIH->>PS: ExtractFailureInfo(pod, retryable, msg, categories)
PS-->>PIH: *FailureInfo
PIH->>Ev: CreateSimpleJobFailedEvent(..., failureInfo)
Ev-->>Pulsar: Error{PodError, FailureInfo}
Note over JSR: reportCurrentStatus (PodFailed)
JSR->>Ev: CreateEventForCurrentState(pod, clusterId, classifier)
Ev->>C: classifier.Classify(pod)
C-->>Ev: []string{categories}
Ev->>PS: ExtractFailureInfo(pod, false, "", categories)
PS-->>Ev: *FailureInfo
Ev-->>Pulsar: Error{PodError, FailureInfo}
|
30aa1ca to
ef4e22d
Compare
3d469d0 to
20f2fd3
Compare
|
@greptile |
20f2fd3 to
71d2cd3
Compare
Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
71d2cd3 to
5ab3dda
Compare
What type of PR is this?
This is the second PR in the error categorization series. It wires the classifier and FailureInfo into the executor's event reporting path.
What this PR does / why we need it
Classifierfrom config at executor startup and passes it to the pod issue handler and job state reporter.ExtractFailureInfo()+classifier.Classify()on every pod failure, attaching structuredFailureInfoto theErrorevents sent through Pulsar.Which issue(s) this PR fixes
Part of #4713 (Error Categorization) and #4683 (Native support for retry policies)
Special notes for your reviewer
errorCategoriesare configured, FailureInfo is still populated (condition + exit code) but categories will be emptyapplication.go(startup wiring),pod_issue_handler.go, andjob_state_reporter.go