-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
Description
Currently, we fail a Job if there is a problem with the queueing upon submission.
This was triggered in Stage when 8K small object were submitted in parallel.
This ticket is to introduce retry logic in the Handler HandlerSubmit.
Code segment which is relevant is show below.
} catch (Exception e) {
e.printStackTrace();
String msg = "Failed to create Job queue submission: " + jproperties.toString();
System.err.println("[error] " + msg);
// Batch failure
if (job != null) {
try {
Batch batch = new Batch(job.bid());
batch.setStatus(zooKeeper, org.cdlib.mrt.zk.BatchState.Failed);
} catch (Exception e2) {}
}
return new HandlerResult(false, "FAIL: " + NAME + " Submission failed: " + msg, 0);
}
}