-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
output from automatic split probe jobs is not discarded #8938
Comments
indeed from one probe job stdout
|
for the good task, the
full spec for one probe job node is
|
But I find the same
also when using crabtaskworker:v3.250215-stable which was the same TW used by the problematic task reported above. I am waiting for those jobs to run (at CERN) to see what's in job stdout ads... |
I need to make sure that I use a PSet which does produce output files :-( |
I have initially focused on But my jobs haven't run at CERN yet to see if local stageout was performed. pff... |
one sure thing is that output transfer for probe jobs is disabled in PostJob, i.e. no ASO. CRABServer/src/python/TaskWorker/Actions/PostJob.py Lines 2621 to 2630 in 2283a7b
|
from a quick look at cmscp.py, the relevant classAd should be things get interesting. Looking at how it is set in submit files for probes
belforte@vocms059/SPOOL_DIR> grep v3 TaskWorker/__init__.py
__version__ = "v3.250109.patch1" #Automatically added during build process
belforte@vocms059/SPOOL_DIR> grep CRAB_TransferOutputs *0-*submit
Job.0-1.submit:+CRAB_TransferOutputs = 0
Job.0-2.submit:+CRAB_TransferOutputs = 0
Job.0-3.submit:+CRAB_TransferOutputs = 0
Job.0-4.submit:+CRAB_TransferOutputs = 0
Job.0-5.submit:+CRAB_TransferOutputs = 0
belforte@vocms059/SPOOL_DIR>
belforte@vocms059/SPOOL_DIR> grep CRAB_TransferOutputs Job.submit
belforte@vocms059/SPOOL_DIR>
belforte@vocms0194/SPOOL_DIR> grep v3 TaskWorker/__init__.py
__version__ = "v3.250215" #Automatically added during build process
belforte@vocms0194/SPOOL_DIR> grep CRAB_TransferOutputs *0-*submit
Job.0-1.submit:My.CRAB_TransferOutputs = 0
Job.0-1.submit:My.CRAB_TransferOutputs = 1
Job.0-2.submit:My.CRAB_TransferOutputs = 0
Job.0-2.submit:My.CRAB_TransferOutputs = 1
Job.0-3.submit:My.CRAB_TransferOutputs = 0
Job.0-3.submit:My.CRAB_TransferOutputs = 1
Job.0-4.submit:My.CRAB_TransferOutputs = 0
Job.0-4.submit:My.CRAB_TransferOutputs = 1
Job.0-5.submit:My.CRAB_TransferOutputs = 0
Job.0-5.submit:My.CRAB_TransferOutputs = 1
belforte@vocms0194/SPOOL_DIR>
belforte@vocms0194/SPOOL_DIR> grep CRAB_TransferOutputs Job.submit
My.CRAB_TransferOutputs = 1
belforte@vocms0194/SPOOL_DIR> It seems that new code inserts CRABServer/src/python/TaskWorker/Actions/PreJob.py Lines 320 to 321 in 2283a7b
|
bloody mess. IIUC code before the "refactoring", One clear goal of the refactoring make explicit all such obscure information passing. I.e. I could move
to after the Job.submit file was created, after this comment CRABServer/src/python/TaskWorker/Actions/DagmanCreator.py Lines 1142 to 1143 in 2283a7b
Another option could be to enforce that PreJob overrides existing ad in Job.submit by changing CRABServer/src/python/TaskWorker/Actions/PreJob.py Lines 352 to 354 in 2283a7b
so that new_submit_text is added after existing file, rather then the other way around. I have no idea why current code is like that.But so far I have hesitated to change code w/o a full understanding, maybe there were reasons for current implementation ? |
OK. I have submitte an auto-split task to v3.250215 with no sitewhitelist and indeed all jobs performed local stageout. belforte@lxplus802/TC3> crab status -d ./crab_20250219_222943 --long
Rucio client intialized for account belforte
CRAB project directory: /afs/cern.ch/work/b/belforte/CRAB3/TC3/crab_20250219_222943
Task name: 250219_212948:belforte_crab_20250219_222943
Grid scheduler - Task Worker: [email protected] - crab-dev-tw01
[...]
Job State Most Recent Site Runtime Mem (MB) CPU % Retries Restarts Waste Exit Code
0-1 no output T1_US_FNAL 0:23:37 1494 60 0 0 0:00:08 0
0-2 no output T1_FR_CCIN2P3 0:16:01 1196 90 0 0 0:00:08 0
0-3 no output T1_FR_CCIN2P3 0:16:12 1207 90 0 0 0:00:09 0
0-4 no output T1_FR_CCIN2P3 0:16:24 1444 86 0 0 0:00:09 0
0-5 no output T1_RU_JINR 0:18:35 1086 79 0 0 0:00:09 0 probe 0-1 ran at FNAL where local stage out failed and the file was indeed pushed to CERN belforte@lxplus802/~> ls /eos/cms/store/user/belforte/GenericTTbar/crab_20250219_222943/250219_212948/0000
kk_0-1.root
belforte@lxplus802/~> |
same task submitted to prod tw (v3.250109.patch1) indeed now belforte@vocms059/250219_220503:belforte_crab_20250219_230459> condor_q -con crab_reqname==\"250219_220503:belforte_crab_20250219_230459\" -af jobuniverse jobstatus CRAB_TransferOutputs
7 2 1
12 2 undefined
5 2 0
5 2 0
5 2 0
5 2 0
5 2 0
belforte@vocms059/250219_220503:belforte_crab_20250219_230459> |
CONCLUSION |
OTOH.. since
!!!!!!!!!!!!! |
So I will go for PreJob adds its stuff AFTER the common Job.submit template. Reversing Line 354 here CRABServer/src/python/TaskWorker/Actions/PreJob.py Lines 352 to 354 in 2283a7b
|
rats. it did not work ! belforte@vocms059/cluster10126689.proc0.subproc0> grep CRAB_TransferOutputs *submit
Job.0-1.submit:My.CRAB_TransferOutputs = 1
Job.0-1.submit:My.CRAB_TransferOutputs = 0
Job.0-2.submit:My.CRAB_TransferOutputs = 1
Job.0-2.submit:My.CRAB_TransferOutputs = 0
Job.0-3.submit:My.CRAB_TransferOutputs = 1
Job.0-3.submit:My.CRAB_TransferOutputs = 0
Job.0-4.submit:My.CRAB_TransferOutputs = 1
Job.0-4.submit:My.CRAB_TransferOutputs = 0
Job.0-5.submit:My.CRAB_TransferOutputs = 1
Job.0-5.submit:My.CRAB_TransferOutputs = 0
Job.submit:My.CRAB_TransferOutputs = 1
belforte@vocms059/cluster10126689.proc0.subproc0> yet submitted jobs have the wrong value belforte@vocms059/cluster10126689.proc0.subproc0> condor_q -con crab_reqname==\"250220_123754:belforte_crab_20250220_133751\" -af CRAB_TransferOutputs
1
undefined
1
1
1
1
1
belforte@vocms059/cluster10126689.proc0.subproc0> |
maybe lines added in JDL file after the belforte@vocms059/SPOOL_DIR> condor_q -con crab_reqname==\"250220_123754:belforte_crab_20250220_133751\" -af DESIRED_SITES
undefined
undefined
undefined
undefined
undefined
undefined
undefined
belforte@vocms059/SPOOL_DIR> |
Need a bit more care in PreJob to make sure that I.e. I will also take this chance to change |
see https://cmsweb.cern.ch/crabserver/ui/task/250217_154906%3Acerminar_crab_DoublePhoton_FlatPt-1To100_PU200
while transfer tab in there and
crab geteoutput --dump
only show processing and tail jobs, user showed me that the probe jobs output were also present in the destination directory.One possibility is that this is due to direct stageout from WN in case the job ran at same site as destination.
It may even be that it has been like this since a while... need to check.
The text was updated successfully, but these errors were encountered: