This repository contains all the codes, links, and info about how to make the Q/F classification and comparisons.
You can make the Monte Carlo Pythia samples - use Pythia_jets_HEPMC3 folder:
- Here you can write your own script or use existing ones:
pythia_Zgjets.cc,pythia_Zqjets.cc, ordijets.cc - Change the scripts to what you need, then do
source setup_cvmfs_hepmc3_alma.shand build the functions withmake - To run on the cluster, use
submit_Zqorsubmit_Zqshell scripts, which callrun_pythia_Zjets.sh. You’ll need to adjust this file accordingly.
Note: If you cannot build Pythia, it might be because:
- You need to first build Pythia8 and HepMC3 (consider using the provided directory to simplify this process).
For those particularly interested in building their own Pythia8 and HepMC3:
-
Build HepMC3
Follow the instructions at HepMC3 GitLab (use manual installation). The specific commands I use are:conda activate appenv source /cvmfs/sft.cern.ch/lcg/releases/LCG_96b/ROOT/6.18.04/x86_64-centos7-gcc8-opt/ROOT-env.sh cmake -DCMAKE_INSTALL_PREFIX=../hepmc3-install -DHEPMC3_ENABLE_ROOTIO:BOOL=ON -DROOT_DIR=$ROOTSYS -DHEPMC3_ENABLE_PROTOBUFIO:BOOL=OFF -DHEPMC3_ENABLE_TEST:BOOL=OFF -DHEPMC3_INSTALL_INTERFACES:BOOL=ON -DHEPMC3_BUILD_STATIC_LIBS:BOOL=OFF -DHEPMC3_BUILD_DOCS:BOOL=OFF ../HepMC3 make make install
-
Pythia8
Download and unpack Pythia:
tar xvfz pythia8311.tgz export HEPMC_HOME=/storage/agrp/annai/QURK-GLUON/COCOA/Patrick_Dev_Br/hepmc3-install/ # or the directory where you just installed hepmc-install ln -s ${HEPMC_HOME}/lib64 ${HEPMC_HOME}/lib # Link lib64 to lib - very important step cd pythia8 ./configure --with-hepmc3=${HEPMC_HOME} make
Note
I put my build pythia8 and hepmc3 directories with all necessary files to the git.
You can try using my build directory to make stuff, but dont forget to change the main setup file paths to yours in setup_cvmfs_hepmc3_alma.sh
After successfully setting up, you need to pass the events through COCOA. There is a development branch in cocoa-hep from Patrick (origin/dev_g410_pythia8HepMCPlugin), but I made changes, and my modified version of COCOA works with the setup above. It is available in the cocoa-hep folder.
To run this:
- Add configuration files to the
cocoa-hep/COCOA/macro/HepMC/folder, where you will manually specify which sample you want to process (add the full path; see examples inHepMC).- Note: This step is required because of a bug in Patrick’s script, which prevents specifying the file during runtime, so it must be done manually.
- After adding all the files you want to process, go to
cocoa_submit/and update the paths to your input and output directories, as well as the macro files you will be using. - Run the jobs using
submit_*.sh.
- After successfully preparing the COCOA root files, you must prepare them to be trained on HGPFLOW NN.
First, you need to remove cocoa events with no tracks, cells, or clusters. You use code in
scripts/copyTree.cfor this. I took it from Etienne; I know one can do better, but I have no strength for this. Inside the code, just specify the path of the cocoa directory files and write down the names of inputs. Then run it asroot -l scripts/copyTree.c - Now you are almost ready to train HyperGraph; before this, you need to run the segmentation on those files so each event will be divided to be processed faster.
- For this purpose, you Ask Nilotpal to give you the HGPFlow mini-version. It is supposed to be located here HGPflow
- Then, to submit for the segmentation, you use the available in
scripts/submit_job_cocoa_chunks_new.pyscript - this one already has everything set up to process our files (500GeV Zq/g jets) - Here,
is_inference=Falseandmod_e_th = 0.1for train and validation samples, or elseis_inference=Trueandmod_e_th = 0
Congratulations! You have reached the exciting part of the project. Run the training on HGPFlow using our samples (500 GeV jets)
Ensure you copy the config files in this git to the git folder where you set up your HGPFLow code. configs --> hgpflow/congifs and hgpflow/pbs_scrips to hgpflow_v2 git folder
- Running the stage 1 training:
python pbs_scripts/submit_job.py -m train -cv configs/model_configs/var_mini.yml -cms1 configs/model_configs/model_stage1_mini.yml -ct configs/train_incidence_200k.yml -gt A6000 - Preparation for stage 2:
python pbs_scripts/submit_job.py -m hyperedge_data_prep -i configs/inference_stage1.yml -gt A6000 -mem 3gb --io 20 - Running the stage 2 training:
python pbs_scripts/submit_job.py -m train -cms2 configs/model_configs/model_stage2.yml -ct configs/train_hyperedge_200k.yml -gt A6000 -nc 5(I did it locally - something didnt work for me, the local code ispython -m hgpflow_v2.train -cms2 configs/model_configs/model_stage2.yml -ct configs/train_hyperedge_200k.yml - Doing the evaluations:
python pbs_scripts/submit_job.py -m eval -i configs/inference_200k.yml -gt A6000 -nc 5 --io 20
Here, in those general configs like train_incidence_200k.yml or train_hyperedge_200k.yml, you need to write your own directory.
In general Nathalie, just copy my whole hgpflow folder so you will not need to do anything: in our storage you find ...annai/QURK-GLUON/samples_produce/Cocoa/HGPFLOW_v2/hgpflow_v2
After you have the predictions, you need to make npz files to be processed later by QG processor. For that you can use ipynp scripts/performance_save.ipynb
-
There are a few things here. This notebook was created for hgpflow_full and not the mini version. I had to do a bunch of things in the performance code to save the needed information correctly. If Nilotpal says that it ok, you can use the same code as you have the prediction files. I have already modified it a bit because there was a problem with event_offset. I ran it on new predictions it looks ok. I think you might want to change the pt of the jet which you save (~400-800GeV) maybe? up to you. This notebook also will shuffle the files between pythia Zq and Zg - please dont miss this step.
-
Now, the shuffled npz files need to be processed to match the Q/G format. The file name you need is
scripts/convert_qg_datasets_new_hgpflow.py- after this step you are ready to train the hgpflow inputs on QG script. -
Additionally, you maybe would like to do the truth jet training. For this perpose you need to convert root->npz-->QGformat as in hgflow cas. But now you use ipynb
scripts/make_datasets.ipynbYou can just hadd all root files you have and process one file there. After that you need to usescripts/convert_qg_datasets_new.py
Congratulations. Now, you are finally ready to train on ParticleNet. So, I have made some packages that will take all their functions and their libraries in one place and train the models. Initially, I combined those two: [PartikleTransformer]https://github.com/jet-universe/particle_transformer/ (the new and old models setup) and [WeaverCore]https://github.com/hqucms/weaver-core - this is the place where the models and all configurations are actually stored. I failed adequately set up the Wevercore, got pissed and just made a new big package. It works well for our PaticleNet model, which is what we need. If you find something that needs to be added, go ahead and change it.
- you can use this git version of my new code here to train and test the files: [PartNet_Local]https://github.com/annaivina/PartNet_Local/tree/main
- What you need to do is very simple: you need to create a submit script for training (or you can use one of the existing ones you can check their names easdy to understand).
- The most important thing you need to choose is what you want to train on: only kin or kin+pid, PartikleNet or PartikleTransformer model. That's about it
- If, for whatever reason, this will not work, you can always go to my storage directory at Weizmann and see ... annai/QURK-GLUON/PartNet_Local
-- You can use rates.ipynb in my ParticleNet folder or you can create your own
Samples
All samples shall be accessible if I have given relevant permissions.
- Pythia samples can be found in my work directory under .. annai/QURK-GLUON/samples_produce/Pythia_Zgjets or .. annai/QURK-GLUON/samples_produce/Pythia_Zqjets
- Cocoa samples are located in .. annai/QURK-GLUON/samples_produce/Cocoa/Cocoa_Zjets/. Relevant folders are 500GeVPythiaZq/500GeVPythiaZg/500GeVHerwigZq500GeVHerwigZg
- Inside the cocoa sample directories, you will also find the skimmed version of the samples as well as segmented samples for HGPFlow task in train/valid/test folders
- Inside my weizmann storage, there are also Herwig and Pythia samples from the original paper you can use it as well: annai/QURK-GLUON/QG_herwig or annai/QURK-GLUON/QG_pythia
Statistics
- We generated Zq and Zg samples using Herwig and Cocoa (399k each)
- Out of those, we used 100k Zq Pythia and 100k Zg Pythia for Training on HGPflow and 10k Zq Pythia and 10k Zg Pythia for the validation
- Thus, For predictions and further usage, we will have 289k for Pythia Zq/Zg each and 399k for Herwig Zq/Zg each