Document download models and jsonl format

nielstron · nielstron · commit 4b5019968151 · 2025-05-07T10:11:02.000+02:00
diff --git a/README.md b/README.md
@@ -142,13 +142,22 @@ bash ./setup_env.sh
 # NOTE: Some models are guarded on huggingface, so you will need to visit their model page, accept the EULA and enter the huggingface Access Token to your account when prompted. See section "Requirements" for more details.
 ```
 
-Before running the experiments, you need to download the models and datasets.
-To download the models and datasets, run the following command:
+> Important note: Before running the experiments, you need to download the models and datasets used for the experiments.
+
+We provide a script to download the required dataset and models for our experiments. This script must be run before starting the experiments.
+You may specify models to download by passing the `models` paramater.
+
+```bash
+python3 experiments/main/download_models.py --models google/gemma-2-2b-it,google/gemma-2-9b-it
+```
+
+To download all required models and datasets, run the following command:
 
 ```bash
-CUDA_VISIBLE_DEVICES=0 python3 experiments/main/download_models.py
+python3 experiments/main/download_models.py
 ```
 
+
 ### Warming up
 
 To warm up, we start by reproducing the result for synthesis of the smallest model (Gemma 2 2B) and the MBPP dataset. To avoid using busy GPUs in a shared setting, use command `nvidia-smi` to check which GPUs are free. Then specify the IDs of GPUs you want to use by setting the `CUDA_VISIBLE_DEVICES` environment variable.  If you want to use GPU 0 and 1, run the following command:
@@ -159,9 +168,9 @@ CUDA_VISIBLE_DEVICES=0,1 python3 experiments/main/run_experiments_syn_tran.py --
 
 This reproduces the results for Gemma-2B on the synthesis task on MBPP.
 The experiment should finish within approximately 4 hours on a single GPU.
-The results of the experiment (and all other results) will be stored in `experiments/main/results` in an appropriately named `jsonl` file, in this concrete example `experiments/main/results/mbpp_google_gemma-2-2b-it_s=0_t=1_synth_nc.jsonl` and `..._c.jsonl` for the unconstrained and type-constrained variants respectively.
+The results of the experiment (and all other results) will be stored in `experiments/main/results` in an appropriately named `jsonl` file. The general schema is `experiments/main/results/<subset>_<model>_s=<seed>_t=<temperature>_<task>_<constrained>.jsonl`. In this concrete example `experiments/main/results/mbpp_google_gemma-2-2b-it_s=0_t=1_synth_nc.jsonl` and `..._c.jsonl` for the unconstrained and type-constrained variants respectively.
 
-> The experiment runs can be cancelled at any time, intermediate results are stored in the `results` folder. Upon restarting, the script will automatically pick up the last completed instance and continue from there. It may happen that running tasks daemonize and continue running (check `nvidia-smi`). Make sure to kill them manually before restarting.
+> The experiment runs can be cancelled at any time, intermediate results are stored in the `jsonl` files. Upon restarting, the script will automatically pick up the last completed instance and continue from there. It may happen that running tasks daemonize and continue running (check `nvidia-smi`). Make sure to kill them manually before restarting.
 
 Our experiment script automatically distributes jobs over indicated GPUs.
 The script then repeatedly queries whether running jobs are completed and new GPUs are available. You will therefore see something like the following ouput:
@@ -290,4 +299,4 @@ The type reachability algorithm is implemented in `typesafe_llm/parser/types_ts.
 
 The automaton for statements is defined in `typesafe_llm/automata/parser_ts.py` in the class `StatementParserState`.
 It handles the constraining for valid return types.
-The automaton for the entire program is defined in `typesafe_llm/automata/parser_ts.py` in the class `ProgramParserState`.
+The automaton for the entire program is defined in `typesafe_llm/automata/parser_ts.py` in the class `ProgramParserState`.