This is code for Experiment using domain specific KG.
We use the pytorch implementation for the EMNLP 2020 (Findings) paper: Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering [arxiv][project page]
Codes in the below folders are based on Path-Generator-QA.
- Commonsense-Path-Generator Please refer to the following link for installation.
The raw file of domain KG is located at /home/yujin/dot/preprocess_DoKG/raw_data/baby-domain-hiararchy-refined.csv .
If you have any additional needs, please request.
-
You can dowload trained path generator
- path generator trained with domain kg checkpoint
- path generator trained with ConceptNet checkpoint
- path generator trained with KG integrated domain KG and ConceptNet using naive method checkpoint
- path generator trained with KG integrated domain KG and ConceptNet using pivoting checkpoint
-
Before run the below code, check the source file path and save path.
cd learning-generator sh run_test_pg.sh
-
Preprocess Before run the below code, check the source file path and save path. You can download example input file here.
cd preprocess_DoKG sh convert_csv_to_nx.sh
The outputs of this script are domain kg files (baby_domain_graph.nx / entity_vocab.pkl / relation_freq.pkl / relation_vocab.pkl).
-
Path sampling for path generator Before run the below code, check the input setting.
- data_dir : dir_path of domain kg files. For example, the save_dir of convert_csv_to_nx.sh above process.
- output_dir : dir_path for output file
- graph_file_name : file name of graph_file in data_dir
- split_dataset : If you set the setting 'split_dataset' as True, split dataset for training by ratio of 0.9:0.05:0.05 will be saved under output_dir.
cd learning-generator sh run_path_sampling_dokg_multi_di_rw_no_hiar.sh
-
Train path generator using domainKG Before run the below code, check the config file(~/leraning-generator/config/).
- data_dir : dir_path for training datset. For example, the output_dir of above process.
- model : name of the LM
For training a path generator
sh ./run_dokg_multi_di_rev_no_hiar.sh $gpu_device
-
Preprocess Before run the below code, check the source file path and save path. You can download example input files following links.
cd preprocess_integrate_kg/integrateKG sh integrate_kg.sh
The outputs of this script are domain kg files (integrate_graph.nx / entity_vocab.pkl / relation_vocab.pkl).
-
For training a path generator
cd learning-generator sh run_path_sampling_inte_pivot.sh
If you set the setting 'split_dataset' as True, split dataset for training by ratio of 0.9:0.05:0.05 will be saved under output_dir.
-
Train path generator using domainKG
./run_inte_pivot.sh $gpu_device
The output of this script is path generator model.
-
Download Data First, you need to download all the necessary data in order to train the model:
cd commonsense-qa bash scripts/download.sh
-
Preprocess To preprocess the data, run:
python preprocess.py
-
Using the path generator to connect question-answer entities (Modify ./config/path_generate.config to specify the dataset and gpu device)
./scripts/run_generate.sh
3.1. For domain specific KG (Modify ./config/path_generate_dokg_multi_di.config to specify the dataset and gpu device)
bash ./scripts/run_generate_dokg_multi_di.sh
3.2. For domain specific KG (Modify ./config/path_generate_dokg_multi_di.config to specify the dataset and gpu device)
bash ./scripts/run_generate_dokg_multi_di.sh
-
Commonsense QA system training bash scripts/run_main.sh ./config/csqa.config Training process and final evaluation results would be stored in './saved_models/'