Name	Name	Last commit message	Last commit date
parent directory ..
config	config
README.md	README.md
run_eval.py	run_eval.py

Evaluation

Evaluation config

For running the evaluation, you will need to create an evaluation script. Some examples can be found in the config directory. Each test suite such as Wikidata_Tekgen or DBpedia_WebNLG contains a set of ontologies and the config file contains parametrized path patterns to files corresponding to each ontology in the test suite.

The following shows one example.

{
  "onto_list" : [
    "1_movie",  "2_music", "3_sport", "4_book", "5_military", 
    "6_computer", "7_space", "8_politics", "9_nature", "10_culture"
  ],
  "path_patterns": {
    "sys": "../../data/wikidata_tekgen/baselines/Vicuna-13B/llm_responses/ont_$$onto$$_llm_responses.jsonl",
    "gt":"../../data/wikidata_tekgen/ground_truth/ont_$$onto$$_ground_truth.jsonl",
    "selected_ids": "../../data/wikidata_tekgen/manually_verified_sentences/selected_ont_$$onto$$.txt",
    "onto": "../../data/wikidata_tekgen/ontologies/$$onto$$_ontology.json",
    "output": "../../data/wikidata_tekgen/baselines/Vicuna-13B/eval_metrics/ont_$$onto$$_llm_stats.jsonl"
  },
  "avg_out_file": "../../data/wikidata_tekgen/baselines/Vicuna-13B/eval_metrics/ont_llm_avg_stats.jsonl"
}

Here is the description of each of the parameters in the config.

Parameter	Description
onto_list	List of ids of ontologies. These are used for converting file path patterns to absolute paths.
path_patterns/sys	The path pattern to system outputs for test sentences in each ontology.
path_patterns/gt	The path pattern to ground truth triples for each ontology.
path_patterns/selected_ids	(Optional) The path pattern to a selected list of manually validated test cases if applicable.
path_patterns/onto	The path pattern to the ontology file.
path_patterns/output	The path pattern for the detailed output file with metrics for each individual test sentence in each ontology.
avg_out_file	The path pattern for average metrics at the ontology level and globally for the whole dataset.

Running the evaluation script

In order to run the run_eval.py script, set the working directory to the Text2KGBench\src\evaluation directory:

cd Text2KGBench\src\evaluation

|   README.md
|   run_eval.py
|
\---config
        tekgen_vicuna_config.json
        tikgen_alpaca_config.json
        tikgen_unseen_alpaca_config.json
        tikgen_unseen_vicuna_config.json
        tikgen_vicuna_config.json
        webnlg_alpaca_config.json
        webnlg_vicuna_config.json

If we run the run_eval.py script with a --help parameter it will output the following response:

python run_eval.py -h

It will generate an output similar to the following.

usage: run_eval.py [-h] --eval_config_path EVAL_CONFIG_PATH

 options:
 
  -h, --help            show this help message and exit
  
  --eval_config_path EVAL_CONFIG_PATH

To run the evaluation, we need an evaluation configuration file as discussed in the previous section. You can find evaluation configurations for various setups in config directory.

We run the run_eval.py script with a configuration file as a parameter:

python run_eval.py --eval_config_path config/tekgen_vicuna_config.json

It will generate a results file for each ontology and a results file with aggregated average results for each ontology and globally. You can find examples of the generated files in data\wikidata_tekgen\baselines\Vicuna-13B\eval_metrics. The output directory is also defined in the configuration file.

File
ont_1_movie_llm_stats.jsonl
ont_2_music_llm_stats.jsonl
ont_3_sport_llm_stats.jsonl
ont_4_book_llm_stats.jsonl
ont_5_military_llm_stats.jsonl
ont_6_computer_llm_stats.jsonl
ont_7_space_llm_stats.jsonl
ont_8_politics_llm_stats.jsonl
ont_9_nature_llm_stats.jsonl
ont_10_culture_llm_stats.jsonl
ont_llm_avg_stats.jsonl

The individual ontology results file contains the results at each test sentence level.

{
   "id":"ont_1_movie_test_1",
   "precision":"1.00",
   "recall":"0.50",
   "f1":"0.67",
   "onto_conf":"1.00",
   "rel_halluc":"0.00",
   "sub_halluc":"0.00",
   "obj_halluc":"0.00",
   "llm_triples":[
      [
         "Bleach: Hell Verse",
         "director",
         "Noriyuki Abe"
      ]
   ],
   "filtered_llm_triples":[
      [
         "Bleach: Hell Verse",
         "director",
         "Noriyuki Abe"
      ]
   ],
   "gt_triples":[
      [
         "Bleach : Hell Verse",
         "director",
         "Noriyuki Abe"
      ],
      [
         "Bleach : Hell Verse",
         "publication date",
         "01 January 2010"
      ]
   ],
   "sent":"Bleach: Hell Verse (Japanese: BLEACH , Hepburn: Bur\u00c4\u00abchi Jigoku-Hen) is a 2010 Japanese animated film directed by Noriyuki Abe."
}, 
...

Each generated file contains the following entries:

id: the ontology test identifier i.e. "ont_1_movie_test_1"
precision: precision metrics P = correct_triples/predicted_triples
recall: recall metrics R = correct_triples/gold_triples
f1: F1 score metrics F1 = harmonic mean of precision and recall. F1 = 2 * ((P * R) / (P + R)).
onto_conf (OC): ontology conformance (OC) metrics which is the number of system triple relations in the ontology divided by the total number of system triples
rel_halluc (RH): relation hallucination (RH) metrics which it is inversely related to OC and equal to RH = 1 - OC which is the number of system triple relations that are not in the ontology divided by the total number of system triples
sub_halluc (SH): subject hallucination (SH) metrics calculated by checking if pre-processed triple subject is referred in the test sentence and/or ontology concepts
obj_halluc (OH): object hallucination (OH) metrics checking if pre-processed triple object is referred in the test sentence and/or ontology concepts
llm_triples: LLM system triples i.e. [["Bleach: Hell Verse", "director", "Noriyuki Abe"]]
filtered_llm_triples: filtered LLM triples i.e. [["Bleach: Hell Verse", "director", "Noriyuki Abe"]]
gt_triples: ground truth triples i.e. [["Bleach : Hell Verse", "director", "Noriyuki Abe"], ["Bleach : Hell Verse", "publication date", "01 January 2010"]],
sent: original test sentence for extracting facts i.e. "Bleach: Hell Verse (Japanese: BLEACH , Hepburn: Bur\u00c4\u00abchi Jigoku-Hen) is a 2010 Japanese animated film directed by Noriyuki Abe."}

The average results file contains the results at each ontology level.

{
   "onto":"1_movie",
   "type":"all_test_cases",
   "avg_precision":"0.33",
   "avg_recall":"0.23",
   "avg_f1":"0.25",
   "avg_onto_conf":"0.89",
   "avg_sub_halluc":"0.26",
   "avg_rel_halluc":"0.11",
   "avg_obj_halluc":"0.26"
}, ...

The total avg metrics file contains the following fields:

onto: the ontology identifier i.e. "1_movie",
type: type of average calculation either "all_test_cases" or "selected_test_cases" or "global(global average figures)"
avg_precision (AP): total average precision for the ontology
avg_recall (AR): total average recall for the ontology
avg_f1 (AF1): total average F1 score for the ontology
avg_onto_conf (AOC): total average ontology conformance for the ontology
avg_sub_halluc (ASH): total average subject hallucination metrics for the ontology
avg_rel_halluc (ARH): total average relation hallucination metrics for the ontology ARH = 1 - AOC
avg_obj_halluc (AOH): total average object hallucination metrics for the ontology

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluation

evaluation

README.md

Evaluation

Evaluation config

Running the evaluation script

Files

evaluation

Directory actions

More options

Directory actions

More options

Latest commit

History

evaluation

Folders and files

parent directory

README.md

Evaluation

Evaluation config

Running the evaluation script