@@ -9,24 +9,124 @@ pip install -e .
9
9
```
10
10
11
11
## Usage
12
- ### CLI
13
- If installed as a python package, the following command is available:
12
+ If installed locally, henceforth the command ` py_css ` is available. Otherwise, the following entrypoint shall be called:
14
13
``` bash
15
- py_css cli
14
+ python py_css/main.py
15
+ # OR, if installed locally:
16
+ py_css
16
17
```
17
18
18
- Otherwise, the equivalent can be achieved by navigating into the repository and running the following :
19
+ A detailed help page will be presented using :
19
20
``` bash
20
- python py_css/main.py cli
21
+ py_css --help
22
+ ```
23
+
24
+ ### CLI Mode
25
+ If installed as a python package, the following command is available:
26
+ ``` bash
27
+ py_css cli
21
28
```
22
29
23
30
### Run Queries File
24
31
``` bash
25
- python py_css/main.py run_file --log=INFO --queries=data/queries_train.csv --output=output/train.txt
32
+ py_css run_file --log=INFO --queries=data/queries_train.csv --output=output/train.txt
26
33
```
27
34
28
35
### Run Queries and Evaluate Performance
29
36
``` bash
30
- python py_css/main.py eval --log=INFO --queries=data/queries_train.csv --qrels=data/qrels_train.txt
37
+ py_css eval --log=INFO --queries=data/queries_train.csv --qrels=data/qrels_train.txt
38
+ ```
39
+
40
+ ### Create Kaggle Runfile Format
41
+ ``` bash
42
+ py_css kaggle --log=INFO --queries=data/queries_test.csv --output=output/kaggle-prf.csv
43
+ ```
44
+
45
+
46
+ ## Retrieval Pipelines
47
+ As outlined in the paper, four retrieval pipelines were implemented:
48
+
49
+ ### Baseline
50
+ Can be selected by specifying the following parameters:
51
+ ``` bash
52
+ --method=baseline
53
+ --baseline-params=1000,1000,50
54
+ ```
55
+
56
+ #### Indexing
57
+ For indexing, the document collection has to be placed into the ` data/ ` folder.
58
+ <br >
59
+ [ Further Instructions] ( data/README.md )
60
+
61
+ #### Parameters
62
+ | Position | ID | Description | Constraints |
63
+ | --- | --- | --- | --- |
64
+ | 0 | ` bm25_docs ` | The number of documents to be retrieved using ` BM25 ` . | |
65
+ | 1 | ` mono_t5_docs ` | The number of documents to be reranked by ` monoT5 ` after retrieval. | ` bm25_docs >= mono_t5_docs ` |
66
+ | 2 | ` duo_t5_docs ` | The number of documents to be reranked by ` duoT5 ` after ` monoT5 ` reranking. | ` mono_t5_docs <= duo_t5_docs ` |
67
+
68
+ ### Baseline + ` RM3 `
69
+
70
+ Can be selected by specifying the following parameters:
71
+ ``` bash
72
+ --method=baseline-prf
73
+ --baseline-prf-params=1000,17,26,1000,50
31
74
```
32
75
76
+ #### Indexing
77
+ For indexing, the document collection has to be placed into the ` data/ ` folder.
78
+ <br >
79
+ [ Further Instructions] ( data/README.md )
80
+
81
+ #### Parameters
82
+ | Position | ID | Description | Constraints |
83
+ | --- | --- | --- | --- |
84
+ | 0 | ` bm25_docs ` | The number of documents to be retrieved using ` BM25 ` . | |
85
+ | 1 | ` rm3_fb_docs ` | The number of documents to be used for ` RM3 ` query expansion. | |
86
+ | 2 | ` rm3_fb_terms ` | The number of terms to expand the query with using ` RM3 ` . | |
87
+ | 3 | ` mono_t5_docs ` | The number of documents to be reranked by ` monoT5 ` after retrieval. | ` bm25_docs >= mono_t5_docs ` |
88
+ | 4 | ` duo_t5_docs ` | The number of documents to be reranked by ` duoT5 ` after ` monoT5 ` reranking. | ` mono_t5_docs <= duo_t5_docs ` |
89
+
90
+
91
+ ### ` doc2query `
92
+ Can be selected by specifying the following parameters:
93
+ ``` bash
94
+ --method=doc2query
95
+ --doc2query-params=1000,1000,50
96
+ ```
97
+
98
+ #### Indexing
99
+ For indexing, the document collection has to be placed into the ` data/ ` folder.
100
+ Additionally, descriptive queries for each document have to be generated using [ this script] ( scripts/doc2query-t5.py ) .
101
+ <br >
102
+ [ Further Instructions] ( data/README.md )
103
+
104
+ #### Parameters
105
+ | Position | ID | Description | Constraints |
106
+ | --- | --- | --- | --- |
107
+ | 0 | ` bm25_docs ` | The number of documents to be retrieved using ` BM25 ` . | |
108
+ | 1 | ` mono_t5_docs ` | The number of documents to be reranked by ` monoT5 ` after retrieval. | ` bm25_docs >= mono_t5_docs ` |
109
+ | 2 | ` duo_t5_docs ` | The number of documents to be reranked by ` duoT5 ` after ` monoT5 ` reranking. | ` mono_t5_docs <= duo_t5_docs ` |
110
+
111
+ ### ` doc2query ` + ` RM3 `
112
+
113
+ Can be selected by specifying the following parameters:
114
+ ``` bash
115
+ --method=doc2query-prf
116
+ --doc2query-prf-params=1000,17,26,1000,50
117
+ ```
118
+
119
+ #### Indexing
120
+ For indexing, the document collection has to be placed into the ` data/ ` folder.
121
+ Additionally, descriptive queries for each document have to be generated using [ this script] ( scripts/doc2query-t5.py ) .
122
+ <br >
123
+ [ Further Instructions] ( data/README.md )
124
+
125
+ #### Parameters
126
+ | Position | ID | Description | Constraints |
127
+ | --- | --- | --- | --- |
128
+ | 0 | ` bm25_docs ` | The number of documents to be retrieved using ` BM25 ` . | |
129
+ | 1 | ` rm3_fb_docs ` | The number of documents to be used for ` RM3 ` query expansion. | |
130
+ | 2 | ` rm3_fb_terms ` | The number of terms to expand the query with using ` RM3 ` . | |
131
+ | 3 | ` mono_t5_docs ` | The number of documents to be reranked by ` monoT5 ` after retrieval. | ` bm25_docs >= mono_t5_docs ` |
132
+ | 4 | ` duo_t5_docs ` | The number of documents to be reranked by ` duoT5 ` after ` monoT5 ` reranking. | ` mono_t5_docs <= duo_t5_docs ` |
0 commit comments