Skip to content

Commit 143d6fc

Browse files
committed
Editing READMEs
1 parent 9110d6b commit 143d6fc

File tree

2 files changed

+112
-7
lines changed

2 files changed

+112
-7
lines changed

README.md

+107-7
Original file line numberDiff line numberDiff line change
@@ -9,24 +9,124 @@ pip install -e .
99
```
1010

1111
## Usage
12-
### CLI
13-
If installed as a python package, the following command is available:
12+
If installed locally, henceforth the command `py_css` is available. Otherwise, the following entrypoint shall be called:
1413
```bash
15-
py_css cli
14+
python py_css/main.py
15+
# OR, if installed locally:
16+
py_css
1617
```
1718

18-
Otherwise, the equivalent can be achieved by navigating into the repository and running the following:
19+
A detailed help page will be presented using:
1920
```bash
20-
python py_css/main.py cli
21+
py_css --help
22+
```
23+
24+
### CLI Mode
25+
If installed as a python package, the following command is available:
26+
```bash
27+
py_css cli
2128
```
2229

2330
### Run Queries File
2431
```bash
25-
python py_css/main.py run_file --log=INFO --queries=data/queries_train.csv --output=output/train.txt
32+
py_css run_file --log=INFO --queries=data/queries_train.csv --output=output/train.txt
2633
```
2734

2835
### Run Queries and Evaluate Performance
2936
```bash
30-
python py_css/main.py eval --log=INFO --queries=data/queries_train.csv --qrels=data/qrels_train.txt
37+
py_css eval --log=INFO --queries=data/queries_train.csv --qrels=data/qrels_train.txt
38+
```
39+
40+
### Create Kaggle Runfile Format
41+
```bash
42+
py_css kaggle --log=INFO --queries=data/queries_test.csv --output=output/kaggle-prf.csv
43+
```
44+
45+
46+
## Retrieval Pipelines
47+
As outlined in the paper, four retrieval pipelines were implemented:
48+
49+
### Baseline
50+
Can be selected by specifying the following parameters:
51+
```bash
52+
--method=baseline
53+
--baseline-params=1000,1000,50
54+
```
55+
56+
#### Indexing
57+
For indexing, the document collection has to be placed into the `data/` folder.
58+
<br>
59+
[Further Instructions](data/README.md)
60+
61+
#### Parameters
62+
| Position | ID | Description | Constraints |
63+
| --- | --- | --- | --- |
64+
| 0 | `bm25_docs` | The number of documents to be retrieved using `BM25`. | |
65+
| 1 | `mono_t5_docs` | The number of documents to be reranked by `monoT5` after retrieval. | `bm25_docs >= mono_t5_docs` |
66+
| 2 | `duo_t5_docs` | The number of documents to be reranked by `duoT5` after `monoT5` reranking. | `mono_t5_docs <= duo_t5_docs` |
67+
68+
### Baseline + `RM3`
69+
70+
Can be selected by specifying the following parameters:
71+
```bash
72+
--method=baseline-prf
73+
--baseline-prf-params=1000,17,26,1000,50
3174
```
3275

76+
#### Indexing
77+
For indexing, the document collection has to be placed into the `data/` folder.
78+
<br>
79+
[Further Instructions](data/README.md)
80+
81+
#### Parameters
82+
| Position | ID | Description | Constraints |
83+
| --- | --- | --- | --- |
84+
| 0 | `bm25_docs` | The number of documents to be retrieved using `BM25`. | |
85+
| 1 | `rm3_fb_docs` | The number of documents to be used for `RM3` query expansion. | |
86+
| 2 | `rm3_fb_terms` | The number of terms to expand the query with using `RM3`. | |
87+
| 3 | `mono_t5_docs` | The number of documents to be reranked by `monoT5` after retrieval. | `bm25_docs >= mono_t5_docs` |
88+
| 4 | `duo_t5_docs` | The number of documents to be reranked by `duoT5` after `monoT5` reranking. | `mono_t5_docs <= duo_t5_docs` |
89+
90+
91+
### `doc2query`
92+
Can be selected by specifying the following parameters:
93+
```bash
94+
--method=doc2query
95+
--doc2query-params=1000,1000,50
96+
```
97+
98+
#### Indexing
99+
For indexing, the document collection has to be placed into the `data/` folder.
100+
Additionally, descriptive queries for each document have to be generated using [this script](scripts/doc2query-t5.py).
101+
<br>
102+
[Further Instructions](data/README.md)
103+
104+
#### Parameters
105+
| Position | ID | Description | Constraints |
106+
| --- | --- | --- | --- |
107+
| 0 | `bm25_docs` | The number of documents to be retrieved using `BM25`. | |
108+
| 1 | `mono_t5_docs` | The number of documents to be reranked by `monoT5` after retrieval. | `bm25_docs >= mono_t5_docs` |
109+
| 2 | `duo_t5_docs` | The number of documents to be reranked by `duoT5` after `monoT5` reranking. | `mono_t5_docs <= duo_t5_docs` |
110+
111+
### `doc2query` + `RM3`
112+
113+
Can be selected by specifying the following parameters:
114+
```bash
115+
--method=doc2query-prf
116+
--doc2query-prf-params=1000,17,26,1000,50
117+
```
118+
119+
#### Indexing
120+
For indexing, the document collection has to be placed into the `data/` folder.
121+
Additionally, descriptive queries for each document have to be generated using [this script](scripts/doc2query-t5.py).
122+
<br>
123+
[Further Instructions](data/README.md)
124+
125+
#### Parameters
126+
| Position | ID | Description | Constraints |
127+
| --- | --- | --- | --- |
128+
| 0 | `bm25_docs` | The number of documents to be retrieved using `BM25`. | |
129+
| 1 | `rm3_fb_docs` | The number of documents to be used for `RM3` query expansion. | |
130+
| 2 | `rm3_fb_terms` | The number of terms to expand the query with using `RM3`. | |
131+
| 3 | `mono_t5_docs` | The number of documents to be reranked by `monoT5` after retrieval. | `bm25_docs >= mono_t5_docs` |
132+
| 4 | `duo_t5_docs` | The number of documents to be reranked by `duoT5` after `monoT5` reranking. | `mono_t5_docs <= duo_t5_docs` |

data/README.md

+5
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
11
# Data
22

33
The document collection is MS MARCO Passages and has to be stored in `collection.tsv`.
4+
Furthermore, for the `doc2query` based approaches, descriptive queries for each document in the collection must be stored in `doc2query.tsv`.
5+
This file can be automatically generated using [this script](scripts/doc2query-t5.py). :warning: May take several days.
6+
7+
A MS MARCO document collection has been provided [here](https://gustav1.ux.uis.no/dat640/msmarco-passage.tar.gz).
8+
A pre-generated `doc2query.tsv` file has been made available [here](https://drive.google.com/file/d/1vGGGu0eprxG_iUm9Z5xkbsKEwjJoAf_A/view?usp=drive_link).
49

0 commit comments

Comments
 (0)