CodingTil
diff --git a/‎report/main.pdf
486 Bytes b/‎report/main.pdf
486 Bytes
diff --git a/‎report/main.tex
Lines changed: 1 addition & 2 deletions b/‎report/main.tex
Lines changed: 1 addition & 2 deletions
@@ -154,7 +154,6 @@ \subsection{Re-ranking}
 The re-ranking stage of our baseline system consists of two stages: First, the top 1000 documents retrieved by the \texttt{BM25} retrieval method are re-ranked using the \texttt{monoT5} reranker. Afterwards, the top 50 documents of the previous re-ranking stage are rearranged using the \texttt{duoT5} reranker, see Section \ref{sec:rerankers}. The precise count of documents subject to reranking at each stage is a hyperparameter of our system, allowing to balance computational cost and result quality. These rerankers were implemented in the \texttt{pyterrier\_t5} library.\footnote{URL: \url{https://github.com/terrierteam/pyterrier_t5}} Again, since a low latency of our retrieval pipeline is crucial to us, we utilized smaller \texttt{T5} models: \texttt{castorini\-/monot5\--base\--msmarco}\footnote{URL: \url{https://huggingface.co/castorini/monot5-base-msmarco}} for \texttt{monoT5} and \texttt{castorini\-/duot5\--base\--msmarco}\footnote{URL: \url{https://huggingface.co/castorini/duot5-base-msmarco}} for \texttt{duoT5}.
 
 \section{Incorporating Pseudo-Relevance Feedback into Our Baseline}\label{sec:baseline+rm3}
-
 Recognizing the substantial performance enhancements associated with pseudo-relevance feedback, we felt compelled to integrate a query expansion mechanism into our baseline retrieval method, see Section \ref{sec:baseline}. Our choice fell upon the \texttt{RM3} query expansion technique, well-established for its robustness and acceptance within the information retrieval community. For a deeper dive into its mechanics and principles, readers are directed to Section \ref{sec:prf}.
 
 In the \texttt{Pyterrier} framework, the setup requires that any query expansion follows an initial retrieval phase. This initial retrieval fetches the top $p$ documents, forming the foundation for subsequent query expansion by $n$ words using \texttt{RM3}. With the query expanded, it's then passed into a secondary retrieval phase to retrieve the final document set for the end-user. And, to fine-tune the output, we again apply re-ranking using both \texttt{monoT5} and \texttt{duoT5}.
@@ -177,7 +176,7 @@ \section{Document Expansion Method}\label{sec:doc2query-method}
 
 The core idea behind the \texttt{doc2query-T5} model is to dynamically generate specific questions or queries that are closely related to the content of a given document. These generated questions are then seamlessly incorporated into the document. The goal of this process is to expand the document's content, thereby providing additional information that can significantly improve the effectiveness of our information retrieval system. By generating relevant queries based on the document's content, we are essentially expanding the scope of potential search terms, enabling our system to better capture the user's intent and find more relevant documents.
 
-The integration of the \texttt{T5} model allows us to transform the document into highly relevant queries tailored to the content of the document. This is achieved by utilizing a \texttt{T5} model fine-tuned on this task of understanding the contextual relationships within the document and generate queries that effectively summarize the key points of the document.
+The integration of the \texttt{T5} model allows us to transform the document into highly relevant queries tailored to the content of the document. This is achieved by utilizing a \texttt{T5} model fine-tuned on this task of understanding the contextual relationships within the document and generate queries that effectively summarize the key points of the document. In particular, we use the \texttt{castorini\-/doc2query\--t5\--large\--msmarco}\footnote{https://huggingface.co/castorini/doc2query-t5-large-msmarco} model.
 
 The use of \texttt{doc2query-T5} will be added to our baseline, which will otherwise remain unchanged. In particular, \texttt{doc2query-T5} can be seen as a preprocessing step to indexing, where first $m$ queries can be generated for each document in the collection, which then will be appended to the original document to form the input for the indexing stage. The system architecture for this pipeline, which we will refer to as "\texttt{doc2query-T5}", will therefore take the following form:
 \begin{enumerate}