Main contributors: Roxanne El Baff ([email protected]) and Valentin Edelsbrunner ([email protected]).
The main goal of this demo is to use Zero-shot LLM to:
- Improve and expand the
queryprovided by the user using QueryOptimizer. - Summarize the result snippets returned by MOSAIC using Summarizer.
The user provides a query, q. q is fed to a QueryOptimizer, where it is improved and expanded with three additional queries. Then, using q, the improved query, and the three sub-queries, we search (via RESTAPI) using MOSAIC for each one of them and fetch the top N results - in total, we conduct five different searches (q, improved_q, subquery_1, subquery_2, subquery_3). After that, in the last phase, we feed the list of snippets to a Summarizer. We do this three times: For the q results, improved_q results, and the combined results of subquery_1+ subquery_2+subquery_3. The goal is to compare the three summaries and check if the optimized OR expanded queries positively impact the results.
Check data/mosaicllm_results.csv for sample queries and results from mosaicllm.
Original Query q
'The focus is on Apple. Apples are a type of pome fruit, which also includes pears, and grow on trees. They are in the same family as quinces. In the technology context, an applet is a computer program, often written in Java, that can be run in a web browser. In a different setting, the iodine test uses iodine as a chemical indicator to detect starch, and is used in various applications such as brewing beer and determining apple ripeness. In Greek mythology, Ladon is a dragon who guarded the golden apples in the Garden of the Hesperides.'
Improved Query 'The Apple T1 is a chip developed by Apple, derived from the Apple S2. Apple Inc. also produces the Apple Pencil, a wireless stylus pen accessory used with supported iPad tablets. Unrelated to Apple Inc., a pond-apple is a type of fruit that is not associated with the tech company or its products.'
Subquery_1-3 'Apple Inc. is an American company that produces personal computers, including the Apple Macintosh, and the Apple I, which was their first official product. Apple also offers the Apple Pencil, a wireless stylus pen accessory for use with supported iPad tablets. The Macintosh was one of the first computers to incorporate a graphical user interface, allowing users to point and click on icons using a mouse. Under the leadership of Steve Jobs, Apple Inc. has become a successful technology company, but specific financial performance details are not provided in the text snippets.'
Below, we explain each component in more detail:
Input: a search query (e.g., Apple, Climate Change).
Output: a JSON object with the following key and values:
rationale: Brief reasoning for your choicesclarified_query: The improved querysubqueries: An array of the top 3 sub-queries
Approach:
- Zero-shot with an instruction-tuned LLM.
- The prompt used is under
prompts/query_optimization.txt[link]. - code:
MosaicLLM.optimize_query
Example Output for q: Apple
{
'rationale': "The original query is too vague. I will clarify it to specify that the user is likely looking for information about Apple Inc., a major technology company. I will also create sub-queries to gather information about Apple's products, services, and company performance.",
'clarified_query': 'Apple Inc.',
'subqueries': ['Apple Inc. products', 'Apple Inc. services', 'Apple Inc. financial performance']
}Input: A query as a string
Output: a list containing n text snippets
Approach:
-
We use MOSAIC Rest API, which returns the top
nresults as JSON, where each item has the text snippet and other metadata.https://qnode.eu/ows/mosaic/service/search?q={query}?&index={mosaic_index}&lang={lang}&limit={n}" -
We post-process the results from JSON to a list of strings containing only the text snippets.
-
code:
MosaicLLM.query_mosaicand thenMosaicLLM.extract_textsnippet_from_mosaic_response
Check the MOSAIC main page for further information.
Input: The list (as string) of top N text snippets returned from MOSAIC (II.) for a single query.
Output: a string containing the summary of the snippets
Approach:
- Zero-shot with an instruction-tuned LLM.
- The prompt used is under
prompts/result_summarization.txt.txt[link]. - code:
MosaicLLM.summarize_resultsCheck the overview section for an example of the three main outputs.
- Model
open-mixtral-8x7baccessed via Mixtral API - Temperature default to
0.7 - Framework LangChain
- Requirements check
requirements.txthere. - Python version: we tested the code with
Python>3.9.2
The main class MosaicLLM is implemented under the package mosaic_llm/mosaillm.py.
model_name: str - The model name of the LLM to be used for the Optimizer and Summarizer. Default value:open-mixtral-8x7b. - Note: for now, we only support Mistral models.temperature: int - the temperature set when calling the LLM. Default value:0.7.root: str - The path root where thepromptsfolder is located. Default value:../.mosaic_top_n: int - The top N results that should be returned from MOSAIC. Default value:5.mosaic_index: str - Defines the index name to be used when searching MOSAIC. Default value:demo-simplewiki.mosaic_lang: str - Defines the results' language returned from MOSAIC.. Default value:en.
The primary function, run, takes the query q and executes the whole pipeline. It returns a json object with the outputs from QueryOptimizer and from Summarizer, in addition to the prompt string used for each of these modules.
requirement to run the code: Create a .env file in the main folder here and add the following:
MISTRAL_API_KEY = "<MISTRAL_API_KEY_VALUE>"Option 1
git clone <repo_git_url>
cd MOSAIC_LLM
pip install .
Option 2
pip install git+<repo_git_https_url>
from mosaic_llm.mosaicllm import MosaicLLM
q= "Apple"
mosaicllm = MosaicLLM(root="")
apple_query_result = mosaicllm.run(query=q)
print("original query summary:")
print(apple_query_result.summary_query)
print("\nimproved query summary:")
print(apple_query_result.summary_clarified_query)
print("\nsubquery_1-3 summary:")
print(apple_query_result.summary_subqueries)