Merge pull request #27 from ittia-research/dev

Update docs and infra
ittia-research · Sep 10, 2024 · 80e6c3c · 80e6c3c
2 parents dcac942 + b1279d7
commit 80e6c3c
Show file tree

Hide file tree

Showing 4 changed files with 144 additions and 96 deletions.
diff --git a/README.md b/README.md
@@ -2,20 +2,36 @@ True, false, or just opinions? Maybe not binary, but a percentage.
 
 Fact-checking tools to combat disinformation.
 
+## How It Works
+Extract a list of statements from given text.  
+For each statement search via search engine and read the top URLs.  
+For each hostname as one source, extract most related info from the read content.  
+For each sources, generate one verdict and citation with the extracted content.  
+Combine all verdicts of one statements into a final verdict.  
+Return a list of statements with verdicts, citations and others related.
+
 ## Get Started
-Online demo: `https://check.ittia.net`
+Online demo: https://check.ittia.net
+
+Using existing API: https://github.com/ittia-research/check/tree/main/packages/ittia_check
 
-Use pip package `ittia-check` to connect to API: https://github.com/ittia-research/check/tree/main/packages/ittia_check
+### Self-hosting API Server
+Main components:
+  - Check server: see docker-compose.yml
+  - LLM: any OpenAI compatible API, self-hosting via vllm or Ollama
+  - Embedding: self-hosting via Ollama or Infinity
+  - Rerank: self-hosting via Infinity
+  - Search: https://search.ittia.net
 
-API docs: `https://check.ittia.net/docs`
+### Other Tools
+- Start a wiki_dpr retrieval server (ColBERTv2) for development: https://github.com/ittia-research/check/tree/main/datasets/wiki_dpr
 
 ### Search backend
-  - Using `search.ittia.net` for better optimization.
-  - API doc: `https://search.ittia.net/docs`
-  - Features:
-    - Customizable source count.
-    - Supports search sessions: streaming, resuming.
-    - Utilizes state-of-the-art search engine (currently Google).
+- Using `search.ittia.net` for better optimization.
+- Features:
+  - Customizable source count.
+  - Supports search sessions: streaming, resuming.
+  - Utilizes state-of-the-art search engine (currently Google).
 
 ## Design
 Input something.
@@ -40,88 +56,14 @@ Input types:
 Verdicts:
 - false
 - true
+- tie: false and true verdicts counts are the same and above zero
 - irrelevant: context processed irrelevant to the statement
 
-## Todo
-### Frontend
-- [ ] API: Input string or url, output analysis
-- [ ] Optional more detailed output: correction, explanation, references
-
-### Backend
-- [ ] Get list of facts from input, improve performance
-- [ ] Get search results of each facts and check if they are true or false
-- [ ] Get weight of facts and opinions
-- [ ] Compare different search engines.
-- [ ] Add support for URL input
-- [ ] Performance benchmark.
-
-LLM
-- [ ] Better way to handle LLM output formatting: list, JSON.
-
-Embedding:
-- [ ] chunk size optimize
-
-Contexts
-- [ ] Filter out non-related contexts before send for verdict
-
-Retrieval
-- [ ] Retrieve the latest info when facts might change
-
-### pipeline
-DSPy:
-- [ ] choose the right LLM temperature
-- [ ] better training datasets
-
-### Retrival
-- [ ] Better retrieval solution: high performance, concurrency, multiple index, index editable.
-- [ ] Getting more sources when needed.
-
-### Verdict
-- [ ] Set final verdict standards.
-
-### Toolchain
-- [ ] Evaluate MLOps pipeline
-  - https://kitops.ml
-- [ ] Evaluate data quality of searching and url fetching. Better error handle.
-- [ ] Use multiple sources for fact-check.
-
-### Stability
-- [ ] Stress test.
-
-### Extend
-- [ ] To other types of media: image, audio, video, etc.
-- [ ] Shall we try to answer questions if provided.
-- [ ] Multi-language support.
-- [ ] Add logging and long-term memory.
-- [ ] Integrate with other fact-check services.
-
-### Calculate
-- [ ] Shall we calculate percentage of true and false in the input? Any better calculation than items count?
-
-### Logging
-- [ ] Full logging on chain of events for re-producing and debugging.
-
-### Checkout
-- [ ] Chroma #retrieve
-
-## Issues
-- [ ] Uses many different types of models, difficult for performance optimization and maintenance.
-- [ ] LLM verdict wrong contradict to context provided.
-
 ## References
-### Reports
-- [ ] AI-generated misinformation
-
 ### Fact-check
 - https://www.snopes.com
 - https://www.bmi.bund.de/SharedDocs/schwerpunkte/EN/disinformation/examples-of-russian-disinformation-and-the-facts.html
 
-### Resources
-Inference
-  - https://console.groq.com/docs/ (free tier)
-Search and fetch:
-  - https://jina.ai/read
-
 ## Acknowledgements
 - TPU Research Cloud team at Google
 - Google Search

diff --git a/docker-compose.yml b/docker-compose.yml
@@ -8,15 +8,7 @@ services:
       - ./infra/env.d/check
     ports:
       - 8000:8000
-    restart: always
-
-  ollama:
-    image: ollama/ollama
-    container_name: ollama
-    ports:
-      - "11434:11434"
-    volumes:
-      - /data/volumes/ollama:/root/.ollama
+    # Remove the GPU part belwo if not inferencing locally
     deploy:
       resources:
         reservations:
@@ -26,6 +18,21 @@ services:
               capabilities: [gpu]
     restart: always
 
+  # Using vllm for LLM inference on Google TPU
+  # Tested on Google v4-8  
+  vllm:                                                                                                                                                                                                                                
+    image: ittia/vllm:0.6.0-tpu                                                                                                                                                                                                        
+    privileged: true                                                                                                                                                                                                                   
+    ports:                                                                                                                                                                                                                             
+      - "8010:8010"                                                                                                                                                                                                                    
+    shm_size: 128G                                                                                                                                                                                                                     
+    volumes:                                                                                                                                                                                                                           
+      - /mnt/cache:/root/.cache                                                                                                                                                                                                          
+    env_file:                                                                                                                                                                                                                          
+      - ./env.d/huggingface                                                                                                                                                                                                            
+    command: vllm serve mistralai/Mistral-Nemo-Instruct-2407 --tensor-parallel-size 4 --port 8010 --trust-remote-code --max-model-len 12288                                                                                            
+    restart: always 
+
   # Infinity supports embedding and rerank models, v2 version supports serving multiple models
   infinity:
     image: michaelf34/infinity:latest
@@ -46,3 +53,22 @@ services:
               count: all
               capabilities: [gpu]
     restart: always
+
+  # Services below are not actively in use.
+  # Keeps here for reference.
+
+  ollama:
+    image: ollama/ollama
+    container_name: ollama
+    ports:
+      - "11434:11434"
+    volumes:
+      - /data/volumes/ollama:/root/.ollama
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]
+    restart: always
diff --git a/docs/to-do.md b/docs/to-do.md
@@ -0,0 +1,78 @@
+## Roadmap
+- [ ] Check one line of single statement.
+- [ ] Check long paragraphs or a content source (URL, file, etc.)
+  - [ ] What's the ultimate goals?
+- [ ] Long-term memory and caching.
+- [ ] Fact-check standards and database.
+
+## Work
+### Frontend
+- [ ] API: Input string or url, output analysis
+- [ ] Optional more detailed output: correction, explanation, references
+
+### Backend
+- [ ] Get list of facts from input, improve performance
+- [ ] Get search results of each facts and check if they are true or false
+- [ ] Get weight of facts and opinions
+- [ ] Compare different search engines.
+- [ ] Add support for URL input
+- [ ] Performance benchmark.
+
+LLM
+- [ ] Better way to handle LLM output formatting: list, JSON.
+
+Embedding:
+- [ ] chunk size optimize
+
+Contexts
+- [ ] Filter out non-related contexts before send for verdict
+
+Retrieval
+- [ ] Retrieve the latest info when facts might change
+
+### pipeline
+DSPy
+  - [ ] choose the right LLM temperature
+  - [ ] better training datasets
+
+### Retrieval
+- [ ] Better retrieval solution: high performance, concurrency, multiple index, index editable.
+- [ ] Getting more sources when needed.
+
+### Verdict
+- [ ] Set final verdict standards.
+
+### Toolchain
+- [ ] Evaluate MLOps pipeline: https://kitops.ml
+- [ ] Evaluate data quality of searching and url fetching. Better error handle.
+- [ ] Use multiple sources for fact-check.
+
+### Infra
+- [ ] Stress test
+- [ ] Meaningful health endpoint
+- [ ] Monitoring service health
+
+### Calculate
+- [ ] Shall we calculate percentage of true and false in the input? Any better calculation than items count?
+
+### Logging
+- [ ] Full logging on chain of events for re-producing and debugging.
+
+### Issues
+- [ ] Uses many different types of models, difficult for performance optimization and maintenance.
+- [ ] LLM verdict wrong contradict to context provided.
+
+### Data
+- [ ] A standard on save fact-check related data.
+- [ ] Store fact-check data with standards.
+
+### Research
+- [ ] Chroma #retrieve
+- [ ] AI-generated misinformation
+
+### Extend 
+- [ ] To other types of media: image, audio, video, etc.
+- [ ] Shall we try to answer questions if provided.
+- [ ] Multi-language support.
+- [ ] Add logging and long-term memory.
+- [ ] Integrate with other fact-check services.
diff --git a/infra/env.d/infinity b/infra/env.d/infinity
@@ -1,4 +1,6 @@
 INFINITY_API_KEY=<CHANGE_ME>
-INFINITY_LOG_LEVEL=trace
-INFINITY_MODEL_ID=jinaai/jina-reranker-v2-base-multilingual
-
+INFINITY_LOG_LEVEL=debug
+INFINITY_MODEL_ID="jinaai/jina-embeddings-v2-base-en;jinaai/jina-reranker-v2-base-multilingual;"
+INFINITY_MODEL_WARMUP=false
+# batch size: small to save VRAM, big to improve performance
+INFINITY_BATCH_SIZE=8