-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #27 from ittia-research/dev
Update docs and infra
- Loading branch information
Showing
4 changed files
with
144 additions
and
96 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
## Roadmap | ||
- [ ] Check one line of single statement. | ||
- [ ] Check long paragraphs or a content source (URL, file, etc.) | ||
- [ ] What's the ultimate goals? | ||
- [ ] Long-term memory and caching. | ||
- [ ] Fact-check standards and database. | ||
|
||
## Work | ||
### Frontend | ||
- [ ] API: Input string or url, output analysis | ||
- [ ] Optional more detailed output: correction, explanation, references | ||
|
||
### Backend | ||
- [ ] Get list of facts from input, improve performance | ||
- [ ] Get search results of each facts and check if they are true or false | ||
- [ ] Get weight of facts and opinions | ||
- [ ] Compare different search engines. | ||
- [ ] Add support for URL input | ||
- [ ] Performance benchmark. | ||
|
||
LLM | ||
- [ ] Better way to handle LLM output formatting: list, JSON. | ||
|
||
Embedding: | ||
- [ ] chunk size optimize | ||
|
||
Contexts | ||
- [ ] Filter out non-related contexts before send for verdict | ||
|
||
Retrieval | ||
- [ ] Retrieve the latest info when facts might change | ||
|
||
### pipeline | ||
DSPy | ||
- [ ] choose the right LLM temperature | ||
- [ ] better training datasets | ||
|
||
### Retrieval | ||
- [ ] Better retrieval solution: high performance, concurrency, multiple index, index editable. | ||
- [ ] Getting more sources when needed. | ||
|
||
### Verdict | ||
- [ ] Set final verdict standards. | ||
|
||
### Toolchain | ||
- [ ] Evaluate MLOps pipeline: https://kitops.ml | ||
- [ ] Evaluate data quality of searching and url fetching. Better error handle. | ||
- [ ] Use multiple sources for fact-check. | ||
|
||
### Infra | ||
- [ ] Stress test | ||
- [ ] Meaningful health endpoint | ||
- [ ] Monitoring service health | ||
|
||
### Calculate | ||
- [ ] Shall we calculate percentage of true and false in the input? Any better calculation than items count? | ||
|
||
### Logging | ||
- [ ] Full logging on chain of events for re-producing and debugging. | ||
|
||
### Issues | ||
- [ ] Uses many different types of models, difficult for performance optimization and maintenance. | ||
- [ ] LLM verdict wrong contradict to context provided. | ||
|
||
### Data | ||
- [ ] A standard on save fact-check related data. | ||
- [ ] Store fact-check data with standards. | ||
|
||
### Research | ||
- [ ] Chroma #retrieve | ||
- [ ] AI-generated misinformation | ||
|
||
### Extend | ||
- [ ] To other types of media: image, audio, video, etc. | ||
- [ ] Shall we try to answer questions if provided. | ||
- [ ] Multi-language support. | ||
- [ ] Add logging and long-term memory. | ||
- [ ] Integrate with other fact-check services. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,6 @@ | ||
INFINITY_API_KEY=<CHANGE_ME> | ||
INFINITY_LOG_LEVEL=trace | ||
INFINITY_MODEL_ID=jinaai/jina-reranker-v2-base-multilingual | ||
|
||
INFINITY_LOG_LEVEL=debug | ||
INFINITY_MODEL_ID="jinaai/jina-embeddings-v2-base-en;jinaai/jina-reranker-v2-base-multilingual;" | ||
INFINITY_MODEL_WARMUP=false | ||
# batch size: small to save VRAM, big to improve performance | ||
INFINITY_BATCH_SIZE=8 |