Skip to content

Leveraging large language models with RAG and classic machine learning to create knowledge graphs and conceptual hierarchies

Notifications You must be signed in to change notification settings

zhutchens/rag-kg-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Overview

Ongoing research project investigating the usage of AI-enabled RAG systems to build knowledge graphs and conceptual hierarchies. Current functionality includes identifying concepts, outcomes, main topics, main topic relationships from text sections/chapters, building and visualizing knowledge graphs and hierarchies, and evaluating the LLM output of concepts and outcomes (if ground truth values are available).

The knowledge graphs and conceptual hierachies are only prototypes and are still in development, so please do not expect anything extravagant yet. However, functions to generate concepts and outcomes from text are seem good from our testing and observations.

We also offer language model, metric, and retriever classes that can be use standalone. You'll need some text (raw text, files, or links) to get started with these.

The main class, RAGKGGenerator, needs a language model and some text to get started. Although its optional, you can also provide a course syllabus, retriever type, and sentence transformer if you do not want the defaults.

If you only wish to build a knowledge graph, we offer two main functions to do this for you:

One of them, text_pipeline(), generates a knowledge graph from some text. It doesn't take any arguments, you only need to instantiate a RAGKGGenerator to use it.

The other, syllabus_pipeline(), uses the textbook and syllabus to generate a knowledge graph. Again, it doesn't take any arguments, but for this to work you must provide a syllabus when instantiating the RAGKGGenerator class.

Examples

To build a knowledge graph from text:

gen = RAGKGGenerator(
    chapters = <chapters>, # put your text sections/chapters here
    llm = <your llm>, # put your llm here, it MUST inherit from DeepEvalLLM. We offer three prebuilts in src.llms
    texts = <your text here>, # text here. either provide a list or one 
)

kg = gen.text_pipeline() # this generates an html file in ./visualizations

Building a knowledge from a syllabus and text is very similar. You will need some syllabus.

gen = RAGKGGenerator(
    chapters = <chapters>, # put your text sections/chapters here
    llm = <your llm>, # put your llm here, it MUST inherit from DeepEvalLLM. We offer three prebuilts in src.llms
    texts = <your text here>, # text here. either provide a list or one (raw text, files, and links are fine.)
    syllabus = <syllabus here>, # text, file, or link
)

kg = gen.syllabus_pipeline() # this generates an html file in ./visualizations

If you would like to use functions independently:

gen = RAGKGGenerator(
    chapters = <chapters>, # put your text sections/chapters here
    llm = <your llm>, # put your llm here, it MUST inherit from DeepEvalLLM. We offer three prebuilts in src.llms
    texts = <your text here>, # text here. either provide a list or one (raw text, files, and links are fine.)
    syllabus = <syllabus here>, # text, file, or link
)

gen.identify_concepts(5) # generates 5 concepts per text section/chapter
gen.identify_outcomes(5) # generates 5 outcomes per text section/chapter
gen.summarize() # if the text is short and fits in the llm context window, this can summarize it
gen.objectives_from_syllabus() # gets course objectives from provided syllabus
...
and more!

Environment Setup

Using conda:

conda env create -f environment.yml

Activation and installation

On Windows and Linux, activate conda virtual environment using:

conda activate rag_kg_generation

About

Leveraging large language models with RAG and classic machine learning to create knowledge graphs and conceptual hierarchies

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published