Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"result": {"response": {"id": "BaBTafzEPOiejuMP-KO-kAY", "created": 1767088122, "model": "gemini-2.5-flash-lite", "object": "chat.completion", "system_fingerprint": null, "choices": [{"finish_reason": "stop", "index": 0, "message": {"content": "{\n \"title\": \"GraphRAG Ecosystem and LLM Integration\",\n \"summary\": \"The community centers around GraphRAG, a graph-based retrieval-augmented generation system developed by Microsoft Research. GraphRAG's primary function is to enhance Large Language Model (LLM) outputs by extracting structured data from unstructured text using knowledge graphs. It directly supports major LLM providers such as OpenAI and Google Gemini, indicating a significant role in advancing LLM capabilities. The core technology relies on knowledge graphs for its data extraction and enhancement processes.\",\n \"findings\": [\n {\n \"summary\": \"GraphRAG as a Central System\",\n \"explanation\": \"GraphRAG is the central entity in this community, defined as a graph-based retrieval-augmented generation system. Its purpose is to improve LLM outputs by leveraging knowledge graphs to extract structured data from unstructured text [Data: Entities (0)]. This system's architecture and functionality are key to understanding the interactions within this community. The system's high degree (5) suggests it is a significant node with many connections and potential influence within its domain [Data: Entities (0)].\"\n },\n {\n \"summary\": \"Microsoft Research's Development Role\",\n \"explanation\": \"Microsoft Research is identified as the developer of the GraphRAG system. This relationship highlights Microsoft's involvement in cutting-edge AI research, specifically in the area of enhancing LLM performance through graph-based methods [Data: Entities (1), Relationships (0)]. The combined degree of 6 for this relationship indicates a strong and significant connection between Microsoft Research and GraphRAG, suggesting substantial investment or strategic importance [Data: Relationships (0)].\"\n },\n {\n \"summary\": \"Support for Major LLM Providers\",\n \"explanation\": \"GraphRAG directly supports prominent LLM providers, including OpenAI and Google Gemini. This indicates that GraphRAG is a foundational technology enabling or improving the capabilities of these major AI players [Data: Entities (2), Entities (3), Relationships (1), Relationships (2)]. The shared combined degree of 6 for these relationships underscores the critical nature of GraphRAG's support for these LLM providers, suggesting a deep integration or dependency [Data: Relationships (1), Relationships (2)].\"\n },\n {\n \"summary\": \"Enhancement of Large Language Models (LLMs)\",\n \"explanation\": \"A primary function of GraphRAG is to enhance LLM outputs. This is achieved by integrating knowledge graphs, which allow for the extraction of structured data from unstructured text, thereby providing LLMs with richer context and more accurate information [Data: Entities (4), Entities (5), Relationships (3), Relationships (4)]. The relationship between GraphRAG and LLMs, with a combined degree of 6, signifies a core operational link and highlights GraphRAG's role in advancing the general capabilities of LLMs [Data: Relationships (3)].\"\n },\n {\n \"summary\": \"Reliance on Knowledge Graphs\",\n \"explanation\": \"Knowledge Graphs are a fundamental component of the GraphRAG system. They are utilized to extract structured data from unstructured text, which is then used to enhance LLM outputs [Data: Entities (4), Relationships (4)]. The description of Knowledge Graphs emphasizes their role in structuring information, which is crucial for GraphRAG's retrieval-augmented generation process. This dependency indicates that the effectiveness and capabilities of GraphRAG are directly tied to the quality and structure of the knowledge graphs it employs [Data: Entities (4), Relationships (4)].\"\n }\n ],\n \"rating\": 7.5,\n \"rating_explanation\": \"The impact severity rating is high due to GraphRAG's foundational role in enhancing major LLM providers and its reliance on knowledge graphs, indicating significant potential influence and dependency in the AI landscape.\"\n}", "role": "assistant", "tool_calls": null, "function_call": null, "images": [], "thinking_blocks": [], "provider_specific_fields": null}}], "usage": {"completion_tokens": 815, "prompt_tokens": 2337, "total_tokens": 3152, "completion_tokens_details": null, "prompt_tokens_details": {"audio_tokens": null, "cached_tokens": null, "text_tokens": 2337, "image_tokens": null}}, "vertex_ai_grounding_metadata": [], "vertex_ai_url_context_metadata": [], "vertex_ai_safety_results": [], "vertex_ai_citation_metadata": []}}}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"result": {"response": {"id": "_p9TaZ2HNIK_g8UPzOPvgAI", "created": 1767088116, "model": "gemini-2.5-flash-lite", "object": "chat.completion", "system_fingerprint": null, "choices": [{"finish_reason": "stop", "index": 0, "message": {"content": "(\"entity\"<|>GRAPHRAG<|>ORGANIZATION<|>GraphRAG is a graph-based retrieval-augmented generation system that uses knowledge graphs to enhance LLM outputs by extracting structured data from unstructured text.)\n##\n(\"entity\"<|>MICROSOFT RESEARCH<|>ORGANIZATION<|>Microsoft Research developed the GraphRAG system.)\n##\n(\"entity\"<|>OPENAI<|>ORGANIZATION<|>OpenAI is an LLM provider supported by GraphRAG.)\n##\n(\"entity\"<|>GOOGLE GEMINI<|>ORGANIZATION<|>Google Gemini is an LLM provider supported by GraphRAG.)\n##\n(\"relationship\"<|>GRAPHRAG<|>MICROSOFT RESEARCH<|>GraphRAG was developed by Microsoft Research.<|>9)\n##\n(\"relationship\"<|>GRAPHRAG<|>OPENAI<|>GraphRAG supports OpenAI as an LLM provider.<|>7)\n##\n(\"relationship\"<|>GRAPHRAG<|>GOOGLE GEMINI<|>GraphRAG supports Google Gemini as an LLM provider.<|>7)\n<|COMPLETE|>", "role": "assistant", "tool_calls": null, "function_call": null, "images": [], "thinking_blocks": [], "provider_specific_fields": null}}], "usage": {"completion_tokens": 225, "prompt_tokens": 1820, "total_tokens": 2045, "completion_tokens_details": null, "prompt_tokens_details": {"audio_tokens": null, "cached_tokens": null, "text_tokens": 1820, "image_tokens": null}}, "vertex_ai_grounding_metadata": [], "vertex_ai_url_context_metadata": [], "vertex_ai_safety_results": [], "vertex_ai_citation_metadata": []}}}
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{"result": {"response": {"id": "AKBTaYfTI4uag8UPjcyRmQY", "created": 1767088119, "model": "gemini-2.5-flash-lite", "object": "chat.completion", "system_fingerprint": null, "choices": [{"finish_reason": "stop", "index": 0, "message": {"content": "(\"entity\"<|>LLM<|>ORGANIZATION<|>LLM refers to Large Language Models, which GraphRAG aims to enhance.)\n##\n(\"entity\"<|>KNOWLEDGE GRAPH<|>ORGANIZATION<|>Knowledge Graphs are used by GraphRAG to extract structured data from unstructured text and enhance LLM outputs.)\n##\n(\"relationship\"<|>GRAPHRAG<|>LLM<|>GraphRAG is designed to enhance LLM outputs.<|>8)\n##\n(\"relationship\"<|>GRAPHRAG<|>KNOWLEDGE GRAPH<|>GraphRAG uses knowledge graphs to extract structured data and enhance LLM outputs.<|>9)\n<|COMPLETE|>", "role": "assistant", "tool_calls": null, "function_call": null, "images": [], "thinking_blocks": [], "provider_specific_fields": null}}], "usage": {"completion_tokens": 138, "prompt_tokens": 2081, "total_tokens": 2219, "completion_tokens_details": null, "prompt_tokens_details": {"audio_tokens": null, "cached_tokens": null, "text_tokens": 2081, "image_tokens": null}}, "vertex_ai_grounding_metadata": [], "vertex_ai_url_context_metadata": [], "vertex_ai_safety_results": [], "vertex_ai_citation_metadata": []}}}

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

147 changes: 115 additions & 32 deletions graphrag/cli/query.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
"""CLI implementation of the query subcommand."""

import asyncio
import contextlib
import io
import sys
import warnings
from pathlib import Path
from typing import TYPE_CHECKING, Any

Expand All @@ -18,7 +21,72 @@
if TYPE_CHECKING:
import pandas as pd

# ruff: noqa: T201
# Suppress harmless asyncio cleanup warnings on Windows
warnings.filterwarnings("ignore", message=".*coroutine.*was never awaited")
warnings.filterwarnings("ignore", message=".*Fatal error on SSL transport.*")
warnings.filterwarnings("ignore", message=".*Event loop is closed.*")
warnings.filterwarnings("ignore", category=RuntimeWarning, message=".*coroutine.*")

# Install filtered stderr to suppress SSL cleanup errors
_original_stderr = sys.stderr


class FilteredStderr:
"""Filter stderr to suppress harmless SSL cleanup errors on Windows."""

def __init__(self, original_stderr):
self.original_stderr = original_stderr
self.suppress_file_patterns = [
"sslproto.py",
"proactor_events.py",
]
self.in_ssl_traceback = False

def write(self, message):
msg = message
msg_lower = message.lower()

# Detect start of SSL cleanup exception - check for "Exception ignored" or "Traceback"
if ("exception ignored" in msg_lower or msg_lower.strip().startswith("traceback")) and \
("ssl" in msg_lower or "_sslprotocol" in msg_lower or any(p in msg for p in self.suppress_file_patterns)):
self.in_ssl_traceback = True
return

# If we're in an SSL traceback, suppress everything until blank line
if self.in_ssl_traceback:
# Stop suppressing on blank line (end of traceback) or if we see a non-SSL error
if message.strip() == "":
self.in_ssl_traceback = False
# Continue suppressing traceback lines
return

# Also suppress individual SSL/asyncio cleanup error lines that aren't in a traceback
if any(p in msg for p in self.suppress_file_patterns) and \
("runtimeerror" in msg_lower or "attributeerror" in msg_lower or "fatal error" in msg_lower or "event loop is closed" in msg_lower):
return

self.original_stderr.write(message)

def flush(self):
self.original_stderr.flush()

def __getattr__(self, name):
return getattr(self.original_stderr, name)


# Install filtered stderr to suppress SSL cleanup errors during program execution
sys.stderr = FilteredStderr(_original_stderr)


def _run_async_with_cleanup(coro):
"""Run an async coroutine and suppress harmless SSL cleanup errors on Windows."""
result = asyncio.run(coro)
# Small delay to allow async cleanup
try:
asyncio.run(asyncio.sleep(0.1))
except (RuntimeError, AttributeError):
pass
return result


def run_global_search(
Expand Down Expand Up @@ -59,21 +127,26 @@ def run_global_search(
final_community_reports_list = dataframe_dict["community_reports"]
index_names = dataframe_dict["index_names"]

response, context_data = asyncio.run(
api.multi_index_global_search(
config=config,
entities_list=final_entities_list,
communities_list=final_communities_list,
community_reports_list=final_community_reports_list,
index_names=index_names,
community_level=community_level,
dynamic_community_selection=dynamic_community_selection,
response_type=response_type,
streaming=streaming,
query=query,
verbose=verbose,
)
)
async def run_with_cleanup():
try:
return await api.multi_index_global_search(
config=config,
entities_list=final_entities_list,
communities_list=final_communities_list,
community_reports_list=final_community_reports_list,
index_names=index_names,
community_level=community_level,
dynamic_community_selection=dynamic_community_selection,
response_type=response_type,
streaming=streaming,
query=query,
verbose=verbose,
)
finally:
# Give time for async cleanup
await asyncio.sleep(0.1)

response, context_data = asyncio.run(run_with_cleanup())
print(response)
return response, context_data

Expand Down Expand Up @@ -113,21 +186,31 @@ def on_context(context: Any) -> None:
print()
return full_response, context_data

return asyncio.run(run_streaming_search())
async def run_streaming_with_cleanup():
try:
return await run_streaming_search()
finally:
await asyncio.sleep(0.1)

return asyncio.run(run_streaming_with_cleanup())
# not streaming
response, context_data = asyncio.run(
api.global_search(
config=config,
entities=final_entities,
communities=final_communities,
community_reports=final_community_reports,
community_level=community_level,
dynamic_community_selection=dynamic_community_selection,
response_type=response_type,
query=query,
verbose=verbose,
)
)
async def run_search_with_cleanup():
try:
return await api.global_search(
config=config,
entities=final_entities,
communities=final_communities,
community_reports=final_community_reports,
community_level=community_level,
dynamic_community_selection=dynamic_community_selection,
response_type=response_type,
query=query,
verbose=verbose,
)
finally:
await asyncio.sleep(0.1)

response, context_data = asyncio.run(run_search_with_cleanup())
print(response)

return response, context_data
Expand Down Expand Up @@ -416,7 +499,7 @@ def run_basic_search(
final_text_units_list = dataframe_dict["text_units"]
index_names = dataframe_dict["index_names"]

response, context_data = asyncio.run(
response, context_data = _run_async_with_cleanup(
api.multi_index_basic_search(
config=config,
text_units_list=final_text_units_list,
Expand Down Expand Up @@ -461,7 +544,7 @@ def on_context(context: Any) -> None:

return asyncio.run(run_streaming_search())
# not streaming
response, context_data = asyncio.run(
response, context_data = _run_async_with_cleanup(
api.basic_search(
config=config,
text_units=final_text_units,
Expand Down
8 changes: 4 additions & 4 deletions graphrag/config/defaults.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,13 +47,13 @@
DEFAULT_OUTPUT_BASE_DIR = "output"
DEFAULT_CHAT_MODEL_ID = "default_chat_model"
DEFAULT_CHAT_MODEL_TYPE = ModelType.Chat
DEFAULT_CHAT_MODEL = "gpt-4-turbo-preview"
DEFAULT_CHAT_MODEL = "gemini-2.5-flash-lite"
DEFAULT_CHAT_MODEL_AUTH_TYPE = AuthType.APIKey
DEFAULT_EMBEDDING_MODEL_ID = "default_embedding_model"
DEFAULT_EMBEDDING_MODEL_TYPE = ModelType.Embedding
DEFAULT_EMBEDDING_MODEL = "text-embedding-3-small"
DEFAULT_EMBEDDING_MODEL = "gemini-embedding-001"
DEFAULT_EMBEDDING_MODEL_AUTH_TYPE = AuthType.APIKey
DEFAULT_MODEL_PROVIDER = "openai"
DEFAULT_MODEL_PROVIDER = "gemini"
DEFAULT_VECTOR_STORE_ID = "default_vector_store"

ENCODING_MODEL = "cl100k_base"
Expand Down Expand Up @@ -176,7 +176,7 @@ class EmbedGraphDefaults:
class EmbedTextDefaults:
"""Default values for embedding text."""

model: str = "text-embedding-3-small"
model: str = "gemini-embedding-001"
batch_size: int = 16
batch_max_tokens: int = 8191
model_id: str = DEFAULT_EMBEDDING_MODEL_ID
Expand Down
6 changes: 3 additions & 3 deletions graphrag/config/init_content.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
type: {defs.DEFAULT_CHAT_MODEL_TYPE.value}
model_provider: {defs.DEFAULT_MODEL_PROVIDER}
auth_type: {defs.DEFAULT_CHAT_MODEL_AUTH_TYPE.value} # or azure_managed_identity
api_key: ${{GRAPHRAG_API_KEY}} # set this in the generated .env file, or remove if managed identity
api_key: ${{GEMINI_API_KEY}} # set this in the generated .env file, or remove if managed identity
model: {defs.DEFAULT_CHAT_MODEL}
# api_base: https://<instance>.openai.azure.com
# api_version: 2024-05-01-preview
Expand All @@ -37,7 +37,7 @@
type: {defs.DEFAULT_EMBEDDING_MODEL_TYPE.value}
model_provider: {defs.DEFAULT_MODEL_PROVIDER}
auth_type: {defs.DEFAULT_EMBEDDING_MODEL_AUTH_TYPE.value}
api_key: ${{GRAPHRAG_API_KEY}}
api_key: ${{GEMINI_API_KEY}}
model: {defs.DEFAULT_EMBEDDING_MODEL}
# api_base: https://<instance>.openai.azure.com
# api_version: 2024-05-01-preview
Expand Down Expand Up @@ -160,5 +160,5 @@
"""

INIT_DOTENV = """\
GRAPHRAG_API_KEY=<API_KEY>
GEMINI_API_KEY=<API_KEY>
"""
11 changes: 10 additions & 1 deletion graphrag/config/load_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,10 +75,19 @@ def _load_dotenv(config_path: Path | str) -> None:
config_path : Path | str
The path to the config file.
"""
import os
from dotenv import dotenv_values

config_path = Path(config_path)
dotenv_path = config_path.parent / ".env"
if dotenv_path.exists():
load_dotenv(dotenv_path)
# Use dotenv_values to ensure variables are loaded into os.environ
env_config = dotenv_values(str(dotenv_path))
for key, value in env_config.items():
if value is not None:
# Strip BOM and whitespace from key names
clean_key = key.strip().lstrip('\ufeff')
os.environ[clean_key] = str(value)


def _get_config_path(root_dir: Path, config_filepath: Path | None) -> Path:
Expand Down
6 changes: 6 additions & 0 deletions graphrag/language_model/providers/litellm/chat_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,12 @@

import inspect
import json
import logging
from collections.abc import AsyncGenerator, Generator
from typing import TYPE_CHECKING, Any, cast

logger = logging.getLogger(__name__)

import litellm
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from litellm import (
Expand Down Expand Up @@ -221,6 +224,9 @@ def __init__(
self.name = name
self.config = config
self.cache = cache.child(self.name) if cache else None
model_provider = config.model_provider
model = config.deployment_name or config.model
logger.info(f"Using LiteLLM provider with model: {model_provider}/{model}")
self.completion, self.acompletion = _create_completions(
config, self.cache, "chat"
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,11 @@

"""Embedding model implementation using Litellm."""

import logging
from typing import TYPE_CHECKING, Any

logger = logging.getLogger(__name__)

import litellm
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from litellm import (
Expand Down Expand Up @@ -185,6 +188,9 @@ def __init__(
self.name = name
self.config = config
self.cache = cache.child(self.name) if cache else None
model_provider = config.model_provider
model = config.deployment_name or config.model
logger.info(f"Using LiteLLM provider with model: {model_provider}/{model}")
self.embedding, self.aembedding = _create_embeddings(
config, self.cache, "embeddings"
)
Expand Down
9 changes: 9 additions & 0 deletions input/sample.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
GraphRAG is a graph-based retrieval-augmented generation system that uses knowledge graphs to enhance LLM outputs. It extracts structured data from unstructured text using the power of large language models.

The system works by:
1. Processing input documents and extracting entities and relationships
2. Building a knowledge graph from the extracted information
3. Creating community reports and embeddings
4. Enabling semantic search across the knowledge graph

GraphRAG was developed by Microsoft Research and supports various LLM providers including OpenAI and Google Gemini.
Binary file added output/communities.parquet
Binary file not shown.
Binary file added output/community_reports.parquet
Binary file not shown.
1 change: 1 addition & 0 deletions output/context.json
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{}
Binary file added output/documents.parquet
Binary file not shown.
Binary file added output/entities.parquet
Binary file not shown.
Binary file added output/relationships.parquet
Binary file not shown.
Loading