fix: limit resolve_extracted_nodes context to prevent max_tokens overflow by rafaelreis-r · Pull Request #1276 · getzep/graphiti

rafaelreis-r · 2026-02-26T11:16:34Z

Problem

When deduplicating extracted nodes in resolve_extracted_nodes, the LLM prompt includes the full candidate.attributes dict for every candidate returned by the similarity search. In production workloads with:

10–15 extracted nodes per episode
Up to 10 candidates per node from vector/BM25 search

This produces ~100–150 candidates, each carrying verbose attributes (summaries, descriptions, relationship lists, etc.). The resulting prompt regularly exceeds 17k tokens, pushing the JSON output past the max_tokens=16384 limit and causing truncated/invalid responses.

Reported in issue #1275.

Fix

Two surgical changes to graphiti_core/utils/maintenance/node_operations.py:

1. Remove `candidate.attributes` from the resolution context

Only name and entity_types are needed for identity-level deduplication. The full attribute payload adds thousands of tokens without improving match quality.

# Before
existing_nodes_context = [
    {
        **{'name': candidate.name, 'entity_types': candidate.labels},
        **candidate.attributes,   # ← verbose, causes overflow
    }
    for candidate in indexes.existing_nodes
]

# After
existing_nodes_context = [
    {
        'name': candidate.name,
        'entity_types': candidate.labels,
        # attributes omitted to prevent token overflow
    }
    for candidate in indexes.existing_nodes[:MAX_RESOLVE_CANDIDATES]
]

2. Add `MAX_RESOLVE_CANDIDATES = 50` constant

Caps the candidate list to bound worst-case context size regardless of search result count. Placed alongside the existing MAX_NODES constant.

Impact

No change to resolution logic or search behavior
Eliminates context overflow for typical workloads (10–15 nodes × 10 candidates)
MAX_RESOLVE_CANDIDATES = 50 provides a safety ceiling for extreme cases
The name + entity_types fields are sufficient for the deduplication LLM to make correct identity judgments

…flow When resolving deduplicated nodes, the LLM prompt included full candidate.attributes (summaries, descriptions, etc.) for every candidate returned by similarity search. With 10-15 extracted nodes and up to 10 candidates each, this produces ~150 verbose candidates in a single context, routinely exceeding the max_tokens limit (16384) and causing the JSON output to be truncated mid-response. Two surgical changes: 1. Remove candidate.attributes from the resolution context — only name and entity_types are needed for identity matching. 2. Cap the candidate list at MAX_RESOLVE_CANDIDATES = 50 to bound worst-case context growth regardless of search result count. Closes getzep#1275

danielchalef · 2026-02-26T11:16:47Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

Rafael Reis seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: limit resolve_extracted_nodes context to prevent max_tokens overflow#1276

fix: limit resolve_extracted_nodes context to prevent max_tokens overflow#1276
rafaelreis-r wants to merge 1 commit intogetzep:mainfrom
rafaelreis-r:fix/resolve-nodes-context-overflow

rafaelreis-r commented Feb 26, 2026

Uh oh!

danielchalef commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rafaelreis-r commented Feb 26, 2026

Problem

Fix

1. Remove candidate.attributes from the resolution context

2. Add MAX_RESOLVE_CANDIDATES = 50 constant

Impact

Uh oh!

danielchalef commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Remove `candidate.attributes` from the resolution context

2. Add `MAX_RESOLVE_CANDIDATES = 50` constant