Skip to content

Security: SQL/filter injection in VectorStore.structured_search via unsanitized person/entity names #53

@CrepuscularIRIS

Description

@CrepuscularIRIS

Bug Description

The structured_search method in database/vector_store.py constructs LanceDB .where() filter clauses by directly interpolating user-derived values (persons, entities, timestamps) into query strings using f-strings. The persons and entities fields have no escaping at all, allowing an attacker to inject arbitrary filter logic.

Location

database/vector_store.py:206-216

Reproduction

from database.vector_store import VectorStore

store = VectorStore(db_path="./test_db")
# ... add some entries ...

# Attacker crafts a person name with injection payload:
# The person name breaks out of make_array() and injects arbitrary conditions
malicious_persons = ["Alice')) OR true--"]

# This produces the where clause:
# array_has_any(persons, make_array('Alice')) OR true--'))
# Which bypasses the intended filter and returns ALL entries
results = store.structured_search(persons=malicious_persons)

The vulnerable code:

# Line 206-208: No escaping of person names
if persons:
    values = ", ".join([f"'{p}'" for p in persons])
    conditions.append(f"array_has_any(persons, make_array({values}))")

# Line 214-216: Same issue with entities
if entities:
    values = ", ".join([f"'{e}'" for e in entities])
    conditions.append(f"array_has_any(entities, make_array({values}))")

Note: The location field (line 211) does have basic replace("'", "''") escaping, but persons and entities do not.

Impact

  • Data exfiltration: Bypass filters to read all stored memories across tenants
  • Filter manipulation: Inject conditions to return specific records or no records

Suggested Fix

# Apply the same escaping used for location to all string interpolations:
if persons:
    safe_persons = [p.replace("'", "''") for p in persons]
    values = ", ".join([f"'{p}'" for p in safe_persons])
    conditions.append(f"array_has_any(persons, make_array({values}))")

if entities:
    safe_entities = [e.replace("'", "''") for e in entities]
    values = ", ".join([f"'{e}'" for e in safe_entities])
    conditions.append(f"array_has_any(entities, make_array({values}))")

# Also escape timestamp_range values:
if timestamp_range:
    start_time = str(start_time).replace("'", "''")
    end_time = str(end_time).replace("'", "''")
    conditions.append(f"timestamp >= '{start_time}' AND timestamp <= '{end_time}'")

Ideally, use parameterized queries if LanceDB supports them.


Found via automated codebase analysis (confirmed by independent architecture review). Happy to submit a PR if confirmed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions