Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Nov 5, 2025

📄 15% (0.15x) speedup for format_entities in mem0/memory/utils.py

⏱️ Runtime : 560 microseconds 485 microseconds (best of 250 runs)

📝 Explanation and details

The optimized code replaces an explicit for-loop with a list comprehension, achieving a 15% speedup by eliminating redundant operations.

Key optimization: The original code creates an empty list and then repeatedly calls .append() in a loop, which involves multiple function calls and intermediate variable assignments. The optimized version uses a single list comprehension that builds the entire list in one operation.

Performance benefits:

  • Eliminates intermediate variable: No need for the simplified variable that temporarily stores each formatted string
  • Reduces function call overhead: .append() is called thousands of times in the original version (5,036 hits in profiler), but the list comprehension builds the list directly
  • Better memory allocation: List comprehensions can pre-allocate memory more efficiently since Python knows the final size

Profiler evidence: The original version shows significant time spent on string formatting (44.6%) and list appending (30.3%), totaling ~75% of execution time. The optimized version consolidates these operations into a single list comprehension line that accounts for 96.5% of the time but completes faster overall.

Test case performance: The optimization particularly excels with larger datasets - the 1000-entity test case shows 19-20% speedup, while smaller test cases (1-3 entities) show slight slowdowns due to list comprehension setup overhead. This suggests the optimization is most beneficial for batch processing scenarios with many entities, which is likely the primary use case for a utility function formatting relationship data.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 42 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
import pytest  # used for our unit tests
from mem0.memory.utils import format_entities

# unit tests

# -------------------------------
# Basic Test Cases
# -------------------------------

def test_empty_list_returns_empty_string():
    # Test that an empty list returns an empty string
    codeflash_output = format_entities([]) # 447ns -> 399ns (12.0% faster)

def test_single_entity():
    # Test formatting a single entity
    entities = [{'source': 'A', 'relationship': 'likes', 'destination': 'B'}]
    expected = "A -- likes -- B"
    codeflash_output = format_entities(entities) # 1.02μs -> 1.23μs (17.4% slower)

def test_multiple_entities():
    # Test formatting multiple entities
    entities = [
        {'source': 'A', 'relationship': 'likes', 'destination': 'B'},
        {'source': 'B', 'relationship': 'hates', 'destination': 'C'},
        {'source': 'C', 'relationship': 'knows', 'destination': 'A'}
    ]
    expected = "A -- likes -- B\nB -- hates -- C\nC -- knows -- A"
    codeflash_output = format_entities(entities) # 1.60μs -> 1.71μs (6.38% slower)

def test_entities_with_numeric_values():
    # Test entities where values are numbers (should be stringified)
    entities = [
        {'source': 1, 'relationship': 'knows', 'destination': 2},
        {'source': 3, 'relationship': 'likes', 'destination': 4}
    ]
    expected = "1 -- knows -- 2\n3 -- likes -- 4"
    codeflash_output = format_entities(entities) # 1.57μs -> 1.75μs (10.1% slower)

def test_entities_with_special_characters():
    # Test entities with special characters in the fields
    entities = [
        {'source': 'A&B', 'relationship': 'loves', 'destination': 'C/D'},
        {'source': 'E', 'relationship': 'follows', 'destination': 'F@G'}
    ]
    expected = "A&B -- loves -- C/D\nE -- follows -- F@G"
    codeflash_output = format_entities(entities) # 1.37μs -> 1.54μs (11.5% slower)

# -------------------------------
# Edge Test Cases
# -------------------------------

def test_none_input_returns_empty_string():
    # Test that None input returns an empty string
    codeflash_output = format_entities(None) # 332ns -> 311ns (6.75% faster)

def test_missing_fields_raises_keyerror():
    # Test that missing fields in an entity raises KeyError
    entities_missing_source = [{'relationship': 'likes', 'destination': 'B'}]
    with pytest.raises(KeyError):
        format_entities(entities_missing_source) # 959ns -> 1.31μs (26.8% slower)

    entities_missing_relationship = [{'source': 'A', 'destination': 'B'}]
    with pytest.raises(KeyError):
        format_entities(entities_missing_relationship) # 696ns -> 870ns (20.0% slower)

    entities_missing_destination = [{'source': 'A', 'relationship': 'likes'}]
    with pytest.raises(KeyError):
        format_entities(entities_missing_destination) # 531ns -> 638ns (16.8% slower)

def test_entity_with_empty_strings():
    # Test that entities with empty strings are formatted correctly
    entities = [
        {'source': '', 'relationship': '', 'destination': ''},
        {'source': 'A', 'relationship': '', 'destination': 'B'}
    ]
    expected = " --  -- \nA --  -- B"
    codeflash_output = format_entities(entities) # 1.36μs -> 1.58μs (13.5% slower)

def test_entity_with_whitespace_strings():
    # Test that whitespace in values is preserved
    entities = [
        {'source': ' A ', 'relationship': ' likes ', 'destination': ' B '}
    ]
    expected = " A  --  likes  --  B "
    codeflash_output = format_entities(entities) # 956ns -> 1.18μs (18.7% slower)

def test_entity_with_non_string_types():
    # Test that entities with other types (bool, float, None) are stringified
    entities = [
        {'source': None, 'relationship': True, 'destination': 3.14}
    ]
    expected = "None -- True -- 3.14"
    codeflash_output = format_entities(entities) # 3.15μs -> 3.21μs (2.12% slower)

def test_entity_with_nested_dicts():
    # Test that entities with nested dicts are stringified as dicts
    entities = [
        {'source': {'id': 1}, 'relationship': 'related', 'destination': [2, 3]}
    ]
    expected = "{'id': 1} -- related -- [2, 3]"
    codeflash_output = format_entities(entities) # 2.88μs -> 3.02μs (4.41% slower)

def test_entity_with_unicode_characters():
    # Test that unicode characters are handled correctly
    entities = [
        {'source': '😀', 'relationship': 'loves', 'destination': '🍕'}
    ]
    expected = "😀 -- loves -- 🍕"
    codeflash_output = format_entities(entities) # 1.35μs -> 1.57μs (13.7% slower)

def test_entity_with_long_strings():
    # Test that long strings are handled correctly
    long_str = "a" * 100
    entities = [{'source': long_str, 'relationship': long_str, 'destination': long_str}]
    expected = f"{long_str} -- {long_str} -- {long_str}"
    codeflash_output = format_entities(entities) # 885ns -> 1.07μs (17.3% slower)

# -------------------------------
# Large Scale Test Cases
# -------------------------------

def test_large_number_of_entities():
    # Test formatting a large number of entities (1000)
    n = 1000
    entities = [
        {'source': f"S{i}", 'relationship': f"R{i}", 'destination': f"D{i}"}
        for i in range(n)
    ]
    expected = "\n".join([f"S{i} -- R{i} -- D{i}" for i in range(n)])
    codeflash_output = format_entities(entities) # 96.9μs -> 81.1μs (19.5% faster)

def test_large_entity_fields():
    # Test entities with very large string fields
    big = "x" * 1000
    entities = [
        {'source': big, 'relationship': big, 'destination': big}
        for _ in range(3)
    ]
    expected = "\n".join([f"{big} -- {big} -- {big}"] * 3)
    codeflash_output = format_entities(entities) # 2.14μs -> 2.23μs (3.95% slower)

def test_performance_on_large_input():
    # Test that function completes in reasonable time for 1000 entities
    import time
    n = 1000
    entities = [
        {'source': f"S{i}", 'relationship': f"R{i}", 'destination': f"D{i}"}
        for i in range(n)
    ]
    start = time.time()
    codeflash_output = format_entities(entities); result = codeflash_output # 97.1μs -> 81.6μs (19.0% faster)
    duration = time.time() - start

# -------------------------------
# Additional Robustness Tests
# -------------------------------

def test_input_is_not_a_list():
    # Test that non-list input (e.g., dict) raises TypeError
    with pytest.raises(TypeError):
        format_entities({'source': 'A', 'relationship': 'likes', 'destination': 'B'}) # 1.13μs -> 1.44μs (21.2% slower)

def test_list_with_non_dict_elements():
    # Test that list with non-dict elements raises TypeError
    entities = ['not a dict', 123, None]
    # The function will raise TypeError when trying to subscript a non-dict
    with pytest.raises(TypeError):
        format_entities(entities) # 1.03μs -> 1.33μs (22.7% slower)

def test_list_with_mixed_valid_and_invalid_entities():
    # Test that list with a mix of valid and invalid entities raises error at the invalid one
    entities = [
        {'source': 'A', 'relationship': 'likes', 'destination': 'B'},
        {'relationship': 'hates', 'destination': 'C'},  # missing 'source'
        {'source': 'C', 'relationship': 'knows', 'destination': 'A'}
    ]
    with pytest.raises(KeyError):
        format_entities(entities) # 1.38μs -> 1.62μs (14.8% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import pytest  # used for our unit tests
from mem0.memory.utils import format_entities

# unit tests

# 1. BASIC TEST CASES

def test_single_entity():
    # Test with a single entity
    entities = [
        {'source': 'A', 'relationship': 'likes', 'destination': 'B'}
    ]
    expected = "A -- likes -- B"
    codeflash_output = format_entities(entities) # 1.01μs -> 1.20μs (15.2% slower)

def test_multiple_entities():
    # Test with multiple entities
    entities = [
        {'source': 'A', 'relationship': 'likes', 'destination': 'B'},
        {'source': 'C', 'relationship': 'hates', 'destination': 'D'}
    ]
    expected = "A -- likes -- B\nC -- hates -- D"
    codeflash_output = format_entities(entities) # 1.31μs -> 1.45μs (9.70% slower)

def test_three_entities():
    # Test with three entities
    entities = [
        {'source': 'X', 'relationship': 'knows', 'destination': 'Y'},
        {'source': 'Y', 'relationship': 'trusts', 'destination': 'Z'},
        {'source': 'Z', 'relationship': 'helps', 'destination': 'X'}
    ]
    expected = "X -- knows -- Y\nY -- trusts -- Z\nZ -- helps -- X"
    codeflash_output = format_entities(entities) # 1.51μs -> 1.64μs (7.40% slower)

# 2. EDGE TEST CASES

def test_empty_list():
    # Test with empty list
    entities = []
    expected = ""
    codeflash_output = format_entities(entities) # 326ns -> 334ns (2.40% slower)

def test_none_input():
    # Test with None input
    entities = None
    expected = ""
    codeflash_output = format_entities(entities) # 319ns -> 333ns (4.20% slower)

def test_missing_source_key():
    # Test with missing 'source' key
    entities = [
        {'relationship': 'likes', 'destination': 'B'}
    ]
    with pytest.raises(KeyError):
        format_entities(entities) # 997ns -> 1.28μs (22.0% slower)

def test_missing_relationship_key():
    # Test with missing 'relationship' key
    entities = [
        {'source': 'A', 'destination': 'B'}
    ]
    with pytest.raises(KeyError):
        format_entities(entities) # 1.02μs -> 1.27μs (19.3% slower)

def test_missing_destination_key():
    # Test with missing 'destination' key
    entities = [
        {'source': 'A', 'relationship': 'likes'}
    ]
    with pytest.raises(KeyError):
        format_entities(entities) # 1.05μs -> 1.28μs (17.7% slower)

def test_empty_strings():
    # Test with empty strings for keys
    entities = [
        {'source': '', 'relationship': '', 'destination': ''}
    ]
    expected = " --  -- "
    codeflash_output = format_entities(entities) # 1.04μs -> 1.19μs (13.1% slower)

def test_whitespace_strings():
    # Test with whitespace strings for keys
    entities = [
        {'source': ' ', 'relationship': ' ', 'destination': ' '}
    ]
    expected = "  --   --  "
    codeflash_output = format_entities(entities) # 999ns -> 1.13μs (11.9% slower)

def test_non_string_values():
    # Test with non-string values for keys
    entities = [
        {'source': 123, 'relationship': None, 'destination': ['X', 'Y']}
    ]
    expected = "123 -- None -- ['X', 'Y']"
    codeflash_output = format_entities(entities) # 2.53μs -> 2.44μs (3.90% faster)

def test_extra_keys():
    # Test with extra keys in the entity dict
    entities = [
        {'source': 'A', 'relationship': 'likes', 'destination': 'B', 'extra': 42}
    ]
    expected = "A -- likes -- B"
    codeflash_output = format_entities(entities) # 932ns -> 1.13μs (17.3% slower)

def test_order_preservation():
    # Test that order of entities is preserved
    entities = [
        {'source': 'A', 'relationship': 'likes', 'destination': 'B'},
        {'source': 'B', 'relationship': 'likes', 'destination': 'C'},
        {'source': 'C', 'relationship': 'likes', 'destination': 'D'}
    ]
    expected = "A -- likes -- B\nB -- likes -- C\nC -- likes -- D"
    codeflash_output = format_entities(entities) # 1.61μs -> 1.71μs (5.56% slower)

def test_dict_instead_of_list():
    # Test with a dict instead of a list
    entities = {'source': 'A', 'relationship': 'likes', 'destination': 'B'}
    with pytest.raises(TypeError):
        format_entities(entities) # 1.10μs -> 1.39μs (20.6% slower)

def test_list_with_non_dict():
    # Test with a list containing a non-dict
    entities = [
        {'source': 'A', 'relationship': 'likes', 'destination': 'B'},
        "not a dict"
    ]
    with pytest.raises(TypeError):
        format_entities(entities) # 1.40μs -> 1.74μs (19.8% slower)

def test_unicode_characters():
    # Test with unicode characters
    entities = [
        {'source': 'Α', 'relationship': 'αγαπά', 'destination': 'Β'}
    ]
    expected = "Α -- αγαπά -- Β"
    codeflash_output = format_entities(entities) # 1.41μs -> 1.70μs (17.1% slower)

def test_special_characters():
    # Test with special characters in strings
    entities = [
        {'source': 'A, 'relationship': 'li&kes', 'destination': 'B*'}
    ]
    expected = "A$ -- li&kes -- B*"
    codeflash_output = format_entities(entities) # 983ns -> 1.19μs (17.3% slower)

# 3. LARGE SCALE TEST CASES

def test_large_number_of_entities():
    # Test with a large number of entities (1000)
    entities = [
        {'source': f'S{i}', 'relationship': f'R{i}', 'destination': f'D{i}'}
        for i in range(1000)
    ]
    expected = "\n".join([
        f"S{i} -- R{i} -- D{i}" for i in range(1000)
    ])
    codeflash_output = format_entities(entities) # 97.2μs -> 80.6μs (20.7% faster)

def test_large_entity_strings():
    # Test with very large strings in the entity
    long_str = "A" * 1000
    entities = [
        {'source': long_str, 'relationship': long_str, 'destination': long_str}
    ]
    expected = f"{long_str} -- {long_str} -- {long_str}"
    codeflash_output = format_entities(entities) # 1.18μs -> 1.41μs (16.5% slower)

def test_large_mixed_entities():
    # Test with 500 normal and 500 edge-case entities
    normal_entities = [
        {'source': f'N{i}', 'relationship': f'R{i}', 'destination': f'D{i}'}
        for i in range(500)
    ]
    edge_entities = [
        {'source': '', 'relationship': None, 'destination': 0}
        for _ in range(500)
    ]
    entities = normal_entities + edge_entities
    expected = (
        "\n".join([f"N{i} -- R{i} -- D{i}" for i in range(500)]) + "\n" +
        "\n".join([" -- None -- 0" for _ in range(500)])
    )
    codeflash_output = format_entities(entities) # 127μs -> 109μs (16.0% faster)

def test_performance_large_input():
    # Test that function runs efficiently on large input (timing test, not strict)
    import time
    entities = [
        {'source': f'S{i}', 'relationship': f'R{i}', 'destination': f'D{i}'}
        for i in range(1000)
    ]
    start = time.time()
    codeflash_output = format_entities(entities); result = codeflash_output # 96.1μs -> 80.4μs (19.4% faster)
    end = time.time()
    # The result should be correct and function should run in less than 0.5 seconds
    expected = "\n".join([
        f"S{i} -- R{i} -- D{i}" for i in range(1000)
    ])
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-format_entities-mhloyv2z and push.

Codeflash Static Badge

The optimized code replaces an explicit for-loop with a list comprehension, achieving a **15% speedup** by eliminating redundant operations.

**Key optimization**: The original code creates an empty list and then repeatedly calls `.append()` in a loop, which involves multiple function calls and intermediate variable assignments. The optimized version uses a single list comprehension that builds the entire list in one operation.

**Performance benefits**:
- **Eliminates intermediate variable**: No need for the `simplified` variable that temporarily stores each formatted string
- **Reduces function call overhead**: `.append()` is called thousands of times in the original version (5,036 hits in profiler), but the list comprehension builds the list directly
- **Better memory allocation**: List comprehensions can pre-allocate memory more efficiently since Python knows the final size

**Profiler evidence**: The original version shows significant time spent on string formatting (44.6%) and list appending (30.3%), totaling ~75% of execution time. The optimized version consolidates these operations into a single list comprehension line that accounts for 96.5% of the time but completes faster overall.

**Test case performance**: The optimization particularly excels with larger datasets - the 1000-entity test case shows **19-20% speedup**, while smaller test cases (1-3 entities) show slight slowdowns due to list comprehension setup overhead. This suggests the optimization is most beneficial for batch processing scenarios with many entities, which is likely the primary use case for a utility function formatting relationship data.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 November 5, 2025 07:44
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash labels Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to Codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant