⚡ Bolt: [performance improvement] by glacy · Pull Request #97 · glacy/evolutIA

glacy · 2026-05-09T18:13:50Z

💡 What: Refactored O(N*M) nested loops into O(N) dictionary lookups in get_all_exercises (MaterialExtractor) and index_materials (RAGIndexer).
🎯 Why: When matching solutions to exercises, the previous implementation used an inner loop that iterated over all solutions for each exercise. For documents with many exercises and solutions, this caused significant overhead.
📊 Impact: Reduces execution time for mapping solutions from $O(N \times M)$ to $O(N)$, resulting in massive performance gains during material extraction and RAG indexing.
🔬 Measurement: Verified using mock workloads showing time dropped from ~0.9 seconds to ~0.007 seconds for get_all_exercises, and ~0.75 seconds to ~0.003 seconds for index_materials respectively.

PR created automatically by Jules for task 2468833882201891242 started by @glacy

… RAGIndexer Co-authored-by: glacy <1131951+glacy@users.noreply.github.com>

google-labs-jules · 2026-05-09T18:13:51Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

Copilot

Pull request overview

This PR optimizes solution→exercise matching by replacing nested O(N×M) scans with per-material O(1) dictionary lookups in both the material extraction pipeline and the RAG indexing pipeline.

Changes:

Refactor MaterialExtractor.get_all_exercises to precompute a solution_lookup map (preserving first-match semantics).
Refactor RAGIndexer.index_materials to precompute a solution_lookup map when attaching solutions during indexing.
Document the “preserve first-match when building lookup dicts” optimization guideline in .jules/bolt.md.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
evolutia/rag/rag_indexer.py	Uses a per-material dict lookup to map solutions to exercises during indexing; also includes formatting/quoting normalization in the touched region.
evolutia/material_extractor.py	Uses a per-material dict lookup to map solutions to exercises when flattening all extracted exercises.
.jules/bolt.md	Adds a note about preserving first-match semantics when refactoring nested search loops to dict lookups.

Comments suppressed due to low confidence (1)

evolutia/material_extractor.py:337

_is_cache_valid ignora por completo el timestamp guardado en self._file_cache[file_path]['timestamp'] (y el TTL _cache_ttl). Además, para archivos inexistentes (que se cachean en extract_from_file para evitar reintentos), file_path.stat() lanza OSError y esto hace que el caché nunca sea válido, contradiciendo el objetivo del caché de errores. Considera validar con el timestamp cacheado + TTL cuando stat() falla, y usar el timestamp del entry (no _last_scan_timestamp) para invalidación por mtime.

        if file_path not in self._file_cache:
            return False

        # Verificar si el archivo fue modificado
        try:
            _ = self._file_cache[file_path]
            file_mtime = file_path.stat().st_mtime

            # Usar el timestamp de escaneo más reciente para verificar
            if file_mtime > self._last_scan_timestamp:
                return False

            return True
        except (OSError, KeyError):
            return False

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        # Generar embeddings
+        embeddings = self._generate_embeddings_batch(chunks)
+
+        # Sincronizar chunks con embeddings (por si se filtraron vacíos en _generate_embeddings_batch)
+        # Aunque aquí preferimos filtrar antes para mantener consistencia
+        valid_indices = [i for i, chunk in enumerate(chunks) if chunk and chunk.strip()]
+        chunks = [chunks[i] for i in valid_indices]
+
+        if not chunks:
+            logger.warning(
+                f"Ejercicio {exercise.get('label', 'unknown')} no tiene contenido válido para indexar"
+            )
+            return []
+
+        # Crear IDs y documentos
+        chunk_ids = []
+        documents = []
+        metadatas = []
+
+        for i, (chunk, embedding) in enumerate(zip(chunks, embeddings)):


+        for i, (chunk, embedding) in enumerate(zip(chunks, embeddings)):
+            chunk_id = self._create_chunk_id(f"{source}_{i}", i)
+            chunk_ids.append(chunk_id)


⚡ Bolt: Optimize nested loops to O(N) lookup in MaterialExtractor and…

b5ae21a

… RAGIndexer Co-authored-by: glacy <1131951+glacy@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 9, 2026 18:13

Copilot started reviewing on behalf of glacy May 9, 2026 18:14 View session

Copilot AI reviewed May 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Bolt: [performance improvement]#97

⚡ Bolt: [performance improvement]#97
glacy wants to merge 1 commit into
mainfrom
bolt-optimize-loops-2468833882201891242

glacy commented May 9, 2026

Uh oh!

google-labs-jules Bot commented May 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

glacy commented May 9, 2026

Uh oh!

google-labs-jules Bot commented May 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants