This repo contains scripts to analyze an IMF/World Bank LIC-DSF Excel workbook:
- Dependency mapping: identify formula cells in configured indicator rows, build a dependency graph, and enrich nodes with human-readable row/column labels.
- Code generation: export workbook formulas as a standalone Python package that can be published to PyPI and used without Excel.
- RAG-based annotation: retrieve relevant context from the LIC-DSF guidance note using a local
embeddings collection, then call DeepSeek (
deepseek-chat) to generate short annotations for indicator groups.
The World Bank periodically releases new LIC-DSF template workbooks. Each template version can
differ in structure (sheet layout, cell ranges, formulas), so all template-specific configuration
lives in its own directory under src/configs/<date>/:
src/configs/
2025-08-12/
config.py # workbook path, export ranges, constraints, region config, etc.
input_groups.json # generated artifact
enrichment_audit.json
Each template version produces an independent PyPI package (e.g. lic-dsf-2025-08-12) so that
users on different template versions can coexist. When a new template is released:
- Add the workbook to
workbooks/ - Create
src/configs/<date>/config.py(copy the most recent config and adjust) - Run the pipeline with
--template <date> - Test and publish the generated package
workbooks/— source-of-truth workbooks (one per template version)src/configs/<date>/config.py— per-template configuration (ranges, constraints, region config)src/configs/<date>/*.json— per-template generated artifactssrc/lic_dsf_config.py— shared type definitions and utility functionssrc/lic_dsf_pipeline.py— shared graph + classification utilitiessrc/lic_dsf_labels.py— label extraction helperssrc/lic_dsf_export.py— code generation + enrichment auditsrc/lic_dsf_group_inputs.py— input grouping +input_groups.jsonexportsrc/lic_dsf_input_setters.py— shared setter helpers used by generated export packagesrc/lic_dsf_annotate.py— DeepSeek annotationsguidance_note/— LIC-DSF guidance note PDF and textdist/lic-dsf-<date>/— generated Python packages (one per template)
- Python version per
pyproject.toml - Dependencies installed via
uv - A DeepSeek API key for annotation runs
Create a virtual environment and install deps:
uv syncSet your DeepSeek key (used by src/lic_dsf_annotate.py):
export DEEPSEEK_API_KEY="..."Optionally, store it in a .env file (loaded by src/lic_dsf_annotate.py):
DEEPSEEK_API_KEY=...All scripts require a --template argument specifying which template version to use. Available
templates are auto-discovered from src/configs/.
Builds a dependency graph, enriches nodes with row/column labels, and writes an audit JSON.
uv run python -m src.lic_dsf_export --template 2025-08-12 --audit-onlyInputs: workbook and configuration from src/configs/2025-08-12/config.py
Output: src/configs/2025-08-12/enrichment_audit.json (overwritten on every run)
Discovers targets, builds a dependency graph, and uses excel-grapher's CodeGenerator to emit a
standalone Python package.
uv run python -m src.lic_dsf_export --template 2025-08-12Output: dist/lic-dsf-2025-08-12/lic_dsf_2025_08_12/ (overwritten on every run)
Groups hardcoded input cells into semantically labeled clusters for setter code generation.
uv run python -m src.lic_dsf_group_inputs --template 2025-08-12Output: src/configs/2025-08-12/input_groups.json (overwritten on every run)
Retrieves guidance-note context via embeddings and calls DeepSeek to generate concise annotations.
uv run python -m src.lic_dsf_annotate --template 2025-08-12Inputs: workbook, guidance note text (guidance_note/lic-dsf-guidance-note.txt), DEEPSEEK_API_KEY
Output: src/configs/2025-08-12/annotations.json (overwritten on every run)
# 1. (Optional) Generate enrichment audit
uv run python -m src.lic_dsf_export --template 2025-08-12 --audit-only
# 2. (Optional) Generate input groups for setters
uv run python -m src.lic_dsf_group_inputs --template 2025-08-12
# 3. (Optional) Generate annotations
uv run python -m src.lic_dsf_annotate --template 2025-08-12
# 4. Core export step — generates the Python package
uv run python -m src.lic_dsf_export --template 2025-08-12The generated package exposes a context object with helper setters derived from input_groups.json.
- Year-series setters: accept
{year: value}(primary) and alsovalues + start_year(secondary). - Range setters (scalars / 1D / 2D tables): accept a scalar, 1D sequence, or 2D sequence-of-sequences matching the range shape.
Example:
import lic_dsf_2025_08_12 as lic_dsf
ctx = lic_dsf.make_context()
# Year-series: dict form (recommended)
assignment = ctx.set_ext_debt_data_external_debt_excluding_locally_issued_debt({2023: 123, 2026: None})
# 1D range
ctx.set_ext_debt_data_ida_new_60_year_credits([1] * 14)
# Load all inputs from a filled-out template (requires optional fastpyxl)
ctx.load_inputs_from_workbook("workbooks/lic-dsf-template-2025-08-12.xlsm")Semantic search uses the llm library's embeddings database:
- DB location:
~/.config/io.datasette.llm/embeddings.db - Collection name:
lic-dsf-guidance - Embedding model:
text-embedding-3-small
When bootstrapping, src/lic_dsf_annotate.py:
- Splits the guidance note text into ~1500-character chunks
- Stores embeddings for those chunks in the
lic-dsf-guidancecollection - Optionally writes chunk files under
lic-dsf-chunks/if none are present
If you need to force a rebuild, delete the collection using the llm collections entrypoint:
uv run llm collections list
uv run llm collections delete lic-dsf-guidanceThen rerun:
uv run python -m src.lic_dsf_annotate --template 2025-08-12