Skip to content

Rules-Based llms.txt generation #64

@malcolmgreaves

Description

@malcolmgreaves

Action

  • Add a completely rules / heuristics based (i.e. non-LLM based) method for generating an llms.txt from HTML.
  • Use explicitly marked elements (title, description, etc.)
  • Deterministic
  • Refactor llms.txt generation traits to work for rules vs. LLM-based methods.

  • Allow CLI + worker to use the rules-based method.

  • Add new prompt to combine new rules-based llms.txt generation and an LLM to make a final llms.txt file.

  • Allow CLI + worker to use this method too.

  • Update llms_txt table to include "generation_method" column.

    • Specifies how the llms.txt file was made.
    • values are enum variant: LLM(provider: str), Rules, RulesLLM(provider:str), Other(unknown: str)
      • The Other variant allows us to handle migrations in the future => can use string-formatting in unknown to encode a new variant, then update the enum & backfill

Rationale

  • Need to make better use of explicit HTML structure elements.
  • Need to lay groundwork for experimentation on llms.txt file generation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions