Skip to content

Suggestion: reference WFGY Problem Map (RAG / LLM debugging checklist) for Modin users #7696

@onestardao

Description

@onestardao

Hi Modin team,

thank you for making it easier to scale pandas-style workloads. I see Modin being used more and more in pipelines where people prepare large corpora, logs, and features that eventually feed into vector stores and LLM / RAG systems.

I maintain an MIT-licensed project called WFGY Problem Map, which is a 16-question checklist for debugging real-world RAG / LLM pipelines. It focuses on data and retrieval failure modes that appear when you move from a notebook prototype to a production system.

Why this could matter for Modin users:

  • Modin is often used when the volume of documents or events is large enough that mistakes in preprocessing become very expensive to debug later.
  • Several of the 16 failure modes describe exactly these “worked on a sample, broke at scale” issues once an LLM and retriever are added.
  • The checklist is framework-agnostic and can be used no matter which backend Modin is running on.

External references for WFGY Problem Map include:

  • Harvard MIMS Lab ToolUniverse
  • QCRI LLM Lab Multimodal RAG Survey
  • Rankify (University of Innsbruck)

Suggestion:

If you think this is useful for your community, one option would be a small “Further reading” link in the docs for users who are combining Modin with vector stores and LLMs:

“RAG / LLM debugging checklist: WFGY Problem Map (16 failure modes)”
https://github.com/onestardao/WFGY/blob/main/ProblemMap/README.md

Project home: https://github.com/onestardao/WFGY

Thank you for your time and for all your work on Modin.

Best,
PSBigBig

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions