Skip to content

A lightweight Python CLI utility to automate dataset downloads, organization, and management for ML/DL workflows — starting with Kaggle, built for speed and structure.

Notifications You must be signed in to change notification settings

hashedUser/dsfetch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

dsfetch

dsfetch is a lightweight, CLI-first tool designed for machine learning practitioners who want a frictionless way to search, preview, and download datasets from multiple platforms—like Kaggle, Hugging Face, and UCI—without leaving the terminal. With a clean, unified interface and support for platform-specific identifiers, dsfetch streamlines dataset discovery and retrieval, making experimentation faster, reproducible, and less dependent on the browser. Whether you’re prototyping a new model or curating datasets for training, dsfetch gets you the data you need—fast.

⚙️ Core Features (MVP)

  • 🔹 CLI interface to input Kaggle dataset handles
  • 🔹 Automated dataset download using kagglehub
  • 🔹 Directory-based organization under a central /datasets folder
  • 🔹 Smart detection of file vs. directory downloads
  • 🔹 Optional renaming and clean-up logic
  • 🔹 Local environment ready — no cloud dependency

🧗‍♂️ Vision

The long-term vision is to evolve this into a general-purpose dataset orchestration framework — integrating intelligent agents, metadata tracking, and reproducible workflows — so developers can spend less time managing datasets and more time building great models.

🤖 Built with ❤️ by a data + automation enthusiast

About

A lightweight Python CLI utility to automate dataset downloads, organization, and management for ML/DL workflows — starting with Kaggle, built for speed and structure.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages