Commands for interacting with Kaggle datasets.
Lists available datasets.
Usage:
kaggle datasets list [options]Options:
--sort-by <SORT_BY>: Sort results. Valid options:hottest,votes,updated,active(default:hottest).--size <SIZE_CATEGORY>: DEPRECATED. Use--min-sizeand--max-size.--file-type <FILE_TYPE>: Filter by file type. Valid options:all,csv,sqlite,json,bigQuery.--license <LICENSE_NAME>: Filter by license. Valid options:all,cc,gpl,odb,other.--tags <TAG_IDS>: Filter by tags (comma-separated tag IDs).-s, --search <SEARCH_TERM>: Search term.-m, --mine: Display only your datasets.--user <USER>: Filter by a specific user or organization.-p, --page <PAGE>: Page number for results (default: 1).-v, --csv: Print results in CSV format.--max-size <BYTES>: Maximum dataset size in bytes.--min-size <BYTES>: Minimum dataset size in bytes.
Examples:
-
List your own datasets:
kaggle datasets list -m
-
List CSV datasets, page 2, sorted by last updated, containing "student" in their title, with size between 13000 and 15000 bytes:
kaggle datasets list --file-type csv --page 2 --sort-by updated -s student --min-size 13000 --max-size 15000
-
List datasets with an ODB license, tagged with "internet", and matching the search term "telco":
kaggle datasets list --license odb --tags internet --search telco
Purpose:
This command helps you find datasets on Kaggle based on various criteria like owner, file type, tags, and size.
Lists files for a specific dataset.
Usage:
kaggle datasets files <DATASET> [options]Arguments:
<DATASET>: Dataset URL suffix in the formatowner/dataset-name(e.g.,kerneler/brazilian-bird-observation-metadata-from-wikiaves).
Options:
-v, --csv: Print results in CSV format.--page-token <PAGE_TOKEN>: Page token for results paging.--page-size <PAGE_SIZE>: Number of items to show on a page (default: 20, max: 200).
Example:
List the first 7 files for the dataset kerneler/brazilian-bird-observation-metadata-from-wikiaves:
kaggle datasets files kerneler/brazilian-bird-observation-metadata-from-wikiaves --page-size=7Purpose:
Use this command to see the individual files within a dataset before downloading.
Downloads dataset files.
Usage:
kaggle datasets download <DATASET> [options]Arguments:
<DATASET>: Dataset URL suffix (e.g.,willianoliveiragibin/pixar-films).
Options:
-f, --file <FILE_NAME>: Specific file to download (downloads all if not specified).-p, --path <PATH>: Folder to download files to (defaults to current directory).-w, --wp: Download files to the current working path.--unzip: Unzip the downloaded file (deletes the .zip file afterwards).-o, --force: Force download, overwriting existing files.-q, --quiet: Suppress verbose output.
Examples:
-
Download all files for the dataset
willianoliveiragibin/pixar-films:kaggle datasets download -d willianoliveiragibin/pixar-films
-
Download the dataset
goefft/public-datasets-with-file-types-and-columns, unzip it into thetmpfolder, overwriting if necessary, and suppress output:kaggle datasets download goefft/public-datasets-with-file-types-and-columns -p tmp --unzip -o -q
-
Download the specific file
dataset_results.csvfromgoefft/public-datasets-with-file-types-and-columnsto the current working directory, quietly, and force overwrite:kaggle datasets download goefft/public-datasets-with-file-types-and-columns -f dataset_results.csv -w -q -o
Purpose:
This command allows you to retrieve dataset files for local use.
Initializes a metadata file (dataset-metadata.json) for creating a new dataset. See metadata file format.
Usage:
kaggle datasets init -p <FOLDER_PATH>Options:
-p, --path <FOLDER_PATH>: The path to the folder where thedataset-metadata.jsonfile will be created (defaults to the current directory).
Example:
Initialize a dataset metadata file in the tests/dataset folder:
kaggle datasets init -p tests/datasetPurpose:
This command creates a template dataset-metadata.json file that you need to edit before creating a new dataset on Kaggle. This file contains information like the dataset title, ID (slug), and licenses.
Creates a new dataset on Kaggle.
Usage:
kaggle datasets create -p <FOLDER_PATH> [options]Options:
-p, --path <FOLDER_PATH>: Path to the folder containing the data files and thedataset-metadata.jsonfile (defaults to the current directory).-u, --public: Make the dataset public (default is private).-q, --quiet: Suppress verbose output.-t, --keep-tabular: Do not convert tabular files to CSV (default is to convert).-r, --dir-mode <MODE>: How to handle directories:skip(ignore),zip(compressed upload),tar(uncompressed upload) (default:skip).
Example:
Create a new public dataset from the files in tests/dataset, quietly, without converting tabular files, and skipping subdirectories. (Assumes dataset-metadata.json in tests/dataset has been properly edited with title and slug):
# Example: Edit dataset-metadata.json first
# sed -i 's/INSERT_TITLE_HERE/My Dataset Title/' tests/dataset/dataset-metadata.json
# sed -i 's/INSERT_SLUG_HERE/my-dataset-slug/' tests/dataset/dataset-metadata.json
kaggle datasets create -p tests/dataset --public -q -t -r skipPurpose:
This command uploads your local data files and the associated metadata to create a new dataset on Kaggle.
Creates a new version of an existing dataset.
Usage:
kaggle datasets version -p <FOLDER_PATH> -m <VERSION_NOTES> [options]Options:
-p, --path <FOLDER_PATH>: Path to the folder containing the updated data files anddataset-metadata.json(defaults to current directory).-m, --message <VERSION_NOTES>: (Required) Message describing the new version.-q, --quiet: Suppress verbose output.-t, --keep-tabular: Do not convert tabular files to CSV.-r, --dir-mode <MODE>: Directory handling mode (skip,zip,tar).-d, --delete-old-versions: Delete old versions of this dataset.
Example:
Create a new version of a dataset using files from tests/dataset with version notes "Updated data", quietly, keeping tabular formats, skipping directories, and deleting old versions:
kaggle datasets version -m "Updated data" -p tests/dataset -q -t -r skip -dPurpose:
Use this command to update an existing dataset with new files or metadata changes.
Downloads metadata for a dataset or updates existing from local metadata.
Usage:
kaggle datasets metadata <DATASET> [options]Arguments:
<DATASET>: Dataset URL suffix (e.g.,goefft/public-datasets-with-file-types-and-columns).
Options:
-p, --path <PATH>: Directory to download/update metadata file (dataset-metadata.json). Defaults to current working directory.--update: Update the existing dataset version's metadata using the contents of the local metadata JSON file. (e.g. "push" from local)
Example:
Download metadata for the dataset goefft/public-datasets-with-file-types-and-columns into the tests/dataset folder:
kaggle datasets metadata goefft/public-datasets-with-file-types-and-columns -p tests/datasetPurpose:
This command allows you to fetch the dataset-metadata.json file for an existing dataset, which can be useful for inspection or as a template for creating a new version.
Gets the creation status of a dataset.
Usage:
kaggle datasets status <DATASET>Arguments:
<DATASET>: Dataset URL suffix (e.g.,goefft/public-datasets-with-file-types-and-columns).
Example:
Get the status of the dataset goefft/public-datasets-with-file-types-and-columns:
kaggle datasets status goefft/public-datasets-with-file-types-and-columnsPurpose:
After creating or updating a dataset, this command helps you check if the process was successful or if there were any issues.
Deletes a dataset from Kaggle.
Usage:
kaggle datasets delete <DATASET> [options]Arguments:
<DATASET>: Dataset URL suffix (e.g.,username/dataset-slug).
Options:
-y, --yes: Automatically confirm deletion without prompting.
Example:
Delete the dataset username/dataset-slug and automatically confirm:
kaggle datasets delete username/dataset-slug --yesPurpose:
This command permanently removes one of your datasets from Kaggle. Use with caution.
Lists discussion topics for a dataset.
Usage:
kaggle datasets topics list <DATASET> [options]Arguments:
<DATASET>: Dataset ref in format<owner>/<dataset-slug>(e.g.,zillow/zecon).
Options:
--sort-by <SORT_BY>: Sort order. Valid options:hot,top,new,recent,active,relevance.-s, --search <SEARCH_TERM>: Search query to filter topics.--page-size <PAGE_SIZE>: Number of items per page.--page-token <PAGE_TOKEN>: Page token for pagination.-v, --csv: Print results in CSV format.-q, --quiet: Suppress verbose output.
Example:
List recent topics for the zillow/zecon dataset:
kaggle datasets topics list zillow/zecon --sort-by recentPurpose:
This command lets you browse discussion topics for a specific dataset.
Displays a dataset discussion topic with all comments in tree form.
Usage:
kaggle datasets topics show <TOPIC_REF> [options]Arguments:
<TOPIC_REF>: A topic reference, which can be:<dataset>/<topic-id>(e.g.,zillow/zecon/12345- note that this supports multi-slash dataset slugs)<dataset> <topic-id>(two separate arguments, where<topic-id>is passed as second argument)<topic-id>(bare numeric ID)
Options:
--page-size <PAGE_SIZE>: Number of comments to show per page.--page-token <PAGE_TOKEN>: Page token for comment pagination.-v, --csv: Print results in CSV format.-q, --quiet: Suppress verbose output.
Example:
kaggle datasets topics show zillow/zecon/12345Purpose:
This command displays a full discussion topic along with all of its comments rendered in an indented tree structure.