Datasets Commands

Commands for interacting with Kaggle datasets.

`kaggle datasets list`

Lists available datasets.

Usage:

kaggle datasets list [options]

Options:

--sort-by <SORT_BY>: Sort results. Valid options: hottest, votes, updated, active (default: hottest).
--size <SIZE_CATEGORY>: DEPRECATED. Use --min-size and --max-size.
--file-type <FILE_TYPE>: Filter by file type. Valid options: all, csv, sqlite, json, bigQuery.
--license <LICENSE_NAME>: Filter by license. Valid options: all, cc, gpl, odb, other.
--tags <TAG_IDS>: Filter by tags (comma-separated tag IDs).
-s, --search <SEARCH_TERM>: Search term.
-m, --mine: Display only your datasets.
--user <USER>: Filter by a specific user or organization.
-p, --page <PAGE>: Page number for results (default: 1).
-v, --csv: Print results in CSV format.
--max-size <BYTES>: Maximum dataset size in bytes.
--min-size <BYTES>: Minimum dataset size in bytes.

Examples:

List your own datasets:
```
kaggle datasets list -m
```
List CSV datasets, page 2, sorted by last updated, containing "student" in their title, with size between 13000 and 15000 bytes:
```
kaggle datasets list --file-type csv --page 2 --sort-by updated -s student --min-size 13000 --max-size 15000
```
List datasets with an ODB license, tagged with "internet", and matching the search term "telco":
```
kaggle datasets list --license odb --tags internet --search telco
```

Purpose:

This command helps you find datasets on Kaggle based on various criteria like owner, file type, tags, and size.

`kaggle datasets files`

Lists files for a specific dataset.

Usage:

kaggle datasets files <DATASET> [options]

Arguments:

<DATASET>: Dataset URL suffix in the format owner/dataset-name (e.g., kerneler/brazilian-bird-observation-metadata-from-wikiaves).

Options:

-v, --csv: Print results in CSV format.
--page-token <PAGE_TOKEN>: Page token for results paging.
--page-size <PAGE_SIZE>: Number of items to show on a page (default: 20, max: 200).

Example:

List the first 7 files for the dataset kerneler/brazilian-bird-observation-metadata-from-wikiaves:

kaggle datasets files kerneler/brazilian-bird-observation-metadata-from-wikiaves --page-size=7

Purpose:

Use this command to see the individual files within a dataset before downloading.

`kaggle datasets download`

Downloads dataset files.

Usage:

kaggle datasets download <DATASET> [options]

Arguments:

<DATASET>: Dataset URL suffix (e.g., willianoliveiragibin/pixar-films).

Options:

-f, --file <FILE_NAME>: Specific file to download (downloads all if not specified).
-p, --path <PATH>: Folder to download files to (defaults to current directory).
-w, --wp: Download files to the current working path.
--unzip: Unzip the downloaded file (deletes the .zip file afterwards).
-o, --force: Force download, overwriting existing files.
-q, --quiet: Suppress verbose output.

Examples:

Download all files for the dataset willianoliveiragibin/pixar-films:
```
kaggle datasets download -d willianoliveiragibin/pixar-films
```
Download the dataset goefft/public-datasets-with-file-types-and-columns, unzip it into the tmp folder, overwriting if necessary, and suppress output:
```
kaggle datasets download goefft/public-datasets-with-file-types-and-columns -p tmp --unzip -o -q
```
Download the specific file dataset_results.csv from goefft/public-datasets-with-file-types-and-columns to the current working directory, quietly, and force overwrite:
```
kaggle datasets download goefft/public-datasets-with-file-types-and-columns -f dataset_results.csv -w -q -o
```

Purpose:

This command allows you to retrieve dataset files for local use.

`kaggle datasets init`

Initializes a metadata file (dataset-metadata.json) for creating a new dataset. See metadata file format.

Usage:

kaggle datasets init -p <FOLDER_PATH>

Options:

-p, --path <FOLDER_PATH>: The path to the folder where the dataset-metadata.json file will be created (defaults to the current directory).

Example:

Initialize a dataset metadata file in the tests/dataset folder:

kaggle datasets init -p tests/dataset

Purpose:

This command creates a template dataset-metadata.json file that you need to edit before creating a new dataset on Kaggle. This file contains information like the dataset title, ID (slug), and licenses.

`kaggle datasets create`

Creates a new dataset on Kaggle.

Usage:

kaggle datasets create -p <FOLDER_PATH> [options]

Options:

-p, --path <FOLDER_PATH>: Path to the folder containing the data files and the dataset-metadata.json file (defaults to the current directory).
-u, --public: Make the dataset public (default is private).
-q, --quiet: Suppress verbose output.
-t, --keep-tabular: Do not convert tabular files to CSV (default is to convert).
-r, --dir-mode <MODE>: How to handle directories: skip (ignore), zip (compressed upload), tar (uncompressed upload) (default: skip).

Example:

Create a new public dataset from the files in tests/dataset, quietly, without converting tabular files, and skipping subdirectories. (Assumes dataset-metadata.json in tests/dataset has been properly edited with title and slug):

# Example: Edit dataset-metadata.json first
# sed -i 's/INSERT_TITLE_HERE/My Dataset Title/' tests/dataset/dataset-metadata.json
# sed -i 's/INSERT_SLUG_HERE/my-dataset-slug/' tests/dataset/dataset-metadata.json

kaggle datasets create -p tests/dataset --public -q -t -r skip

Purpose:

This command uploads your local data files and the associated metadata to create a new dataset on Kaggle.

`kaggle datasets version`

Creates a new version of an existing dataset.

Usage:

kaggle datasets version -p <FOLDER_PATH> -m <VERSION_NOTES> [options]

Options:

-p, --path <FOLDER_PATH>: Path to the folder containing the updated data files and dataset-metadata.json (defaults to current directory).
-m, --message <VERSION_NOTES>: (Required) Message describing the new version.
-q, --quiet: Suppress verbose output.
-t, --keep-tabular: Do not convert tabular files to CSV.
-r, --dir-mode <MODE>: Directory handling mode (skip, zip, tar).
-d, --delete-old-versions: Delete old versions of this dataset.

Example:

Create a new version of a dataset using files from tests/dataset with version notes "Updated data", quietly, keeping tabular formats, skipping directories, and deleting old versions:

kaggle datasets version -m "Updated data" -p tests/dataset -q -t -r skip -d

Purpose:

Use this command to update an existing dataset with new files or metadata changes.

`kaggle datasets metadata`

Downloads metadata for a dataset or updates existing from local metadata.

Usage:

kaggle datasets metadata <DATASET> [options]

Arguments:

<DATASET>: Dataset URL suffix (e.g., goefft/public-datasets-with-file-types-and-columns).

Options:

-p, --path <PATH>: Directory to download/update metadata file (dataset-metadata.json). Defaults to current working directory.
--update: Update the existing dataset version's metadata using the contents of the local metadata JSON file. (e.g. "push" from local)

Example:

Download metadata for the dataset goefft/public-datasets-with-file-types-and-columns into the tests/dataset folder:

kaggle datasets metadata goefft/public-datasets-with-file-types-and-columns -p tests/dataset

Purpose:

This command allows you to fetch the dataset-metadata.json file for an existing dataset, which can be useful for inspection or as a template for creating a new version.

`kaggle datasets status`

Gets the creation status of a dataset.

Usage:

kaggle datasets status <DATASET>

Arguments:

<DATASET>: Dataset URL suffix (e.g., goefft/public-datasets-with-file-types-and-columns).

Example:

Get the status of the dataset goefft/public-datasets-with-file-types-and-columns:

kaggle datasets status goefft/public-datasets-with-file-types-and-columns

Purpose:

After creating or updating a dataset, this command helps you check if the process was successful or if there were any issues.

`kaggle datasets delete`

Deletes a dataset from Kaggle.

Usage:

kaggle datasets delete <DATASET> [options]

Arguments:

<DATASET>: Dataset URL suffix (e.g., username/dataset-slug).

Options:

-y, --yes: Automatically confirm deletion without prompting.

Example:

Delete the dataset username/dataset-slug and automatically confirm:

kaggle datasets delete username/dataset-slug --yes

Purpose:

This command permanently removes one of your datasets from Kaggle. Use with caution.

`kaggle datasets topics list`

Lists discussion topics for a dataset.

Usage:

kaggle datasets topics list <DATASET> [options]

Arguments:

<DATASET>: Dataset ref in format <owner>/<dataset-slug> (e.g., zillow/zecon).

Options:

--sort-by <SORT_BY>: Sort order. Valid options: hot, top, new, recent, active, relevance.
-s, --search <SEARCH_TERM>: Search query to filter topics.
--page-size <PAGE_SIZE>: Number of items per page.
--page-token <PAGE_TOKEN>: Page token for pagination.
-v, --csv: Print results in CSV format.
-q, --quiet: Suppress verbose output.

Example:

List recent topics for the zillow/zecon dataset:

kaggle datasets topics list zillow/zecon --sort-by recent

Purpose:

This command lets you browse discussion topics for a specific dataset.

`kaggle datasets topics show`

Displays a dataset discussion topic with all comments in tree form.

Usage:

kaggle datasets topics show <TOPIC_REF> [options]

Arguments:

<TOPIC_REF>: A topic reference, which can be:
- <dataset>/<topic-id> (e.g., zillow/zecon/12345 - note that this supports multi-slash dataset slugs)
- <dataset> <topic-id> (two separate arguments, where <topic-id> is passed as second argument)
- <topic-id> (bare numeric ID)

Options:

--page-size <PAGE_SIZE>: Number of comments to show per page.
--page-token <PAGE_TOKEN>: Page token for comment pagination.
-v, --csv: Print results in CSV format.
-q, --quiet: Suppress verbose output.

Example:

kaggle datasets topics show zillow/zecon/12345

Purpose:

This command displays a full discussion topic along with all of its comments rendered in an indented tree structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasets Commands

`kaggle datasets list`

`kaggle datasets files`

`kaggle datasets download`

`kaggle datasets init`

`kaggle datasets create`

`kaggle datasets version`

`kaggle datasets metadata`

`kaggle datasets status`

`kaggle datasets delete`

`kaggle datasets topics list`

`kaggle datasets topics show`

FilesExpand file tree

datasets.md

Latest commit

History

datasets.md

File metadata and controls

Datasets Commands

kaggle datasets list

kaggle datasets files

kaggle datasets download

kaggle datasets init

kaggle datasets create

kaggle datasets version

kaggle datasets metadata

kaggle datasets status

kaggle datasets delete

kaggle datasets topics list

kaggle datasets topics show

`kaggle datasets list`

`kaggle datasets files`

`kaggle datasets download`

`kaggle datasets init`

`kaggle datasets create`

`kaggle datasets version`

`kaggle datasets metadata`

`kaggle datasets status`

`kaggle datasets delete`

`kaggle datasets topics list`

`kaggle datasets topics show`