Lance Deep Learning - recipes

Dive into building Deep learning pipelines using Lance datasets! This repository contains examples to help you use Lance datasets for your Deep learning projects.

These are built using Lance, a free, open-source, columnar data format that requires no setup.
High-performance random access: More than 1000x faster than Parquet.
Zero-copy, automatic versioning: manage versions of your data automatically, and reduce redundancy with zero-copy logic built-in.

Join our community for support - Discord • Twitter

Why Lance

Convinience
Lance columnar file format is designed for large scale DL workloads. Columnar format allows you to easily and efficiently manage complex and unstructred multi-modal datasets Updation, filtering and zero-copy versioning allow you to iterate faster on large datasets. It’s designed to be used with images, videos, 3D point clouds, audio and of course tabular data. It supports any POSIX file systems, and cloud storage like AWS S3 and Google Cloud Storage

Performance
Lance format supports fast read/writes making your training time data loading significantly faster.

Dataset Examples

Examples on how to convert existing datasets to Lance format.

Example	Scripts	Read The Blog!
Creating text dataset for LLM pre-training
Creating Instruction dataset for LLM fine-tuning
Creating Image Captioning Dataset for Multi-Modal Model Training

Training Examples

Practical examples showcasing how to adapt your Lance dataset to popular deep learning projects.

Example	Notebook & Scripts
PEFT Supervised Fine-tuning of Gemma using Huggingface Trainer
LLM pre-training
COCO Image segmentation
FSDP LLM pre-training
Wikiart Diffusion Training
CLIP Training
Image Classification
Training a Variational AutoEncoder from scratch with Lance file format

Contributing Examples

If you're working on some cool deep learning examples using Lance that you'd like to add to this repo, please open a PR! More detailed instructions on contributing can be found on the CONTRIBUTING.md page.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
converters/lance-image-dataset-converter		converters/lance-image-dataset-converter
examples		examples
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lance Deep Learning - recipes

Why Lance

Dataset Examples

Training Examples

Contributing Examples

About

Releases

Packages

Contributors 4

Languages

License

lancedb/lance-deeplearning-recipes

Folders and files

Latest commit

History

Repository files navigation

Lance Deep Learning - recipes

Why Lance

Dataset Examples

Training Examples

Contributing Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages