Skip to content

Latest commit

 

History

History
 
 

quickstart

Llama-Recipes Quickstart

If you are new to developing with Meta Llama models, this is where you should start. This folder contains introductory-level notebooks across different techniques relating to Meta Llama.

  • The notebooks demonstrate how to run Llama inference across Linux, Mac and Windows platforms using the appropriate tooling.
  • The notebook showcases the various ways to elicit appropriate outputs from Llama. Take this notebook for a spin to get a feel for how Llama responds to different inputs and generation parameters.
  • The folder contains scripts to deploy Llama for inference on server and mobile. See also and for hosting Llama on open-source model servers.
  • The folder contains a simple Retrieval-Augmented Generation application using Llama 3.
  • The folder contains resources to help you finetune Llama 3 on your custom datasets, for both single- and multi-GPU setups. The scripts use the native llama-recipes finetuning code found in which supports these features:
Feature
HF support for finetuning
Deferred initialization ( meta init)
HF support for inference
Low CPU mode for multi GPU
Mixed precision
Single node quantization
Flash attention
PEFT
Activation checkpointing FSDP
Hybrid Sharded Data Parallel (HSDP)
Dataset packing & padding
BF16 Optimizer ( Pure BF16)
Profiling & MFU tracking
Gradient accumulation
CPU offloading
FSDP checkpoint conversion to HF for inference
W&B experiment tracker