Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub #1

Open
wants to merge 150 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
150 commits
Select commit Hold shift + click to select a range
c9ad067
parallelize multi-head attention
leloykun Jul 24, 2023
c9b1f10
Speed up rmsnorm by using sqrtf/expf
kris-jusiak Jul 24, 2023
90ae37c
git push origin masterMerge branch 'admu-progvar-master'
karpathy Jul 24, 2023
791be9d
tweak argparse. fix steps=256, even if some models may support longer…
karpathy Jul 24, 2023
687473c
Update README.md with TinyStories model series
karpathy Jul 24, 2023
669b75d
Merge pull request #43 from krzysztof-jusiak/rmsnorm
karpathy Jul 24, 2023
bf9f6f2
Add discord link to Readme
karpathy Jul 24, 2023
16edfe6
add a simple makefile
karpathy Jul 24, 2023
2be7d78
MSVC Compatibility fix for timer
richinseattle Jul 24, 2023
e6e3f13
candidate memmap implementation
karpathy Jul 24, 2023
496466f
add rundebug to makefile, useful for spotting issues and such
karpathy Jul 24, 2023
cae88df
tune readme around timings etc
karpathy Jul 24, 2023
f121f5f
Merge branch 'karpathy:master' into richinseattle-patch-1
richinseattle Jul 24, 2023
b2857c6
Switch to using timespec_get() for cross OS compatibility
richinseattle Jul 24, 2023
d18e9ef
Merge pull request #48 from richinseattle/richinseattle-patch-1
karpathy Jul 24, 2023
a1f6b46
merge conflict resolve with imports
karpathy Jul 25, 2023
133ad3f
Merge pull request #50 from karpathy/memmap
karpathy Jul 25, 2023
c3e0d73
we can inference Meta's Llama 2 7B, yay
karpathy Jul 25, 2023
cf625ec
Update README.md
karpathy Jul 25, 2023
81c90bf
Update README.md: small tweaks
karpathy Jul 25, 2023
98ec4ba
Update README.md
karpathy Jul 25, 2023
6ce91b1
Fixed time_in_ms() compile time error (termux and neoterm)
emma-eva Jul 25, 2023
f3a1e22
intimately
RichardScottOZ Jul 25, 2023
d359fae
Merge pull request #69 from RichardScottOZ/patch-1
karpathy Jul 25, 2023
05ee4cb
fix bug in timing - use steps not max seq len doh
karpathy Jul 25, 2023
94730f1
add the 110m model, as it finished training
karpathy Jul 25, 2023
34ccb64
fix typo in readme after adding the 110m model
karpathy Jul 25, 2023
6cf34d6
Update README.md
karpathy Jul 25, 2023
ac22fbc
Update README.md: formate output samples
madroidmaq Jul 25, 2023
4d1fa2f
Export llama without llama
python273 Jul 25, 2023
366711a
Merge pull request #77 from madroidmaq/master
karpathy Jul 25, 2023
614bf91
Merge pull request #60 from emma-eva/patch-1
karpathy Jul 25, 2023
5bcd19a
Merge pull request #85 from python273/export-llama-without-llama
karpathy Jul 25, 2023
7f9f5ca
Update README.md: new llama model export
karpathy Jul 25, 2023
f565089
honestly at this point this is a lot more my nanogpt code than llama …
karpathy Jul 25, 2023
36c522a
Improve locality
aegkmq Jul 26, 2023
36bf904
Refactor freqs_cis into freqs_cos and freqs_sin, and remove complex64…
ai-doge Jul 26, 2023
8986005
Minor cleanup
aegkmq Jul 26, 2023
3aedfe5
Merge branch 'aegkmq-master'
karpathy Jul 26, 2023
f5d8797
Update README.md
karpathy Jul 26, 2023
7496ea8
Update README.md
karpathy Jul 26, 2023
7059d7d
Update README.md
karpathy Jul 26, 2023
2711ae8
make compiler tunable in Makefile, i think potentially nice and useful
karpathy Jul 26, 2023
f0f43b7
small note on traing times
karpathy Jul 26, 2023
5703448
add some code comments
kroggen Jul 26, 2023
4085e89
Merge pull request #119 from kroggen/code-comments
karpathy Jul 26, 2023
7a4ca4a
add contributing section to readme, and also notable forks section
karpathy Jul 26, 2023
c2bbe9c
link to the huggingface hub models instead
karpathy Jul 27, 2023
5f681b6
oops missed a section somehow, updating readme
karpathy Jul 27, 2023
7887133
Center align cute llama image in README
som-sama Jul 27, 2023
7f7a3b2
update openmp pragmas for MSVC compatibility
richinseattle Jul 27, 2023
530ef8e
light touchups to export script so one doesn't need to pass in a slas…
karpathy Jul 27, 2023
34cce6a
Merge pull request #126 from som-sama/patch-1
karpathy Jul 27, 2023
539dc73
fix whitespace
richinseattle Jul 27, 2023
815ce33
Merge branch 'patch-1' of https://github.com/richinseattle/llama2.c i…
karpathy Jul 27, 2023
b35e82f
Merge branch 'richinseattle-patch-1'
karpathy Jul 27, 2023
37e8c20
Windows compat: Use GetTickCount for delta timer
richinseattle Jul 27, 2023
5c55d59
Merge pull request #128 from richinseattle/patch-1
karpathy Jul 27, 2023
eff1c1b
Merge branch 'master' of github.com:karpathy/llama2.c
karpathy Jul 27, 2023
0d18fa7
Merge branch 'patch-2' of https://github.com/richinseattle/llama2.c i…
karpathy Jul 27, 2023
b7efb1b
Merge branch 'richinseattle-patch-2'
karpathy Jul 27, 2023
4a6b7a4
Include windows support header (for mmap)
richinseattle Jul 27, 2023
5b405a7
Add Windows support files with mmap impl
richinseattle Jul 27, 2023
01c06fa
readme: Include reference to go port
tmc Jul 27, 2023
14e90b5
Merge pull request #131 from tmc/patch-2
karpathy Jul 27, 2023
de6f2fc
Merge pull request #130 from richinseattle/patch-3
karpathy Jul 27, 2023
b18d325
add windows build commands
richinseattle Jul 27, 2023
a03ce1e
Merge pull request #132 from richinseattle/master
karpathy Jul 27, 2023
f19f50a
stylistic changes for the windows support ifdefs
karpathy Jul 27, 2023
4e23ad8
touchups to readme: reshuffle todos, and add a windows note
karpathy Jul 27, 2023
d281777
Update README.md
nikolaydubina Jul 27, 2023
9c0850d
add llama2.c-android to readme
Manuel030 Jul 27, 2023
abfcdf1
Improve readme: clarify dependencies and other things to install
tatellos Jul 27, 2023
1bdf5af
Replace the rand() with a portable PRNG
aegkmq Jul 27, 2023
bddde33
add Makefile option to support builds on amazon linux & centos
tairov Jul 27, 2023
2566ddf
add README section for centos 7 & amazon linux make target
tairov Jul 27, 2023
e970c27
Update README.md
nikolaydubina Jul 27, 2023
4a4663a
Merge pull request #134 from Manuel030/sync-with-upstream
karpathy Jul 27, 2023
79933a8
Merge pull request #137 from tatellos/master
karpathy Jul 27, 2023
acf1e18
remove second ifdefs for windows timing by introducing ported version…
tairov Jul 27, 2023
3435726
minor whitespaces cleanup
tairov Jul 27, 2023
9253d45
Merge pull request #139 from tairov/gnu
karpathy Jul 27, 2023
71200f3
Fix random_f32
aegkmq Jul 27, 2023
b4b9ef5
add github actions workflow to validate builds on changes in *.c, *.h…
tairov Jul 25, 2023
677bb8f
Merge branch 'win-timing' of https://github.com/tairov/llama2.c into …
karpathy Jul 27, 2023
b6d63a9
Merge branch 'tairov-win-timing'
karpathy Jul 27, 2023
cc66a20
Merge pull request #86 from tairov/master
karpathy Jul 27, 2023
459b9c8
Merge branch 'master' into patch-1
nikolaydubina Jul 27, 2023
b63cb91
Add llama2.cpp to notable forks section
leloykun Jul 27, 2023
6b3a689
Merge pull request #146 from admu-progvar/master
karpathy Jul 27, 2023
747db60
Merge pull request #133 from nikolaydubina/patch-1
karpathy Jul 27, 2023
5177633
HF checkpoints i removed the optimizer to save space, init Adam witho…
karpathy Jul 27, 2023
78952fb
propagate the dropout flag
karpathy Jul 27, 2023
0e1b0d4
Merge branch 'master' of github.com:karpathy/llama2.c
karpathy Jul 27, 2023
25b50ee
small stylistic fixes and adjustments, fix bug in Makefile, and chang…
karpathy Jul 27, 2023
e5752e1
strip leading whitespace
karpathy Jul 27, 2023
568a651
slightly tune todos of the project
karpathy Jul 27, 2023
72ba34c
fix: Use correct compiler for Win64 GCC in Makefile
murilocurti Jul 28, 2023
b4bb47b
big change: adding prompting. many LOC, but critical. ty @atamurad fo…
karpathy Jul 28, 2023
7cbb47c
update export_meta_llama_bin, get freqs_cos, freqs_sin independently.
ai-doge Jul 28, 2023
905c5c5
add mention of prompting into readme
karpathy Jul 28, 2023
2efc197
oops readme smallfix
karpathy Jul 28, 2023
9949c50
readme tweaks
karpathy Jul 28, 2023
6ce28fb
Merge branch 'master' into better-rng
aegkmq Jul 28, 2023
d04336c
Merge pull request #138 from aegkmq/better-rng
karpathy Jul 28, 2023
fd68dd2
reshuffle blocks of code a bit
karpathy Jul 28, 2023
3418fed
added repository in readme
Jul 28, 2023
356f74c
Add fp fast for better performance on windows
GabrielJadderson Jul 28, 2023
43d19ed
Update README.md
epicure Jul 28, 2023
f61807d
Merge pull request #163 from epicure/patch-1
karpathy Jul 28, 2023
6f156fd
Added Julia port to notable forks section in README.md
juvi21 Jul 29, 2023
bc36686
Add build step for win64 msys2/mingw
tairov Jul 29, 2023
ab39930
Update README.md
cgbur Jul 30, 2023
cddb05d
use ssize_t/int64 and 64bit version of ftell on windows
richinseattle Jul 30, 2023
13789ff
Added julia port to notable forks section in README.md
juvi21 Jul 30, 2023
6b6ed3d
update mmap.c to use ssize_t instead of off_t for 64bit
richinseattle Jul 30, 2023
b63dfd5
clean up windows mmap, drop 32bit support
richinseattle Jul 30, 2023
ce05cc2
Merge pull request #178 from cgbur/patch-1
karpathy Jul 31, 2023
68fc522
add vodkaslime llama.zig to readme
vodkaslime Jul 31, 2023
3b446ba
update readme
leo-du Jul 31, 2023
d0702ed
README.md - Update notable forks section
trholding Jul 31, 2023
4c0a882
add link to scala port
jrudolph Jul 31, 2023
883cda1
fix freq_cos, freq_sin serialize
ai-doge Aug 1, 2023
338f606
Merge branch 'master' into patch-1
juvi21 Aug 1, 2023
163e264
Merge pull request #197 from jrudolph/add-scala-port
karpathy Aug 1, 2023
13d22ef
Merge branch 'master' into llama2.c
karpathy Aug 1, 2023
9942a33
Merge pull request #194 from celikin/patch-1
karpathy Aug 1, 2023
f971b76
Merge pull request #188 from leo-du/llama2.c
karpathy Aug 1, 2023
502f681
Merge branch 'master' of https://github.com/ai-doge/llama2.c into ai-…
karpathy Aug 1, 2023
71e5de2
Merge branch 'ai-doge-master'
karpathy Aug 1, 2023
4a1250e
Merge pull request #149 from murilocurti/fix/makefile-win64-gcc
karpathy Aug 1, 2023
9023840
Merge branch 'master' into master
karpathy Aug 1, 2023
217667d
Merge branch 'master' into notable-forks-patch
karpathy Aug 1, 2023
c1a0c6e
Merge pull request #198 from trholding/notable-forks-patch
karpathy Aug 1, 2023
e06ff42
Merge pull request #160 from GabrielJadderson/add-fp-fast
karpathy Aug 1, 2023
221f4f9
Merge branch 'master' into patch-1
karpathy Aug 1, 2023
def12a2
Merge pull request #173 from juvi21/patch-1
karpathy Aug 1, 2023
23f6083
Merge branch 'master' into master
karpathy Aug 1, 2023
e270c6e
Update README.md: add mention of -f unroll loops option for gcc
karpathy Aug 1, 2023
e2d4a38
Merge pull request #186 from vodkaslime/master
karpathy Aug 1, 2023
b7f026f
Merge pull request #179 from richinseattle/windows-ftell64-fix
karpathy Aug 1, 2023
a8f3e1c
Merge pull request #175 from tairov/ci-mingw
karpathy Aug 1, 2023
e592ed5
Add tinyshakespeare dataset
wlamond Jul 30, 2023
b2b5514
Add link to Emscripten port in README
gohai Aug 2, 2023
8dd9bad
Update README.md
gohai Aug 2, 2023
3097430
Add Java port.
mukel Aug 2, 2023
574be29
Merge pull request #217 from mukel/llama2.java
karpathy Aug 2, 2023
5b47cd1
Merge pull request #211 from wlamond/tinyshakespeare
karpathy Aug 2, 2023
9819ae4
Merge branch 'master' into patch-1
karpathy Aug 2, 2023
af8708d
Merge pull request #216 from gohai/patch-1
karpathy Aug 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 124 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
name: Continuous Integration

on:
push:
branches:
- master
paths: ['.github/workflows/**', '**/Makefile', '**/*.c', '**/*.h']
pull_request:
types: [opened, synchronize, reopened]
paths: ['**/Makefile', '**/*.c', '**/*.h']

env:
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}

jobs:
# check basic builds to avoid breaking changes
ubuntu-focal-make:
runs-on: ubuntu-20.04

steps:
- name: Clone
id: checkout
uses: actions/checkout@v3

- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential -y

- name: Build
id: make_build
run: |
make

- name: Build runfast
id: make_build_runfast
run: |
make runfast

macOS-latest-make:
runs-on: macos-latest

steps:
- name: Clone
id: checkout
uses: actions/checkout@v3

- name: Dependencies
id: depends
continue-on-error: true
run: |
brew update

- name: Build
id: make_build
run: |
make

- name: Build runfast
id: make_build_runfast
run: |
make runfast

- name: Build clang
id: make_build_clang
run: |
make run CC=clang

windows-latest-make:
runs-on: windows-latest

strategy:
matrix:
arch:
- amd64
- amd64_x86
- amd64_arm64

steps:
- name: Clone
id: checkout
uses: actions/checkout@v3

- name: Setup MSBuild
uses: microsoft/setup-msbuild@v1

- name: Setup MSVC ${{ matrix.arch }}
uses: ilammy/msvc-dev-cmd@v1
with:
arch: ${{ matrix.arch }}

- name: Build ${{ matrix.arch }}
id: build_msvc
run: |
.\build_msvc.bat

windows-latest-mingw:
runs-on: windows-latest

defaults:
run:
shell: msys2 {0}

strategy:
matrix:
include:
- { sys: mingw64, env: x86_64 }

steps:
- name: Checkout
id: checkout
uses: actions/checkout@v3

- uses: msys2/setup-msys2@v2
id: setup-msys2
with:
msystem: ${{ matrix.sys }}
install: mingw-w64-${{matrix.env}}-gcc make

- name: Build ${{ matrix.sys }} ${{ matrix.env }}
id: build_mingw
run: |
make win64
50 changes: 50 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# choose your compiler, e.g. gcc/clang
# example override to clang: make run CC=clang
CC = gcc

# the most basic way of building that is most likely to work on most systems
.PHONY: run
run: run.c
$(CC) -O3 -o run run.c -lm

# useful for a debug build, can then e.g. analyze with valgrind, example:
# $ valgrind --leak-check=full ./run out/model.bin 1.0 3
rundebug: run.c
$(CC) -g -o run run.c -lm

# https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
# https://simonbyrne.github.io/notes/fastmath/
# -Ofast enables all -O3 optimizations.
# Disregards strict standards compliance.
# It also enables optimizations that are not valid for all standard-compliant programs.
# It turns on -ffast-math, -fallow-store-data-races and the Fortran-specific
# -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens.
# It turns off -fsemantic-interposition.
# In our specific application this is *probably* okay to use
.PHONY: runfast
runfast: run.c
$(CC) -Ofast -o run run.c -lm

# additionally compiles with OpenMP, allowing multithreaded runs
# make sure to also enable multiple threads when running, e.g.:
# OMP_NUM_THREADS=4 ./run out/model.bin
.PHONY: runomp
runomp: run.c
$(CC) -Ofast -fopenmp -march=native run.c -lm -o run

.PHONY: win64
win64:
x86_64-w64-mingw32-gcc -Ofast -D_WIN32 -o run.exe -I. run.c win.c

# compiles with gnu99 standard flags for amazon linux, coreos, etc. compatibility
.PHONY: rungnu
rungnu:
$(CC) -Ofast -std=gnu11 -o run run.c -lm

.PHONY: runompgnu
runompgnu:
$(CC) -Ofast -fopenmp -std=gnu11 run.c -lm -o run

.PHONY: clean
clean:
rm -f run
193 changes: 137 additions & 56 deletions README.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions build_msvc.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
cl.exe /fp:fast /Ox /openmp /I. run.c win.c
112 changes: 112 additions & 0 deletions export_meta_llama_bin.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
"""
This script exports the Llama 2 weights in llama2c.bin format.
"""
import os
import sys
import struct
from pathlib import Path
import json

import torch

from model import precompute_freqs_cis


def export(p, state_dict, filepath='model.bin'):
"""export the model weights in fp32 into .bin file to be read from C"""
f = open(filepath, 'wb')

def serialize(key):
print(f"writing {key}...")
t = state_dict[key].contiguous().view(-1).type(torch.float32).numpy()
f.write(memoryview(t))
del state_dict[key]

# first write out the header
hidden_dim = state_dict['layers.0.feed_forward.w1.weight'].shape[0]
p['vocab_size'] = 32000
p['max_seq_len'] = 2048

n_kv_heads = p.get('n_kv_heads') or p['n_heads']
header = struct.pack(
'iiiiiii',
p['dim'], hidden_dim, p['n_layers'], p['n_heads'],
n_kv_heads, -p['vocab_size'], p['max_seq_len']
)
# NOTE ABOVE: -ve vocab_size is indicating that the classifier weights are present
# in the checkpoint and should be loaded.
f.write(header)

# next write out the embedding weights
print("writing tok_embeddings...")
serialize('tok_embeddings.weight')

# now all the layers
# attention weights
for i in range(p['n_layers']): serialize(f'layers.{i}.attention_norm.weight')
for i in range(p['n_layers']): serialize(f'layers.{i}.attention.wq.weight')
for i in range(p['n_layers']): serialize(f'layers.{i}.attention.wk.weight')
for i in range(p['n_layers']): serialize(f'layers.{i}.attention.wv.weight')
for i in range(p['n_layers']): serialize(f'layers.{i}.attention.wo.weight')
# ffn weights
for i in range(p['n_layers']): serialize(f'layers.{i}.ffn_norm.weight')
for i in range(p['n_layers']): serialize(f'layers.{i}.feed_forward.w1.weight')
for i in range(p['n_layers']): serialize(f'layers.{i}.feed_forward.w2.weight')
for i in range(p['n_layers']): serialize(f'layers.{i}.feed_forward.w3.weight')

# final rmsnorm
serialize('norm.weight')
# freqs_cos, freqs_sin
freqs_cos, freqs_sin = precompute_freqs_cis(p['dim'] // p['n_heads'], p['max_seq_len'] * 2)
state_dict['freqs_cos'] = freqs_cos[:p['max_seq_len']]
state_dict['freqs_sin'] = freqs_sin[:p['max_seq_len']]
serialize('freqs_cos')
serialize('freqs_sin')

# finally write the output weights
serialize('output.weight')

f.close()
print(f"wrote {filepath}")


def concat_weights(models):
state_dict = {}
for name in list(models[0]):
tensors = [model[name] for model in models]
if len(tensors) == 1 or len(tensors[0].shape) == 1:
state_dict[name] = tensors[0]
continue
is_axis_1 = (
name.startswith('tok_embeddings.')
or name.endswith('.attention.wo.weight')
or name.endswith('.feed_forward.w2.weight')
)
axis = 1 if is_axis_1 else 0
state_dict[name] = torch.cat(tensors, dim=axis)
for model in models:
del model[name]
return state_dict


def load_and_export(model_path, output_path):
params_path = os.path.join(model_path, 'params.json')
with open(params_path) as f:
params = json.load(f)
print(params)

model_paths = sorted(list(Path(model_path).glob('consolidated.*.pth')))
models = [torch.load(p, map_location='cpu') for p in model_paths]
state_dict = concat_weights(models)
del models
export(params, state_dict, output_path)


if __name__ == '__main__':
if len(sys.argv) == 1:
print('[Llama model folder path] [output path]')
exit()

model_path = sys.argv[1]
output_path = sys.argv[2]
load_and_export(model_path, output_path)
Loading