add gist model generation utils to library #23

uSaiPrashanth · 2024-06-20T17:58:14Z

Other modifications worth mentioning:

Changed scripts/convert_hf_checkpoint.py to support loading of finetuned Llama-3 models from safetensors state dict
Added finetuned configs to model.py (Finetuned models use a vocab size of one token higher than Llama 3)

… runpod server.

- Creates cache.py - Introduces global_tokens - Formats repo with ruff - Speed parity with full KV-cache

Implements window KV-Cache Compression Strategy

… Hitter methods which require attention probs.

… update() function.

- https://arxiv.org/abs/2305.17118

…dow + global).

Implement Scissorhands KV-cache compression & SnapKV prompt compression

… seq length.

commit 0528eef Author: Faisal Ladhak <[email protected]> Date: Thu Jun 13 20:30:35 2024 +0000 Reformatting with ruff commit 4c3cfc5 Author: Faisal Ladhak <[email protected]> Date: Thu Jun 13 04:35:05 2024 +0000 Added support for Qwen2 models.

Fix various bugs

…rics.

Adding code for evaluation, along with Squality and refernce-based metrics.

- This is necessary for training models outside of eval.py

Update Tasks to allow for train, val, test splits to be used.

Adding four tasks from RULER

generate.py

…eing modified across sub-classes.

Griffin Adams and others added 30 commits May 20, 2024 19:35

Prepare Llama-3-8B-Instruct.

b5904ff

Start prompt summarization code. Partial update since need to restart…

5b2eb6a

… runpod server.

more

d3811ec

Reformatting with ruff.

c2af69b

Adds fixed-window KV-cache.

06aa103

- Creates cache.py - Introduces global_tokens - Formats repo with ruff - Speed parity with full KV-cache

Merge pull request #9 from AnswerDotAI/window

a4dd428

Implements window KV-Cache Compression Strategy

Update sdpa to allow for return_attn=True flag. Enables work on Heavy…

0f7192d

… Hitter methods which require attention probs.

Layer specific max_cache_length. Switch to a list arg.

2e983d8

Moving cache prefill logic from attention module's forward() to cache…

5f04c78

… update() function.

Apply ruff formatting, minor cosmetic changes.

5324011

Add default prompts into ./prompts.

b933300

Make Llama-2 chat the default and support its chat template.

84dff2d

Implement Scissorhands paper as KVCacheScissorHands.

5bbdcf1

- https://arxiv.org/abs/2305.17118

Implements prompt compression with SnapKV and Remove Middle (keep win…

45e9854

…dow + global).

Merge pull request #11 from AnswerDotAI/snap

0669421

Implement Scissorhands KV-cache compression & SnapKV prompt compression

Add KVCacheRandom as lowerbound baseline.

54c7212

Fix pre-existing bug in max_new_tokens. Removes T_new.

6f1cf31

dolomites compatibility

759a6d9

dtype auto

f7e71bb

Minor change to max_cache_length assertion equality statement.

59f2416

Ruff formatting and dont err out for max cache length longer than max…

07e4a4b

… seq length.

Squashed commit of the following:

6892ea1

commit 0528eef Author: Faisal Ladhak <[email protected]> Date: Thu Jun 13 20:30:35 2024 +0000 Reformatting with ruff commit 4c3cfc5 Author: Faisal Ladhak <[email protected]> Date: Thu Jun 13 04:35:05 2024 +0000 Added support for Qwen2 models.

Fix various bugs

e7a2a91

Merge pull request #14 from AnswerDotAI/vik/optims

d4b4154

Fix various bugs

Fix minor Llama-3 chat template bug and apply ruff formatting.

df8da8d

Comment out assertion.

0d22ad8

Adds code for evaluation, along with Squality and reference-based met…

a074f75

…rics.

Merge pull request #16 from AnswerDotAI/evals

1b5bda7

Adding code for evaluation, along with Squality and refernce-based metrics.

Update Tasks to allow for train, val, test splits to be used.

a4ecbba

- This is necessary for training models outside of eval.py

Merge pull request #17 from AnswerDotAI/eval2

59d133c

Update Tasks to allow for train, val, test splits to be used.

rbiswasfc and others added 3 commits June 27, 2024 19:11

added 4 tasks from ruler

17a6f66

ruff

cc8c039

Merge pull request #28 from AnswerDotAI/rb/ruler

6a02bef

Adding four tasks from RULER

griff4692 reviewed Jun 27, 2024

View reviewed changes

generate.py Outdated Show resolved Hide resolved

Faisal Ladhak and others added 18 commits June 27, 2024 18:07

Filter based on tokenized length

1b76bc3

Add task stats and fix bug in task.py where self.mandatory_cols was b…

f7b5316

…eing modified across sub-classes.

Update metric.

269439e

Minor bugfixes.

8014804

Bugfix for LLM metrics.

a5f3e71

Add --tasks all option for bulk eval.

0a44ff7

Minor change to save path to include model name.

19594b5

Add eval_multi.py for running hparam sweep evals.

0ee7c30

Switch from 4k to 8k RULER tasks.

c2896e7

Add 4k as a cache length.

267e9f5

Add 8k.

c4c87b5

Changed eval order for hyperparm

0f719ca

Remove default attention thresholding.

c608e80

Add random prompt compression strategy.

127773a

Update default configs.

12a6435

Changes to FastGen.

9c9845b

Add KVCacheAnalysis which computes attention loss.

008175b

Update FastGen to use new attention loss calculation.

12be67c

uSaiPrashanth closed this Jul 4, 2024

uSaiPrashanth force-pushed the gist-token-merged branch from 3f9be25 to 12be67c Compare July 4, 2024 14:11

uSaiPrashanth reopened this Jul 4, 2024

add gist model generation utils to library

9056c97

uSaiPrashanth force-pushed the gist-token-merged branch from 0146e6c to 9056c97 Compare July 4, 2024 15:47

uSaiPrashanth requested a review from griff4692 July 4, 2024 15:49

uSaiPrashanth self-assigned this Jul 8, 2024

griff4692 force-pushed the main branch from fff6c03 to ced3b9c Compare July 22, 2024 19:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add gist model generation utils to library #23

add gist model generation utils to library #23

uSaiPrashanth commented Jun 20, 2024

add gist model generation utils to library #23

Are you sure you want to change the base?

add gist model generation utils to library #23

Conversation

uSaiPrashanth commented Jun 20, 2024