LLM pipeline implementation #1040

mohitmundhragithub · 2025-08-26T05:41:43Z

No description provided.

…included

…re pipeline cannot handle an input size larger than the max prefill size

github-actions · 2025-08-26T05:41:51Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

…lemented performance benchmark for LLM pipeline

…y input and issue_query only handles output tokens

… vs KV cache size

flutter/cpp/backends/external.h

mohitmundhragithub

.

flutter/cpp/datasets/mmlu_gen.cc

anhappdev · 2025-10-07T06:50:28Z

This PR should resolve the issue with the iOS build: #1064
However, the Windows build still fails. Here's the log: 2025-10-07-windows.log

farook-edev · 2025-10-07T12:02:24Z

This PR should resolve the issue with the iOS build: #1064
However, the Windows build still fails. Here's the log: 2025-10-07-windows.log

Thanks a ton! I'll look into the windows issue.

…s calculated per instruction not per sample

freedomtan · 2025-10-21T05:19:44Z

@freedomtan to test the app (and the accuracy of tinyMMLU).

freedomtan · 2025-10-21T05:39:12Z

for performance:

time to first token
tokens/s

for accuracy:

tinyMMLU
ifeval

Mostelk · 2025-10-22T08:15:31Z

@farook-edev please share link to all assets used for LLM benchmark, TFRecords for datasets and models

freedomtan · 2025-10-22T09:19:22Z

confirmed that I can run the apk (https://github.com/mlcommons/mobile_app_open/actions/runs/18638579937/artifacts/4313148049) on Pixel 10.

However, it seems there are no ttft and tokens/s information in mlperf_log_summary.txt. I think we need those information in both detail log and the summary one.

The following is the mlperf_log_summary.txt of performance mode. The performance_sample_count: 1 is weird too.

mustang:/sdcard/Android/data/org.mlcommons.android.mlperfbench/files/logs/2025-10-22T16-49-11.233245/llm-performance $ cat  mlperf_log_summary.txt                                                                                      
================================================
MLPerf Results Summary
================================================
SUT name : TFLite
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : 5324192452
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Skipped
  Early stopping satisfied: Yes
Early Stopping Result:
 * Processed at least 64 queries (77).
 * Would discard 0 highest latency queries.
 * Early stopping 90th percentile estimate: 6144549223
 * Not enough queries processed for 99th percentile
 early stopping estimate (would need to process at
 least 662 total queries).

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 0.25
QPS w/o loadgen overhead        : 0.25

Min latency (ns)                : 3479648544
Max latency (ns)                : 6144549223
Mean latency (ns)               : 4035065622
50.00 percentile latency (ns)   : 3731987816
90.00 percentile latency (ns)   : 5324192452
95.00 percentile latency (ns)   : 5677467322
97.00 percentile latency (ns)   : 5766271567
99.00 percentile latency (ns)   : 6144549223
99.90 percentile latency (ns)   : 6144549223

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 1000
target_latency (ns): 0
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 300000
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 1

No warnings encountered during test.

No errors encountered during test.

farook-edev · 2025-10-22T10:27:37Z

@farook-edev please share link to all assets used for LLM benchmark, TFRecords for datasets and models

You should be able to find them here.

freedomtan · 2025-10-28T05:33:41Z

@farook-edev please help to have meaningful numbers in the mlperf_log_summary.txt so that we can determine the constraints for performance model (min running time, number of output tokens).

@anhappdev: we always update to have the latest LoadGen version when we have a new release. Please do it. Surely, we should test it carefully.

freedomtan · 2025-10-28T05:39:26Z

Disable the C++ exception handling when building Eigen for iOS.

freedomtan · 2025-10-28T05:52:25Z

@mohitmundhragithub: please provide the link to the discussed mlperf_client testing method.

mohitmundhragithub · 2025-10-28T05:58:57Z

prompts currently in client: https://github.com/mlcommons/mlperf_client_dev/tree/main/data/llama3/prompts

current media document for 1.5 submission. Describes all aspects of the benchmark: https://docs.google.com/document/d/1QUWJa-iKyXznzco3trM6P-UOlvN33sKV3Yp7MyQCzrs/edit?pli=1&tab=t.0

Performance sheet (contains all the formula used):
https://docs.google.com/spreadsheets/d/1xWVcISuvgljNUry2kFlqgFpGAv1DhyxKXQj9HhmUkZI/edit?gid=1936702666#gid=1936702666

freedomtan · 2025-10-28T06:06:39Z

Note on samples:

reproducible results: when random number seeds are changed, we must have comparable results.

@mohitmundhragithub, @Mostelk

sonarqubecloud · 2025-10-28T22:13:25Z

Quality Gate passed

Issues
74 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
2.6% Duplication on New Code

See analysis details on SonarQube Cloud

farook-edev added 9 commits August 2, 2025 08:11

WIP LLM pipeline and dataset implementation

46346a7

fixed issues preventing libraries from compiling, runtime errors not …

5aab20a

…included

upgrade TensorFlow to 2.18.0

f598e57

upgraded llm pipeline to use TFLite C++ api + small bug fixes

fe32950

basic flutter app support for icon and dataset

24ad1d5

added linux x86_64 config for internal testing

aa09439

updated bazel config to use SSE/MMX instructions

84b164e

fixed incorrect answer format and compression

d57040c

got pipeline and dataset to produce proper results + fixed issues whe…

f9e40a5

…re pipeline cannot handle an input size larger than the max prefill size

mohitmundhragithub requested review from a team and anhappdev as code owners August 26, 2025 05:41

mohitmundhragithub assigned farook-edev Aug 26, 2025

mohitmundhragithub requested review from Mostelk and freedomtan August 26, 2025 05:42

mohitmundhragithub marked this pull request as draft August 26, 2025 05:42

farook-edev added 2 commits September 1, 2025 07:07

added support for loadgen's token based performance measurement + imp…

057c9f8

…lemented performance benchmark for LLM pipeline

fixed bugs in inference process, first token function now handles onl…

3c8b4f5

…y input and issue_query only handles output tokens

freedomtan mentioned this pull request Sep 2, 2025

Master issue: LLM Benchmark #940

Open

farook-edev changed the title ~~Feat llm~~ LLM pipeline implementation Sep 2, 2025

farook-edev linked an issue Sep 2, 2025 that may be closed by this pull request

Master issue: LLM Benchmark #940

Open

farook-edev added 5 commits September 8, 2025 00:54

optimized tensor retrieval for inference + added check for input size…

a03fbea

… vs KV cache size

clang-format

69a630a

mmlu dataset cleanup and formatting

816f282

slight code cleanup

fca2905

fixed issue with genai ops import

20e7805

mohitmundhragithub commented Sep 22, 2025

View reviewed changes

flutter/cpp/backends/external.h Outdated Show resolved Hide resolved

mohitmundhragithub commented Sep 22, 2025

View reviewed changes

flutter/cpp/datasets/mmlu_gen.cc Show resolved Hide resolved

code/config cleanup

83aea46

freedomtan mentioned this pull request Oct 7, 2025

"Accuracy" metric for LLM model(s) #986

Open

farook-edev added 5 commits October 11, 2025 09:30

disabled XNNPACK AVX-VNNI for windows due to C2440 error

97ba25f

moved accuracy calculation away from ProcessOutput, ifeval accuracy i…

5383e75

…s calculated per instruction not per sample

fixed issue with app not finding model/tokenizer

8e21ed1

properly format 0-shot prompts + allow for file/directory for model path

94f3cd5

formatting

e56d622

farook-edev added 7 commits October 27, 2025 09:21

potential fix for windows C2440

9120d63

fix for aligned free for windows

002d2d0

potential fix for IOS / windows CI issues

9f81bdd

ifeval check cleanup and bugfixes

93d5352

formatting

fc0f241

all possible configs for removing eigen exceptions

15880a9

removed objc opts

5ddbb87

farook-edev mentioned this pull request Oct 28, 2025

Migrate from Bazel v6.3.2 to v8.4.2 #1066

Draft

farook-edev added 3 commits October 28, 2025 23:57

use token latencies in app

7a4042a

enable exceptions for IOS

cfc719b

disable FP16 AVX for x86 simulator

f87ef86

Uh oh!

LLM pipeline implementation #1040

Are you sure you want to change the base?

LLM pipeline implementation #1040

Uh oh!

Conversation

mohitmundhragithub commented Aug 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mohitmundhragithub left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

anhappdev commented Oct 7, 2025

Uh oh!

farook-edev commented Oct 7, 2025

Uh oh!

freedomtan commented Oct 21, 2025

Uh oh!

freedomtan commented Oct 21, 2025

Uh oh!

Mostelk commented Oct 22, 2025

Uh oh!

freedomtan commented Oct 22, 2025

Uh oh!

farook-edev commented Oct 22, 2025

Uh oh!

freedomtan commented Oct 28, 2025

Uh oh!

freedomtan commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

freedomtan commented Oct 28, 2025

Uh oh!

mohitmundhragithub commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

freedomtan commented Oct 28, 2025

Uh oh!

sonarqubecloud bot commented Oct 28, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

github-actions bot commented Aug 26, 2025 •

edited

Loading

freedomtan commented Oct 28, 2025 •

edited

Loading

mohitmundhragithub commented Oct 28, 2025 •

edited

Loading