Add ARM NEON int8 qgemm acceleration, quantized kernel selection, and S8 context storage unification for CoNgram on macOS ARM by Copilot · Pull Request #249 · bab2min/Kiwi

Copilot · 2026-03-28T05:37:25Z

Re-check latest Arm NEON CI runs/jobs and collect failure logs for current head
Revert the previous misguided fix commit as requested
Identify the actual cong-global quantized-kernel root cause on NEON
Apply the smallest correct fix and run targeted validation
Run code review and security scan checks
Reply to new PR comments with commit hash and summary

📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

bab2min · 2026-03-28T07:27:09Z

@copilot Kiwi.cpp에서 아키텍처 선택할때도 fallback 대신 quantized kernel을 선택하도록 고쳐야지.
그리고 지금 방법대로라면 gemv는 S8U8 방식으로 연산을 수행하게 되어서 비효율적인데, 이걸 S8S8로 고치는것도 해줘.

Copilot · 2026-03-28T07:34:57Z

@copilot Kiwi.cpp에서 아키텍처 선택할때도 fallback 대신 quantized kernel을 선택하도록 고쳐야지.
그리고 지금 방법대로라면 gemv는 S8U8 방식으로 연�...

반영했습니다. Kiwi의 quantized 아키텍처 선택에서 NEON을 활성화해서 macOS ARM에서 fp32 fallback 대신 quantized kernel을 타도록 수정했고, context→output 경로의 quantized gemv도 S8U8(qgemm::gemv) 대신 S8S8(qgemm::gemvS8S8)을 사용하도록 바꿨습니다(저장 포맷은 유지하고 호출 직전에 U8→S8 변환). 적용 커밋: 9e6e283

bab2min

@copilot 중대한 오류가 있으니 잘 살펴보고 수정해.

src/CoNgramModel.cpp

Copilot · 2026-03-28T07:48:41Z

@copilot 중대한 오류가 있으니 잘 살펴보고 수정해.

수정 완료했습니다. 아키텍처별 동작을 다시 분리해서, x86-64/비-NEON은 기존 qgemm::gemv(S8U8) 경로를 유지하고, Arm NEON에서만 gemvS8S8를 사용하도록 고쳤습니다. 또한 inference 시점 변환은 제거하고, NEON에서는 로딩 시점에 context embedding을 S8로 미리 저장하도록 바꿨습니다. 반영 커밋: 379bca1 (UI 변경 없음)

bab2min · 2026-03-28T07:55:50Z

@copilot 그러면 Arm NEON 아키텍처 환경에서 U8타입인 contextEmbPtr을 쓰는게 다른 부분에 있으면 그것도 S8을 쓰도록 다 고쳐야하지 않을까? 그리고 나서 Arm NEON에서는 contextEmbPtr에 아예 S8타입으로 가중치를 저장해두는거지.

Copilot · 2026-03-28T08:05:07Z

@copilot 그러면 Arm NEON 아키텍처 환경에서 U8타입인 contextEmbPtr을 쓰는게 다른 부분에 있으면 그것도 S8을 쓰도록 다 고쳐야하지 않을까? 그리고 나서 Arm NE...

요청하신 방향으로 반영했습니다. NEON에서는 contextEmbPtr 자체를 S8 payload로 저장하도록 바꾸고, 기존에 U8을 쓰던 context 경로들도 NEON 한정으로 invNormS8/dotS8S8/gemvS8S8를 쓰도록 정리했습니다. 관련 스캐터드 경로(scatteredGEMMOpt baseline)도 NEON에서는 S8 문맥을 그대로 처리하도록 맞췄습니다. 적용 커밋: 94164e6 (UI 변경 없음)

bab2min · 2026-03-28T08:19:58Z

@copilot Arm NEON 환경에서 CI/CD를 돌리니 segfault가 발생하네. S8 사용하도록 변경하면서 메모리 주소를 잘못 계산한 게 있는지 다시 한번 점검해봐.

Copilot · 2026-03-28T08:26:32Z

@copilot Arm NEON 환경에서 CI/CD를 돌리니 segfault가 발생하네. S8 사용하도록 변경하면서 메모리 주소를 잘못 계산한 게 있는지 다시 한번 점검해�...

점검해서 수정했습니다. 원인은 NEON S8 경로에서 dotS8S8의 반환값(이미 scale 반영된 float)을 int32로 캐스팅한 뒤 scale을 다시 곱해 점수가 깨지던 부분이었습니다. 해당 경로(CoNgramModel 단일 점수 계산, qgemm::scatteredGEMMBaseline의 NEON 분기)에서 중복 스케일링/캐스팅을 제거해 score + bias로 바로 사용하도록 고쳤습니다. 적용 커밋: 5989a4a (UI 변경 없음)

bab2min · 2026-03-28T09:05:39Z

@copilot Arm NEON 아키텍처에서 실패하는 test case가 있네. 양자화된 커널을 사용하는 모든 경우에서 실패하는건 아닌거 같고 cong-global에서만 실패하는듯한데 거기서 사용하는 양자화 kernel들에 뭔가 오류가 있는게 아닌지 다시 검토해줘.

Copilot · 2026-03-28T09:13:37Z

@copilot Arm NEON 아키텍처에서 실패하는 test case가 있네. 양자화된 커널을 사용하는 모든 경우에서 실패하는건 아닌거 같고 cong-global에서만 실��...

검토 후 수정했습니다. 원인은 cong-global/sbg의 quantized 경로에서 scatteredGEMMOpt 호출 4곳이 NEON에서도 getContextQuantEmb(0)(U8 레이아웃)을 넘기고 있던 점이었고, NEON S8 커널 기대 레이아웃과 불일치해서 결과가 깨졌습니다. 해당 4개 호출을 NEON에서만 reinterpret_cast<const uint8_t*>(getContextQuantEmbS8(0))를 사용하도록 고쳤고, 비-NEON은 기존 U8 경로를 유지했습니다. 적용 커밋: 5f1421c (UI 변경 없음)

bab2min

@copilot 엉뚱한 데를 수정하고 있는거 같은데 방금 커밋은 취소하고 다른 원인을 찾아보는게 필요해.

src/CoNgramModel.cpp

Copilot · 2026-03-28T11:00:26Z

@copilot 엉뚱한 데를 수정하고 있는거 같은데 방금 커밋은 취소하고 다른 원인을 찾아보는게 필요해.

요청하신 대로 방금 커밋 방향은 취소하고 원인을 다시 찾아 수정했습니다. 실제 원인은 cong-global에서 history 쪽 quantized embedding(distantEmbPtr)이 NEON에서도 U8로 저장되던 점이었고, NEON S8 경로와 불일치해 점수가 깨졌습니다. requantizePackedInts(..., toUint8)를 arch != ArchType::neon으로 바꿔 NEON에서는 S8로 저장되게 고쳤습니다. 적용 커밋: 857f73a (UI 변경 없음)

Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/04036891-f852-4efa-be12-41f0608b1ba5 Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

…S8S8 path Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/1d6eb2dc-5d39-49be-bd49-62979013056f Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

…load Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/926e3fce-732d-4ef4-a9a3-0e31189bed2c Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/926e3fce-732d-4ef4-a9a3-0e31189bed2c Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

… S8 kernels Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/bf42e30c-e964-40d6-8203-6c13ca39fb03 Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

…ntext paths Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/30a34f1b-6366-4a6b-9214-d808ddbd18b7 Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/a35f0d17-5ccb-478b-a492-1517e6ebe57d Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

… for cong-global Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/f84195ba-57e0-4451-8722-675f76968ba1 Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

Initial plan

569f8e5

Copilot AI assigned Copilot and bab2min Mar 28, 2026

Copilot started work on behalf of bab2min March 28, 2026 05:37 View session

Copilot AI linked an issue Mar 28, 2026 that may be closed by this pull request

macOS ARM 환경에서 cong 모델 양자화 지원 추가 #248

Closed

Copilot AI changed the title ~~[WIP] Add support for int8 qgemm in macOS ARM environment~~ Add ARM NEON int8 qgemm acceleration for cong model on macOS ARM Mar 28, 2026

Copilot finished work on behalf of bab2min March 28, 2026 06:00

Copilot AI requested a review from bab2min March 28, 2026 06:00

Repository owner deleted a comment from Copilot AI Mar 28, 2026

Copilot started work on behalf of bab2min March 28, 2026 07:27 View session

Copilot AI changed the title ~~Add ARM NEON int8 qgemm acceleration for cong model on macOS ARM~~ Add ARM NEON int8 qgemm acceleration and quantized kernel selection for cong model on macOS ARM Mar 28, 2026

Copilot finished work on behalf of bab2min March 28, 2026 07:35

bab2min marked this pull request as ready for review March 28, 2026 07:36

bab2min requested changes Mar 28, 2026

View reviewed changes

src/CoNgramModel.cpp Outdated Show resolved Hide resolved

src/CoNgramModel.cpp Outdated Show resolved Hide resolved

Copilot started work on behalf of bab2min March 28, 2026 07:40 View session

Copilot finished work on behalf of bab2min March 28, 2026 07:49

Copilot AI requested a review from bab2min March 28, 2026 07:54

Copilot started work on behalf of bab2min March 28, 2026 07:55 View session

Copilot AI changed the title ~~Add ARM NEON int8 qgemm acceleration and quantized kernel selection for cong model on macOS ARM~~ Add ARM NEON int8 qgemm acceleration, quantized kernel selection, and S8 context storage unification for CoNgram on macOS ARM Mar 28, 2026

Copilot finished work on behalf of bab2min March 28, 2026 08:06

Copilot started work on behalf of bab2min March 28, 2026 08:20 View session

Copilot finished work on behalf of bab2min March 28, 2026 08:27

Copilot started work on behalf of bab2min March 28, 2026 09:05 View session

Copilot finished work on behalf of bab2min March 28, 2026 09:14

bab2min reviewed Mar 28, 2026

View reviewed changes

src/CoNgramModel.cpp Outdated Show resolved Hide resolved

Copilot started work on behalf of bab2min March 28, 2026 10:49 View session

Copilot finished work on behalf of bab2min March 28, 2026 11:01

Copilot AI requested a review from bab2min March 28, 2026 11:01

Copilot AI and others added 8 commits March 28, 2026 21:18

Add ARM NEON int8 qgemm support for cong model on macOS ARM

670b75f

Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/04036891-f852-4efa-be12-41f0608b1ba5 Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

Enable quantized NEON selection and switch quantized context gemv to …

5e7caea

…S8S8 path Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/1d6eb2dc-5d39-49be-bd49-62979013056f Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

Limit S8S8 context gemv to NEON and precompute S8 context storage at …

e2f8d96

…load Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/926e3fce-732d-4ef4-a9a3-0e31189bed2c Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

Apply NEON-only S8 context storage and keep x86 quantized path unchanged

08efc19

Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/926e3fce-732d-4ef4-a9a3-0e31189bed2c Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

Use S8 in NEON contextEmbPtr and switch remaining NEON context ops to…

ea86fd1

… S8 kernels Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/bf42e30c-e964-40d6-8203-6c13ca39fb03 Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

Fix NEON S8 score computation to avoid double scaling in quantized co…

ed13bc7

…ntext paths Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/30a34f1b-6366-4a6b-9214-d808ddbd18b7 Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

Fix remaining NEON quantized scatteredGEMM context buffer mismatch

4e73491

Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/a35f0d17-5ccb-478b-a492-1517e6ebe57d Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

Revert scatteredGEMM detour and fix NEON distant quantization storage…

5cdb33d

… for cong-global Agent-Logs-Url: https://github.com/bab2min/Kiwi/sessions/f84195ba-57e0-4451-8722-675f76968ba1 Co-authored-by: bab2min <19266222+bab2min@users.noreply.github.com>

bab2min force-pushed the copilot/add-int8-qgemm-support-macos-arm branch from 857f73a to 5cdb33d Compare March 28, 2026 12:18

bab2min merged commit d9a43a8 into main Mar 28, 2026
9 of 10 checks passed

bab2min deleted the copilot/add-int8-qgemm-support-macos-arm branch March 28, 2026 12:20

Uh oh!

Conversation

Copilot AI commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bab2min commented Mar 28, 2026

Uh oh!

Copilot AI commented Mar 28, 2026

Uh oh!

bab2min left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Mar 28, 2026

Uh oh!

bab2min commented Mar 28, 2026

Uh oh!

Copilot AI commented Mar 28, 2026

Uh oh!

bab2min commented Mar 28, 2026

Uh oh!

Copilot AI commented Mar 28, 2026

Uh oh!

bab2min commented Mar 28, 2026

Uh oh!

Copilot AI commented Mar 28, 2026

Uh oh!

bab2min left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI commented Mar 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 28, 2026 •

edited

Loading