feat: Nonstreaming API by JNeuvonen · Pull Request #85 · louisgv/local.ai

JNeuvonen · 2023-07-24T07:55:23Z

The implementation uses the same start function inside process.rs for multithreading but just doesn't send server events back to the request sender on every new token but collects the tokens into a string buffer.

Currently, there is no client-side implementation, so merging should not affect client-side at all. Next, we could open an issue for client-side implementation as well.

Here is a request body for quickly testing the API (stream flag is false):

{"sampler":"top-p-top-k","prompt":"AI: Greeting! I am a friendly AI assistant. Feel free to ask me anything.\nHuman: Hello world\nAI: ","max_tokens":200,"temperature":1,"seed":147,"frequency_penalty":0.6,"presence_penalty":0,"top_k":42,"top_p":1,"stop":["AI: ","Human: "],"stream":false}

Issue

…lect tokens to str buf

vercel · 2023-07-24T07:55:26Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
local-ai-web	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 24, 2023 4:14am

louisgv · 2023-07-31T20:36:34Z

@JNeuvonen hey sorry about the slow review on my end, I've been pretty busy with summer chores/errand and also other works xD... Also was investigating #62 and why the upstream llama metal doesn't seem to work on Mac anymore :d..... Will get to this by Wednesday.

Is it ok for me to cook it up a bit if I find something wrong/missing, or would you prefer just comment and you can take care of it? LMK what type of feedback is cool for you :)

JNeuvonen · 2023-08-01T05:01:07Z

Thanks, no problem at all, totally understandable. Comment & let me figure it out would be preferred feedback form, but if it’s very simple changes you can do it as well.

…

On Mon 31. Jul 2023 at 23.36, L ***@***.***> wrote: @JNeuvonen <https://github.com/JNeuvonen> hey sorry about the slow review on my end, I've been pretty busy with summer chores/errand and also other works xD... Also was investigating #62 <#62> and why the upstream llama metal doesn't seem to work on Mac anymore :d..... Will get to this by Wednesday. Is it ok for me to cook it up a bit if I find something wrong/missing, or would you prefer just comment and you can take care of it? LMK what type of feedback is cool for you :) — Reply to this email directly, view it on GitHub <#85 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARW4OHJ6BECN5BFOGVFRKYTXTAJNZANCNFSM6AAAAAA2VF4LWU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

louisgv · 2023-08-02T06:15:42Z

+    });
+
+    HttpResponse::Ok()
+      .append_header(("Content-Type", "text/plain"))


We should return application/json type here instead I think, it helps the client know to do JSON chunk parsing as needed as well based on that header type

I think that makes sense, yes. Will fix those. Thanks for looking at my code.

louisgv · 2023-08-02T06:17:14Z

+      tx: Some(tx),
+    });
+
+    rx.recv().unwrap();


We should match for error and return HTTP error here IMO, otherwise would be hard to triage :d

louisgv · 2023-08-02T06:18:56Z

+    } else {
+      if let Some(tx) = req.tx {
+        //Tell server thread that inference completed, and let it respond
+        let _ = tx.send(());


Do we need that _ or can we just call send here?

louisgv · 2023-08-02T06:20:42Z

    println!("Feeding prompt ...");
-    req.send_event("FEEDING_PROMPT");
+
+    if stream_enabled {


Can we do this check at the trait level instead. This way we can unify the interface call (in this file), and handle the stream/non-stream logic at the trait implementation level instead, would make it much nicer and more cohesive :)

louisgv · 2023-08-02T06:23:12Z

-
  pub model_guard: ModelGuard,
  pub completion_request: CompletionRequest,
+  pub nonstream_completion_tokens: Arc<Mutex<String>>,


I think we can make this private if we use it as a trait state for the non-stream feature. Making it pub would allow others to inspect it while it's writing/locked, which could potentially deadlock the Mutex writer if we're not careful... :d

louisgv · 2023-08-02T06:26:50Z

+  } else {
+    let abort_flag = Arc::new(RwLock::new(false));
+    let completion_tokens = Arc::new(Mutex::new(String::new()));
+    let (tx, rx) = flume::unbounded::<()>();


I wonder if we can make the tokensender generic, so that we can reuse that argument. The token_sender and the tx serve very similar function here, we just need to reconcile the Byte/String type. That'd make for nicer interface I think

louisgv · 2023-08-02T06:28:59Z

-        }),
-      )
-    })
+  if let Some(true) = payload.stream {


This should be payload.0.stream I think, since it's a JSON.

If we can reconcile our trait above, we can infer the stream boolean via the completion_request as well, skipping a couple of lookup hoop!

louisgv

The overall idea is great thus far, added some comment and idea on improvement 👍

louisgv · 2023-08-02T06:37:19Z

+          start(InferenceThreadRequest {
+            model_guard: model_guard.clone(),
+            abort_flag: abort_flag.clone(),
+            token_sender,
+            completion_request: payload.0,
+            nonstream_completion_tokens: str_buffer.clone(),
+            stream: true,
+            tx: None,
+          }),


I have this idea which I think would make this nicer - we can create the InferenceThreadRequest before the isStream check actually, since it's non-blocking state. We can then do

let request = InferenceThreadRequest { model_guard: model_guard.clone(), abort_flag: abort_flag.clone(), token_sender, completion_request: payload.0, nonstream_completion_tokens: str_buffer.clone(), } if request.isStream() {} else {}

And the .isStream is a trait public method we expose via InferenceThreadRequest, which basically return completion_request.stream

I really like your attention to detail and design thinking! I will try to implement this one, I agree, it is indeed cleaner.

louisgv · 2023-08-02T06:47:48Z

@JNeuvonen invited you as repo collaborator

louisgv · 2023-09-16T22:44:04Z

@JNeuvonen lmk if you're still able to update the PR - otherwise I can get on it sometime next week!

JNeuvonen · 2023-09-17T06:59:31Z

Hey, I apologize that I didn't come back earlier. Back when I was working on this, I was on a summer vacation, now I am back on my work schedule, and I have less time & focus. Please feel free to finish the feature.

JNeuvonen added 4 commits July 23, 2023 15:01

Make stream arg public

e19274d

Conditionally serve different request based on stream flag

b7e238a

If stream flag is not passed, prevent streaming server events and col…

ca81d59

…lect tokens to str buf

Format response, add key value pair

715b6da

vercel Bot deployed to Preview July 24, 2023 07:56 View deployment

Remove redundant parentheses

48a0a83

vercel Bot deployed to Preview July 25, 2023 07:09 View deployment

louisgv self-requested a review July 25, 2023 20:42

feat/nonstream_api: rename vars

16d82d9

vercel Bot deployed to Preview July 26, 2023 08:17 View deployment

louisgv changed the title ~~Nonstreaming api~~ feat: Nonstreaming api Aug 2, 2023

louisgv changed the title ~~feat: Nonstreaming api~~ feat: Nonstreaming API Aug 2, 2023

update character

b7befbf

vercel Bot deployed to Preview August 2, 2023 06:08 View deployment

louisgv reviewed Aug 2, 2023

View reviewed changes

louisgv requested changes Aug 2, 2023

View reviewed changes

louisgv reviewed Aug 2, 2023

View reviewed changes

Merge branch 'main' into nonstreaming-api

a89acd2

vercel Bot deployed to Preview September 16, 2023 22:44 View deployment

louisgv assigned JNeuvonen Sep 17, 2023

Merge branch 'main' into nonstreaming-api

f201f6f

vercel Bot deployed to Preview September 17, 2023 02:29 View deployment

Merge branch 'main' into nonstreaming-api

50da5f3

vercel Bot deployed to Preview September 24, 2023 04:14 View deployment

nardint1450 approved these changes Nov 10, 2025

View reviewed changes

Conversation

JNeuvonen commented Jul 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel Bot commented Jul 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

louisgv commented Jul 31, 2023

Uh oh!

JNeuvonen commented Aug 1, 2023 via email

Uh oh!

louisgv Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JNeuvonen Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

louisgv Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

louisgv Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

louisgv Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

louisgv Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

louisgv Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

louisgv Aug 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

louisgv left a comment

Choose a reason for hiding this comment

Uh oh!

louisgv Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

JNeuvonen Aug 2, 2023

Choose a reason for hiding this comment

Uh oh!

louisgv commented Aug 2, 2023

Uh oh!

louisgv commented Sep 16, 2023

Uh oh!

JNeuvonen commented Sep 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JNeuvonen commented Jul 24, 2023 •

edited

Loading

vercel Bot commented Jul 24, 2023 •

edited

Loading

louisgv Aug 2, 2023 •

edited

Loading

louisgv Aug 2, 2023 •

edited

Loading

louisgv Aug 2, 2023 •

edited

Loading