feat: add machine tag and inference timings #4577

mintyleaf · 2025-01-11T00:08:13Z

Description
For my P2P endeavors i need to measure inference time and get exactly from which machine the request was sent
Since the #3687 is unfinished and didn't even connected to the routers, as well as it didn't provide enough control in P2P environment i created opt-in extension to the OpenAI response with included timings data in ms (given token count and duration in ms is enough to get the other data like tokens per second which is calculated on grpc-server.cpp glue backend just for printing out the timings)
Also there is header with machine hostname or opt-in custom tag is specified, which is useful when roaming through rent machines on services like vast.ai to get the static identifier for them

I'm still not sure about naming and putting into Reply protobuf packet two more doubles, so i'm waiting for feedback

Signed commits

Yes, I signed my commits.

netlify · 2025-01-11T00:08:30Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`4f9ed3a`
🔍 Latest deploy log	https://app.netlify.com/sites/localai/deploys/678a29c3f91dc10008c7ba47
😎 Deploy Preview	https://deploy-preview-4577--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

mudler · 2025-01-13T16:38:06Z

core/http/endpoints/openai/files.go

@@ -120,6 +122,7 @@ func getFileFromRequest(c *fiber.Ctx) (*schema.File, error) {
 // @Router /v1/files/{file_id} [get]
 func GetFilesEndpoint(cm *config.BackendConfigLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {
 	return func(c *fiber.Ctx) error {
+		c.Set("LocalAI-Machine-Tag", appConfig.MachineTag)


this could be quite sensitive to add to each reply without the user to explicitly set it out. maybe it should be behind a feature flag in the config

yeah, i'll create that opt-in as well

mudler · 2025-01-13T16:38:32Z

core/http/endpoints/openai/files.go

@@ -82,6 +83,7 @@ func getNextFileId() int64 {
 func ListFilesEndpoint(cm *config.BackendConfigLoader, appConfig *config.ApplicationConfig) func(c *fiber.Ctx) error {

 	return func(c *fiber.Ctx) error {
+		c.Set("LocalAI-Machine-Tag", appConfig.MachineTag)


maybe shall we have a middleware instead that adds this to each response?

let me know if you need any help here!

mudler · 2025-01-13T16:39:49Z

Liking that, thanks! addition of two new fields in the reply field look reasonable, my other comments are mostly on adding the machine tag with a middleware to avoid repetitions, and to control that with a flag if returning that information back or not

mintyleaf · 2025-01-17T07:35:44Z

@mudler

I've written a help message for LOCALAI_MACHINE_TAG config env variable
Now if the MachineTag option is not null the http part just uses middleware which is modifying header to that specific tag
The hostname fallback seems unreasonable, since the most time that feature possibly being used - tag shouldn't really depend on varying hostname of rented machine

I have also renamed LocalAI-Machine-Tag to just Machine-Tag header, since the header '-' separated parts can only be just capitalized, and in further use that tag transforms into Localai-Machine-Tag and confuses usage with some header parser boilerplate, which is strange, because afaik the headers are case insensitive

Waiting for your feedback!

core/config/application_config.go

mudler · 2025-01-17T09:30:04Z

core/http/endpoints/openai/chat.go

+				CompletionTokens: tokenUsage.Completion,
+				TotalTokens:      tokenUsage.Prompt + tokenUsage.Completion,
+			}
+			if extraUsage {


any specific reason to put this under a feature flag?

Since the data is already part of the response there is no extra penalty in computation as I can see, and as it's just statistical data it would probably be safe to always return it as part of the response

i thought that since LocalAI is a drop-in replacement for OpenAI API i think it needs to have strictly the same response models, since some parsers can fail due to extra fields

if you think that it's safe - i just remove that feature flag and this just be always included

or that can be made opt-out as well

I see your point. I'm fine to keep it behind a flag, but we would need to update the docs in this case as otherwise would go unnoticed otherwise (as it does not surfaces in the cli help either)

mudler · 2025-01-17T09:30:48Z

Thank you @mintyleaf ! just few questions inline - direction looks good to me!

Also - could you signoff your commits? git rebase origin/master --signoff will do

…> endpoint extraUsage data is broken for now Signed-off-by: mintyleaf <[email protected]>

Signed-off-by: mintyleaf <[email protected]>

mudler · 2025-01-17T16:05:53Z

@mintyleaf I'm merging this as-is, but will have to make sure to adapt the docs for the headers that enables extra stats, maybe care to open a follow-up? Thanks anyways!

github-actions bot added the enhancements label Jan 11, 2025

mudler reviewed Jan 13, 2025

View reviewed changes

mudler changed the title ~~Machine tag and inference timings data feature~~ feat: add machine tag and inference timings Jan 13, 2025

mudler reviewed Jan 17, 2025

View reviewed changes

core/config/application_config.go Show resolved Hide resolved

mudler reviewed Jan 17, 2025

View reviewed changes

mintyleaf added 3 commits January 17, 2025 13:57

Add machine tag option, add extraUsage option, grpc-server -> proto -…

f040aa4

…> endpoint extraUsage data is broken for now Signed-off-by: mintyleaf <[email protected]>

remove redurant timing fields, fix not working timings output

625029c

Signed-off-by: mintyleaf <[email protected]>

use middleware for Machine-Tag only if tag is specified

4f9ed3a

Signed-off-by: mintyleaf <[email protected]>

mintyleaf force-pushed the feature/token_timings branch from 708c566 to 4f9ed3a Compare January 17, 2025 09:58

mudler approved these changes Jan 17, 2025

View reviewed changes

mudler merged commit 96f8ec0 into mudler:master Jan 17, 2025
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add machine tag and inference timings #4577

feat: add machine tag and inference timings #4577

mintyleaf commented Jan 11, 2025 •

edited

Loading

netlify bot commented Jan 11, 2025 •

edited

Loading

mudler Jan 13, 2025

mintyleaf Jan 14, 2025

mudler Jan 13, 2025

mudler Jan 14, 2025

mudler commented Jan 13, 2025 •

edited

Loading

mintyleaf commented Jan 17, 2025

mudler Jan 17, 2025

mintyleaf Jan 17, 2025

mintyleaf Jan 17, 2025 •

edited

Loading

mudler Jan 17, 2025

mudler commented Jan 17, 2025 •

edited

Loading

mudler commented Jan 17, 2025 •

edited

Loading

feat: add machine tag and inference timings #4577

feat: add machine tag and inference timings #4577

Conversation

mintyleaf commented Jan 11, 2025 • edited Loading

netlify bot commented Jan 11, 2025 • edited Loading

✅ Deploy Preview for localai ready!

mudler Jan 13, 2025

Choose a reason for hiding this comment

mintyleaf Jan 14, 2025

Choose a reason for hiding this comment

mudler Jan 13, 2025

Choose a reason for hiding this comment

mudler Jan 14, 2025

Choose a reason for hiding this comment

mudler commented Jan 13, 2025 • edited Loading

mintyleaf commented Jan 17, 2025

mudler Jan 17, 2025

Choose a reason for hiding this comment

mintyleaf Jan 17, 2025

Choose a reason for hiding this comment

mintyleaf Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

mudler Jan 17, 2025

Choose a reason for hiding this comment

mudler commented Jan 17, 2025 • edited Loading

mudler commented Jan 17, 2025 • edited Loading

mintyleaf commented Jan 11, 2025 •

edited

Loading

netlify bot commented Jan 11, 2025 •

edited

Loading

mudler commented Jan 13, 2025 •

edited

Loading

mintyleaf Jan 17, 2025 •

edited

Loading

mudler commented Jan 17, 2025 •

edited

Loading

mudler commented Jan 17, 2025 •

edited

Loading