Skip to content

Add Llama API Support #1820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

cmodi-meta
Copy link

@cmodi-meta cmodi-meta commented Apr 30, 2025

Features
The changes in the PR are to add Llama API support with the openai spec compat endpoint.

This includes the support for the following models:

  • llama-4-Scout-17B-16E-Instruct-FP8
  • llama-4-Maverick-17B-128E-Instruct-FP8
  • llama-3.3-8B-Instruct
  • llama-3.3-70B-Instruct

I'm unable to run because of the issue here: #1819

I can run hello_world.py test with a llama api configured yaml file. I get the following results:

(metagpt) cmodi@cmodi-mbp examples % python hello_world.py
2025-05-15 12:30:08.588 | INFO     | metagpt.const:get_metagpt_package_root:15 - Package root set to /Users/cmodi/Documents/ai/llama-api/MetaGPT
2025-05-15 12:30:08.588 | INFO     | metagpt.const:get_metagpt_package_root:15 - Package root set to /Users/cmodi/Documents/ai/llama-api/MetaGPT
2025-05-15 12:30:10.782 | INFO     | __main__:ask_and_print:15 - Q: what's your name?
I'm Llama, a model designed by Meta. What’s your name, or should I start guessing?
2025-05-15 12:30:11.627 | INFO     | metagpt.utils.token_counter:count_message_tokens:438 - Warning: model Llama-4-Scout-17B-16E-Instruct-FP8 not found in tiktoken. Using cl100k_base encoding.
2025-05-15 12:30:11.799 | WARNING  | metagpt.provider.openai_api:_calc_usage:278 - usage calculation failed: num_tokens_from_messages() is not implemented for model Llama-4-Scout-17B-16E-Instruct-FP8. See https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken for information on how messages are converted to tokens.
2025-05-15 12:30:11.799 | INFO     | __main__:ask_and_print:19 - A: I'm Llama, a model designed by Meta. What’s your name, or should I start guessing?
2025-05-15 12:30:11.799 | INFO     | __main__:ask_and_print:15 - Q: who are you?
I am a robot.
2025-05-15 12:30:12.449 | INFO     | metagpt.utils.token_counter:count_message_tokens:438 - Warning: model Llama-4-Scout-17B-16E-Instruct-FP8 not found in tiktoken. Using cl100k_base encoding.
2025-05-15 12:30:12.451 | WARNING  | metagpt.provider.openai_api:_calc_usage:278 - usage calculation failed: num_tokens_from_messages() is not implemented for model Llama-4-Scout-17B-16E-Instruct-FP8. See https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken for information on how messages are converted to tokens.
2025-05-15 12:30:12.451 | INFO     | __main__:ask_and_print:19 - A: I am a robot.
2025-05-15 12:30:12.451 | INFO     | __main__:lowlevel_api_example:24 - low level api example
2025-05-15 12:30:13.128 | WARNING  | metagpt.utils.cost_manager:update_cost:49 - Model Llama-4-Scout-17B-16E-Instruct-FP8 not found in TOKEN_COSTS.
2025-05-15 12:30:14.170 | WARNING  | metagpt.utils.cost_manager:update_cost:49 - Model Llama-4-Scout-17B-16E-Instruct-FP8 not found in TOKEN_COSTS.
2025-05-15 12:30:14.171 | INFO     | __main__:lowlevel_api_example:25 - Hi! How are you today? Is there something I can help you with or would you like to chat?
**Python "Hello, World!" Example**
```python
def main():
    """Prints a simple 'Hello, World!' message."""
    print("Hello, World!")

if __name__ == "__main__":
    main()
You can save this to a file (e.g., `hello.py`) and run it using Python (e.g., `python hello.py`). 

Alternatively, you can also use a one-liner:
```python
print("Hello, World!")
2025-05-15 12:30:14.692 | WARNING  | metagpt.utils.cost_manager:update_cost:49 - Model Llama-4-Scout-17B-16E-Instruct-FP8 not found in TOKEN_COSTS.
2025-05-15 12:30:14.693 | INFO     | __main__:lowlevel_api_example:28 - ChatCompletion(id='AowL2Liryun9ghKRRMv2Cp6', choices=[Choice(finish_reason='stop', index=0, logprobs=ChoiceLogprobs(content=None, refusal=None), message=ChatCompletionMessage(content='Here is the count from 1 to 10, split by newlines:\n\n\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10', refusal='', role='assistant', audio=None, function_call=None, tool_calls=None, id='AowL2Liryun9ghKRRMv2Cp6'))], created=1747337414, model='Llama-4-Scout-17B-16E-Instruct-FP8', object='chat.completions', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=36, prompt_tokens=22, total_tokens=58, completion_tokens_details=None, prompt_tokens_details=None))
2025-05-15 12:30:15.209 | WARNING  | metagpt.utils.cost_manager:update_cost:49 - Model Llama-4-Scout-17B-16E-Instruct-FP8 not found in TOKEN_COSTS.
2025-05-15 12:30:15.210 | INFO     | __main__:lowlevel_api_example:29 - Here is the count from 1 to 10, split by newlines:


1
2
3
4
5
6
7
8
9
10
Here is the count from 1 to 10, split by newlines:


1
2
3
4
5
6
7
8
9
10
2025-05-15 12:30:15.980 | INFO     | metagpt.utils.token_counter:count_message_tokens:438 - Warning: model Llama-4-Scout-17B-16E-Instruct-FP8 not found in tiktoken. Using cl100k_base encoding.
2025-05-15 12:30:15.980 | WARNING  | metagpt.provider.openai_api:_calc_usage:278 - usage calculation failed: num_tokens_from_messages() is not implemented for model Llama-4-Scout-17B-16E-Instruct-FP8. See https://cookbook.openai.com/examples/how_to_count_tokens_with_tiktoken for information on how messages are converted to tokens.

Feature Docs
(geekan/MetaGPT-docs#182)

Result

  • TODO

Other

@cmodi-meta
Copy link
Author

Hello MetaGPT team! I'm a Partner Engineer on the GenAI team at Meta.

Llama API was launched yesterday at LlamaCon (article) and it would be great for MetaGPT user's to use Llama API as well! You can get access to Llama API using the waitlist form if you haven't done so already.

Help Required: I've created this PR to support Llama API however I am unable to run this because of issue with initial set-ups (#1819). It would be great to get support on this issue.

Thanks,
Chirag (@cmodi-meta)

cc: @WuhanMonkey

@cmodi-meta cmodi-meta marked this pull request as draft April 30, 2025 23:36
@cmodi-meta cmodi-meta marked this pull request as ready for review May 1, 2025 19:49
@cmodi-meta
Copy link
Author

Adding @better629 (see from previous PRs) if you can help take a look at the issue and PR.

Also is a contributor license agreement needed here?

Thank you!

@cmodi-meta
Copy link
Author

Hello @better629 and @XiangJinyu! A friendly ping here to help with #1819 so we can look into testing the PR? Thank you!

@cmodi-meta
Copy link
Author

cmodi-meta commented May 13, 2025

Added documentation in PR here: geekan/MetaGPT-docs#182 .

cc: @better629 , @XiangJinyu, @geekan - I would really appreciate your review. Thank you!

@cmodi-meta
Copy link
Author

@better629 the unit test is unclear for me. Can you share more details on how to help resolve?

As I put in the PR description, I am able to run hello_world.py now and get appropriate results.

@better629
Copy link
Collaborator

@cmodi-meta hello_world.py is the minimal script to test llm-api. We recommend adding some llm models that can run through metagpt "user_requirements" like metagpt "create a 2048 game", do we have other stronger models here?

@cmodi-meta
Copy link
Author

@cmodi-meta hello_world.py is the minimal script to test llm-api. We recommend adding some llm models that can run through metagpt "user_requirements" like metagpt "create a 2048 game", do we have other stronger models here?

In these cases, I've noticed some issues with json parsing. For a baseline set-up, I've tried to used other models and continue to run into issues (#1819). Can you help on this issue to unblock my set-up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants