Autogen + local LLM = Messy conversations. #471

macintoxic · 2023-10-29T06:00:09Z

macintoxic
Oct 29, 2023

Hello everyone!

I hope you're all doing well. Like many others, I've been exploring Autogen. However, due to cost considerations, I've opted for LM Studio as a substitute for ChatGPT.

I've tested various models, and in all of them, I'm encountering an issue with the conversation using GroupChat and GroupChatManager. The agents seem to struggle to finish their sentences, and there's an overlap where one agent's speech merges into another's. This issue is also evident in the video shared by another user here: https://youtu.be/5f7MQDSNxmk?t=736 (around the 12-minute mark).

I'm facing a similar issue, as depicted in the image below:

Here's the code I'm using in gist, or in the end of this message:

https://gist.github.com/macintoxic/064f478b312e516b24dcffc9f2c3f5ce

I've been troubleshooting this problem for a while now but haven't found a solution. If anyone has encountered and resolved a similar issue or has insights into optimizing the conversation flow with Autogen and LM Studio, I would greatly appreciate your assistance.

Interestingly, when testing with the official OpenAI API, everything works flawlessly. However, when using a local LLM, the problem persists.
Thank you all in advance for your time and expertise!

from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager, ChatCompletion, retrieve_utils

### Default data and functions:

def write_file(file_name, file_content):
    with open(file_name, "w") as file1:       
        file1.writelines(file_content)


function_map_definition = []

function_map_definition.append({
            'name' : 'write_file',
            'description' : 'write a file to disk',
            'parameters' : {
                'type' : 'object',
                'properties' :{
                    'file_name' : {
                        'type':'string',
                        'description' : 'A valid file name'
                    },
                    'file_content' : {
                        'type':'string',
                        'description' : 'The content of the file.'
                    }
                }
            },
            'required' : ['file_name', 'file_content']
        })



config_list = [{
        "api_type": "open_ai",
        "api_key": "NULL",
        'model' : 'gpt-3.5-turbo',
        "api_base" : "http://localhost:1234/v1",
        'functions' : function_map_definition ,
    
        
    }]

#"api_base" : "https://api.openai.com/v1"
#"api_base" : "http://localhost:1234/v1"

llm_config = {
    "request_timeout" : 9600,
    "seed": 42,
    "config_list" : config_list,
    "temperature" : 0.1,
    "max_tokens": 4096,
    
}

ChatCompletion.start_logging()


user_proxy = UserProxyAgent(
   name="Admin",
   system_message="""A human admin. 
        Interact with the planner to discuss the plan. Plan execution needs to be approved by this admin.        
        Reply TERMINATE if the task has been solved at full satisfaction otherwise CONTINUE, or reply why the reason the task is not solved yet. 
   """, 
   code_execution_config=False,
   human_input_mode="TERMINATE",
   llm_config=llm_config,
    

)

engineer = AssistantAgent(
    name="Engineer",
    llm_config=llm_config,
    system_message='''Engineer. You follow an approved plan. You write python/shell or csharp code to solve tasks. 
    Wrap the code in a code block that specifies the script type. 
    The user can't modify your code. So do not suggest incomplete code which requires others to modify. 
    Don't use a code block if it's not intended to be executed by the executor. 
    Don't include multiple code blocks in one response. 
    Do not ask others to copy and paste the result. 
    If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. 
    If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, 
    collect additional info you need, and think of a different approach to try.  

''',

max_consecutive_auto_reply=2,
code_execution_config={"work_dir": "coding"},
)

planner = AssistantAgent(
    name="Planner",
    system_message='''Planner. Suggest a plan. Break down the task in smaller steps. Revise the plan based on feedback from admin and critic, until admin approval.
The plan may involve an engineer who can write code. 
Explain the plan first. Be clear which step is performed by an engineer. Do not write code. ask for an engineer to do it.
''',
    llm_config=llm_config,
    max_consecutive_auto_reply=5,
)
planner.register_function( function_map={"write_file": write_file})


executor = UserProxyAgent(
    name="Executor",
    system_message="Executor. Execute the code written by the engineers and report the result. When you receive a csharp or sql file, dont execute-it. Just write it down. ",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=2,
    llm_config=llm_config,
    code_execution_config={"work_dir": "coding"},
)

sr_python = AssistantAgent(
    name='sr_python_engineer',
    system_message='''python engineer. You follow an approved plan. You write python/shell code to solve tasks. 
    Wrap the code in a code block that specifies the script type. 
    The user can't modify your code. So do not suggest incomplete code which requires others to modify. 
    Don't use a code block if it's not intended to be executed by the executor. 
    Don't include multiple code blocks in one response. 
    Do not ask others to copy and paste the result. Check the execution result returned by the executor.
    If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. 
    If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, 
    collect additional info you need, and think of a different approach to try.  

''',    
    llm_config=llm_config,
    code_execution_config={"work_dir": "coding"},
)

sr_python.register_function( function_map={"write_file": write_file})

sr_dot_net = AssistantAgent(
    name='csharp_engineer',
    system_message='''Csharp Engineer. You follow an approved plan. You write csharp code to solve tasks. 
    Wrap the code in a code block that specifies the script type. 
    The user can't modify your code. So do not suggest incomplete code which requires others to modify. 
    Don't use a code block if it's not intended to be executed by the executor. 
    Don't include multiple code blocks in one response. 
    Do not ask others to copy and paste the result.     
    After each file you generate, write it down in the disk using the write_file method.

''',
    #is_termination_msg=lambda x : x.get("content","").rstrip().endswith("TERMINATE"),
    llm_config=llm_config,
    code_execution_config={"work_dir": "coding"},
    max_consecutive_auto_reply=2
)

sr_dot_net.register_function( function_map={"write_file": write_file})

sr_sql = AssistantAgent(
    name='sql_engineer',
    system_message='''Sql Engineer. You follow an approved plan. sql code to solve tasks. 
    Wrap the code in a code block that specifies the script type. 
    The user can't modify your code. So do not suggest incomplete code which requires others to modify. 
    Don't use a code block if it's not intended to be executed by the executor. 
    Don't include multiple code blocks in one response. 
    Do not ask others to copy and paste the result.
    Unless specified, your code is for postgress database.
    After each file you generate, write it down in the disk using the write_file method.

''',
    
    llm_config=llm_config,     
)
sr_sql.register_function( function_map={"write_file": write_file})

critic = AssistantAgent(
    name="Critic",
    system_message="Critic. Double check plan, claims, code from other agents and provide feedback. Check whether the plan includes adding verifiable info such as source URL.",
    llm_config=llm_config,    
)





groupchat = GroupChat(agents=[user_proxy, planner, sr_dot_net, sr_sql, critic], messages=[], max_round=30,)
manager = GroupChatManager(groupchat=groupchat, llm_config=llm_config)


task = """
Given the following class follow these steps:
Write an csharp controller for the crud methods.
Write an csharp service for the crud methods.
Write an csharp repository for the crud methods using entity framework.
Write unity tests for aiming to 100% code coverage using xunit, moq and bogus.
Write the SQL command to create the table in a postgres database.
The default namespace is BuscaCep.

The class:


//Cep.cs
public class Cep
{
    [Key]
    //max lenght 8
    public string ZipCode { get ; set; }= null!;
    public string TipoLogradouro { get; set; }= null!;
    public string Logradouro { get; set; }= null!;
    public string Complemento { get; set; }= null!;
    public string Local { get; set; }= null!;
    public string Bairro { get; set; }= null!;
    public string Cidade { get; set; }= null!;
    public string CodCidade { get; set; }= null!;
    public string Uf { get; set; }= null!;
    public string Estado { get; set; }= null!;
    public string CodEstado { get; set; }= null!;
}

Planner, break down this plan to best execution. You have a Csharp engineer, a sql engineer and a critic that verifies what was done.
Please use the function write_file to write the files on disk and wait for the planner end his planning to continue.
"""

def write_file(file_name, file_content):
    with open(file_name, "w") as file1:
        file1.writelines(file_content)

try:  

    #proxy_agent.initiate_chat(assistant, message=task3)
    user_proxy.initiate_chat(
        manager,
        message=task,
        clear_history=True,            

    )
except Exception as ex:
    print(50* '*') 
    print(ex)

Answered by rickyloynd-microsoft

Oct 29, 2023

Hi @macintoxic. Are you using gpt-3.5-turbo for all these experiments? If so, the difference must be in LMStudio or how it's being called. I'm not familiar with LMStudio myself, but there are discussions about it on our Discord channel, like this one.

View full answer

rickyloynd-microsoft · 2023-10-29T14:15:45Z

rickyloynd-microsoft
Oct 29, 2023
Collaborator

Hi @macintoxic. Are you using gpt-3.5-turbo for all these experiments? If so, the difference must be in LMStudio or how it's being called. I'm not familiar with LMStudio myself, but there are discussions about it on our Discord channel, like this one.

1 reply

macintoxic Oct 29, 2023
Author

Thank you Rick. I will check the Discord.

XNDR007 · 2023-11-01T11:53:01Z

XNDR007
Nov 1, 2023

I am facing the same issue and the only explanation I have found is that LM Studio is "pausing" at 199 token and then continues with the rest. AutoGen will then add the rest of the text coming from LM Studio to the next agent.

I nevertheless haven't found a workaround to this.

1 reply

kfsone Nov 1, 2023

Check how you are configuring the LM studio models, if you are enabling GPU support check your GPU vs CPU thread counts in the server configuration. Also, I noticed that today's 0.2.8 release has some pertinent bugfix notes and changes to how the prompts are handled between different models.

You may also need to pay closer attention to the attributes of the models you're using and ensure your LM studio and agent settings match accordingly (token limits, eval batch size, frequency etc).

kfsone · 2023-11-01T20:46:33Z

kfsone
Nov 1, 2023

system_message='''Engineer. You follow an approved plan. You write python/shell or csharp code to solve tasks. 
Wrap the code in a code block that specifies the script type. 
The user can't modify your code. So do not suggest incomplete code which requires others to modify. 
Don't use a code block if it's not intended to be executed by the executor. 
Don't include multiple code blocks in one response. 
Do not ask others to copy and paste the result. 
If the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. 
If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, 
collect additional info you need, and think of a different approach to try.

Ok - reality check here: The LM tools available today are nothing more than simple pattern-based predictive algorithms. The patterns they were trained on were sourced from sites like Stack Overflow, Quora etc.

So stop yourself here a moment and do two things.

1- Read the text again as though you were encountering this on stack overflow or your personal github repository from another user. Is this a question that you'd answer? If you were searching for an answer, would this be a question you'd expect to have high quality answers? How do you anticipate s/o or reddit posters would respond to this wording, because that's what will have determined how it affects the patterns that emerge in the prediction.

2- The paradigm shifting component in this tech is 'attention' and you are absolutely wasting it on negatives here. Refer back to the first item, we're basically back into the whole 'prompt crafting' business again where the goal is to find the correct combinations of seed and prelude pattern to guide the algorithm towards the right subset of training documents that will provide the reasoning and knowledge required to answer your questions.

In ChatDev, for example, a lot of the coder prompts list things the coder should not do such as write methods consisting of nothing but 'pass', and should those elements end up spread across attention boundaries, 'should' and 'should not' are fairly weak to start with, so 1000 tokens later what attention sees amounts to 'methods ... pass'. Guess what ChatDev likes to do when you aren't asking it to verbatim recreate an example it was explicitly trained on?

1 reply

macintoxic Nov 2, 2023
Author

Thank you for the tips

oneCodeScholar · 2023-11-05T16:09:54Z

oneCodeScholar
Nov 5, 2023

I've got it working really well with 2 local LLM as I've been at it straight for 3 weeks trying to get it to work. I pieced parts together from 3 different sources and took lots of trial and error. The models and the chat conversation all have to be correct for it to work. Try a different model and see if you get better results.

3 replies

BrianPotter Nov 8, 2023

Can you post your code? A video? More information?

SoheylM Nov 8, 2023

I have tried vLLM, litellm, fastchat, fastchat with vLLM backend, LM Studio as OpenAI drop-in proxy servers. I permutated models such as Mistral-7B-Instruct, Vicuna-13B-v1.5(16K and normal), CodeLlama. The Group Chat Manager always fails.

I observed that it often badly fails where it cannot select the proper agent but rather outputs the solution, which is then tagged to be the speech of the first agent (e.g. Engineer) you defined in the list. However, if I permute the list (clearing the .cache folder of course), the new first agent in the list will answer exactly the same thing and the following answers will be the same as well, regardless of the agent. I think the Group chat manager is always answering, pretending to be a different agent.

This may be due to a failure in passing the proper prompt chat template (string formatting with special tokens) associated to a local llm when an agent sends its own prompt to the LLM backend. I don't see another reason for the local LLM to fail so badly with the Group Chat Manager when they are doing okay on direct prompt requests.

Of all the tests, only a full fastchat proxy server with Mistral-7b-Instruct managed to start with a proper agent selection before failing into an infinite loop.

chymian Nov 24, 2023

@oneCodeScholar can you please give some directions about your setup?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autogen + local LLM = Messy conversations. #471

{{title}}

Replies: 4 comments 6 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Autogen + local LLM = Messy conversations. #471

Replies: 4 comments · 6 replies

rickyloynd-microsoft Oct 29, 2023 Collaborator

macintoxic Oct 29, 2023 Author

macintoxic Nov 2, 2023 Author

Replies: 4 comments 6 replies

rickyloynd-microsoft
Oct 29, 2023
Collaborator

macintoxic Oct 29, 2023
Author

macintoxic Nov 2, 2023
Author