Code for the TMLR paper: "Do Think Tags Really Help LLMs Plan? A Critical Evaluation of ReAct-Style Prompting"
Setup :
- Make sure VSCode has devcontainer extension installed.
- You have docker that is already setup (you can run
docker ps,docker images) easily.
Running :
- Clone the repository :
git clone https://github.com/sbhambr1/react_brittleness - Run the devcontainer : VSCode should give a popup to run the code within a devcontainer. If not, then do Cmd + Shift + P to open VSCode command pallete and search for
Rebuild Containerwhich should start the devcontainer. - Specify
OPENAI_API_KEY,ANTHROPIC_API_KEYas environment variable.
Running Webshop
- In the devcontainer use docker image :
famishedrover/taxonomy_llm:webshop - Run the webshop by running.
source /webvenv/bin/activate
cd /webshop/
./run_dev.sh -
Open the webpage. VSCode should prompt you, otherwise Flask will also log a message that the website is accessible on link like :
172.0.0.6:3000(Use the link mentioned in the message!) -
Run OpenAI code using native python (not webvenv)
pip install openai anthropic ratelimit alfworldgit clone https://github.com/sbhambr1/react_brittleness
conda create -n react_test python=3.9
conda activate react_test
pip install -r requirements.txtmkdir datapython runners/react_alfworld.pyRun patchfix.sh for each container. It updates the /webshop/web_agent_site/utils.py to use the larger dataset and downloads it using webvenv virtual environment present in the container.