Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chrome extension] wrong cursor position #350

Open
songkeys opened this issue Feb 5, 2025 · 7 comments
Open

[chrome extension] wrong cursor position #350

songkeys opened this issue Feb 5, 2025 · 7 comments
Assignees

Comments

@songkeys
Copy link

songkeys commented Feb 5, 2025

when asking "help me search today's news on google" on action, it keeps clicking on wrong position of webpage.

llm output:

. The Google homepage is displayed, with the search bar prominently located in the center of the screen.\n2. To proceed with searching for today's news, I need to click on the search bar to activate it and bring up the keyboard for text input.\n3. Clicking on the search bar will allow me to type in my query and initiate the search process.\nAction: click(start_box='<|box_start|>(459,978)<|box_end|>')

Image

report: https://raw.githubusercontent.com/songkeys/oss/refs/heads/master/mac/midscene_report.html

@yuyutaotao
Copy link
Collaborator

Hi @songkeys ,which version of vlm-ui-tars are you using ?

@songkeys
Copy link
Author

songkeys commented Feb 5, 2025

@yuyutaotao
Copy link
Collaborator

According to the README of official UI-TARS model team, the GGUF version is no longer recommended. You may try other versions.

https://github.com/bytedance/UI-TARS?tab=readme-ov-file#%EF%B8%8F-important-announcement-gguf-model-performance

@songkeys
Copy link
Author

songkeys commented Feb 5, 2025

when i tried searching UI TARS on lm-studio, it's all GGUF version. what alternative version would you recommend? could you share a huggingface repo? many thanks!

Image

@yuyutaotao
Copy link
Collaborator

This is the official hugging face repo of ui tars: https://huggingface.co/bytedance-research/UI-TARS-7B-DPO

@songkeys
Copy link
Author

songkeys commented Feb 5, 2025

thank you. i don't know if this could be solved by running the model you provided. on my macbook (m1), lm-studio and ollama only support GGUFs model. i have to run it with vllm but it's extremely slow and frozed my computer. not a single request was responsed. i'll leave this until i find a way to run it.

@yuyutaotao
Copy link
Collaborator

@songkeys you can try deploying it on cloud. M1 macbook can not handle the 7b model well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants