UI-TARS Desktop is a GUI Agent application based on the UI-TARS Vision-Language Model. It enables users to control their computers using natural language, integrating seamlessly with browsers, command lines, and file systems. This desktop application leverages multimodal AI to interpret web pages visually and execute tasks with precision.
Instruction | Video |
---|---|
Get the current weather in SF using the web browser | new_mac_action_weather.mp4 |
Send a tweet with the content "hello world" | new_send_twitter_windows.mp4 |
To get started with UI-TARS Desktop, follow the Quick Start Guide.
For deployment instructions, including cloud deployment options, refer to the Deployment Guide.
The UI TARS SDK is a cross-platform toolkit for building GUI automation agents. Learn more about it in the SDK Documentation.
Contributions are welcome! Please review the CONTRIBUTING.md file for guidelines.
UI-TARS Desktop is licensed under the Apache License 2.0.
If you find this project useful, please consider citing our work:
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}
UI-TARS Desktop is developed by Bytedance and is part of the broader Agent TARS project. For more information, visit agent-tars.com.