bytedance_ui_tars_desktop

bytedance_ui_tars_desktop

by bytedance
A GUI agent application that enables natural language control of your computer using a Vision-Language Model.

UI-TARS Desktop: Natural Language Control for GUI Automation

Overview

UI-TARS Desktop is a GUI Agent application based on the UI-TARS Vision-Language Model. It enables users to control their computers using natural language, integrating seamlessly with browsers, command lines, and file systems. This desktop application leverages multimodal AI to interpret web pages visually and execute tasks with precision.

Features

  • Natural Language Control: Powered by the Vision-Language Model for intuitive task execution.
  • Visual Recognition: Screenshot and visual interpretation capabilities for browser operations.
  • Precise Input Control: Accurate mouse and keyboard control for complex tasks.
  • Cross-Platform Support: Compatible with Windows and macOS.
  • Real-Time Feedback: Immediate status updates and task progress tracking.
  • Local Processing: Ensures privacy and security by processing data locally.

Showcases

Instruction Video
Get the current weather in SF using the web browser new_mac_action_weather.mp4
Send a tweet with the content "hello world" new_send_twitter_windows.mp4

Quick Start

To get started with UI-TARS Desktop, follow the Quick Start Guide.

Deployment

For deployment instructions, including cloud deployment options, refer to the Deployment Guide.

SDK (Experimental)

The UI TARS SDK is a cross-platform toolkit for building GUI automation agents. Learn more about it in the SDK Documentation.

Contributing

Contributions are welcome! Please review the CONTRIBUTING.md file for guidelines.

License

UI-TARS Desktop is licensed under the Apache License 2.0.

Citation

If you find this project useful, please consider citing our work:

@article{qin2025ui,
  title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
  journal={arXiv preprint arXiv:2501.12326},
  year={2025}
}

About

UI-TARS Desktop is developed by Bytedance and is part of the broader Agent TARS project. For more information, visit agent-tars.com.

Topics

Resources

Features & Capabilities

Categories
mcp_server model_context_protocol typescript javascript vision_language_model gui_agents cross_platform natural_language_processing electron vite

Implementation Details

Stats

0 Views
9300 GitHub Stars

Repository Info

bytedance Organization

Similar MCP Servers

continuedev_continue by continuedev
25049
21423