Seed-VC is a zero-shot voice conversion technology based on contextual learning, achieving high-quality audio output and timbre similarity. Users do not need to perform specific training; they only need to provide a 1 to 30-second reference voice sample to achieve voice cloning and conversion. This technology is particularly suitable for voice conversion research, entertainment, media production, and speech synthesis. Seed-VC supports zero-shot singing voice conversion, transforming speech into singing while maintaining the original voice's timbre characteristics. Seed-VC provides command-line tools and a Gradio web interface, making it easy for users to perform voice conversions.
Animate Anyone is an open-source framework developed by Alibaba's Intelligent Computing Research Institute that transforms static images of characters or people into dynamic animations. Built on a diffusion model, it incorporates technologies like ReferenceNet, Pose Guider, and temporal generation modules to ensure consistency, controllability, and stability in the output videos. The framework has gained nearly 13,000 stars on GitHub and has sparked widespread discussion both domestically and internationally. Alibaba's AI chatbot, Tongyi Qianwen, features a "Tongyi Dance King" function based on this technology, enabling characters in photos to perform dances like "Subject 3," "Shoulder Shake," and "Shuffle."
Agent TARS is an open-source multimodal AI agent developed by ByteDance. It visually interprets web content and seamlessly integrates with browsers, command lines, and file systems to plan and execute complex tasks. The tool offers a desktop client that showcases multimodal elements and conversational workflows, making it a powerful solution for AI-assisted task execution and research. Currently in technical preview, it supports macOS and is designed to optimize development processes through intelligent agent-driven workflows.
Bolt.new is an AI-powered full-stack web programming tool that simplifies web development by automatically writing, running, editing, and deploying applications directly in the browser. Leveraging WebContainers technology, it runs a full Node.js environment without requiring local installation or configuration. Users can generate code through simple prompts, test it immediately in the browser, and deploy it with one click to cloud services like Netlify. Bolt.new also features automatic error detection and repair, making it accessible even to non-technical users.
EchoMimicV2, developed by Alibaba's Ant Group, is an advanced digital human project designed to create high-quality animation videos. It utilizes reference images, audio clips, and hand pose sequences to generate synchronized upper body movements. Building on its predecessor, EchoMimicV1, which focused on head animations, EchoMimicV2 extends its capabilities to full upper body animations, supporting both Chinese and English speech. The project employs innovative techniques like Audio-Pose Dynamic Coordination, Head Partial Attention, and Phase-specific Denoising Loss to enhance animation quality and reduce redundancy.
XingliuAI is a comprehensive AI image generation platform developed by LiblibAI, leveraging the self-developed Star-3 Alpha general image generation model. It integrates the world's largest LoRA enhancement model library and advanced AI image control technologies. Designed to enhance productivity for designers, photographers, and visual creators, XingliuAI offers features like high-precision image generation, intelligent recommendations, color control, regional redrawing, intelligent image expansion, and detail restoration. It supports various applications, including e-commerce, advertising, and artistic creation, providing diverse styles and exceptional aesthetic quality.
Buzz is an offline speech-to-text tool built on OpenAI's Whisper model, designed for Windows, macOS, and Linux systems. It converts microphone input or audio/video files into text in real-time, supporting multiple formats like TXT, SRT, and VTT. Buzz offers fast conversion speeds, high accuracy, multi-language recognition, and the ability to translate results into English, all while operating offline to ensure user privacy.
Deep-Live-Cam is an open-source AI tool that enables real-time face swapping in videos using just one image. It supports multiple hardware platforms including CPU, NVIDIA CUDA, Apple Silicon, and Core ML to ensure smooth video processing. The software includes anti-abuse mechanisms, adheres to legal and ethical standards, and reminds users to obtain consent from the person whose face is being swapped.
EchoMimic is an open-source AI digital human project launched by Alibaba's Ant Group, designed to bring static images to life with voice and expressions. By combining deep learning models with audio and facial landmarks, it creates highly realistic dynamic portrait videos. It supports generating videos using either audio or facial features alone, or combining both for more natural and smooth lip-syncing effects. EchoMimic is multilingual, supporting both Chinese and English, and is suitable for various scenarios such as singing, bringing revolutionary advancements to digital human technology, widely used in entertainment, education, and virtual reality fields.
Pi (Presentation Intelligence) is an AI-native platform designed to streamline the creation and sharing of presentations. It supports various content generation methods, including one-sentence generation, file import, and URL import. The platform features an AI-native editor for intelligent editing and dynamic layout, ensuring multi-terminal adaptation. Ideal for business presentations, education, and training, Pi helps users create professional-level presentations with ease.
An AI-powered video and audio editing platform that makes content creation as easy as editing a document.
A user-friendly AI chatbot that can help with writing, analysis, answering questions, and creative tasks.
The industry-standard image editing software now enhanced with powerful AI features for generative fill, neural filters, and creative editing.
An AI writing assistant integrated into Notion's workspace, helping users write, edit, and organize content more efficiently.
An AI content creation platform that helps create marketing copy, blog posts, social media content, and more.
An AI-powered writing assistant that helps improve grammar, style, tone, and clarity in real-time.
A user-friendly video editing app with AI-powered features for automatic editing, effects, and content creation.
A design platform with powerful AI features for creating graphics, presentations, and marketing materials with ease.
An AI meeting assistant that provides real-time transcription, summarization, and collaboration features for meetings and conversations.
An AI-powered assistant integrated with Microsoft 365 apps, helping users create content, analyze data, and boost productivity.