OmniHuman, developed by ByteDance, is an advanced multimodal framework designed to generate realistic human videos from a single image and motion signals such as audio or video. It employs a hybrid training strategy to overcome data scarcity issues, supporting images of any aspect ratio and excelling in scenarios like singing, conversation, and gesture handling. The framework is compatible with various visual and audio styles, making it versatile for high-quality video content creation.
LivePortrait, developed by Kuaishou, is an open-source framework designed for portrait animation generation. It efficiently transfers expressions and poses from driving videos to static or dynamic portraits, producing expressive videos. The framework uses an implicit keypoint system, leveraging large-scale, high-quality training data and mixed training strategies to enhance generalization and motion control. With a single-frame generation speed of 12.8 milliseconds on an RTX 4090 GPU, LivePortrait is optimized for high efficiency. It supports multi-style portraits, high-resolution animation, and includes alignment and redirection modules for complex scenes. The open-source community has embraced LivePortrait, and its GitHub page offers detailed usage guides and resources.
Tora is an advanced AI video generation framework developed by Alibaba, leveraging Trajectory-guided Diffusion Transformer (DiT) technology. It integrates text, visual, and trajectory conditions to produce high-quality videos that align with real-world physical dynamics. The framework includes components like the Trajectory Extractor, Spatial-Temporal DiT, and Motion-guidance Fuser, enabling precise control over video dynamics and supporting video production of up to 204 frames at 720p resolution. Tora excels in motion fidelity and simulating real-world physical dynamics, making it a powerful tool for video generation.
MindSearch is an open-source AI search framework developed by the Shanghai AI Lab. It leverages the InternLM2.5 7B dialogue model to gather effective information from over 300 web pages in just 3 minutes, completing tasks that would typically take humans 3 hours. Using a multi-agent framework to simulate human thinking, it plans before searching, enhancing the accuracy and completeness of information. The project is fully open-source, allowing users to experience and deploy it locally for free.
Unique3D is an open-source framework developed by Tsinghua University that transforms a single image into a high-fidelity 3D model. It combines multi-view diffusion models and normal diffusion models with an efficient multi-level upsampling strategy to quickly generate 3D meshes with rich textures and high fidelity. The framework also integrates the ISOMER algorithm to ensure geometric and color consistency, achieving superior results compared to other image-to-3D models like InstantMesh, CRM, and OpenLRM.