BEHAVIOR Robot Suite (BRS) is a comprehensive framework developed by Fei-Fei Li's team at Stanford University for enabling robots to perform complex household tasks through whole-body manipulation. It integrates a low-cost teleoperation interface, JoyLo, for efficient data collection and a novel imitation learning algorithm, WB-VIMA, for precise whole-body control. BRS focuses on key capabilities such as bimanual coordination, stable navigation, and extensive end-effector reachability, making it highly effective in real-world environments. The framework has demonstrated success in tasks like cleaning, garbage disposal, and clothing organization, showcasing its potential for autonomous robot operation.
Evolving Agents is an open-source framework designed for creating, managing, and evolving AI agents. It enables intelligent agents to communicate, collaborate, and evolve based on semantic understanding and past experiences, making it ideal for solving complex tasks. The framework supports core functionalities like intelligent agent evolution, inter-agent communication, and semantic search. It is widely applicable in fields such as document processing, healthcare, financial analysis, and customer service, enhancing efficiency and effectiveness through collaborative agent workflows.
Chitu is an open-source high-performance large model inference engine developed by Tsinghua University's High-Performance Computing Institute and Qingcheng Jizhi. It addresses the high costs and inefficiencies of large models during the inference phase by offering strong hardware adaptability, supporting multiple NVIDIA GPUs and domestic chips. Chitu significantly reduces GPU usage and increases inference speed, making it ideal for various deployment scenarios from pure CPU to large-scale clusters.
I2V3D is an image-to-video generation framework developed by City University of Hong Kong and Microsoft GenAI. It transforms static images into dynamic videos with precise control over animations and camera movements using 3D geometry guidance. The framework combines the precision of traditional computer graphics pipelines with the visual fidelity of generative AI models. It employs a two-stage process: 3D-guided keyframe generation and video interpolation, ensuring high-quality and controllable video output. I2V3D supports complex 3D animations, allowing users to start animations from any point and generate videos of arbitrary length. It simplifies video production, making it accessible for animation, video editing, and content creation.
Motia is an AI Agent framework tailored for software engineers, streamlining the development, testing, and deployment of AI agents. It supports multiple programming languages such as Python, TypeScript, and Ruby, allowing developers to write agent logic in familiar languages without needing to learn proprietary domain-specific languages. Motia offers zero-infrastructure deployment, enabling one-click deployment of agents without complex configurations.
InfiniteYou (InfU) is an identity-preserving image generation framework developed by ByteDance's Intelligent Creation Team. It leverages Diffusion Transformers (DiTs) like FLUX and incorporates InfuseNet to inject identity features into the diffusion model, ensuring high identity similarity while maintaining robust image generation capabilities. The framework employs multi-stage training strategies, including pre-training and supervised fine-tuning, using synthetic single-person multi-sample (SPMS) data to enhance text-image alignment, image quality, and aesthetic effects. InfiniteYou is compatible with various existing tools and methods, making it a versatile solution for generative AI applications.
OLMo (Open Language Model) is a fully open-source large language model (LLM) framework developed by Allen AI (AI2, Allen Institute for Artificial Intelligence). It provides researchers with comprehensive resources, including data, training code, model weights, and evaluation tools, to foster collaborative advancements in language model science.
FoleyCrafter is an AI video sound effects framework developed by the Shanghai AI Lab and the Chinese University of Hong Kong (Shenzhen). It automatically detects actions in videos and adds appropriate sound effects, such as footsteps, animal sounds, wind, and water sounds. FoleyCrafter can also take user prompts to adjust the sound effects, making video production simpler and more realistic.
ControlNeXt is an advanced AI framework designed for controllable image and video generation, jointly developed by The Chinese University of Hong Kong and SenseTime. It employs lightweight control modules and cross-normalization techniques to reduce computational resources and training complexity while maintaining high-quality and diverse content generation. The framework supports various conditional control signals like human poses and edge maps, and integrates seamlessly with base models and LoRA weights, enabling style transformations without additional training. This enhances the efficiency and flexibility of AI generation models significantly.
PhotoMaker V2, developed by Tencent, is an advanced AI image generation framework that produces highly realistic human photos in seconds. It builds on its predecessor by offering enhanced character consistency and controllability, allowing users to fine-tune results through text instructions. The framework supports integration with tools like ControlNet, T2I-Adapter, IP-Adapter-FaceID, and InstantID, enabling personalized character generation for diverse applications such as gaming, film production, and social media.