DeepSeek-R1 is a high-performance AI reasoning model developed by Hangzhou DeepSeek Corporation. It is designed to match the capabilities of OpenAI's o1 official version, excelling in tasks such as math, coding, and natural language reasoning. The model leverages large-scale reinforcement learning techniques, achieving exceptional performance with minimal labeled data. DeepSeek-R1 is open-sourced under the MIT License and supports model distillation for training other models.
Wan2.1 is an open-source AI video generation model developed by Alibaba Cloud, featuring robust visual generation capabilities. It supports text-to-video and image-to-video tasks and includes two model sizes: a 14B-parameter professional version excelling in complex motion generation and physical modeling, and a 1.3B-parameter speed version that runs on consumer-grade GPUs with low VRAM requirements, suitable for secondary development and academic research. Wan2.1 is based on a causal 3D VAE and video Diffusion Transformer architecture, enabling efficient spatiotemporal compression and long-term dependency modeling. The 14B version outperforms models like Sora, Luma, and Pika with a score of 86.22% on the Vbench evaluation, securing the top position. It is open-sourced under the Apache 2.0 license and supports multiple mainstream frameworks, available on GitHub, HuggingFace, and the ModelScope community for easy deployment.
Grok-1 is a large language model developed by xAI, an AI startup under Elon Musk. It is a Mixture of Experts (MoE) model with 314 billion parameters, making it the largest open-source language model available. The development and training of Grok-1 follow open-source principles, with its weights and network architecture publicly available under the Apache 2.0 license, allowing free use, modification, and distribution for both personal and commercial purposes.
Loopy is an advanced audio-driven AI video generation model developed by ByteDance. It animates static photos by synchronizing facial expressions and head movements with provided audio files, creating realistic dynamic videos. Built on diffusion model technology, Loopy captures long-term motion information without requiring additional spatial signals, making it versatile for applications in entertainment, education, and more.
CosyVoice 2.0 is an upgraded speech generation model developed by Alibaba's Tongyi Lab. It improves codebook utilization with limited scalar quantization, simplifies the text-to-speech architecture, and introduces a block-aware causal flow matching model to support diverse synthesis scenarios. The model significantly enhances pronunciation accuracy, timbre consistency, rhythm, and audio quality, with a MOS score increase from 5.4 to 5.53. It supports streaming inference, reducing the first-packet synthesis latency to 150ms, making it suitable for real-time speech synthesis applications.
CogVideoX is an open-source AI video generation model developed by Zhipu AI. It allows users to generate 6-second videos from English text prompts, with a resolution of 720*480 and 8 frames per second. The model requires 7.8-26GB of VRAM for inference and includes features like 3D Causal VAE for video reconstruction. It also provides tools such as CLI/WEB Demo, online experience, API interface examples, and fine-tuning guides.
ChatTTS is an open-source text-to-speech (TTS) model optimized for dialogue scenarios, supporting both Chinese and English. Trained on approximately 100,000 hours of data, it produces high-quality, natural-sounding speech. The model offers fine-grained control over prosodic features like laughter and pauses, supports multiple speakers, and is ideal for conversational tasks. It surpasses most open-source TTS models in fluidity and naturalness.
An open-source text-to-image model capable of generating detailed images from text descriptions, with a strong community and multiple deployment options.
Meta's open-source large language model family, offering strong performance across various tasks with different model sizes.
GPT-4 is OpenAI's most advanced large language model, demonstrating human-level performance on various academic and professional tests.