AI Models

AI Models Page 2 of 5

All Models Complete list of AI models and foundation models, sorted by newest first

RMBG-2.0
RMBG-2.0 by BRIA AI
0

RMBG-2.0 is an open-source image background removal model developed by BRIA AI, designed to achieve high-precision separation of foreground and background in images. Leveraging advanced AI technology, it reaches state-of-the-art (SOTA) levels of accuracy, outperforming its predecessor and even well-known paid tools like remove.bg. Trained on over 15,000 high-resolution images, RMBG-2.0 is highly accurate and applicable across various fields such as e-commerce, advertising, and game development.

Image Processing Background Removal Open Source Computer Vision AI Model Deep Learning E-commerce Advertising Game Development High-Precision
vision production Open Source
DeepSeek-VL2
DeepSeek-VL2 by DeepSeek
0

DeepSeek-VL2 is an open-source series of large-scale Mixture-of-Experts (MoE) vision-language models developed by DeepSeek. It significantly improves upon its predecessor, DeepSeek-VL, and excels in tasks such as visual question answering, optical character recognition, document/table/chart understanding, and visual grounding. The model series includes three versions: DeepSeek-VL2-Tiny, DeepSeek-VL2-Small, and DeepSeek-VL2, with 1.0B, 2.8B, and 4.5B activated parameters, respectively. DeepSeek-VL2 supports resolutions up to 1152x1152 and extreme aspect ratios of 1:9 or 9:1, making it versatile for various applications. It also features advanced capabilities like understanding scientific charts and generating Python code from images using the Plot2Code feature.

Vision-Language Model Mixture-of-Experts AI Open Source Computer Vision Natural Language Processing Multimodal AI Document Understanding Code Generation Visual Grounding
multimodal production Open Source
WillowQuantumChip
WillowQuantumChip by Google
0

The Willow Quantum Chip, developed by Google, is a cutting-edge quantum processor featuring 105 physical qubits. It addresses a 30-year challenge in quantum error correction, significantly reducing error rates while increasing qubit count. The chip completes a standard benchmark calculation in less than five minutes, a task that would take the fastest supercomputer 10^25 years. This innovation marks a significant step toward the commercialization of quantum computing, with potential applications in medicine, energy, AI, and more.

Quantum Computing Error Correction Google Quantum Technology AI Hardware Superconducting Processors Quantum Error Correction Computational Efficiency Quantum Applications Quantum Hardware
quantum production
DeepSeek-V2.5-1210
DeepSeek-V2.5-1210 by DeepSeek AI
0

DeepSeek-V2.5-1210 is the final fine-tuned model of DeepSeek V2.5, based on Post-Training iteration, which improves performance in math, programming, writing, and role-playing. It supports online search functionality, providing comprehensive, accurate, and personalized answers on the web, automatically extracting keywords for parallel searches, and delivering diverse results quickly. The model weights are open-sourced on Huggingface for developers and researchers.

AI Model Fine-Tuning Online Search Natural Language Processing Math Problem-Solving Programming Content Creation Role-Playing Open Source Huggingface
language production Open Source
LAM
LAM by Microsoft
0

LAM, or "Large Action Model," is an AI model developed by Microsoft designed to autonomously operate Windows programs. Unlike traditional language models, LAM converts user requests into specific actions, such as launching applications or controlling devices. It is optimized for Microsoft Office and other Windows applications, achieving a 71% task completion success rate in Word, outperforming GPT-4. LAM excels in understanding user intent, generating actions, and dynamically adapting to complex tasks.

AI Windows Automation Microsoft Office Automation Task Execution User Intent Dynamic Planning Smart Home Customer Support
multimodal production
MinMo
MinMo by Alibaba Tongyi Lab
0

MinMo is a multimodal large model developed by the FunAudioLLM team at Alibaba Tongyi Lab, designed to achieve seamless voice interaction. With approximately 8 billion parameters, MinMo is trained on 1.4 million hours of diverse voice data and supports controlling emotion, dialect, and speaking style in generated audio. It enables full-duplex voice interaction with low latency, making multi-turn conversations smoother and more natural.

Voice Interaction Multimodal Model AI Alibaba Speech Recognition Text-to-Speech Full-Duplex Emotion Control Dialect Support Low Latency
multimodal production
Goku
Goku by University of Hong Kong, ByteDance
0

Goku is a cutting-edge video generation model developed by the University of Hong Kong and ByteDance. Built on a rectified flow Transformer framework, it excels in generating high-quality videos and images from text or image inputs. Goku supports multiple modes, including text-to-video, image-to-video, and text-to-image, making it versatile for various creative and commercial applications. It is particularly effective in reducing advertising video production costs by up to 100 times compared to traditional methods. Goku is trained on a massive dataset of 36 million videos and 160 million images, ensuring high-quality outputs. Advanced parallel strategies and fault-tolerant mechanisms further enhance its efficiency and stability.

Video Generation Image Generation Multimodal AI Advertising Content Creation AI Model ByteDance HKU Transformer Rectified Flow
multimodal production Open Source
YAYI-Ultra
YAYI-Ultra by Wenge Research
0

YAYI-Ultra is the flagship enterprise-level large language model developed by Wenge Research. It excels in multi-domain expertise and multimodal content generation, supporting fields such as mathematics, coding, finance, public opinion, traditional Chinese medicine, and security. The model supports inputs of up to 128k tokens, longer context windows, and multimodal capabilities aligned with over 10 million text-image data pairs. It also features multi-turn dialogue role-playing, content security risk control, and the invocation of 10+ intelligent plugins.

Large Language Model Multimodal Content Generation Enterprise AI Natural Language Processing Text Generation Image Generation Data Analysis Task Planning Content Security Intelligent Plugins
multimodal production Open Source
HumanOmni
HumanOmni by HumanMLLM
0

HumanOmni is a multimodal large model designed for human-centric scenarios, integrating visual and auditory modalities. It processes video, audio, or a combination of both to understand human behavior, emotions, and interactions. Pre-trained on over 2.4 million video clips and 14 million instructions, HumanOmni employs a dynamic weight adjustment mechanism to flexibly fuse visual and auditory information. It excels in tasks like emotion recognition, facial description, and speech recognition, making it suitable for applications such as movie analysis, close-up video interpretation, and real-time video understanding.

Multimodal Human-Centric Emotion Recognition Speech Recognition Video Analysis Audio Processing AI Models Behavior Understanding Interaction Analysis Real-Time Processing
multimodal production Open Source
Soundwave
Soundwave by The Chinese University of Hong Kong (Shenzhen)
0

Soundwave is an open-source speech understanding model developed by The Chinese University of Hong Kong (Shenzhen). It specializes in the intelligent alignment and comprehension of speech and text, leveraging innovative alignment and compression adapter technologies to bridge the representation gap between speech and text. This enables efficient speech feature compression and enhanced performance in various speech-related tasks.

Speech Understanding Text Alignment Speech Translation Speech Q&A Emotion Recognition Multimodal Interaction Open Source AI Model Speech Processing Language Learning
multimodal production Open Source