AI Models

Trending Models Most popular AI models and foundation models

GPT-4
GPT-4 by OpenAI
50000

GPT-4 is OpenAI's most advanced large language model, demonstrating human-level performance on various academic and professional tests.

LLM NLP AI ChatGPT OpenAI
language production
1000036 views
LLaMA 2
LLaMA 2 by Meta
45000

Meta's open-source large language model family, offering strong performance across various tasks with different model sizes.

LLM Open Source Meta AI Foundation Model
language production Open Source
700041 views
Stable Diffusion
Stable Diffusion by Stability AI
40000

An open-source text-to-image model capable of generating detailed images from text descriptions, with a strong community and multiple deployment options.

AI Art Text-to-Image Open Source Local Deployment
vision production Open Source
600035 views
Claude 3
Claude 3 by Anthropic
30000

Anthropic's most capable AI model, featuring enhanced reasoning, analysis, and creative capabilities with improved accuracy and safety.

LLM AI Constitutional AI Anthropic
language production
500033 views
PaLM 2
PaLM 2 by Google
30000

Google's advanced language model optimized for reasoning, coding, and multilingual tasks with strong capabilities across various domains.

LLM Google AI Multilingual Reasoning
language production
450033 views
Gemini
Gemini by Google
25000

Google's most capable and flexible AI model, designed to be multimodal from the ground up with superior reasoning capabilities.

LLM Multimodal Google AI Vision
multimodal production
400047 views
DALL-E 3
DALL-E 3 by OpenAI
20000

OpenAI's advanced text-to-image generation model capable of creating highly detailed and accurate images from natural language descriptions.

AI Art Text-to-Image OpenAI Image Generation
vision production
300034 views
Mistral AI
Mistral AI by Mistral AI
20000

A family of powerful open-source language models known for their efficiency and strong performance across various tasks.

LLM Open Source Efficient AI French Tech
language production Open Source
250027 views
Whisper
Whisper by OpenAI
15000

OpenAI's advanced speech recognition system capable of transcribing and translating multiple languages with high accuracy.

ASR Speech-to-Text OpenAI Open Source
audio production Open Source
200034 views
Ola
Ola by Tsinghua University, Tencent Hunyuan Research Team, NUS S-Lab
0

Ola is a full-modal language model developed by Tsinghua University, Tencent Hunyuan Research Team, and NUS S-Lab. It employs a progressive modal alignment strategy to gradually expand the modalities supported by the language model, starting with images and text, and then introducing audio and video data. Ola's architecture supports full-modal inputs, including text, images, video, and audio, and can process these inputs simultaneously. It also features a sentence-by-sentence decoding scheme for streaming speech generation, enhancing interactive experiences.

Multimodal AI Language Model Speech Generation Text Processing Image Understanding Audio Processing Video Understanding AI Research Natural Language Processing Progressive Modal Alignment
multimodal experimental Open Source
86 views

All Models Complete list of AI models and foundation models, sorted by newest first

Ola
Ola by Tsinghua University, Tencent Hunyuan Research Team, NUS S-Lab
0

Ola is a full-modal language model developed by Tsinghua University, Tencent Hunyuan Research Team, and NUS S-Lab. It employs a progressive modal alignment strategy to gradually expand the modalities supported by the language model, starting with images and text, and then introducing audio and video data. Ola's architecture supports full-modal inputs, including text, images, video, and audio, and can process these inputs simultaneously. It also features a sentence-by-sentence decoding scheme for streaming speech generation, enhancing interactive experiences.

Multimodal AI Language Model Speech Generation Text Processing Image Understanding Audio Processing Video Understanding AI Research Natural Language Processing Progressive Modal Alignment
multimodal experimental Open Source
86 views
Zonos
Zonos by Zyphra
0

Zonos is a high-fidelity text-to-speech (TTS) model developed by Zyphra. It includes two models: a 1.6 billion parameter Transformer model and an SSM hybrid model, both open-sourced under the Apache 2.0 license. Zonos generates natural and expressive speech based on text prompts and speaker embeddings, supporting voice cloning and adjustable parameters such as speed, pitch, and emotion. The output sampling rate is 44kHz. The model is trained on approximately 200,000 hours of multilingual speech data, primarily supporting English with limited support for other languages. Zonos provides an optimized inference engine for fast speech generation, making it suitable for real-time applications.

Text-to-Speech Voice Cloning Multilingual Speech Synthesis AI Models Natural Language Processing Real-Time Applications Open Source High-Fidelity Audio Developer Tools
Text-to-Speech beta Open Source
105 views
Step-Video-T2V
Step-Video-T2V by Leapfrogging Star
0

Step-Video-T2V is an open-source text-to-video model developed by Leapfrogging Star, featuring 30 billion parameters and capable of generating high-quality videos up to 204 frames long. The model uses a deeply compressed variational autoencoder (Video-VAE) for efficient training and inference, supporting bilingual text inputs in Chinese and English. It employs a diffusion-based Transformer (DiT) architecture with a 3D full-attention mechanism, optimized for generating videos with strong motion dynamics and high aesthetic quality.

Text-to-Video AI Model Open Source Video Generation Bilingual Support Deep Learning Diffusion Models Transformer Architecture Video Content Creation High-Quality Video
multimodal production Open Source
68 views
HealthGPT
HealthGPT by Zhejiang University, University of Electronic Science and Technology, Alibaba
0

HealthGPT is an advanced medical visual language model (Med-LVLM) developed by Zhejiang University, University of Electronic Science and Technology, and Alibaba. It integrates visual comprehension and generation tasks using Heterogeneous Low-Rank Adaptation (H-LoRA) technology, enabling efficient medical image analysis, diagnostic assistance, and text generation. The model offers two versions: HealthGPT-M3 (3.8 billion parameters) and HealthGPT-L14 (14 billion parameters), optimized for different performance and resource needs. HealthGPT employs Hierarchical Visual Perception (HVP) and a Three-Stage Learning Strategy (TLS) to enhance visual feature learning and task adaptation.

Medical AI Visual Language Model HealthTech Medical Imaging AI in Healthcare Multimodal AI Diagnostic Assistance Medical Text Generation Image Analysis Personalized Treatment
multimodal production Open Source
56 views
MistralSaba
MistralSaba by Mistral AI
0

Mistral Saba is a 24-billion-parameter AI model developed by Mistral AI, specifically designed to handle Middle Eastern and South Asian languages and cultures. It excels in processing Arabic and Indian-origin languages such as Tamil and Malayalam, offering efficient deployment on single GPU systems with a response speed of 150 tokens per second. The model addresses the limitations of general-purpose models in understanding regional language nuances and cultural contexts.

AI Language Model Regional Arabic South Asian Languages Multilingual Natural Language Processing Cultural Context Efficient Deployment Custom-Trained Models
language production
76 views
SignLLM

SignLLM is a groundbreaking multilingual sign language generation model that transforms text input into corresponding sign language videos. It supports multiple sign languages, including American Sign Language (ASL), German Sign Language (GSL), Argentine Sign Language (LSA), and Korean Sign Language (KSL). The model leverages the Prompt2Sign dataset, utilizing automated techniques to collect and process sign language videos from the web. It incorporates new loss functions and reinforcement learning modules to achieve efficient data extraction and model training.

Sign Language AI Model Multilingual Video Generation Reinforcement Learning Education Healthcare Legal Services Entertainment Daily Communication
multimodal production
149 views
Flame
0

Flame is an open-source multimodal AI model that transforms UI design screenshots into high-quality modern frontend code. It leverages visual language modeling, automated data synthesis, and structured training processes to generate code that adheres to modern frontend frameworks like React. Flame supports componentization, state management, and dynamic interactions, overcoming the limitations of traditional models that generate static code. The model's training data, models, and test sets are open-source, providing an efficient design-to-code conversion tool for frontend development.

AI Frontend Development Code Generation React Open Source Multimodal AI Visual Language Modeling UI Design Componentization State Management
multimodal Open Source
24 views
VLM-R1
VLM-R1 by Om AI Lab
0

VLM-R1, developed by Om AI Lab, is a cutting-edge visual language model that leverages reinforcement learning to accurately identify and locate target objects in images using natural language instructions. Built on the Qwen2.5-VL architecture and enhanced with DeepSeek's R1 method, VLM-R1 excels in complex scenes and cross-domain data, offering superior generalization and stability. It supports multimodal reasoning, joint image-text processing, and efficient training, making it a powerful tool for applications ranging from smart assistants to medical imaging analysis.

Visual Language Model Reinforcement Learning Natural Language Processing Computer Vision Multimodal AI Object Detection Image Analysis AI Research Open Source Deep Learning
multimodal production Open Source
34 views
WarriorCoder
WarriorCoder by Microsoft, South China University of Technology
0

WarriorCoder is a code generation large language model (LLM) developed by the School of Computer Science and Engineering at South China University of Technology in collaboration with Microsoft. It generates high-quality training data through adversarial simulations between expert models, enhancing model performance. Unlike traditional methods, WarriorCoder does not rely on existing proprietary models or datasets. Instead, it mines instructions from scratch, using the Elo rating system and a referee model to evaluate adversarial outcomes and select the best responses as training data. WarriorCoder integrates the strengths of multiple open-source code expert models, avoiding human intervention and system bias during data collection. Experiments show that WarriorCoder achieves new SOTA performance in tasks such as code generation, code reasoning, and library usage, demonstrating strong generalization capabilities and data diversity.

Code Generation Large Language Model AI Machine Learning Programming Code Optimization Code Debugging Code Reasoning Library Usage Multi-language Support
language production
45 views
CSM
CSM by Sesame Team
0

CSM (Conversational Speech Model) is a voice dialogue model developed by the Sesame Team, designed to improve the naturalness and emotional interaction capabilities of voice assistants. It uses a multimodal learning framework, combining text and voice data, and leverages the Transformer architecture to generate natural and coherent speech. CSM dynamically adjusts tone, rhythm, and emotional expression based on conversation history and context, providing a more human-like interaction experience. It is optimized for training efficiency and trained on large-scale datasets to enhance performance and expressiveness.

Conversational AI Voice Assistants Multimodal Learning Transformer Architecture Emotional Interaction Natural Language Processing Speech Generation Real-Time Interaction Multilingual Support Contextual Adaptation
multimodal production
46 views