AI Models

Trending Models Most popular AI models and foundation models

GPT-4 by OpenAI

50000

GPT-4 is OpenAI's most advanced large language model, demonstrating human-level performance on various academic and professional tests.

LLM NLP AI ChatGPT OpenAI

language production

1000148 views

Learn More Try Now

LLaMA 2 by Meta

45000

Meta's open-source large language model family, offering strong performance across various tasks with different model sizes.

LLM Open Source Meta AI Foundation Model

language production Open Source

700162 views

Learn More Try Now

Stable Diffusion by Stability AI

40000

An open-source text-to-image model capable of generating detailed images from text descriptions, with a strong community and multiple deployment options.

AI Art Text-to-Image Open Source Local Deployment

vision production Open Source

600115 views

Learn More Try Now

Claude 3 by Anthropic

30000

Anthropic's most capable AI model, featuring enhanced reasoning, analysis, and creative capabilities with improved accuracy and safety.

LLM AI Constitutional AI Anthropic

language production

500113 views

Learn More Try Now

PaLM 2 by Google

30000

Google's advanced language model optimized for reasoning, coding, and multilingual tasks with strong capabilities across various domains.

LLM Google AI Multilingual Reasoning

language production

450113 views

Learn More Try Now

Gemini by Google

25000

Google's most capable and flexible AI model, designed to be multimodal from the ground up with superior reasoning capabilities.

LLM Multimodal Google AI Vision

multimodal production

400196 views

Learn More Try Now

DALL-E 3 by OpenAI

20000

OpenAI's advanced text-to-image generation model capable of creating highly detailed and accurate images from natural language descriptions.

AI Art Text-to-Image OpenAI Image Generation

vision production

300118 views

Learn More Try Now

Mistral AI by Mistral AI

20000

A family of powerful open-source language models known for their efficiency and strong performance across various tasks.

LLM Open Source Efficient AI French Tech

language production Open Source

250095 views

Learn More Try Now

Whisper by OpenAI

15000

OpenAI's advanced speech recognition system capable of transcribing and translating multiple languages with high accuracy.

ASR Speech-to-Text OpenAI Open Source

audio production Open Source

200114 views

Learn More Try Now

Ola by Tsinghua University, Tencent Hunyuan Research Team, NUS S-Lab

Ola is a full-modal language model developed by Tsinghua University, Tencent Hunyuan Research Team, and NUS S-Lab. It employs a progressive modal alignment strategy to gradually expand the modalities supported by the language model, starting with images and text, and then introducing audio and video data. Ola's architecture supports full-modal inputs, including text, images, video, and audio, and can process these inputs simultaneously. It also features a sentence-by-sentence decoding scheme for streaming speech generation, enhancing interactive experiences.

Multimodal AI Language Model Speech Generation Text Processing Image Understanding Audio Processing Video Understanding AI Research Natural Language Processing Progressive Modal Alignment

multimodal experimental Open Source

453 views

Learn More Try Now

All Models Complete list of AI models and foundation models, sorted by newest first

Ola by Tsinghua University, Tencent Hunyuan Research Team, NUS S-Lab

Multimodal AI Language Model Speech Generation Text Processing Image Understanding Audio Processing Video Understanding AI Research Natural Language Processing Progressive Modal Alignment

multimodal experimental Open Source

453 views

Learn More Try Now

Zonos by Zyphra

Zonos is a high-fidelity text-to-speech (TTS) model developed by Zyphra. It includes two models: a 1.6 billion parameter Transformer model and an SSM hybrid model, both open-sourced under the Apache 2.0 license. Zonos generates natural and expressive speech based on text prompts and speaker embeddings, supporting voice cloning and adjustable parameters such as speed, pitch, and emotion. The output sampling rate is 44kHz. The model is trained on approximately 200,000 hours of multilingual speech data, primarily supporting English with limited support for other languages. Zonos provides an optimized inference engine for fast speech generation, making it suitable for real-time applications.

Text-to-Speech Voice Cloning Multilingual Speech Synthesis AI Models Natural Language Processing Real-Time Applications Open Source High-Fidelity Audio Developer Tools

Text-to-Speech beta Open Source

389 views

Learn More Try Now

Step-Video-T2V by Leapfrogging Star

Step-Video-T2V is an open-source text-to-video model developed by Leapfrogging Star, featuring 30 billion parameters and capable of generating high-quality videos up to 204 frames long. The model uses a deeply compressed variational autoencoder (Video-VAE) for efficient training and inference, supporting bilingual text inputs in Chinese and English. It employs a diffusion-based Transformer (DiT) architecture with a 3D full-attention mechanism, optimized for generating videos with strong motion dynamics and high aesthetic quality.

Text-to-Video AI Model Open Source Video Generation Bilingual Support Deep Learning Diffusion Models Transformer Architecture Video Content Creation High-Quality Video

multimodal production Open Source

458 views

Learn More Try Now

HealthGPT by Zhejiang University, University of Electronic Science and Technology, Alibaba

HealthGPT is an advanced medical visual language model (Med-LVLM) developed by Zhejiang University, University of Electronic Science and Technology, and Alibaba. It integrates visual comprehension and generation tasks using Heterogeneous Low-Rank Adaptation (H-LoRA) technology, enabling efficient medical image analysis, diagnostic assistance, and text generation. The model offers two versions: HealthGPT-M3 (3.8 billion parameters) and HealthGPT-L14 (14 billion parameters), optimized for different performance and resource needs. HealthGPT employs Hierarchical Visual Perception (HVP) and a Three-Stage Learning Strategy (TLS) to enhance visual feature learning and task adaptation.

Medical AI Visual Language Model HealthTech Medical Imaging AI in Healthcare Multimodal AI Diagnostic Assistance Medical Text Generation Image Analysis Personalized Treatment

multimodal production Open Source

327 views

Learn More Try Now

MistralSaba by Mistral AI

Mistral Saba is a 24-billion-parameter AI model developed by Mistral AI, specifically designed to handle Middle Eastern and South Asian languages and cultures. It excels in processing Arabic and Indian-origin languages such as Tamil and Malayalam, offering efficient deployment on single GPU systems with a response speed of 150 tokens per second. The model addresses the limitations of general-purpose models in understanding regional language nuances and cultural contexts.

AI Language Model Regional Arabic South Asian Languages Multilingual Natural Language Processing Cultural Context Efficient Deployment Custom-Trained Models

language production

389 views

Learn More Try Now

SignLLM

SignLLM is a groundbreaking multilingual sign language generation model that transforms text input into corresponding sign language videos. It supports multiple sign languages, including American Sign Language (ASL), German Sign Language (GSL), Argentine Sign Language (LSA), and Korean Sign Language (KSL). The model leverages the Prompt2Sign dataset, utilizing automated techniques to collect and process sign language videos from the web. It incorporates new loss functions and reinforcement learning modules to achieve efficient data extraction and model training.

Sign Language AI Model Multilingual Video Generation Reinforcement Learning Education Healthcare Legal Services Entertainment Daily Communication

multimodal production

861 views

Learn More Try Now

Flame

Flame is an open-source multimodal AI model that transforms UI design screenshots into high-quality modern frontend code. It leverages visual language modeling, automated data synthesis, and structured training processes to generate code that adheres to modern frontend frameworks like React. Flame supports componentization, state management, and dynamic interactions, overcoming the limitations of traditional models that generate static code. The model's training data, models, and test sets are open-source, providing an efficient design-to-code conversion tool for frontend development.

AI Frontend Development Code Generation React Open Source Multimodal AI Visual Language Modeling UI Design Componentization State Management

multimodal Open Source

87 views

Learn More Try Now

VLM-R1 by Om AI Lab

VLM-R1, developed by Om AI Lab, is a cutting-edge visual language model that leverages reinforcement learning to accurately identify and locate target objects in images using natural language instructions. Built on the Qwen2.5-VL architecture and enhanced with DeepSeek's R1 method, VLM-R1 excels in complex scenes and cross-domain data, offering superior generalization and stability. It supports multimodal reasoning, joint image-text processing, and efficient training, making it a powerful tool for applications ranging from smart assistants to medical imaging analysis.

Visual Language Model Reinforcement Learning Natural Language Processing Computer Vision Multimodal AI Object Detection Image Analysis AI Research Open Source Deep Learning

multimodal production Open Source

87 views

Learn More Try Now

WarriorCoder by Microsoft, South China University of Technology

WarriorCoder is a code generation large language model (LLM) developed by the School of Computer Science and Engineering at South China University of Technology in collaboration with Microsoft. It generates high-quality training data through adversarial simulations between expert models, enhancing model performance. Unlike traditional methods, WarriorCoder does not rely on existing proprietary models or datasets. Instead, it mines instructions from scratch, using the Elo rating system and a referee model to evaluate adversarial outcomes and select the best responses as training data. WarriorCoder integrates the strengths of multiple open-source code expert models, avoiding human intervention and system bias during data collection. Experiments show that WarriorCoder achieves new SOTA performance in tasks such as code generation, code reasoning, and library usage, demonstrating strong generalization capabilities and data diversity.

Code Generation Large Language Model AI Machine Learning Programming Code Optimization Code Debugging Code Reasoning Library Usage Multi-language Support

language production

135 views

Learn More Try Now

CSM by Sesame Team

CSM (Conversational Speech Model) is a voice dialogue model developed by the Sesame Team, designed to improve the naturalness and emotional interaction capabilities of voice assistants. It uses a multimodal learning framework, combining text and voice data, and leverages the Transformer architecture to generate natural and coherent speech. CSM dynamically adjusts tone, rhythm, and emotional expression based on conversation history and context, providing a more human-like interaction experience. It is optimized for training efficiency and trained on large-scale datasets to enhance performance and expressiveness.

Conversational AI Voice Assistants Multimodal Learning Transformer Architecture Emotional Interaction Natural Language Processing Speech Generation Real-Time Interaction Multilingual Support Contextual Adaptation

multimodal production

170 views

Learn More Try Now

Trending Models Most popular AI models and foundation models

All Models Complete list of AI models and foundation models, sorted by newest first

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

Details

Frameworks

Database

Billing

Completed

Project Type

Project Settings

Drop files here or click to upload.

Budget

Build a Team

Set First Target

Upload Files

Drop files here or click to upload.

Project Created!

No result found

Advanced Search

Search Preferences

Trending Models Most popular AI models and foundation models

All Models Complete list of AI models and foundation models, sorted by newest first

Drop files here or click to upload.

Drop files here or click to upload.