AI Models

Trending Models Most popular AI models and foundation models

GPT-4
GPT-4 by OpenAI
50000

GPT-4 is OpenAI's most advanced large language model, demonstrating human-level performance on various academic and professional tests.

LLM NLP AI ChatGPT OpenAI
language production
1000000 views
LLaMA 2
LLaMA 2 by Meta
45000

Meta's open-source large language model family, offering strong performance across various tasks with different model sizes.

LLM Open Source Meta AI Foundation Model
language production Open Source
700000 views
Stable Diffusion
Stable Diffusion by Stability AI
40000

An open-source text-to-image model capable of generating detailed images from text descriptions, with a strong community and multiple deployment options.

AI Art Text-to-Image Open Source Local Deployment
vision production Open Source
600000 views
Claude 3
Claude 3 by Anthropic
30000

Anthropic's most capable AI model, featuring enhanced reasoning, analysis, and creative capabilities with improved accuracy and safety.

LLM AI Constitutional AI Anthropic
language production
500000 views
PaLM 2
PaLM 2 by Google
30000

Google's advanced language model optimized for reasoning, coding, and multilingual tasks with strong capabilities across various domains.

LLM Google AI Multilingual Reasoning
language production
450000 views
Gemini
Gemini by Google
25000

Google's most capable and flexible AI model, designed to be multimodal from the ground up with superior reasoning capabilities.

LLM Multimodal Google AI Vision
multimodal production
400000 views
DALL-E 3
DALL-E 3 by OpenAI
20000

OpenAI's advanced text-to-image generation model capable of creating highly detailed and accurate images from natural language descriptions.

AI Art Text-to-Image OpenAI Image Generation
vision production
300000 views
Mistral AI
Mistral AI by Mistral AI
20000

A family of powerful open-source language models known for their efficiency and strong performance across various tasks.

LLM Open Source Efficient AI French Tech
language production Open Source
250000 views
Whisper
Whisper by OpenAI
15000

OpenAI's advanced speech recognition system capable of transcribing and translating multiple languages with high accuracy.

ASR Speech-to-Text OpenAI Open Source
audio production Open Source
200000 views
LongWriter
LongWriter by Tsinghua University and Zhipu AI
0

LongWriter is a state-of-the-art long text generation model developed by Tsinghua University in collaboration with Zhipu AI. It is designed to break the limitations of existing large language models by generating coherent texts that exceed 10,000 words. The model leverages the "LongWriter-6k" dataset and employs Direct Preference Optimization (DPO) technology to enhance output quality and adherence to length constraints. LongWriter is open-source, making it accessible for both academic research and practical applications.

Text Generation AI Model Open Source Natural Language Processing Long Context Processing Direct Preference Optimization Academic Research Content Creation Publishing Education
language production Open Source

All Models Complete list of AI models and foundation models, sorted by newest first

LongWriter
LongWriter by Tsinghua University and Zhipu AI
0

LongWriter is a state-of-the-art long text generation model developed by Tsinghua University in collaboration with Zhipu AI. It is designed to break the limitations of existing large language models by generating coherent texts that exceed 10,000 words. The model leverages the "LongWriter-6k" dataset and employs Direct Preference Optimization (DPO) technology to enhance output quality and adherence to length constraints. LongWriter is open-source, making it accessible for both academic research and practical applications.

Text Generation AI Model Open Source Natural Language Processing Long Context Processing Direct Preference Optimization Academic Research Content Creation Publishing Education
language production Open Source
Pixtral12B
Pixtral12B by Mistral AI
0

Pixtral 12B is a multimodal AI model developed by Mistral AI, designed to handle both text and image data. With 12 billion parameters and a size of approximately 24GB, it excels in tasks such as image captioning, object counting, and answering questions based on image content. Built on the Nemo 12B text model, it incorporates a 400-million-parameter vision adapter, enabling high-resolution image processing up to 1024x1024 pixels. The model is open-sourced under the Apache 2.0 license, allowing users to download, fine-tune, and deploy it for various applications. Pixtral 12B is optimized for inference using the TensorRT-LLM engine and supports dynamic batching and quantization on NVIDIA GPUs.

Multimodal AI Image Processing Text Processing Open Source High Performance Natural Language Processing Computer Vision AI Models Machine Learning Deep Learning
multimodal production Open Source
LongCite
LongCite by Tsinghua University
0

LongCite is an open-source project by Tsinghua University designed to enhance the credibility and verifiability of large language models (LLMs) in long-text question-answering tasks. It generates fine-grained sentence-level citations, allowing users to verify the accuracy of the model's responses. The project includes the LongBench-Cite evaluation benchmark, the CoF automated data construction process, the LongCite-45k dataset, and the LongCite-8B and LongCite-9B models trained on this dataset. These models can process long texts and provide accurate answers with direct citations, improving transparency and reliability.

LLM Citation Verifiability Natural Language Processing Open Source Academic Research Legal Consultation Financial Analysis Medical Consultation News Reporting
language production Open Source
OpenMusic
OpenMusic by Hugging Face
0

OpenMusic is a high-quality text-to-music model based on QA-MDT (Quality-aware Masked Diffusion Transformer) technology. It utilizes advanced AI algorithms to generate music from text descriptions. The model incorporates a quality-aware training strategy that ensures the generated music is musically rich, aligns with the text description, and maintains high fidelity. OpenMusic supports various music creation functions, including audio editing, processing, and recording. It is designed to assist musicians, content creators, and educators in generating music for diverse applications such as music production, multimedia content creation, and music therapy.

Text-to-Music AI Music Audio Editing Music Generation Multimedia Music Therapy Content Creation AI Algorithms High-Fidelity Audio Quality-Aware Training
Text-to-Music production Open Source
CogView3
CogView3 by Tsinghua University and Zhipu AI
0

CogView3 is an open-source AI image generation model developed by Tsinghua University and Zhipu AI. It utilizes relay diffusion technology to generate high-resolution images in stages, starting with low-resolution images and enhancing them using relay super-resolution technology. This approach improves efficiency, reduces costs, and surpasses existing open-source models like SDXL in both quality and speed. CogView3 significantly reduces inference time while maintaining image detail, making it a powerful tool for various applications.

AI Image Generation Open Source Relay Diffusion High-Resolution Images Efficiency Cost-Effective Tsinghua University Zhipu AI SDXL Inference Speed
vision production Open Source
PyramidFlow
PyramidFlow by Peking University, Kuaishou Technology, Beijing University of Posts and Telecommunications
0

Pyramid-Flow is an advanced video generation model developed by researchers from Peking University, Kuaishou Technology, and Beijing University of Posts and Telecommunications. It generates high-definition videos up to 10 seconds long, with a resolution of 1280x768 and 24 frames per second, based on text prompts. The model uses an innovative pyramid flow matching algorithm that decomposes the video generation process into multiple pyramid stages of different resolutions, processing the final stage at full resolution to reduce computational complexity. It features a temporal pyramid structure that compresses full-resolution historical information to improve training efficiency. Pyramid-Flow supports end-to-end optimization and is trained using a single unified diffusion transformer (DiT), simplifying the model's implementation.

Video Generation AI Model Text-to-Video High-Resolution Video Autoregressive Video End-to-End Optimization Diffusion Transformer Pyramid Flow Matching Temporal Pyramid Spatial Pyramid
video generation experimental Open Source
Mochi1
Mochi1 by Genmo
0

Mochi 1 is an open-source video generation model developed by Genmo, designed to produce high-quality videos with smooth motion and strong adherence to user prompts. Released under the Apache 2.0 license, it is free for both personal and commercial use. The model currently offers a 480p base version, with a 720p HD version planned for release later this year. Mochi 1's architecture and weights are available on Hugging Face, and Genmo provides a hosted playground for users to experiment with the model for free.

AI Video Generation Open Source High Fidelity Motion Quality Prompt Adherence Apache 2.0 Hugging Face Video Content Creation Real-Time Video Generation Asymmetric Diffusion Transformer
video generation beta Open Source
DocMind
DocMind by SmartRead
0

DocMind is a document intelligence model developed by SmartRead, based on the Transformer architecture, integrating deep learning, NLP, and CV technologies. It handles complex structures and visual information in rich-text documents, improving the accuracy of information extraction. DocMind supports precise identification of document entities, capturing text dependencies, and deep understanding of document content. It integrates with knowledge bases to enhance the understanding of professional documents and automates tasks like Q&A, document classification, and organization, applicable in fields like law, education, and finance.

Document Intelligence NLP CV Transformer Deep Learning Information Extraction Knowledge Integration Task Automation Multimodal Fusion Professional Documents
multimodal production
RecraftV3
RecraftV3 by Recraft
0

Recraft V3 is an advanced AI text-to-image generation model developed by Recraft, designed to produce high-quality images with precise design control. It has achieved the top position on Hugging Face's text-to-image model leaderboard with an ELO score of 1172. The model allows users to customize brand styles, control text and element positioning, and supports long text generation. Recraft V3 is accessible via a user-friendly interface, mobile apps, and API, making it a versatile tool for designers and creative professionals.

AI Text-to-Image Design Creative Tools Image Generation Branding Content Creation E-commerce Game Development API
vision production
AlphaFold3
AlphaFold3 by Google DeepMind
0

AlphaFold 3, developed by Google DeepMind, is an advanced AI model designed to predict the 3D structures of various biomolecules, including proteins, nucleic acids, small molecules, ions, and modified residues. This open-source model has significantly improved the accuracy of structural predictions, making it a valuable tool in drug design, scientific research, and biomedical applications. By enabling global scientists to accelerate the development of new drugs and vaccines, AlphaFold 3 is transforming the field of structural biology.

AI Biomolecular Structure Drug Design Open Source Structural Prediction Deep Learning Protein Folding Scientific Research Biomedical Applications Molecular Interactions
multimodal production Open Source