AI Models

AI Models Page 3 of 7

All Models Complete list of AI models and foundation models, sorted by newest first

Universal-1
Universal-1 by AssemblyAI
0

Universal-1 is a multilingual speech recognition and transcription model developed by AssemblyAI. Trained on over 12.5 million hours of multilingual audio data, it supports languages such as English, Spanish, French, and German. The model delivers high accuracy in various environments, including noisy backgrounds, diverse accents, and natural conversations. It features fast response times, improved timestamp accuracy, and reduced hallucination rates. Universal-1 is designed to enhance speech recognition accuracy, making it a powerful tool for building next-generation AI products and services.

Speech Recognition Transcription Multilingual AI Natural Language Processing Audio Processing Machine Learning Developer Tools API High Accuracy
speech recognition production
63 views
FalconMamba7B
FalconMamba7B by Technology Innovation Institute (TII)
0

Falcon Mamba 7B is an open-source AI model developed by the Technology Innovation Institute (TII) in the UAE. It surpasses models like Meta's Llama 3.1-8B in performance. Utilizing an encoder-decoder structure and multi-head attention technology, it is optimized for handling long sequences efficiently. It can run on a single A10 24GB GPU and was trained on a curated dataset of approximately 5500GT, employing constant learning rates and learning rate decay strategies.

AI Open Source Natural Language Processing Encoder-Decoder Multi-Head Attention State Space Model Text Generation Machine Learning Long Sequence Processing Content Creation
Causal decoder-only production Open Source
67 views
LongWriter
LongWriter by Tsinghua University and Zhipu AI
0

LongWriter is a state-of-the-art long text generation model developed by Tsinghua University in collaboration with Zhipu AI. It is designed to break the limitations of existing large language models by generating coherent texts that exceed 10,000 words. The model leverages the "LongWriter-6k" dataset and employs Direct Preference Optimization (DPO) technology to enhance output quality and adherence to length constraints. LongWriter is open-source, making it accessible for both academic research and practical applications.

Text Generation AI Model Open Source Natural Language Processing Long Context Processing Direct Preference Optimization Academic Research Content Creation Publishing Education
language production Open Source
61 views
Pixtral12B
Pixtral12B by Mistral AI
0

Pixtral 12B is a multimodal AI model developed by Mistral AI, designed to handle both text and image data. With 12 billion parameters and a size of approximately 24GB, it excels in tasks such as image captioning, object counting, and answering questions based on image content. Built on the Nemo 12B text model, it incorporates a 400-million-parameter vision adapter, enabling high-resolution image processing up to 1024x1024 pixels. The model is open-sourced under the Apache 2.0 license, allowing users to download, fine-tune, and deploy it for various applications. Pixtral 12B is optimized for inference using the TensorRT-LLM engine and supports dynamic batching and quantization on NVIDIA GPUs.

Multimodal AI Image Processing Text Processing Open Source High Performance Natural Language Processing Computer Vision AI Models Machine Learning Deep Learning
multimodal production Open Source
67 views
LongCite
LongCite by Tsinghua University
0

LongCite is an open-source project by Tsinghua University designed to enhance the credibility and verifiability of large language models (LLMs) in long-text question-answering tasks. It generates fine-grained sentence-level citations, allowing users to verify the accuracy of the model's responses. The project includes the LongBench-Cite evaluation benchmark, the CoF automated data construction process, the LongCite-45k dataset, and the LongCite-8B and LongCite-9B models trained on this dataset. These models can process long texts and provide accurate answers with direct citations, improving transparency and reliability.

LLM Citation Verifiability Natural Language Processing Open Source Academic Research Legal Consultation Financial Analysis Medical Consultation News Reporting
language production Open Source
70 views
OpenMusic
OpenMusic by Hugging Face
0

OpenMusic is a high-quality text-to-music model based on QA-MDT (Quality-aware Masked Diffusion Transformer) technology. It utilizes advanced AI algorithms to generate music from text descriptions. The model incorporates a quality-aware training strategy that ensures the generated music is musically rich, aligns with the text description, and maintains high fidelity. OpenMusic supports various music creation functions, including audio editing, processing, and recording. It is designed to assist musicians, content creators, and educators in generating music for diverse applications such as music production, multimedia content creation, and music therapy.

Text-to-Music AI Music Audio Editing Music Generation Multimedia Music Therapy Content Creation AI Algorithms High-Fidelity Audio Quality-Aware Training
Text-to-Music production Open Source
59 views
CogView3
CogView3 by Tsinghua University and Zhipu AI
0

CogView3 is an open-source AI image generation model developed by Tsinghua University and Zhipu AI. It utilizes relay diffusion technology to generate high-resolution images in stages, starting with low-resolution images and enhancing them using relay super-resolution technology. This approach improves efficiency, reduces costs, and surpasses existing open-source models like SDXL in both quality and speed. CogView3 significantly reduces inference time while maintaining image detail, making it a powerful tool for various applications.

AI Image Generation Open Source Relay Diffusion High-Resolution Images Efficiency Cost-Effective Tsinghua University Zhipu AI SDXL Inference Speed
vision production Open Source
55 views
PyramidFlow
PyramidFlow by Peking University, Kuaishou Technology, Beijing University of Posts and Telecommunications
0

Pyramid-Flow is an advanced video generation model developed by researchers from Peking University, Kuaishou Technology, and Beijing University of Posts and Telecommunications. It generates high-definition videos up to 10 seconds long, with a resolution of 1280x768 and 24 frames per second, based on text prompts. The model uses an innovative pyramid flow matching algorithm that decomposes the video generation process into multiple pyramid stages of different resolutions, processing the final stage at full resolution to reduce computational complexity. It features a temporal pyramid structure that compresses full-resolution historical information to improve training efficiency. Pyramid-Flow supports end-to-end optimization and is trained using a single unified diffusion transformer (DiT), simplifying the model's implementation.

Video Generation AI Model Text-to-Video High-Resolution Video Autoregressive Video End-to-End Optimization Diffusion Transformer Pyramid Flow Matching Temporal Pyramid Spatial Pyramid
video generation experimental Open Source
79 views
Mochi1
Mochi1 by Genmo
0

Mochi 1 is an open-source video generation model developed by Genmo, designed to produce high-quality videos with smooth motion and strong adherence to user prompts. Released under the Apache 2.0 license, it is free for both personal and commercial use. The model currently offers a 480p base version, with a 720p HD version planned for release later this year. Mochi 1's architecture and weights are available on Hugging Face, and Genmo provides a hosted playground for users to experiment with the model for free.

AI Video Generation Open Source High Fidelity Motion Quality Prompt Adherence Apache 2.0 Hugging Face Video Content Creation Real-Time Video Generation Asymmetric Diffusion Transformer
video generation beta Open Source
45 views
DocMind
DocMind by SmartRead
0

DocMind is a document intelligence model developed by SmartRead, based on the Transformer architecture, integrating deep learning, NLP, and CV technologies. It handles complex structures and visual information in rich-text documents, improving the accuracy of information extraction. DocMind supports precise identification of document entities, capturing text dependencies, and deep understanding of document content. It integrates with knowledge bases to enhance the understanding of professional documents and automates tasks like Q&A, document classification, and organization, applicable in fields like law, education, and finance.

Document Intelligence NLP CV Transformer Deep Learning Information Extraction Knowledge Integration Task Automation Multimodal Fusion Professional Documents
multimodal production
43 views