GPT-4 is OpenAI's most advanced large language model, demonstrating human-level performance on various academic and professional tests.
Meta's open-source large language model family, offering strong performance across various tasks with different model sizes.
An open-source text-to-image model capable of generating detailed images from text descriptions, with a strong community and multiple deployment options.
Anthropic's most capable AI model, featuring enhanced reasoning, analysis, and creative capabilities with improved accuracy and safety.
Google's advanced language model optimized for reasoning, coding, and multilingual tasks with strong capabilities across various domains.
Google's most capable and flexible AI model, designed to be multimodal from the ground up with superior reasoning capabilities.
OpenAI's advanced text-to-image generation model capable of creating highly detailed and accurate images from natural language descriptions.
A family of powerful open-source language models known for their efficiency and strong performance across various tasks.
OpenAI's advanced speech recognition system capable of transcribing and translating multiple languages with high accuracy.
LongWriter is a state-of-the-art long text generation model developed by Tsinghua University in collaboration with Zhipu AI. It is designed to break the limitations of existing large language models by generating coherent texts that exceed 10,000 words. The model leverages the "LongWriter-6k" dataset and employs Direct Preference Optimization (DPO) technology to enhance output quality and adherence to length constraints. LongWriter is open-source, making it accessible for both academic research and practical applications.
LongWriter is a state-of-the-art long text generation model developed by Tsinghua University in collaboration with Zhipu AI. It is designed to break the limitations of existing large language models by generating coherent texts that exceed 10,000 words. The model leverages the "LongWriter-6k" dataset and employs Direct Preference Optimization (DPO) technology to enhance output quality and adherence to length constraints. LongWriter is open-source, making it accessible for both academic research and practical applications.
Pixtral 12B is a multimodal AI model developed by Mistral AI, designed to handle both text and image data. With 12 billion parameters and a size of approximately 24GB, it excels in tasks such as image captioning, object counting, and answering questions based on image content. Built on the Nemo 12B text model, it incorporates a 400-million-parameter vision adapter, enabling high-resolution image processing up to 1024x1024 pixels. The model is open-sourced under the Apache 2.0 license, allowing users to download, fine-tune, and deploy it for various applications. Pixtral 12B is optimized for inference using the TensorRT-LLM engine and supports dynamic batching and quantization on NVIDIA GPUs.
LongCite is an open-source project by Tsinghua University designed to enhance the credibility and verifiability of large language models (LLMs) in long-text question-answering tasks. It generates fine-grained sentence-level citations, allowing users to verify the accuracy of the model's responses. The project includes the LongBench-Cite evaluation benchmark, the CoF automated data construction process, the LongCite-45k dataset, and the LongCite-8B and LongCite-9B models trained on this dataset. These models can process long texts and provide accurate answers with direct citations, improving transparency and reliability.
OpenMusic is a high-quality text-to-music model based on QA-MDT (Quality-aware Masked Diffusion Transformer) technology. It utilizes advanced AI algorithms to generate music from text descriptions. The model incorporates a quality-aware training strategy that ensures the generated music is musically rich, aligns with the text description, and maintains high fidelity. OpenMusic supports various music creation functions, including audio editing, processing, and recording. It is designed to assist musicians, content creators, and educators in generating music for diverse applications such as music production, multimedia content creation, and music therapy.
CogView3 is an open-source AI image generation model developed by Tsinghua University and Zhipu AI. It utilizes relay diffusion technology to generate high-resolution images in stages, starting with low-resolution images and enhancing them using relay super-resolution technology. This approach improves efficiency, reduces costs, and surpasses existing open-source models like SDXL in both quality and speed. CogView3 significantly reduces inference time while maintaining image detail, making it a powerful tool for various applications.
Pyramid-Flow is an advanced video generation model developed by researchers from Peking University, Kuaishou Technology, and Beijing University of Posts and Telecommunications. It generates high-definition videos up to 10 seconds long, with a resolution of 1280x768 and 24 frames per second, based on text prompts. The model uses an innovative pyramid flow matching algorithm that decomposes the video generation process into multiple pyramid stages of different resolutions, processing the final stage at full resolution to reduce computational complexity. It features a temporal pyramid structure that compresses full-resolution historical information to improve training efficiency. Pyramid-Flow supports end-to-end optimization and is trained using a single unified diffusion transformer (DiT), simplifying the model's implementation.
Mochi 1 is an open-source video generation model developed by Genmo, designed to produce high-quality videos with smooth motion and strong adherence to user prompts. Released under the Apache 2.0 license, it is free for both personal and commercial use. The model currently offers a 480p base version, with a 720p HD version planned for release later this year. Mochi 1's architecture and weights are available on Hugging Face, and Genmo provides a hosted playground for users to experiment with the model for free.
DocMind is a document intelligence model developed by SmartRead, based on the Transformer architecture, integrating deep learning, NLP, and CV technologies. It handles complex structures and visual information in rich-text documents, improving the accuracy of information extraction. DocMind supports precise identification of document entities, capturing text dependencies, and deep understanding of document content. It integrates with knowledge bases to enhance the understanding of professional documents and automates tasks like Q&A, document classification, and organization, applicable in fields like law, education, and finance.
Recraft V3 is an advanced AI text-to-image generation model developed by Recraft, designed to produce high-quality images with precise design control. It has achieved the top position on Hugging Face's text-to-image model leaderboard with an ELO score of 1172. The model allows users to customize brand styles, control text and element positioning, and supports long text generation. Recraft V3 is accessible via a user-friendly interface, mobile apps, and API, making it a versatile tool for designers and creative professionals.
AlphaFold 3, developed by Google DeepMind, is an advanced AI model designed to predict the 3D structures of various biomolecules, including proteins, nucleic acids, small molecules, ions, and modified residues. This open-source model has significantly improved the accuracy of structural predictions, making it a valuable tool in drug design, scientific research, and biomedical applications. By enabling global scientists to accelerate the development of new drugs and vaccines, AlphaFold 3 is transforming the field of structural biology.