AI Frameworks

Trending Frameworks Popular AI Frameworks

TPO
TPO
0

TPO (Test-Time Preference Optimization) is an AI framework that optimizes language model outputs during inference to better align with human preferences. It converts reward signals into textual feedback, marking high-quality responses as "chosen" and low-quality ones as "rejected." TPO generates "textual loss" and "textual gradients" to iteratively improve outputs without updating model parameters. This approach significantly enhances performance across benchmarks, even for models without prior alignment training.

AI Optimization Language Models Human Preferences Inference Optimization Textual Feedback Model Alignment Natural Language Processing Iterative Improvement Scalability Interpretability
experimental Open Source
Phantom
Phantom by ByteDance
0

Phantom, developed by ByteDance's Intelligent Creation Team, is a Subject-to-Video (S2V) generation framework that leverages cross-modal alignment technology. It combines text and image prompts to extract subject elements from reference images, generating video content that aligns with the provided text description. Built on existing Text-to-Video (T2V) and Image-to-Video (I2V) architectures, Phantom redesigns the joint text-image injection model, learning cross-modal alignment from text-image-video triplet data. The framework supports single and multi-subject references, emphasizing subject consistency in human generation tasks and enhancing identity-preserving video generation.

Video Generation Subject-to-Video Cross-Modal Alignment AI Video Tools Creative AI Identity Preservation Multi-Subject Generation Text-to-Video Image-to-Video ByteDance
production Open Source
AgentSociety
AgentSociety by Tsinghua University
0

AgentSociety is a social simulator developed by Tsinghua University, leveraging large language models (LLM) to create intelligent agents with human-like minds. These agents are endowed with emotions, needs, and cognitive abilities, enabling them to simulate complex social behaviors in urban environments. The platform features realistic urban social environment simulation, a large-scale social simulation engine, and a toolbox for intelligent social science research. It is used for analyzing social phenomena, policy sandbox testing, crisis warning, and exploring future social forms.

Social Simulation LLM AI Agents Urban Environment Sociology Distributed Computing Behavioral Modeling Policy Testing Crisis Simulation Research Tool
production Open Source
DualPipe
DualPipe by DeepSeek
0

DualPipe is an open-source bidirectional pipeline parallelism technology developed by DeepSeek, aimed at improving the training efficiency of large-scale deep learning models. It divides the model training process into two independent pipelines—forward computation and backward computation—which are executed in parallel. This approach reduces communication overhead in distributed training by optimizing communication mechanisms and scheduling strategies, making it ideal for large-scale model training.

Deep Learning Pipeline Parallelism Distributed Training Model Optimization AI Development Machine Learning Large-scale Models Computation Overlap Memory Optimization Training Efficiency
production Open Source
HippoRAG2
HippoRAG2 by Ohio State University
0

HippoRAG 2, developed by Ohio State University, is a Retrieval-Augmented Generation (RAG) framework that addresses the limitations of existing RAG systems in simulating human long-term memory. It integrates a personalized PageRank algorithm, deep paragraph consolidation, and efficient use of Large Language Models (LLMs) to enhance knowledge retrieval and generation. During the offline phase, it constructs an open knowledge graph (KG) by extracting triples from paragraphs and detecting synonyms. During online retrieval, it links queries with KG triples, filters irrelevant information, and applies the personalized PageRank algorithm for context-aware retrieval, providing the most relevant paragraphs for Q&A tasks.

Retrieval-Augmented Generation PageRank Algorithm Knowledge Graph Large Language Models Natural Language Processing AI Framework Context-Aware Retrieval Multi-hop Reasoning Continuous Learning Q&A Systems
experimental Open Source
xAR
xAR by ByteDance, Johns Hopkins University
0

xAR is an advanced autoregressive visual generation framework developed by ByteDance and Johns Hopkins University. It addresses common issues in traditional autoregressive models, such as insufficient information density and cumulative errors, through innovative techniques like Next-X Prediction and Noisy Context Learning. These methods enable the framework to predict complex entities like image patches and entire images, improving both the quality and speed of image generation. xAR has demonstrated superior performance on benchmarks like ImageNet, outperforming existing models in both inference speed and generation quality.

Autoregressive Models Visual Generation AI Framework Image Generation Next-X Prediction Noisy Context Learning High-Performance Generation Flow Matching Data Augmentation Art Creation
experimental Open Source
PRefLexOR
PRefLexOR by MIT
0

PRefLexOR (Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning) is a self-learning AI framework developed by MIT. It integrates preference optimization and reinforcement learning to enhance reasoning through iterative inference. The framework uses a recursive reasoning algorithm, enabling the model to perform multi-step reasoning, review, and improve intermediate steps during training and inference, resulting in more accurate outputs. PRefLexOR employs Odds Ratio Preference Optimization (ORPO) and Direct Preference Optimization (DPO) to align reasoning paths with human preferences and improve reasoning quality.

AI Framework Self-Learning Reasoning Optimization Reinforcement Learning Preference Optimization Recursive Reasoning Dynamic Knowledge Graphs Cross-Domain Reasoning Materials Science Open-Domain Problem Solving
experimental Open Source
ViDoRAG
ViDoRAG by Alibaba Tongyi Lab, USTC, SJTU
0

ViDoRAG, developed by Alibaba Tongyi Lab in collaboration with USTC and SJTU, is a visual document retrieval-augmented generation framework. It addresses the limitations of traditional methods in handling complex visual documents by employing a multimodal hybrid retrieval strategy based on Gaussian Mixture Models (GMM). The framework dynamically adjusts the number of retrieval results, optimizing the integration of text and visual information. It includes three agents: Seeker, Inspector, and Answer, which work together to refine answers progressively, improving generation quality and consistency. ViDoRAG significantly outperforms existing methods on the ViDoSeek benchmark dataset, demonstrating its efficiency and superiority in visual document retrieval and reasoning tasks.

Visual Document Retrieval Multimodal Retrieval Dynamic Iterative Reasoning AI Framework Document Processing Complex Document Understanding Generation Consistency Efficient Generation Multi-agent Collaboration Gaussian Mixture Models
production Open Source
Hyper-SD
Hyper-SD by ByteDance
0

Hyper-SD, developed by ByteDance, is an advanced image synthesis framework that addresses the high computational costs of diffusion models during multi-step inference. It employs Trajectory Segmentation Consistency Distillation (TSCD) to maintain data consistency across different time periods, preserving the original ODE trajectory. The framework also integrates human feedback learning to optimize performance in low-step inference scenarios and uses score distillation to enhance single-step inference image quality. Hyper-SD significantly reduces the necessary inference steps while maintaining high image quality, enabling rapid generation of high-resolution images and advancing the field of generative AI.

Image Synthesis Diffusion Models Machine Learning AI Research Generative AI High-Resolution Images Computational Efficiency Human Feedback Learning Score Distillation Low-Rank Adaptation
production
PodAgent
PodAgent by The Chinese University of Hong Kong, Microsoft, Xiaohongshu
0

PodAgent is a podcast generation framework developed by The Chinese University of Hong Kong, Microsoft, and Xiaohongshu. It uses a multi-agent collaboration system to simulate real talk show scenarios, automatically generating rich and structured dialogue content. The framework includes a diverse voice library for precise character-voice matching, ensuring natural and immersive audio. It leverages large language model (LLM)-based speech synthesis to produce expressive and emotional speech, enhancing podcast engagement. PodAgent also provides comprehensive evaluation metrics to measure the quality of generated podcasts, ensuring professionalism and diversity in content.

Podcast Generation AI Collaboration Speech Synthesis Multi-Agent Systems LLM Content Creation Media Production Voice Matching Natural Language Processing Audio Generation
production Open Source

All Frameworks Complete list of AI frameworks, sorted by newest first

TPO
TPO
0

TPO (Test-Time Preference Optimization) is an AI framework that optimizes language model outputs during inference to better align with human preferences. It converts reward signals into textual feedback, marking high-quality responses as "chosen" and low-quality ones as "rejected." TPO generates "textual loss" and "textual gradients" to iteratively improve outputs without updating model parameters. This approach significantly enhances performance across benchmarks, even for models without prior alignment training.

AI Optimization Language Models Human Preferences Inference Optimization Textual Feedback Model Alignment Natural Language Processing Iterative Improvement Scalability Interpretability
experimental Open Source
Phantom
Phantom by ByteDance
0

Phantom, developed by ByteDance's Intelligent Creation Team, is a Subject-to-Video (S2V) generation framework that leverages cross-modal alignment technology. It combines text and image prompts to extract subject elements from reference images, generating video content that aligns with the provided text description. Built on existing Text-to-Video (T2V) and Image-to-Video (I2V) architectures, Phantom redesigns the joint text-image injection model, learning cross-modal alignment from text-image-video triplet data. The framework supports single and multi-subject references, emphasizing subject consistency in human generation tasks and enhancing identity-preserving video generation.

Video Generation Subject-to-Video Cross-Modal Alignment AI Video Tools Creative AI Identity Preservation Multi-Subject Generation Text-to-Video Image-to-Video ByteDance
production Open Source
AgentSociety
AgentSociety by Tsinghua University
0

AgentSociety is a social simulator developed by Tsinghua University, leveraging large language models (LLM) to create intelligent agents with human-like minds. These agents are endowed with emotions, needs, and cognitive abilities, enabling them to simulate complex social behaviors in urban environments. The platform features realistic urban social environment simulation, a large-scale social simulation engine, and a toolbox for intelligent social science research. It is used for analyzing social phenomena, policy sandbox testing, crisis warning, and exploring future social forms.

Social Simulation LLM AI Agents Urban Environment Sociology Distributed Computing Behavioral Modeling Policy Testing Crisis Simulation Research Tool
production Open Source
DualPipe
DualPipe by DeepSeek
0

DualPipe is an open-source bidirectional pipeline parallelism technology developed by DeepSeek, aimed at improving the training efficiency of large-scale deep learning models. It divides the model training process into two independent pipelines—forward computation and backward computation—which are executed in parallel. This approach reduces communication overhead in distributed training by optimizing communication mechanisms and scheduling strategies, making it ideal for large-scale model training.

Deep Learning Pipeline Parallelism Distributed Training Model Optimization AI Development Machine Learning Large-scale Models Computation Overlap Memory Optimization Training Efficiency
production Open Source
HippoRAG2
HippoRAG2 by Ohio State University
0

HippoRAG 2, developed by Ohio State University, is a Retrieval-Augmented Generation (RAG) framework that addresses the limitations of existing RAG systems in simulating human long-term memory. It integrates a personalized PageRank algorithm, deep paragraph consolidation, and efficient use of Large Language Models (LLMs) to enhance knowledge retrieval and generation. During the offline phase, it constructs an open knowledge graph (KG) by extracting triples from paragraphs and detecting synonyms. During online retrieval, it links queries with KG triples, filters irrelevant information, and applies the personalized PageRank algorithm for context-aware retrieval, providing the most relevant paragraphs for Q&A tasks.

Retrieval-Augmented Generation PageRank Algorithm Knowledge Graph Large Language Models Natural Language Processing AI Framework Context-Aware Retrieval Multi-hop Reasoning Continuous Learning Q&A Systems
experimental Open Source
xAR
xAR by ByteDance, Johns Hopkins University
0

xAR is an advanced autoregressive visual generation framework developed by ByteDance and Johns Hopkins University. It addresses common issues in traditional autoregressive models, such as insufficient information density and cumulative errors, through innovative techniques like Next-X Prediction and Noisy Context Learning. These methods enable the framework to predict complex entities like image patches and entire images, improving both the quality and speed of image generation. xAR has demonstrated superior performance on benchmarks like ImageNet, outperforming existing models in both inference speed and generation quality.

Autoregressive Models Visual Generation AI Framework Image Generation Next-X Prediction Noisy Context Learning High-Performance Generation Flow Matching Data Augmentation Art Creation
experimental Open Source
PRefLexOR
PRefLexOR by MIT
0

PRefLexOR (Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning) is a self-learning AI framework developed by MIT. It integrates preference optimization and reinforcement learning to enhance reasoning through iterative inference. The framework uses a recursive reasoning algorithm, enabling the model to perform multi-step reasoning, review, and improve intermediate steps during training and inference, resulting in more accurate outputs. PRefLexOR employs Odds Ratio Preference Optimization (ORPO) and Direct Preference Optimization (DPO) to align reasoning paths with human preferences and improve reasoning quality.

AI Framework Self-Learning Reasoning Optimization Reinforcement Learning Preference Optimization Recursive Reasoning Dynamic Knowledge Graphs Cross-Domain Reasoning Materials Science Open-Domain Problem Solving
experimental Open Source
ViDoRAG
ViDoRAG by Alibaba Tongyi Lab, USTC, SJTU
0

ViDoRAG, developed by Alibaba Tongyi Lab in collaboration with USTC and SJTU, is a visual document retrieval-augmented generation framework. It addresses the limitations of traditional methods in handling complex visual documents by employing a multimodal hybrid retrieval strategy based on Gaussian Mixture Models (GMM). The framework dynamically adjusts the number of retrieval results, optimizing the integration of text and visual information. It includes three agents: Seeker, Inspector, and Answer, which work together to refine answers progressively, improving generation quality and consistency. ViDoRAG significantly outperforms existing methods on the ViDoSeek benchmark dataset, demonstrating its efficiency and superiority in visual document retrieval and reasoning tasks.

Visual Document Retrieval Multimodal Retrieval Dynamic Iterative Reasoning AI Framework Document Processing Complex Document Understanding Generation Consistency Efficient Generation Multi-agent Collaboration Gaussian Mixture Models
production Open Source
Hyper-SD
Hyper-SD by ByteDance
0

Hyper-SD, developed by ByteDance, is an advanced image synthesis framework that addresses the high computational costs of diffusion models during multi-step inference. It employs Trajectory Segmentation Consistency Distillation (TSCD) to maintain data consistency across different time periods, preserving the original ODE trajectory. The framework also integrates human feedback learning to optimize performance in low-step inference scenarios and uses score distillation to enhance single-step inference image quality. Hyper-SD significantly reduces the necessary inference steps while maintaining high image quality, enabling rapid generation of high-resolution images and advancing the field of generative AI.

Image Synthesis Diffusion Models Machine Learning AI Research Generative AI High-Resolution Images Computational Efficiency Human Feedback Learning Score Distillation Low-Rank Adaptation
production
PodAgent
PodAgent by The Chinese University of Hong Kong, Microsoft, Xiaohongshu
0

PodAgent is a podcast generation framework developed by The Chinese University of Hong Kong, Microsoft, and Xiaohongshu. It uses a multi-agent collaboration system to simulate real talk show scenarios, automatically generating rich and structured dialogue content. The framework includes a diverse voice library for precise character-voice matching, ensuring natural and immersive audio. It leverages large language model (LLM)-based speech synthesis to produce expressive and emotional speech, enhancing podcast engagement. PodAgent also provides comprehensive evaluation metrics to measure the quality of generated podcasts, ensuring professionalism and diversity in content.

Podcast Generation AI Collaboration Speech Synthesis Multi-Agent Systems LLM Content Creation Media Production Voice Matching Natural Language Processing Audio Generation
production Open Source