PodAgent

by The Chinese University of Hong Kong, Microsoft, Xiaohongshu

PodAgent is a podcast generation framework developed by CUHK, Microsoft, and Xiaohongshu, using multi-agent collaboration to create structured and engaging dialogue content.

What is PodAgent?

PodAgent is a podcast generation framework developed by The Chinese University of Hong Kong, Microsoft, and Xiaohongshu. It simulates real talk show scenarios using a multi-agent collaboration system to automatically generate rich and structured dialogue content. The framework includes a diverse voice library for precise character-voice matching, ensuring natural and immersive audio. It leverages large language model (LLM)-based speech synthesis to produce expressive and emotional speech, enhancing podcast engagement.

Key Features of PodAgent

Generate High-Quality Dialogue Content: Automatically generates rich and diverse dialogue scripts covering various topics.
Voice Role Matching: Dynamically matches the most suitable voice based on the character's personality and content context.
Speech Synthesis and Expressiveness Enhancement: Adjusts the tone, rhythm, and emotion of the speech according to the dialogue content, making the podcast more lively.
Generate Complete Podcast Structure: Supports adding appropriate sound effects and background music to generate a complete podcast structure. It also supports multi-language generation to adapt to different scenarios and audience needs.
Evaluation and Optimization: Provides comprehensive evaluation metrics to measure the quality of generated podcasts, including the richness of dialogue content, the accuracy of voice matching, and the expressiveness of speech.

Technical Principles of PodAgent

Multi-Agent Collaboration System:
Host: Responsible for setting the dialogue outline and guiding the discussion.
Guest: Provides professional insights and perspectives based on the role setting.
Writer: Integrates dialogue content and optimizes the coherence and diversity of the script.
Voice Feature Analysis and Matching: Builds a voice library, analyzes voice features (such as timbre, tone, emotion, etc.), and matches the most suitable voice for each role. It uses open-source datasets (such as LibriTTS and AISHELL-3) to extract voice samples and generates a diverse voice library based on deduplication and filtering.
LLM-Guided Speech Synthesis: Uses LLM-based speech synthesis technology to convert text content into natural and expressive speech. It uses the LLM-predicted speaking style as instructions to guide the speech synthesis model (such as CosyVoice) to generate speech that matches the emotional context of the content.
Comprehensive Evaluation Metrics: Introduces a set of evaluation metrics to measure the quality of generated podcasts. Metrics include the lexical diversity, semantic richness, and information density of the dialogue content, as well as the accuracy of voice matching and the expressiveness of speech. It uses LLM as an evaluation tool to compare and score the generated content.

Project Address of PodAgent

GitHub Repository: https://github.com/yujxx/PodAgent
arXiv Technical Paper: https://arxiv.org/pdf/2503.00455

Application Scenarios of PodAgent

Media and Content Creation: Quickly generates high-quality podcast programs covering news, culture, technology, and other topics, saving time and cost in content creation.
Education and Learning: Generates educational podcasts, such as language learning and academic lectures, providing a lively and interesting learning experience.
Corporate Promotion: Produces brand promotion podcasts, sharing product stories or industry insights to enhance brand influence.
Self-Media and Personal Branding: Helps creators quickly generate podcast content, overcoming creative bottlenecks and enhancing content appeal.
Entertainment and Creativity: Generates fictional stories, comedy talk shows, and other entertainment podcasts, providing an immersive auditory experience.

Framework Features

Supported Tasks

Podcast Generation Dialogue Content Creation Speech Synthesis Voice Matching Content Evaluation

Getting Started

Pricing

free

Screenshots & Images

Additional Images

View Repository

Stats

0 Views

0 Favorites

67 GitHub Stars

Community & Support

GitHub Repository

Similar Frameworks

TPO

Phantom by ByteDance

AgentSociety by Tsinghua University

PodAgent

What is PodAgent?

Key Features of PodAgent

Technical Principles of PodAgent

Project Address of PodAgent

Application Scenarios of PodAgent

Framework Features

Getting Started

Screenshots & Images

Stats

Community & Support

Similar Frameworks

Recently Viewed

Company

Categories

Stay Updated

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

Details

Frameworks

Database

Billing

Completed

Project Type

Project Settings

Drop files here or click to upload.

Budget

Build a Team

Set First Target

Upload Files

Drop files here or click to upload.

Project Created!

No result found

Advanced Search

Search Preferences

PodAgent

What is PodAgent?

Key Features of PodAgent

Technical Principles of PodAgent

Project Address of PodAgent

Application Scenarios of PodAgent

Framework Features

Getting Started

Screenshots & Images

Stats

Community & Support

Similar Frameworks

Recently Viewed

Company

Categories

Stay Updated

Drop files here or click to upload.

Drop files here or click to upload.