PodAgent

PodAgent

by The Chinese University of Hong Kong, Microsoft, Xiaohongshu
PodAgent is a podcast generation framework developed by CUHK, Microsoft, and Xiaohongshu, using multi-agent collaboration to create structured and engaging dialogue content.

What is PodAgent?

PodAgent is a podcast generation framework developed by The Chinese University of Hong Kong, Microsoft, and Xiaohongshu. It simulates real talk show scenarios using a multi-agent collaboration system to automatically generate rich and structured dialogue content. The framework includes a diverse voice library for precise character-voice matching, ensuring natural and immersive audio. It leverages large language model (LLM)-based speech synthesis to produce expressive and emotional speech, enhancing podcast engagement.

Key Features of PodAgent

  • Generate High-Quality Dialogue Content: Automatically generates rich and diverse dialogue scripts covering various topics.
  • Voice Role Matching: Dynamically matches the most suitable voice based on the character's personality and content context.
  • Speech Synthesis and Expressiveness Enhancement: Adjusts the tone, rhythm, and emotion of the speech according to the dialogue content, making the podcast more lively.
  • Generate Complete Podcast Structure: Supports adding appropriate sound effects and background music to generate a complete podcast structure. It also supports multi-language generation to adapt to different scenarios and audience needs.
  • Evaluation and Optimization: Provides comprehensive evaluation metrics to measure the quality of generated podcasts, including the richness of dialogue content, the accuracy of voice matching, and the expressiveness of speech.

Technical Principles of PodAgent

  • Multi-Agent Collaboration System:
  • Host: Responsible for setting the dialogue outline and guiding the discussion.
  • Guest: Provides professional insights and perspectives based on the role setting.
  • Writer: Integrates dialogue content and optimizes the coherence and diversity of the script.
  • Voice Feature Analysis and Matching: Builds a voice library, analyzes voice features (such as timbre, tone, emotion, etc.), and matches the most suitable voice for each role. It uses open-source datasets (such as LibriTTS and AISHELL-3) to extract voice samples and generates a diverse voice library based on deduplication and filtering.
  • LLM-Guided Speech Synthesis: Uses LLM-based speech synthesis technology to convert text content into natural and expressive speech. It uses the LLM-predicted speaking style as instructions to guide the speech synthesis model (such as CosyVoice) to generate speech that matches the emotional context of the content.
  • Comprehensive Evaluation Metrics: Introduces a set of evaluation metrics to measure the quality of generated podcasts. Metrics include the lexical diversity, semantic richness, and information density of the dialogue content, as well as the accuracy of voice matching and the expressiveness of speech. It uses LLM as an evaluation tool to compare and score the generated content.

Project Address of PodAgent

Application Scenarios of PodAgent

  • Media and Content Creation: Quickly generates high-quality podcast programs covering news, culture, technology, and other topics, saving time and cost in content creation.
  • Education and Learning: Generates educational podcasts, such as language learning and academic lectures, providing a lively and interesting learning experience.
  • Corporate Promotion: Produces brand promotion podcasts, sharing product stories or industry insights to enhance brand influence.
  • Self-Media and Personal Branding: Helps creators quickly generate podcast content, overcoming creative bottlenecks and enhancing content appeal.
  • Entertainment and Creativity: Generates fictional stories, comedy talk shows, and other entertainment podcasts, providing an immersive auditory experience.

Framework Features

Supported Tasks
Podcast Generation Dialogue Content Creation Speech Synthesis Voice Matching Content Evaluation
Tags
Podcast Generation AI Collaboration Speech Synthesis Multi-Agent Systems LLM Content Creation Media Production Voice Matching Natural Language Processing Audio Generation

Getting Started

Pricing
free

Screenshots & Images

Additional Images

Stats

0 Views
0 Favorites
67 GitHub Stars

Community & Support

Similar Frameworks

TPO
0
Phantom by ByteDance
0
AgentSociety by Tsinghua University
0

Recently Viewed

PRefLexOR Framework