FunAudioLLM

by Alibaba Tongyi Lab

FunAudioLLM is an open-source speech large model project by Alibaba Tongyi Lab, featuring SenseVoice for multilingual speech recognition and CosyVoice for natural speech generation.

What is FunAudioLLM?

FunAudioLLM is an open-source speech large model project developed by Alibaba Tongyi Lab, consisting of two models: SenseVoice and CosyVoice. SenseVoice excels in multilingual speech recognition and emotion detection, supporting over 50 languages, with particularly strong performance in Chinese and Cantonese. CosyVoice focuses on natural speech generation, capable of controlling tone and emotion, and supports Chinese, English, Japanese, Cantonese, and Korean. FunAudioLLM is suitable for scenarios such as multilingual translation and emotional voice dialogue. The related models and code have been open-sourced on the Modelscope and Huggingface platforms.

Main Features of FunAudioLLM

SenseVoice Model:
Focuses on high-accuracy multilingual speech recognition.
Supports over 50 languages, with superior recognition performance in Chinese and Cantonese.
Features emotion detection, capable of identifying various human-computer interaction events.
Offers both lightweight and large versions to adapt to different application scenarios.
CosyVoice Model:
Focuses on natural speech generation, supporting multilingual, tone, and emotion control.
Can quickly generate simulated tones based on a small amount of original audio, including rhythm and emotional details.
Supports cross-language speech generation and fine-grained emotion control.

Project Addresses of FunAudioLLM

Project Official Website: https://fun-audio-llm.github.io/
CosyVoice Online Experience: https://www.modelscope.cn/studios/iic/CosyVoice-300M
SenseVoice Online Experience: https://www.modelscope.cn/studios/iic/SenseVoice
GitHub Repository: https://github.com/FunAudioLLM
arXiv Technical Paper: https://arxiv.org/abs/2407.04051

Application Scenarios of FunAudioLLM

Developers and Researchers: Use FunAudioLLM for research and development in speech recognition, speech synthesis, and emotion analysis.
Enterprise Users: Apply FunAudioLLM in customer service, intelligent assistants, and multilingual translation to improve efficiency and user experience.
Content Creators: Use FunAudioLLM to generate audiobooks or podcasts, enriching content forms and attracting more listeners.
Education Sector: Utilize FunAudioLLM for language learning and listening training to enhance learning efficiency and interest.
People with Disabilities: Assist visually impaired individuals in accessing information through voice interaction, improving life convenience.

Model Capabilities

Model Type

speech

Supported Tasks

Multilingual Speech Recognition Speech Synthesis Emotion Detection Voice Generation Audio Event Detection

Usage & Integration

Pricing

free

License

Open Source Open Source

Screenshots & Images

Primary Screenshot

Additional Images

Try Now View Demo

Stats

96 Views

0 Favorites

Community & Support

GitHub Repository

Similar Models

Ola by Tsinghua University, Tencent Hunyuan Research Team, NUS S-Lab

627

Zonos by Zyphra

516

Step-Video-T2V by Leapfrogging Star

639

FunAudioLLM

What is FunAudioLLM?

Main Features of FunAudioLLM

Project Addresses of FunAudioLLM

Application Scenarios of FunAudioLLM

Model Capabilities

Usage & Integration

Screenshots & Images

Stats

Community & Support

Similar Models

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

Details

Frameworks

Database

Billing

Completed

Project Type

Project Settings

Drop files here or click to upload.

Budget

Build a Team

Set First Target

Upload Files

Drop files here or click to upload.

Project Created!

No result found

Advanced Search

Search Preferences

FunAudioLLM

What is FunAudioLLM?

Main Features of FunAudioLLM

Project Addresses of FunAudioLLM

Application Scenarios of FunAudioLLM

Model Capabilities

Usage & Integration

Screenshots & Images

Stats

Community & Support

Similar Models

Drop files here or click to upload.

Drop files here or click to upload.