The Voicevox MCP Server is a server that enables the use of VOICEVOX-compatible voice synthesis engines (such as AivisSpeech, VOICEVOX, and COEIROINK) via the Model Context Protocol (MCP). It is designed to facilitate voice synthesis in agent mode using Claude 3.7, particularly in tools like Cursor.
This server allows seamless integration of VOICEVOX voice synthesis capabilities into applications that utilize the MCP. It is particularly useful for developers working on AI-driven voice synthesis projects.
http://localhost:50000
)libsdl2-dev
, pulseaudio-utils
, and pulseaudio
installed/mnt/wslg
Clone the repository:
bash
git clone https://github.com/Dosugamea/voicevox-mcp-server.git
cd voicevox-mcp-server
Install dependencies:
bash
npm install
Configure environment variables by copying .env_example
to .env
and modifying it as needed:
env
VOICEVOX_API_URL=http://localhost:50021
VOICEVOX_SPEAKER_ID=1
To start the server on Windows:
npm run build
npm start
The server runs in stdio mode when using Docker, so no additional commands are needed.
Add the following to your mcp.json
file:
"voicevox": {
"url": "http://localhost:10100/sse"
}
Add the following to your mcp.json
file (note: this has not been fully tested in the author's environment):
{
"tools": {
"voicevox": {
"command": "cmd",
"args": [
"/c",
"docker",
"run",
"-i",
"--rm",
"-v",
"/mnt/wslg:/mnt/wslg",
"-e",
"PULSE_SERVER",
"-e",
"SDL_AUDIODRIVER",
"-e",
"VOICEVOX_API_URL",
"-e",
"VOICEVOX_SPEAKER_ID",
"your-local-docker-image-name"
],
"env": {
"PULSE_SERVER": "unix:/mnt/wslg/PulseServer",
"SDL_AUDIODRIVER": "pulseaudio",
"VOICEVOX_API_URL": "http://host.docker.internal:50031",
"VOICEVOX_SPEAKER_ID": "919692871"
}
}
}
}
The default speaker ID is 1
(corresponding to "四国めたん"). To use a different speaker, modify the VOICEVOX_SPEAKER_ID
environment variable. You can retrieve a list of available speakers using the VOICEVOX ENGINE API:
curl http://localhost:50021/speakers
Contributions are welcome! Please create an issue or submit a pull request for any bugs or feature requests.
This project is licensed under the MIT License.