Text-to-Speech

Characters: 0/1000

Speed

Language

English / United States

Voices

Female / Aria - Sexy Female Villain Voice

🎤 Audio generation cost: 3 credit/10s

My Results

Guide

API

Text-to-Speech User Guide

Text-to-Speech Feature Introduction

Text-to-Speech (TTS) is an AI technology that converts written text into natural-sounding speech. Our system supports multiple languages and voices, using advanced neural network models to provide high-quality speech synthesis services.

Supported Languages and Voices

Multi-language support: Supports 80+ languages including Chinese, English, Japanese, German, French, Spanish, Korean, Arabic, Russian, Dutch, Italian, Polish, Portuguese, and more
Rich voices: Built-in 400+ high-quality voices, including male, female, and different age groups
Voice cloning: Support using your own trained custom voices (requires separate training)
Multiple models: Supports A2E, Cartesia, Minimax, ElevenLabs and other TTS engines

How to Use

Input Text Content

Enter the content you want to convert in the text box, supports up to 1000 characters

Select Language and Voice

Choose the appropriate language, then select your preferred voice from the corresponding language voice list

Adjust Speed (Optional)

Adjust speech playback speed as needed, supports 0.5-2.0x speed

Generate and Download

Click the generate button, the system will create audio files that can be played and downloaded in history

Pricing Information

A2E Model: Free for VIP/MAX users, 1 credit/10s for regular users
Cartesia/Minimax Models: 2 credits/10s for all users
ElevenLabs Model: 3 credits/10s for all users
Voice Clone Training: A2E free, Cartesia 100 credits, Minimax/ElevenLabs 200 credits

Usage Tips

Text optimization: Use punctuation properly to help generate more natural speech rhythm
Language matching: Ensure the selected language matches the text content to avoid pronunciation errors
Voice selection: Choose appropriate voices based on content type (e.g., formal voices for news broadcasting)
Long text handling: For very long texts, consider processing in segments with reasonable length per segment
Special characters: Numbers, English abbreviations, etc. will be automatically converted to corresponding pronunciations based on language