Text-to-Speech
Characters: 0/1000
Language
English / United States
Female / Aria - Sexy Female Villain Voice
🎤 Audio generation cost: 3 credit/10s
My Results
Guide
API
Text-to-Speech User Guide
Text-to-Speech Feature Introduction
Text-to-Speech (TTS) is an AI technology that converts written text into natural-sounding speech. Our system supports multiple languages and voices, using advanced neural network models to provide high-quality speech synthesis services.
Supported Languages and Voices
- Multi-language support: Supports 80+ languages including Chinese, English, Japanese, German, French, Spanish, Korean, Arabic, Russian, Dutch, Italian, Polish, Portuguese, and more
- Rich voices: Built-in 400+ high-quality voices, including male, female, and different age groups
- Voice cloning: Support using your own trained custom voices (requires separate training)
- Multiple models: Supports A2E, Cartesia, Minimax, ElevenLabs and other TTS engines
How to Use
Input Text Content
Enter the content you want to convert in the text box, supports up to 1000 characters
Select Language and Voice
Choose the appropriate language, then select your preferred voice from the corresponding language voice list
Adjust Speed (Optional)
Adjust speech playback speed as needed, supports 0.5-2.0x speed
Generate and Download
Click the generate button, the system will create audio files that can be played and downloaded in history
Pricing Information
- A2E Model: Free for VIP/MAX users, 1 credit/10s for regular users
- Cartesia/Minimax Models: 2 credits/10s for all users
- ElevenLabs Model: 3 credits/10s for all users
- Voice Clone Training: A2E free, Cartesia 100 credits, Minimax/ElevenLabs 200 credits
Usage Tips
- Text optimization: Use punctuation properly to help generate more natural speech rhythm
- Language matching: Ensure the selected language matches the text content to avoid pronunciation errors
- Voice selection: Choose appropriate voices based on content type (e.g., formal voices for news broadcasting)
- Long text handling: For very long texts, consider processing in segments with reasonable length per segment
- Special characters: Numbers, English abbreviations, etc. will be automatically converted to corresponding pronunciations based on language