Text-to-Speech

Characters: 0/1000
Language
🎤 Audio generation cost: 3 credit/10s

Text-to-Speech User Guide

Text-to-Speech Feature Introduction

Text-to-Speech (TTS) is an AI technology that converts written text into natural-sounding speech. Our system supports multiple languages and voices, using advanced neural network models to provide high-quality speech synthesis services.
Supported Languages and Voices
  • Multi-language support: Supports 80+ languages including Chinese, English, Japanese, German, French, Spanish, Korean, Arabic, Russian, Dutch, Italian, Polish, Portuguese, and more
  • Rich voices: Built-in 400+ high-quality voices, including male, female, and different age groups
  • Voice cloning: Support using your own trained custom voices (requires separate training)
  • Multiple models: Supports A2E, Cartesia, Minimax, ElevenLabs and other TTS engines
How to Use
Input Text Content
Enter the content you want to convert in the text box, supports up to 1000 characters
Select Language and Voice
Choose the appropriate language, then select your preferred voice from the corresponding language voice list
Adjust Speed (Optional)
Adjust speech playback speed as needed, supports 0.5-2.0x speed
Generate and Download
Click the generate button, the system will create audio files that can be played and downloaded in history
Pricing Information
  • A2E Model: Free for VIP/MAX users, 1 credit/10s for regular users
  • Cartesia/Minimax Models: 2 credits/10s for all users
  • ElevenLabs Model: 3 credits/10s for all users
  • Voice Clone Training: A2E free, Cartesia 100 credits, Minimax/ElevenLabs 200 credits
Usage Tips
  • Text optimization: Use punctuation properly to help generate more natural speech rhythm
  • Language matching: Ensure the selected language matches the text content to avoid pronunciation errors
  • Voice selection: Choose appropriate voices based on content type (e.g., formal voices for news broadcasting)
  • Long text handling: For very long texts, consider processing in segments with reasonable length per segment
  • Special characters: Numbers, English abbreviations, etc. will be automatically converted to corresponding pronunciations based on language