For the Speech-To-Text (STT) option, you need to confirm the language code (using the BCP-47 format) and extension (mp3, wav, etc.) of the audio, besides the Uniform Resource Identifier (URI) of the file. Then, the STT module performs the transcription of audio to text, breaking it into paragraphs according to speech intervals.
The Text-To-Speech (TTS) module receives the language code and a text as inputs to obtain an mp3 audio, which is returned as a Base64 string. Optionally, you can inform the voice pitch, the speaking rate (slower or faster), and the volume gain in Decibel (Db). Then, the TTS module converts the text into audio by using a standard voice. If you have any particular requirements (for example, personalizing the voice of the text-to-audio functionality), or have any questions please contact Sinch. If you are an existing customer, reach out to your Sinch Account Manager.
For both, STT and TTS, you can also include the NLU analysis to extract insights from the message content.