Skip to content

Voice To Text

The openaiaudioToText module transcribes audio files to text using the OpenAI Whisper API. It receives the path of an audio file on the local file system, sends it to the OpenAI API, and returns the transcription as text. It is a generic transcription module that can be used directly in a workflow or invoked internally by other modules such as telegramVoiceToText. It supports multiple Whisper-compatible audio formats (ogg, mp3, wav, m4a, etc.).

ParameterTypeRequiredDescription
credentials_idcredentialsYesCredential with OpenAI apiKey for the Whisper API.
audioPathtextYesAbsolute path of the audio file to transcribe on the server’s file system.

A credential with the following field is required:

  • apiKey: OpenAI API Key with access to the Whisper model. Alternatively, the OPENAI_API_KEY environment variable can be configured.
{
"nextModule": "siguiente_modulo",
"data": {
"transcript": "Este es el texto transcrito del audio",
"originalPath": "/ruta/al/archivo/audio.ogg"
}
}
{
"label": "Voice To Text",
"credentials_id": "credencial_openai",
"audioPath": "/temporal/cli_1/archivo.ogg"
}
  • The audio file must exist on the server’s file system before executing the module. If it does not exist, an error is thrown.
  • The API Key is first searched in the configured credentials (config.apiKey) and then in the OPENAI_API_KEY environment variable.
  • Audio formats supported by Whisper: mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg.
  • The originalPath field in the output contains the path of the processed file, useful for subsequent cleanup.
  • If a transcription error occurs, an exception is thrown (no soft error is returned).
  • telegramVoiceToText - Module that uses this internally for Telegram audio
  • telegramReceive - Trigger that provides audio files