Voice To Text

Description

The openaiaudioToText module transcribes audio files to text using the OpenAI Whisper API. It receives the path of an audio file on the local file system, sends it to the OpenAI API, and returns the transcription as text. It is a generic transcription module that can be used directly in a workflow or invoked internally by other modules such as telegramVoiceToText. It supports multiple Whisper-compatible audio formats (ogg, mp3, wav, m4a, etc.).

Configuration

Parameter	Type	Required	Description
credentials_id	credentials	Yes	Credential with OpenAI apiKey for the Whisper API.
audioPath	text	Yes	Absolute path of the audio file to transcribe on the server’s file system.

Credentials

A credential with the following field is required:

apiKey: OpenAI API Key with access to the Whisper model. Alternatively, the OPENAI_API_KEY environment variable can be configured.

Output

{
  "nextModule": "siguiente_modulo",
  "data": {
    "transcript": "Este es el texto transcrito del audio",
    "originalPath": "/ruta/al/archivo/audio.ogg"
  }
}

Usage Example

Basic case

{
  "label": "Voice To Text",
  "credentials_id": "credencial_openai",
  "audioPath": "/temporal/cli_1/archivo.ogg"
}

API Used

OpenAI Whisper API: POST https://api.openai.com/v1/audio/transcriptions
Model: whisper-1
Format: multipart/form-data with the audio file
Documentation: https://platform.openai.com/docs/api-reference/audio/createTranscription

Notes

The audio file must exist on the server’s file system before executing the module. If it does not exist, an error is thrown.
The API Key is first searched in the configured credentials (config.apiKey) and then in the OPENAI_API_KEY environment variable.
Audio formats supported by Whisper: mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg.
The originalPath field in the output contains the path of the processed file, useful for subsequent cleanup.
If a transcription error occurs, an exception is thrown (no soft error is returned).

telegramVoiceToText - Module that uses this internally for Telegram audio
telegramReceive - Trigger that provides audio files