Voice To Text
Description
Section titled “Description”The openaiaudioToText module transcribes audio files to text using the OpenAI Whisper API. It receives the path of an audio file on the local file system, sends it to the OpenAI API, and returns the transcription as text. It is a generic transcription module that can be used directly in a workflow or invoked internally by other modules such as telegramVoiceToText. It supports multiple Whisper-compatible audio formats (ogg, mp3, wav, m4a, etc.).
Configuration
Section titled “Configuration”| Parameter | Type | Required | Description |
|---|---|---|---|
| credentials_id | credentials | Yes | Credential with OpenAI apiKey for the Whisper API. |
| audioPath | text | Yes | Absolute path of the audio file to transcribe on the server’s file system. |
Credentials
Section titled “Credentials”A credential with the following field is required:
apiKey: OpenAI API Key with access to the Whisper model. Alternatively, theOPENAI_API_KEYenvironment variable can be configured.
Output
Section titled “Output”{ "nextModule": "siguiente_modulo", "data": { "transcript": "Este es el texto transcrito del audio", "originalPath": "/ruta/al/archivo/audio.ogg" }}Usage Example
Section titled “Usage Example”Basic case
Section titled “Basic case”{ "label": "Voice To Text", "credentials_id": "credencial_openai", "audioPath": "/temporal/cli_1/archivo.ogg"}API Used
Section titled “API Used”- OpenAI Whisper API:
POST https://api.openai.com/v1/audio/transcriptions - Model:
whisper-1 - Format: multipart/form-data with the audio file
- Documentation: https://platform.openai.com/docs/api-reference/audio/createTranscription
- The audio file must exist on the server’s file system before executing the module. If it does not exist, an error is thrown.
- The API Key is first searched in the configured credentials (
config.apiKey) and then in theOPENAI_API_KEYenvironment variable. - Audio formats supported by Whisper: mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg.
- The
originalPathfield in the output contains the path of the processed file, useful for subsequent cleanup. - If a transcription error occurs, an exception is thrown (no soft error is returned).
Related Nodes
Section titled “Related Nodes”- telegramVoiceToText - Module that uses this internally for Telegram audio
- telegramReceive - Trigger that provides audio files