Telegram Voice to Text

Description

The telegramVoiceToText module receives a voice message from Telegram (usually from the telegramReceive node), downloads the audio file from Telegram servers, transcribes it using the OpenAI Whisper API, and automatically sends the transcription back to the user’s Telegram chat. Additionally, it modifies the output data so the type changes from voice to text and the content becomes the transcription, allowing subsequent nodes to process the message as normal text.

Configuration

Parameter	Type	Required	Description
credentials_id	credentials	Yes	Credential with OpenAI apiKey for converting audio to text.

Credentials

A credential with the following field is required:

apiKey: OpenAI API Key with access to the Whisper model for audio transcription.

Additionally, the node expects the input data to contain botToken (Telegram bot token, usually propagated from telegramReceive).

Output

{
  "nextModule": "siguiente_modulo",
  "data": {
    "type": "text",
    "chatId": 123456789,
    "from": { "id": 123456789, "first_name": "Juan" },
    "botToken": "123456:ABCdefGHI",
    "content": "Este es el texto transcrito del audio",
    "transcript": "Este es el texto transcrito del audio",
    "metadata": { "duration": 15, "mime_type": "audio/ogg" }
  }
}

Usage Example

Basic case

{
  "label": "Telegram VoiceToText",
  "credentials_id": "credencial_openai"
}

API Used

Telegram Bot API: GET https://api.telegram.org/bot{token}/getFile to get the file path, and POST https://api.telegram.org/bot{token}/sendMessage to send the transcription.
OpenAI Whisper API: POST https://api.openai.com/v1/audio/transcriptions to transcribe the audio.

Notes

Requires input data to contain content (audio file_id) and botToken (Telegram bot token).
The audio file is temporarily downloaded to the temporal/cli_{client_id}/ folder on the server.
The transcription is automatically sent to the user on Telegram in Markdown format.
The data type is changed from voice to text in the output, facilitating subsequent processing.
The transcript field is added as an additional field to the existing data.
Internally uses the openaiaudioToText module for transcription.

telegramReceive (Telegram message trigger, provides input data)
openaiaudioToText (transcription module used internally)
decision (to evaluate the transcribed content)