OCR Extractor

Description

The OCR Extractor module uses the Tesseract.js library to extract text from images. It supports multiple languages and works with common image formats such as JPG, PNG, etc. It is ideal for automating the reading of invoices, delivery notes, scanned documents, or any image that contains text. The image path can come from the node configuration or from the workflow input data.

Configuration

Parameter	Type	Required	Description
imagePath	text	Yes	Local path of the image to analyze (e.g.: uploads/photo.jpg)
lang	text	No	Language of the text in the image. Example: ‘spa’ for Spanish, ‘eng’ for English (default: spa)
persistent	boolean	No	Propagates the input data along with the result

Output

{
  "nextModule": "siguiente_modulo",
  "data": {
    "content": "Texto extraido de la imagen mediante OCR..."
  }
}

Usage Example

Basic case

{
  "imagePath": "/uploads/factura_001.jpg",
  "lang": "spa"
}

Using path from data

{
  "imagePath": "",
  "lang": "eng",
  "persistent": true
}

In this case, the path will be taken from data.imagePath or data.filePath.

Notes

The image path is searched in this order: config.imagePath, data.imagePath, data.filePath
If the file does not exist at the specified path, the module returns an error
Supported languages: spa (Spanish), eng (English), fra (French), deu (German), among others (see Tesseract documentation)
The extracted text is returned clean (trimmed) in the content field
If persistent is active, the input data is preserved along with the result
OCR accuracy depends on the image quality
Does not require credentials

PDF Extractor - Extract text from PDFs
Read Excel - Read data from Excel files