Skip to content

OCR Extractor

The OCR Extractor module uses the Tesseract.js library to extract text from images. It supports multiple languages and works with common image formats such as JPG, PNG, etc. It is ideal for automating the reading of invoices, delivery notes, scanned documents, or any image that contains text. The image path can come from the node configuration or from the workflow input data.

ParameterTypeRequiredDescription
imagePathtextYesLocal path of the image to analyze (e.g.: uploads/photo.jpg)
langtextNoLanguage of the text in the image. Example: ‘spa’ for Spanish, ‘eng’ for English (default: spa)
persistentbooleanNoPropagates the input data along with the result
{
"nextModule": "siguiente_modulo",
"data": {
"content": "Texto extraido de la imagen mediante OCR..."
}
}
{
"imagePath": "/uploads/factura_001.jpg",
"lang": "spa"
}
{
"imagePath": "",
"lang": "eng",
"persistent": true
}

In this case, the path will be taken from data.imagePath or data.filePath.

  • The image path is searched in this order: config.imagePath, data.imagePath, data.filePath
  • If the file does not exist at the specified path, the module returns an error
  • Supported languages: spa (Spanish), eng (English), fra (French), deu (German), among others (see Tesseract documentation)
  • The extracted text is returned clean (trimmed) in the content field
  • If persistent is active, the input data is preserved along with the result
  • OCR accuracy depends on the image quality
  • Does not require credentials