OCR Extractor
Description
Section titled “Description”The OCR Extractor module uses the Tesseract.js library to extract text from images. It supports multiple languages and works with common image formats such as JPG, PNG, etc. It is ideal for automating the reading of invoices, delivery notes, scanned documents, or any image that contains text. The image path can come from the node configuration or from the workflow input data.
Configuration
Section titled “Configuration”| Parameter | Type | Required | Description |
|---|---|---|---|
| imagePath | text | Yes | Local path of the image to analyze (e.g.: uploads/photo.jpg) |
| lang | text | No | Language of the text in the image. Example: ‘spa’ for Spanish, ‘eng’ for English (default: spa) |
| persistent | boolean | No | Propagates the input data along with the result |
Output
Section titled “Output”{ "nextModule": "siguiente_modulo", "data": { "content": "Texto extraido de la imagen mediante OCR..." }}Usage Example
Section titled “Usage Example”Basic case
Section titled “Basic case”{ "imagePath": "/uploads/factura_001.jpg", "lang": "spa"}Using path from data
Section titled “Using path from data”{ "imagePath": "", "lang": "eng", "persistent": true}In this case, the path will be taken from data.imagePath or data.filePath.
- The image path is searched in this order:
config.imagePath,data.imagePath,data.filePath - If the file does not exist at the specified path, the module returns an error
- Supported languages:
spa(Spanish),eng(English),fra(French),deu(German), among others (see Tesseract documentation) - The extracted text is returned clean (trimmed) in the
contentfield - If
persistentis active, the input data is preserved along with the result - OCR accuracy depends on the image quality
- Does not require credentials
Related Nodes
Section titled “Related Nodes”- PDF Extractor - Extract text from PDFs
- Read Excel - Read data from Excel files