---
name: image-ocr
displayName: Image OCR
description: Extract and convert text from images (.png, .jpg, .tiff) using OCR.
  Use when you need to read, copy, or export text from scanned documents,
  screenshots, or photos.
tags:
  - ocr
  - image
  - text-extraction
  - png
  - jpg
  - tiff
  - converter
capabilities:
  - ReadImageText
  - ExtractTextFromImage
  - ConvertImageToJson
  - ConvertImageToCsv
representativeQueries:
  - Read text from a PNG file
  - Extract text from a scanned TIFF document
  - Convert image text to JSON or CSV
  - OCR a JPG photo and get the text out
  - What does this image say
version: 0.1.0
tier: curated
---

# Image OCR

Extract human-readable text from image files (.png, .jpg, .tiff) using OCR (Optical Character Recognition). The converter script wraps Tesseract via pytesseract and outputs plain text, JSON, or CSV to stdout.

## When to use

- Reading text from scanned documents, receipts, or photos
- Converting a screenshot or image containing a table into structured data
- Pulling text out of a .tiff exported from a scanner or fax
- Any workflow where the source is an image rather than a text file

## Steps

1. **Locate the source image.** Accept a file path to a .png, .jpg/.jpeg, or .tiff file.
2. **Open and pre-process.** Load with Pillow; convert to grayscale (always applied — improves OCR accuracy on colour images). Additional steps such as upscaling to >=300 DPI or thresholding can be added in `preprocess()` for low-quality scans.
3. **Run OCR.** Call `pytesseract.image_to_data()` against the image; the structured output contains bounding boxes and line/paragraph coordinates used for all three output modes.
4. **Structure the output.** Emit plain text by default; pass `--format json` or `--format csv` to get word-level structured output with confidence scores.
5. **Handle multi-page TIFF.** If the file is a TIFF with multiple frames, iterate pages and concatenate or label each page's text.

## Output

- **plain** (default): raw extracted text printed to stdout.
- **json**: list of word records `[{"page": 1, "word": "...", "conf": 95, "left": 10, "top": 20, "width": 50, "height": 15}]`
- **csv**: same fields as CSV with a header row.

Exit code 0 on success, non-zero on error (missing file, missing library, Tesseract not installed).

## Notes

- Install both the system Tesseract binary and the Python wrapper before use; the script prints a one-line `pip install` hint if pytesseract is missing.
- For non-English documents, install the matching Tesseract language pack and pass `--lang <code>` (e.g. `--lang fra`).
- Image quality is the top accuracy lever: low-DPI or blurry scans produce poor results regardless of Tesseract settings.
- See `scripts/ocr_image.py --help` for all options.

<!-- runner-fallback -->
## Remote runner (MCP)
Can't run this locally (no setup, missing dependency)? The StealthStack runner exposes the **same code** as server-side MCP tools — no local install needed: `image-ocr_extract_text`, `image-ocr_extract_to_json`, `image-ocr_extract_to_csv`. Call the `application/mcp` catalog twin of this skill (its `runnerTwin`).
