---
name: epub
displayName: EPUB Reader & Converter
description: Reads and converts EPUB e-books to plain text, JSON, or CSV. Use
  when opening .epub files, extracting book content, or converting e-books to
  structured data.
tags:
  - epub
  - ebook
  - document
  - converter
  - text-extraction
capabilities:
  - ReadEpub
  - ConvertToJson
  - ConvertToCsv
  - ConvertToText
  - ExtractMetadata
representativeQueries:
  - Read a .epub file and show me the content
  - Convert an epub to JSON or CSV
  - Extract text from an epub e-book
  - Get the chapter list from an epub
  - What metadata is in this epub file?
version: 0.1.0
tier: curated
---

# EPUB Reader & Converter

Reads `.epub` e-book files and converts their content to plain text, JSON metadata+chapters, or CSV (one row per chapter). An EPUB is a ZIP archive containing XHTML chapter files, an OPF package manifest, and optional NCX/nav table-of-contents files. This skill extracts prose in correct spine order and surfaces title, author, and chapter structure.

## When to use

- Opening or previewing an `.epub` file in a readable format.
- Extracting structured chapter data for downstream processing (search, summarization, translation).
- Pulling metadata (title, author, language, publisher) from an e-book.
- Converting an e-book into a format Claude can reason about (JSON or plain text).

## Steps

1. **Detect format** — confirm the file has a `.epub` extension or is a ZIP containing `META-INF/container.xml`.
2. **Route by task** — choose output mode: `text` (default, plain prose), `json` (metadata + chapters array), or `csv` (chapter index, title, word count, text).
3. **Run the bundled script** — `python3 scripts/epub_converter.py <file.epub> [--format text|json|csv]`. The script handles parsing and prints to stdout.
4. **Present output** — for `text`, display or summarize the prose; for `json`/`csv`, pass the structured data to the next step in the workflow.

## Operations

| Capability | CRUD | Resource | Tool |
|---|---|---|---|
| `ReadEpub` | READ | EPUB file (prose + structure) | `scripts/epub_converter.py` |
| `ConvertToText` | READ | Plain-text chapter content | `scripts/epub_converter.py --format text` |
| `ConvertToJson` | READ | JSON with metadata + chapters | `scripts/epub_converter.py --format json` |
| `ConvertToCsv` | READ | CSV chapter rows | `scripts/epub_converter.py --format csv` |
| `ExtractMetadata` | READ | Dublin Core metadata fields | `scripts/epub_converter.py --format json` (metadata block) |

## Output

- **text** — Plain prose concatenated across chapters in spine order, separated by `\n\n--- <Chapter Title> ---\n\n` headers.
- **json** — `{ "metadata": { title, creator, language, publisher, date, identifier }, "chapters": [ { "index", "title", "text" } ] }`.
- **csv** — Columns: `index`, `title`, `word_count`, `text` (one row per chapter).

## Notes

- Uses only the Python standard library (`zipfile`, `xml.etree.ElementTree`, `html.parser`) — no pip installs required.
- DRM-encrypted EPUBs will yield empty chapter text; the script emits a warning to stderr when `META-INF/encryption.xml` is present (heuristic only — some DRM schemes omit this file).
- EPUB 2 and EPUB 3 are both supported. EPUB 3 fixed-layout (comics/picture books) will have little or no prose.
- Spine order is authoritative — do not sort chapters by filename.
- Non-linear spine items (cover pages, footnote sections, embedded TOC documents) are excluded from output; only `linear="yes"` (the default) items are extracted.
- Chapter titles are inferred from the first visible `h1`/`h2`/`h3` heading in each XHTML file, not from the NCX or nav document. The NCX/nav TOC structure is present in the EPUB but not parsed by this script.
- Manifest hrefs containing `..` (valid per the EPUB spec when the OPF is in a subdirectory) are resolved with `posixpath.normpath` before archive lookup.
- For very large EPUBs, pipe the JSON output to `jq` or process chapter-by-chapter to avoid memory pressure.

<!-- runner-fallback -->
## Remote runner (MCP)
Can't run this locally (no setup, missing dependency)? The StealthStack runner exposes the **same code** as server-side MCP tools — no local install needed: `epub_summary`, `epub_to_text`, `epub_to_json`, `epub_to_csv`. Call the `application/mcp` catalog twin of this skill (its `runnerTwin`).
