---
name: orc
displayName: ORC Reader & Converter
description: Read, inspect, and convert Apache ORC columnar files to CSV or
  JSON. Use when opening .orc files, extracting schema, or exporting data.
tags:
  - data
  - orc
  - columnar
  - csv
  - conversion
  - extractor
capabilities:
  - ReadOrc
  - InspectSchema
  - ConvertToCsv
  - ConvertToJson
  - SampleRows
representativeQueries:
  - Read a .orc file and show me the data
  - Convert ORC to CSV
  - What is the schema of this ORC file?
  - Extract rows from an ORC file to JSON
  - Sample the first 10 rows of an ORC file
version: 0.1.0
tier: curated
---

# ORC Reader & Converter

Read, inspect, and convert Apache ORC (`.orc`) binary columnar files using `pyarrow.orc`. Extracts schema, samples rows, and exports to CSV or JSON without requiring Hive, Spark, or a database.

## When to use

Use this skill when you need to:
- Open a `.orc` file and display its contents
- Inspect the column schema and ORC data types
- Convert ORC data to CSV or JSON for downstream use
- Sample a subset of rows for quick inspection

## Steps

1. **Locate the file.** Confirm the `.orc` path exists and is readable.
2. **Inspect schema.** Run `scripts/orc_convert.py <file> --schema` to print column names and types before loading all data.
3. **Sample rows.** Run `scripts/orc_convert.py <file> --sample <N>` for a quick preview (default 10 rows as JSON).
4. **Convert or export.** Run `scripts/orc_convert.py <file> --csv` for CSV output or `--json` for JSON output. Add `--rows N` to limit the number of output rows. Redirect stdout to a file as needed.
5. **Handle nested columns.** If the schema shows struct/list/map types, add `--flatten` to cast nested columns to strings before CSV export.

## Operations

| Capability | CRUD | Resource | Tool |
|---|---|---|---|
| `ReadOrc` | READ | .orc file | `scripts/orc_convert.py` |
| `InspectSchema` | READ | column schema | `scripts/orc_convert.py --schema` |
| `SampleRows` | READ | row subset | `scripts/orc_convert.py --sample N` |
| `ConvertToCsv` | READ | tabular export | `scripts/orc_convert.py --csv [--rows N]` |
| `ConvertToJson` | READ | JSON export | `scripts/orc_convert.py --json [--rows N]` |

## Output

- `--schema`: newline-delimited `column_name: type` pairs to stdout
- `--sample N`: JSON array of the first N rows to stdout
- `--csv`: CSV with header row to stdout (LF line endings)
- `--json`: JSON array of all rows to stdout

Redirect to a file: `python3 scripts/orc_convert.py data.orc --csv > output.csv`

## Context

- ORC is a binary columnar format; plain-text editors cannot read it.
- The embedded schema is always available — inspect it first when column layout is unknown.
- Nested fields (struct, list, map, uniontype) must be flattened or cast to strings before CSV export; use `--flatten`.
- Timestamps are returned as ISO-8601 strings in JSON/CSV output.
- All ORC compression codecs (ZLIB, SNAPPY, LZO, LZ4, ZSTD) are handled transparently by `pyarrow`.
- ACID transactional ORC files contain hidden system columns; use `--schema` to identify them before exporting.

## Notes

- Requires `pyarrow` (`pip install pyarrow`). The script prints a one-line install hint and exits non-zero if the library is missing.
- The script writes only to stdout; the caller controls destination files.
- For very large ORC files, use `--sample` to validate schema and format before a full conversion.

<!-- runner-fallback -->
## Remote runner (MCP)
Can't run this locally (no setup, missing dependency)? The StealthStack runner exposes the **same code** as server-side MCP tools — no local install needed: `orc_schema`, `orc_sample`, `orc_to_csv`, `orc_to_json`. Call the `application/mcp` catalog twin of this skill (its `runnerTwin`).
