---
name: avro
displayName: Avro Reader & Converter
description: Read, inspect, and convert Apache Avro (.avro) files to CSV, JSON,
  or JSONL. Use when opening .avro files, extracting embedded schema, or
  exporting records.
tags:
  - data
  - avro
  - schema
  - csv
  - json
  - conversion
  - extractor
capabilities:
  - ReadAvro
  - InspectSchema
  - ConvertToCsv
  - ConvertToJson
  - ConvertToJsonl
  - SampleRows
representativeQueries:
  - Read a .avro file and show me the data
  - Convert avro to CSV
  - Extract the schema from an avro file
  - Convert avro to JSON or JSONL
  - Sample the first rows of an avro file
version: 0.1.0
tier: curated
---

# Avro Reader & Converter

Read, inspect, and convert Apache Avro (`.avro`) binary container files using `fastavro`. Extracts the embedded writer schema, samples records, and exports to CSV, JSON, or JSONL without any external schema registry or Spark cluster.

## When to use

Use this skill when you need to:
- Open a `.avro` file and display its contents
- Inspect the embedded Avro schema (field names, types, logical types)
- Convert Avro data to CSV, JSON, or JSONL for downstream use
- Sample a subset of records for quick inspection
- Handle nullable union types and logical types (date, decimal, timestamp)

## Steps

1. **Locate the file.** Confirm the `.avro` path exists and is readable.
2. **Inspect schema.** Run `scripts/avro_convert.py <file> --schema` to print the embedded writer schema as formatted JSON before loading all records.
3. **Sample records.** Run `scripts/avro_convert.py <file> --sample <N>` for a quick preview (default 10 rows as a JSON array).
4. **Convert or export.** Run `scripts/avro_convert.py <file> --csv` for CSV output, `--json` for a JSON array, or `--jsonl` for newline-delimited JSON (one record per line).
5. **Limit rows.** Add `--rows N` to any full-export mode to stop after N records (useful for very large files).
6. **Project fields (schema evolution).** To read only a subset of fields or promote types, pass `--reader-schema <schema.json>` with a JSON file containing the desired reader schema. Fields in the writer schema but absent from the reader schema are dropped; new fields in the reader schema receive their default values.

## Operations

| Capability | CRUD | Resource | Tool |
|---|---|---|---|
| `ReadAvro` | READ | .avro container file | `scripts/avro_convert.py` |
| `InspectSchema` | READ | embedded writer schema | `scripts/avro_convert.py --schema` |
| `SampleRows` | READ | record subset | `scripts/avro_convert.py --sample N` |
| `ConvertToCsv` | READ | tabular CSV export | `scripts/avro_convert.py --csv` |
| `ConvertToJson` | READ | JSON array export | `scripts/avro_convert.py --json` |
| `ConvertToJsonl` | READ | JSONL export | `scripts/avro_convert.py --jsonl` |

## Output

- `--schema`: the embedded Avro writer schema as pretty-printed JSON to stdout
- `--sample N`: JSON array of the first N records to stdout
- `--csv`: RFC 4180 CSV with header row to stdout (nested fields serialised as JSON strings)
- `--json`: JSON array of all records to stdout
- `--jsonl`: newline-delimited JSON, one record per line, to stdout

Redirect to a file: `python3 scripts/avro_convert.py data.avro --csv > output.csv`

## Context

- Avro is a binary format; plain-text editors cannot read it.
- The writer schema is always embedded in the file header — inspect it first when field layout is unknown.
- Nested records, arrays, and maps are serialised as JSON strings in CSV output to preserve structure.
- Union fields (e.g. `["null", "string"]`) are resolved to `None` or the concrete value automatically.
- `decimal` logical-type values are returned as Python `Decimal`; the script converts them to strings for JSON safety.
- Timestamps and dates are converted to ISO-8601 strings in all output modes.
- Records are iterated lazily — memory footprint is low even for very large files.

## Notes

- Requires `fastavro` (`pip install fastavro`). The script prints a one-line install hint and exits non-zero if the library is missing.
- The script writes only to stdout; the caller controls destination files.
- For schema evolution (projecting a subset of fields), use the `--reader-schema` flag with a path to a JSON schema file.

<!-- runner-fallback -->
## Remote runner (MCP)
Can't run this locally (no setup, missing dependency)? The StealthStack runner exposes the **same code** as server-side MCP tools — no local install needed: `avro_schema`, `avro_sample`, `avro_to_csv`, `avro_to_json`, `avro_to_jsonl`. Call the `application/mcp` catalog twin of this skill (its `runnerTwin`).
