---
name: parquet
displayName: Parquet Reader & Converter
description: Read, inspect, and convert Apache Parquet columnar files to CSV or
  JSON. Use when opening .parquet files, extracting schema, or exporting data.
tags:
  - data
  - parquet
  - columnar
  - csv
  - conversion
  - extractor
capabilities:
  - ReadParquet
  - InspectSchema
  - ConvertToCsv
  - ConvertToJson
  - SampleRows
representativeQueries:
  - Read a .parquet file and show me the data
  - Convert parquet to CSV
  - What is the schema of this parquet file?
  - Extract rows from a parquet file to JSON
  - Sample the first 10 rows of a parquet file
version: 0.1.0
tier: curated
---

# Parquet Reader & Converter

Read, inspect, and convert Apache Parquet (`.parquet`) binary columnar files using `pyarrow`. Extracts schema, samples rows, and exports to CSV or JSON without requiring a database or Spark cluster.

## When to use

Use this skill when you need to:
- Open a `.parquet` file and display its contents
- Inspect the column schema and data types
- Convert Parquet data to CSV or JSON for downstream use
- Sample a subset of rows for quick inspection

## Steps

1. **Locate the file.** Confirm the `.parquet` path exists and is readable.
2. **Inspect schema.** Run `scripts/parquet_convert.py <file> --schema` to print column names and types before loading all data.
3. **Sample rows.** Run `scripts/parquet_convert.py <file> --sample <N>` for a quick preview (default 10 rows as JSON).
4. **Convert or export.** Run `scripts/parquet_convert.py <file> --csv` for CSV output or `--json` for JSON output. Redirect stdout to a file as needed.
5. **Handle nested columns.** If the schema shows struct/list types, add `--flatten` to cast nested columns to strings before CSV export.

## Operations

| Capability | CRUD | Resource | Tool |
|---|---|---|---|
| `ReadParquet` | READ | .parquet file | `scripts/parquet_convert.py` |
| `InspectSchema` | READ | column schema | `scripts/parquet_convert.py --schema` |
| `SampleRows` | READ | row subset | `scripts/parquet_convert.py --sample N` |
| `ConvertToCsv` | READ | tabular export | `scripts/parquet_convert.py --csv` |
| `ConvertToJson` | READ | JSON export | `scripts/parquet_convert.py --json` |

## Output

- `--schema`: newline-delimited `column_name: type` pairs to stdout
- `--sample N`: JSON array of the first N rows to stdout
- `--csv`: RFC 4180 CSV with header row to stdout
- `--json`: JSON array of all rows to stdout

Redirect to a file: `python3 scripts/parquet_convert.py data.parquet --csv > output.csv`

## Context

- Parquet is a binary columnar format; plain-text editors cannot read it.
- The embedded schema is always available — inspect it first when column layout is unknown.
- Nested fields (structs, lists, maps) must be flattened or cast to strings before CSV export; use `--flatten`.
- Timestamps are returned as ISO-8601 strings in JSON/CSV output.
- Compression (Snappy, GZIP, ZSTD, LZ4) is handled transparently by `pyarrow`.
- Very wide or very large files: stream in chunks or use `--sample` to validate format before full conversion.

## Notes

- Requires `pyarrow` (`pip install pyarrow`). The script prints a one-line install hint and exits non-zero if the library is missing.
- For multi-file Parquet datasets (Spark-style `part-*.parquet` directories), pass the directory path — `pyarrow.parquet.read_table` accepts both.
- The script writes only to stdout; the caller controls destination files.

<!-- runner-fallback -->
## Remote runner (MCP)
Can't run this locally (no setup, missing dependency)? The StealthStack runner exposes the **same code** as server-side MCP tools — no local install needed: `parquet_schema`, `parquet_sample`, `parquet_to_csv`, `parquet_to_json`. Call the `application/mcp` catalog twin of this skill (its `runnerTwin`).
