---
name: hdf5
displayName: HDF5 Reader & Converter
description: Read and convert HDF5 (.h5, .hdf5) files to JSON or CSV. Use when
  inspecting scientific or ML datasets stored in hierarchical binary format.
tags:
  - data
  - hdf5
  - h5
  - converter
  - extractor
  - scientific-data
capabilities:
  - ReadHdf5
  - InspectStructure
  - ConvertToCsv
  - ConvertToJson
  - SliceDataset
representativeQueries:
  - Read a .h5 file and show me its contents
  - Convert HDF5 to CSV
  - What datasets are inside this .hdf5 file?
  - Extract a specific group or dataset from an HDF5 file
  - Convert HDF5 to JSON for downstream processing
version: 0.1.0
tier: curated
---

# HDF5 Reader & Converter

Read, inspect, and convert HDF5 (`.h5` / `.hdf5`) files — the binary hierarchical format used widely in scientific computing, machine learning checkpoints, and large numerical datasets. The skill surfaces group/dataset structure as readable text, and converts datasets to JSON or CSV via `scripts/convert_hdf5.py`.

## When to use

- You have a `.h5` or `.hdf5` file and want to know what is inside it.
- You need to export one or more datasets to CSV or JSON for use in a spreadsheet, database, or downstream pipeline.
- You are debugging an ML training checkpoint or a scientific data file and need to inspect shapes, dtypes, and sample values.

## Steps

1. **Locate the file.** Confirm the path to the `.h5` or `.hdf5` file on disk.
2. **Inspect structure.** Run `python3 scripts/convert_hdf5.py <file>` with no extra flags to print the full group/dataset tree (names, shapes, dtypes, and a sample of values).
3. **Target a dataset.** Identify the dataset path inside the file (e.g. `/data/signal` or `/model/weights`).
4. **Convert.** Pass `--format csv` or `--format json` and optionally `--dataset <path>` to export. Pipe or redirect stdout to a file.
5. **Slice if large.** Use `--rows N` to limit output to N rows/elements. Use `--start M --rows N` to extract a window starting at row M (e.g. rows 1000–2000: `--start 1000 --rows 1000`).

## Output

- **Default (no flags):** human-readable tree of all groups and datasets with shape, dtype, and up to 5 sample values.
- **`--format json`:** With `--dataset`: a JSON array of row-objects. Without `--dataset`: a JSON object keyed by dataset path, each value being an array of row-objects.
- **`--format csv`:** CSV with a header row; compound datasets use field names as columns.

## Notes

- Requires `h5py`. If missing, the script prints `pip install h5py` and exits non-zero.
- Open files with `mode='r'` (the script default) to prevent accidental writes; for true SWMR read-while-write scenarios h5py also requires `swmr=True`.
- Multidimensional (3-D+) arrays are flattened per row; use `--start`/`--rows` to window-slice large tensors without loading the full dataset.
- Some HDF5 files use external or virtual links that reference paths on the original machine; those entries will show an error annotation but the rest of the file remains readable.

<!-- runner-fallback -->
## Remote runner (MCP)
Can't run this locally (no setup, missing dependency)? The StealthStack runner exposes the **same code** as server-side MCP tools — no local install needed: `hdf5_tree`, `hdf5_to_json`, `hdf5_to_csv`. Call the `application/mcp` catalog twin of this skill (its `runnerTwin`).
