---
name: archives
displayName: Archive Container Reader & Converter
description: Read, list, and extract archive containers (.zip, .tar, .gz, .7z)
  to JSON or text. Use when inspecting or unpacking compressed file bundles.
tags:
  - archive
  - zip
  - tar
  - gzip
  - 7z
  - extraction
  - compression
  - conversion
capabilities:
  - ListContents
  - ExtractFiles
  - ReadArchive
  - ConvertToJson
  - InspectMetadata
representativeQueries:
  - Read a .zip file and show me what's inside
  - Extract files from a tar.gz archive
  - List contents of a .7z archive
  - Convert archive contents to JSON
  - Inspect files in a compressed archive
version: 0.1.0
tier: curated
---

# Archive Container Reader & Converter

Reads, lists, and extracts files from compressed archive containers (`.zip`, `.tar`, `.tar.gz`, `.tar.bz2`, `.tar.xz`, `.tgz`, `.gz`, `.7z`). ZIP, tar, and gzip formats are handled via Python's standard library. `.7z` requires the third-party `py7zr` package or a system `7z`/`7za` binary. Outputs a file listing as text or JSON, or extracts member contents to stdout or disk. This is an `extractor` primitive: it locates the archive source, parses headers, and structures the output.

## When to use

Use this skill when:
- You need to see what files are inside an archive without fully extracting it.
- You want to extract one or more members from a `.zip`, `.tar.*`, or `.gz` archive.
- A pipeline needs archive contents as JSON (file names, sizes, timestamps).
- You are inspecting an unknown compressed bundle to understand its structure.

Do NOT use for:
- `.docx`, `.xlsx`, `.pptx`, `.epub`, `.jar` — these are ZIPs with domain-specific schemas; use their dedicated skills.
- Password-protected ZIPs (no password support — the script has no `--password` argument).
- `.7z` archives when neither `py7zr` nor a system `7z`/`7za` binary is available.

## Steps

1. **Detect format.** Identify the archive type by extension (`.zip`, `.tar`, `.tar.gz`, `.tgz`, `.tar.bz2`, `.tar.xz`, `.gz`, `.7z`) or by magic bytes. The script auto-detects.
2. **List contents (default).** Run `python3 scripts/archive_reader.py <file>` to print a table of members (name, size, compressed size, modified time).
3. **Convert to JSON.** Run `python3 scripts/archive_reader.py <file> --format json` to emit a JSON array of member objects.
4. **Extract a member.** Run `python3 scripts/archive_reader.py <file> --extract <member-path>` to decompress and print a single member's content to stdout.
5. **Extract all to disk.** Run `python3 scripts/archive_reader.py <file> --extract-all <dest-dir>` to unpack the archive into a destination directory.
6. **Inspect output.** Redirect stdout to a file if needed.

## Operations

| Capability | CRUD | Resource | Tool |
|---|---|---|---|
| `ListContents` | READ | archive member index | `scripts/archive_reader.py` |
| `ExtractFiles` | READ | member data | `scripts/archive_reader.py --extract` |
| `ReadArchive` | READ | archive container | `scripts/archive_reader.py` |
| `ConvertToJson` | READ | member metadata | `scripts/archive_reader.py --format json` |
| `InspectMetadata` | READ | member headers | `scripts/archive_reader.py --format json` |

## Output

- **Text mode (default):** tab-separated table with columns `Name`, `Size`, `Compressed`, `Modified`, printed to stdout. One row per archive member.
- **JSON mode (`--format json`):** JSON array of objects: `[{"name": "...", "size": <int>, "compressed": <int>, "modified": "ISO-8601", "is_dir": <bool>}]`.
- **Extract (`--extract <member>`):** raw decompressed bytes of the named member to stdout.
- **Extract all (`--extract-all <dir>`):** members written to `<dir>/`; prints extracted file paths to stdout. Note: for `.7z` archives, only the destination directory path is printed (not individual file paths) due to how the py7zr and system-binary extraction APIs report results.

## Notes

- `.7z` requires either `py7zr` (`pip install py7zr`) or a system `7z`/`7za` binary. The script will detect which is available and report clearly if neither is found.
- `.gz` files (bare gzip, not tar) wrap a single stream. There is no member list — the script decompresses and streams the content directly.
- For safety, `--extract-all` sanitizes paths: leading `/` and `..` components are stripped from member names so all output lands inside the destination directory. Members whose entire path reduces to nothing after sanitization (e.g. a member named `..`) are skipped with a warning to stderr.
- Tar path traversal: the script uses `extractfile()` per member rather than `tarfile.extractall()`, so the `filter` parameter does not apply. Path safety is enforced by the `safe_member_path()` sanitiser, which strips leading `/` and `..` components before writing. Symlinks and device files are skipped entirely.
- OOXML trick: `.docx`, `.xlsx`, and `.pptx` files are valid ZIPs. You can use this script to list their XML parts, but interpreting them requires the dedicated skill for that format.
- Encoding: ZIP member names default to CP437; Python's `zipfile` module handles the UTF-8 flag (bit 11 of the general-purpose bit flag) automatically and decodes accordingly. The script relies on this stdlib behaviour rather than performing explicit flag detection.

<!-- runner-fallback -->
## Remote runner (MCP)
Can't run this locally (no setup, missing dependency)? The StealthStack runner exposes the **same code** as server-side MCP tools — no local install needed: `archives_list`, `archives_extract_member`, `archives_extract_all`. Call the `application/mcp` catalog twin of this skill (its `runnerTwin`).
