Skip to content

v0.8.0 Release Notes

This release publishes the April 2026 extraction work: stronger pure-Python rich extraction in light mode, corrected print-area defaults across public entrypoints, and LibreOffice / OOXML resilience hardening.

Highlights

  • light now acts as the pure-Python OOXML-rich baseline for .xlsx / .xlsm, so non-COM environments can emit best-effort:
  • shapes
  • connectors / arrows
  • charts
  • light now keeps print_areas by default across:
  • extract(...)
  • process_excel(...)
  • ExStructEngine
  • CLI extraction and --print-areas-dir
  • libreoffice now seeds the same OOXML baseline first and then applies UNO enrichment when available, so fallback paths preserve already recovered rich artifacts where safe.
  • LibreOffice workbook lifecycle handling is more robust for custom session_factory integrations via typed workbook handles and session-owned close semantics.
  • OOXML drawing parsing is more resilient and more efficient:
  • malformed or corrupt drawing parts now fail per sheet instead of dropping healthy workbook siblings
  • worksheet metrics are read with streaming XML parsing
  • row/column offset lookups now use cached cumulative offsets

Compatibility Notes

  • No new extraction CLI commands were added in v0.8.0.
  • light mode behavior changed intentionally:
  • previous releases treated light as cells + table candidates only
  • v0.8.0 adds best-effort OOXML shapes / connectors / charts for OOXML workbooks and keeps print areas by default
  • .xls remains outside the new OOXML-rich baseline; the new non-COM rich path applies to .xlsx / .xlsm.
  • Serialized backend metadata may now report python_ooxml provenance when backend metadata output is enabled.
  • MCP tool names and payload shapes remain compatible; the release changes the extraction content available behind existing interfaces rather than adding a new transport contract.

Notes

  • The repository docs/build path still has a pre-existing mkdocstrings failure in docs/api.md; this issue was already reproducible before the v0.8.0 extraction work and is not introduced by this release.
  • Review-driven hardening after the initial implementation also restored process_excel() auto-filter behavior, corrected stale README / architecture wording, and prevented OOXML baseline seeding failures from crashing the LibreOffice pipeline.