v0.8.0 Release Notes¶
This release publishes the April 2026 extraction work: stronger pure-Python rich
extraction in light mode, corrected print-area defaults across public
entrypoints, and LibreOffice / OOXML resilience hardening.
Highlights¶
lightnow acts as the pure-Python OOXML-rich baseline for.xlsx/.xlsm, so non-COM environments can emit best-effort:- shapes
- connectors / arrows
- charts
lightnow keepsprint_areasby default across:extract(...)process_excel(...)ExStructEngine- CLI extraction and
--print-areas-dir libreofficenow seeds the same OOXML baseline first and then applies UNO enrichment when available, so fallback paths preserve already recovered rich artifacts where safe.- LibreOffice workbook lifecycle handling is more robust for custom
session_factoryintegrations via typed workbook handles and session-owned close semantics. - OOXML drawing parsing is more resilient and more efficient:
- malformed or corrupt drawing parts now fail per sheet instead of dropping healthy workbook siblings
- worksheet metrics are read with streaming XML parsing
- row/column offset lookups now use cached cumulative offsets
Compatibility Notes¶
- No new extraction CLI commands were added in
v0.8.0. lightmode behavior changed intentionally:- previous releases treated
lightas cells + table candidates only v0.8.0adds best-effort OOXML shapes / connectors / charts for OOXML workbooks and keeps print areas by default.xlsremains outside the new OOXML-rich baseline; the new non-COM rich path applies to.xlsx/.xlsm.- Serialized backend metadata may now report
python_ooxmlprovenance when backend metadata output is enabled. - MCP tool names and payload shapes remain compatible; the release changes the extraction content available behind existing interfaces rather than adding a new transport contract.
Notes¶
- The repository docs/build path still has a pre-existing
mkdocstringsfailure indocs/api.md; this issue was already reproducible before thev0.8.0extraction work and is not introduced by this release. - Review-driven hardening after the initial implementation also restored
process_excel()auto-filter behavior, corrected stale README / architecture wording, and prevented OOXML baseline seeding failures from crashing the LibreOffice pipeline.