Skip to content

v0.6.0 Release Notes

This release adds a new best-effort libreoffice extraction mode for non-COM environments and extends shape/chart metadata with provenance fields.

Highlights

  • Added mode="libreoffice" across the Python API, CLI, and MCP server.
  • Added early validation for .xls + mode="libreoffice" with a clear error.
  • Added extraction-only validation for mode="libreoffice":
  • rejects PDF/PNG rendering
  • rejects auto page-break export
  • Added FallbackReason.LIBREOFFICE_UNAVAILABLE and FallbackReason.LIBREOFFICE_PIPELINE_FAILED.
  • Added backend metadata to shapes/charts:
  • provenance
  • approximation_level
  • confidence
  • serialized output now keeps these fields opt-in via include_backend_metadata
  • Added OOXML-based best-effort reconstruction for:
  • shapes
  • connectors
  • charts
  • Added a LibreOffice runtime helper so server/Linux/macOS environments can opt into rich extraction without Excel COM.
  • Added bundled bridge compatibility probing for LibreOffice Python runtime selection, including fail-fast handling for incompatible EXSTRUCT_LIBREOFFICE_PYTHON_PATH overrides.
  • Added a required Linux GitHub Actions smoke job that installs LibreOffice
  • python3-uno and runs the pytest.mark.libreoffice sample smoke test.

Notes

  • libreoffice is available for .xlsx/.xlsm only.
  • libreoffice is best-effort and not a strict subset of COM output.
  • v1 does not add LibreOffice PDF/PNG rendering or auto page-break extraction.