API Reference¶
This page shows the primary APIs, minimal runnable examples, expected outputs,
and the dependencies required for optional features. Hyperlinks are included
when include_cell_links=True (or when using mode="verbose"). Auto
page-break areas are COM-only and appear when auto page-break
extraction/output is enabled (CLI exposes the option only when COM is
available).
TOC¶
- API Reference
- TOC
- Quick Examples
- Editing API
- Dependencies
- Auto-generated API mkdocstrings
- Models
- Error Handling
- Tuning Examples
Quick Examples¶
from exstruct import extract, export
wb = extract("sample.xlsx", mode="standard")
export(wb, "out.json") # compact JSON by default
Expected JSON snippet (links appear when enabled):
{
"book_name": "sample.xlsx",
"sheets": {
"Sheet1": {
"rows": [{ "r": 1, "c": { "0": "Name", "1": "Age" }, "links": null }],
"shapes": [
{
"text": "note",
"l": 10,
"t": 20,
"w": 80,
"h": 24,
"type": "TextBox"
}
],
"charts": [],
"table_candidates": ["A1:B5"]
}
}
}
CLI-equivalent flow via Python:
from pathlib import Path
from exstruct import process_excel
process_excel(
file_path=Path("input.xlsx"),
output_path=None, # default: stdout (redirect if you want a file)
sheets_dir=Path("out_sheets"), # optional per-sheet outputs
out_fmt="json",
include_backend_metadata=True,
image=True,
pdf=True,
mode="standard",
pretty=True,
)
# Same as: exstruct input.xlsx --format json --include-backend-metadata --pdf --image --mode standard --pretty --sheets-dir out_sheets > out.json
Editing API¶
ExStruct also exposes workbook editing under exstruct.edit, but this is a
secondary surface. If you are writing Python code to edit Excel directly,
openpyxl / xlwings are usually simpler choices. Reach for exstruct.edit
when you specifically want the same patch contract used by ExStruct's CLI and
MCP integration layer.
from pathlib import Path
from exstruct.edit import PatchOp, PatchRequest, patch_workbook
result = patch_workbook(
PatchRequest(
xlsx_path=Path("book.xlsx"),
ops=[PatchOp(op="set_value", sheet="Sheet1", cell="A1", value="updated")],
backend="openpyxl",
)
)
print(result.out_path)
print(result.engine)
Key points:
exstruct.editdoes not require MCPPathPolicy.PatchOp,PatchRequest,MakeRequest, andPatchResultkeep the existing MCP patch contract in Phase 1.- Use
list_patch_op_schemas()/get_patch_op_schema()to inspect the public operation schema programmatically. - The matching operational CLI is
exstruct patch,exstruct make,exstruct ops, andexstruct validate.
Backend capability guide:
| Backend | Use it for | Notes |
|---|---|---|
openpyxl |
Basic cell/style/layout edits, plus dry_run, return_inverse_ops, and preflight_formula_check flows |
Pure Python path. Not valid for .xls, and not for COM-only ops such as create_chart. |
com |
Highest-fidelity workbook editing, .xls, and COM-only ops such as create_chart |
Requires Excel COM. Rejects dry_run, return_inverse_ops, and preflight_formula_check. |
auto |
Default mixed workflow | Resolves to the best supported backend for the request. dry_run, return_inverse_ops, and preflight_formula_check force the openpyxl path even on COM-capable hosts, so inspect PatchResult.engine before assuming the same engine will run the real apply. |
Known editing limits:
create_chartrequires the COM-backed path..xlsediting requires COM.exstruct.editdoes not ownPathPolicy, artifact mirroring, or host approval flows.- Existing MCP compatibility imports remain valid.
For local shell or AI-agent edit workflows, prefer the CLI so you can do
dry_run -> inspect PatchResult -> apply with an explicit backend. Use
backend="openpyxl" when you want the dry run and the real apply to exercise
the same engine. With backend="auto", dry runs resolve to openpyxl while the
real apply may switch to COM on Windows/Excel hosts. For restricted hosts, use
the MCP server, which wraps the same core and adds host policy.
Dependencies¶
- Core extraction: pandas, openpyxl (installed with the package).
- YAML export:
pyyaml(lazy import; missing module raisesMissingDependencyError). - TOON export:
python-toon(lazy import; missing module raisesMissingDependencyError). - Auto page-break extraction/export: Excel + COM required.
mode="libreoffice"rejects auto page-break requests withConfigError. - Rendering (PDF/PNG): Excel + COM +
pypdfium2are mandatory. Missing Excel/COM orpypdfium2surfaces asRenderError/MissingDependencyError, andmode="libreoffice"rejects PDF/PNG requests withConfigError.
Auto-generated API (mkdocstrings)¶
Python APIの最新情報は以下の自動生成セクションを参照してください(docstringベースで同期)。
Core functions¶
exstruct.extract ¶
extract(file_path: str | Path, mode: ExtractionMode = 'standard', *, alpha_col: bool = False) -> WorkbookData
Extracts an Excel workbook into a WorkbookData structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str | Path
|
Path to the workbook file (.xlsx, .xlsm, .xls). |
required |
mode
|
ExtractionMode
|
Extraction detail level. "light" includes cells and table detection only (no COM, shapes/charts empty; print areas via openpyxl). "libreoffice" is a best-effort non-COM mode that adds merged cells, shapes, connectors, and charts when the LibreOffice backend is available. "standard" includes texted shapes, arrows, charts (COM if available) and print areas. "verbose" also includes shape/chart sizes, cell link map, colors map, and formulas map. |
'standard'
|
alpha_col
|
bool
|
When True, convert CellRow column keys to Excel-style ABC names (A, B, ..., Z, AA, ...) instead of 0-based numeric strings. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
WorkbookData |
WorkbookData
|
Parsed workbook representation containing sheets, rows, shapes, charts, and print areas. |
exstruct.export ¶
export(data: WorkbookData, path: str | Path, fmt: Literal['json', 'yaml', 'yml', 'toon'] | None = None, *, pretty: bool = False, indent: int | None = None, include_backend_metadata: bool = False) -> None
Save WorkbookData to a file (format inferred from extension).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
WorkbookData
|
WorkbookData from |
required |
path
|
str | Path
|
destination path; extension is used to infer format |
required |
fmt
|
Literal['json', 'yaml', 'yml', 'toon'] | None
|
explicitly set format if desired (json/yaml/yml/toon) |
None
|
pretty
|
bool
|
pretty-print JSON |
False
|
indent
|
int | None
|
JSON indent width (defaults to 2 when pretty=True and indent is None) |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the format is unsupported. |
Examples:
Write pretty JSON and YAML (requires pyyaml):
>>> from exstruct import export, extract
>>> wb = extract("input.xlsx")
>>> export(wb, "out.json", pretty=True)
>>> export(wb, "out.yaml", fmt="yaml")
exstruct.export_sheets ¶
export_sheets(data: WorkbookData, dir_path: str | Path, *, include_backend_metadata: bool = False) -> dict[str, Path]
Export each sheet as an individual JSON file.
- Payload: {book_name, sheet_name, sheet: SheetData}
- Returns: {sheet_name: Path}
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
WorkbookData
|
WorkbookData to split by sheet. |
required |
dir_path
|
str | Path
|
Output directory. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Path]
|
Mapping from sheet name to written JSON path. |
Examples:
>>> from exstruct import export_sheets, extract
>>> wb = extract("input.xlsx")
>>> paths = export_sheets(wb, "out_sheets")
>>> "Sheet1" in paths
True
exstruct.export_sheets_as ¶
export_sheets_as(data: WorkbookData, dir_path: str | Path, fmt: Literal['json', 'yaml', 'yml', 'toon'] = 'json', *, pretty: bool = False, indent: int | None = None, include_backend_metadata: bool = False) -> dict[str, Path]
Export each sheet in the given format (json/yaml/toon); returns sheet name to path map.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
WorkbookData
|
WorkbookData to split by sheet. |
required |
dir_path
|
str | Path
|
Output directory. |
required |
fmt
|
Literal['json', 'yaml', 'yml', 'toon']
|
Output format; inferred defaults to json. |
'json'
|
pretty
|
bool
|
Pretty-print JSON. |
False
|
indent
|
int | None
|
JSON indent width (defaults to 2 when pretty=True and indent is None). |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Path]
|
Mapping from sheet name to written file path. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If an unsupported format is passed. |
Examples:
Export per sheet as YAML (requires pyyaml):
>>> from exstruct import export_sheets_as, extract
>>> wb = extract("input.xlsx")
>>> _ = export_sheets_as(wb, "out_yaml", fmt="yaml")
exstruct.export_print_areas_as ¶
export_print_areas_as(data: WorkbookData, dir_path: str | Path, fmt: Literal['json', 'yaml', 'yml', 'toon'] = 'json', *, pretty: bool = False, indent: int | None = None, normalize: bool = False, include_backend_metadata: bool = False) -> dict[str, Path]
Export each print area as a PrintAreaView.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
WorkbookData
|
WorkbookData that contains print areas |
required |
dir_path
|
str | Path
|
output directory |
required |
fmt
|
Literal['json', 'yaml', 'yml', 'toon']
|
json/yaml/yml/toon |
'json'
|
pretty
|
bool
|
Pretty-print JSON output. |
False
|
indent
|
int | None
|
JSON indent width (defaults to 2 when pretty is True and indent is None). |
None
|
normalize
|
bool
|
rebase row/col indices to the print-area origin when True |
False
|
Returns:
| Type | Description |
|---|---|
dict[str, Path]
|
dict mapping area key to path (e.g., "Sheet1#1": /.../Sheet1_area1_...json) |
Examples:
Export print areas when present:
>>> from exstruct import export_print_areas_as, extract
>>> wb = extract("input.xlsx", mode="standard")
>>> paths = export_print_areas_as(wb, "areas")
>>> isinstance(paths, dict)
True
exstruct.export_auto_page_breaks ¶
export_auto_page_breaks(data: WorkbookData, dir_path: str | Path, fmt: Literal['json', 'yaml', 'yml', 'toon'] = 'json', *, pretty: bool = False, indent: int | None = None, normalize: bool = False, include_backend_metadata: bool = False) -> dict[str, Path]
Export auto page-break areas (COM-computed) as PrintAreaView files.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
WorkbookData
|
WorkbookData containing auto_print_areas (COM extraction with auto breaks enabled) |
required |
dir_path
|
str | Path
|
output directory |
required |
fmt
|
Literal['json', 'yaml', 'yml', 'toon']
|
json/yaml/yml/toon |
'json'
|
pretty
|
bool
|
Pretty-print JSON output. |
False
|
indent
|
int | None
|
JSON indent width (defaults to 2 when pretty is True and indent is None). |
None
|
normalize
|
bool
|
rebase row/col indices to the area origin when True |
False
|
Returns:
| Type | Description |
|---|---|
dict[str, Path]
|
dict mapping area key to path (e.g., "Sheet1#1": /.../Sheet1_auto_page1_...json) |
Raises:
| Type | Description |
|---|---|
PrintAreaError
|
If no auto page-break areas are present. |
Examples:
>>> from exstruct import export_auto_page_breaks, extract
>>> wb = extract("input.xlsx", mode="standard")
>>> try:
... export_auto_page_breaks(wb, "auto_areas")
... except PrintAreaError:
... pass
exstruct.export_pdf ¶
export_pdf(excel_path: str | Path, output_pdf: str | Path) -> list[str]
Export an Excel workbook to PDF via Excel COM and return sheet names in order.
exstruct.export_sheet_images ¶
export_sheet_images(excel_path: str | Path, output_dir: str | Path, dpi: int = 144, *, sheet: str | None = None, a1_range: str | None = None) -> list[Path]
Export each worksheet in the given Excel workbook to PNG files and return the image paths in workbook order.
Returns:
| Name | Type | Description |
|---|---|---|
paths |
list[Path]
|
Paths to the generated PNG files, ordered by the corresponding worksheets. |
Raises:
| Type | Description |
|---|---|
RenderError
|
If export or rendering fails. |
exstruct.process_excel ¶
process_excel(file_path: str | Path, output_path: str | Path | None = None, out_fmt: str = 'json', image: bool = False, pdf: bool = False, dpi: int = 72, mode: ExtractionMode = 'standard', pretty: bool = False, indent: int | None = None, sheets_dir: str | Path | None = None, print_areas_dir: str | Path | None = None, auto_page_breaks_dir: str | Path | None = None, stream: TextIO | None = None, *, alpha_col: bool = False, include_backend_metadata: bool = False) -> None
Convenience wrapper: extract -> serialize (file or stdout) -> optional PDF/PNG.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str | Path
|
Input Excel workbook (path string or Path). |
required |
output_path
|
str | Path | None
|
None for stdout; otherwise, write to file (string or Path). |
None
|
out_fmt
|
str
|
json/yaml/yml/toon. |
'json'
|
image
|
bool
|
True to also output PNGs (requires Excel + COM + pypdfium2 and is
not supported in |
False
|
pdf
|
bool
|
True to also output PDF (requires Excel + COM + pypdfium2 and is not
supported in |
False
|
dpi
|
int
|
DPI for image output. |
72
|
mode
|
ExtractionMode
|
light/libreoffice/standard/verbose (same meaning as |
'standard'
|
pretty
|
bool
|
Pretty-print JSON. |
False
|
indent
|
int | None
|
JSON indent width. |
None
|
sheets_dir
|
str | Path | None
|
Directory to write per-sheet files (string or Path). |
None
|
print_areas_dir
|
str | Path | None
|
Directory to write per-print-area files (string or Path). |
None
|
auto_page_breaks_dir
|
str | Path | None
|
Directory to write per-auto-page-break files (COM only
and not supported in |
None
|
stream
|
TextIO | None
|
IO override when output_path is None. |
None
|
alpha_col
|
bool
|
When True, convert CellRow column keys to Excel-style ABC names (A, B, ...) instead of 0-based numeric strings. |
False
|
include_backend_metadata
|
bool
|
When True, include shape/chart backend metadata
fields ( |
False
|
Raises:
| Type | Description |
|---|---|
ConfigError
|
If |
ValueError
|
If an unsupported format or mode is given. |
PrintAreaError
|
When exporting auto page breaks without available data. |
RenderError
|
When rendering fails (Excel/COM/pypdfium2 issues). |
Examples:
Extract and write JSON to stdout, plus per-sheet files:
>>> from pathlib import Path
>>> from exstruct import process_excel
>>> process_excel(Path("input.xlsx"), output_path=None, sheets_dir=Path("sheets"))
Render PDF only (COM + Excel required):
>>> process_excel(Path("input.xlsx"), output_path=Path("out.json"), pdf=True)
Editing functions¶
exstruct.edit.patch_workbook ¶
patch_workbook(request: PatchRequest) -> PatchResult
Edit an existing workbook without MCP path policy enforcement.
exstruct.edit.make_workbook ¶
make_workbook(request: MakeRequest) -> PatchResult
Create a new workbook and apply initial patch operations.
Engine and options¶
exstruct.engine.ExStructEngine ¶
Configurable engine for ExStruct extraction and export.
Instances are immutable; override options per call if needed.
Key behaviors
- StructOptions: extraction mode and optional table detection params.
- OutputOptions: serialization format/pretty-print, include/exclude filters, per-sheet/per-print-area output dirs, etc.
- Main methods: extract(path, mode=None) -> WorkbookData - Modes: light/libreoffice/standard/verbose - light: COM-free; cells + tables + print areas only (shapes/charts empty) serialize(workbook, ...) -> str - Applies include_* filters, then serializes export(workbook, ...) - Writes to file/stdout; optionally per-sheet and per-print-area files process(file_path, ...) - One-shot extract->export (CLI equivalent), with optional PDF/PNG
__init__ ¶
__init__(options: StructOptions | None = None, output: OutputOptions | None = None) -> None
Initialize the engine with optional struct/output options.
from_defaults
staticmethod
¶
from_defaults() -> ExStructEngine
Factory to create an engine with default options.
extract ¶
extract(file_path: str | Path, *, mode: ExtractionMode | None = None, _auto_page_breaks_dir_override: Path | None | object = _AUTO_PAGE_BREAKS_DIR_UNSET) -> WorkbookData
Produce a normalized WorkbookData extracted from the given workbook file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str | Path
|
Path to the .xlsx/.xlsm/.xls file to extract. |
required |
mode
|
ExtractionMode | None
|
Extraction mode to use; if None the engine's configured mode is used. Modes: "light", "libreoffice", "standard", "verbose". |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
WorkbookData |
WorkbookData
|
Normalized workbook data extracted from the file. |
serialize ¶
serialize(data: WorkbookData, *, fmt: Literal['json', 'yaml', 'yml', 'toon'] | None = None, pretty: bool | None = None, indent: int | None = None) -> str
Serialize a workbook after applying include/exclude filters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
WorkbookData
|
Workbook to serialize after filtering. |
required |
fmt
|
Literal['json', 'yaml', 'yml', 'toon'] | None
|
Serialization format; defaults to OutputOptions.format.fmt. |
None
|
pretty
|
bool | None
|
Whether to pretty-print JSON output. |
None
|
indent
|
int | None
|
Indentation to use when pretty-printing JSON. |
None
|
export ¶
export(data: WorkbookData, output_path: str | Path | None = None, *, fmt: Literal['json', 'yaml', 'yml', 'toon'] | None = None, pretty: bool | None = None, indent: int | None = None, sheets_dir: str | Path | None = None, print_areas_dir: str | Path | None = None, auto_page_breaks_dir: str | Path | None = None, stream: TextIO | None = None) -> None
Write filtered workbook data to a file or stream.
Includes optional per-sheet and per-print-area outputs when destinations are provided.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
WorkbookData
|
Workbook to serialize and write. |
required |
output_path
|
str | Path | None
|
Target file path (str or Path); writes to stdout when None. |
None
|
fmt
|
Literal['json', 'yaml', 'yml', 'toon'] | None
|
Serialization format; defaults to OutputOptions.format.fmt. |
None
|
pretty
|
bool | None
|
Whether to pretty-print JSON output. |
None
|
indent
|
int | None
|
Indentation to use when pretty-printing JSON. |
None
|
sheets_dir
|
str | Path | None
|
Directory for per-sheet outputs when provided (str or Path). |
None
|
print_areas_dir
|
str | Path | None
|
Directory for per-print-area outputs when provided (str or Path). |
None
|
auto_page_breaks_dir
|
str | Path | None
|
Directory for auto page-break outputs (str or Path; COM environments only). |
None
|
stream
|
TextIO | None
|
Stream override when output_path is None. |
None
|
process ¶
process(file_path: str | Path, output_path: str | Path | None = None, *, out_fmt: str | None = None, image: bool = False, pdf: bool = False, dpi: int = 72, mode: ExtractionMode | None = None, pretty: bool | None = None, indent: int | None = None, sheets_dir: str | Path | None = None, print_areas_dir: str | Path | None = None, auto_page_breaks_dir: str | Path | None = None, stream: TextIO | None = None) -> None
One-shot extract->export wrapper (CLI equivalent) with optional PDF/PNG output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
file_path
|
str | Path
|
Input Excel workbook path (str or Path). |
required |
output_path
|
str | Path | None
|
Target file path (str or Path); writes to stdout when None. |
None
|
out_fmt
|
str | None
|
Serialization format for structured output. |
None
|
image
|
bool
|
Whether to export PNGs alongside structured output. Requires Excel
COM and is not supported in |
False
|
pdf
|
bool
|
Whether to export a PDF snapshot alongside structured output.
Requires Excel COM and is not supported in |
False
|
dpi
|
int
|
DPI to use when rendering images. |
72
|
mode
|
ExtractionMode | None
|
Extraction mode; defaults to the engine's StructOptions.mode. |
None
|
pretty
|
bool | None
|
Whether to pretty-print JSON output. |
None
|
indent
|
int | None
|
Indentation to use when pretty-printing JSON. |
None
|
sheets_dir
|
str | Path | None
|
Directory for per-sheet structured outputs (str or Path). |
None
|
print_areas_dir
|
str | Path | None
|
Directory for per-print-area structured outputs (str or Path). |
None
|
auto_page_breaks_dir
|
str | Path | None
|
Directory for auto page-break outputs (str or Path).
Requires Excel COM and is not supported in |
None
|
stream
|
TextIO | None
|
Stream override when writing to stdout. |
None
|
Raises:
| Type | Description |
|---|---|
ConfigError
|
If |
exstruct.engine.StructOptions
dataclass
¶
Extraction-time options for ExStructEngine.
Attributes:
| Name | Type | Description |
|---|---|---|
mode |
ExtractionMode
|
Extraction mode. One of "light", "libreoffice", "standard", "verbose". - light: cells + table candidates only (no COM, shapes/charts empty) - libreoffice: best-effort non-COM mode using the LibreOffice backend - standard: texted shapes + arrows + charts (if COM available) - verbose: all shapes (width/height), charts, table candidates |
table_params |
TableParams | None
|
Optional dict passed to |
include_colors_map |
bool | None
|
Whether to extract background color maps. |
include_formulas_map |
bool | None
|
Whether to extract formulas map. |
include_merged_cells |
bool | None
|
Whether to extract merged cell ranges. |
include_merged_values_in_rows |
bool
|
Whether to keep merged values in rows. |
colors |
ColorsOptions
|
Color extraction options. |
alpha_col |
bool
|
When True, convert CellRow column keys to Excel-style ABC names (A, B, ..., Z, AA, ...) instead of 0-based numeric strings. |
exstruct.engine.OutputOptions ¶
Bases: BaseModel
Output-time options for ExStructEngine.
- format: serialization format/indent.
- filters: include/exclude flags (rows/shapes/charts/tables/print_areas, size flags).
- destinations: side outputs (per-sheet, per-print-area, stream override).
exstruct.engine.FormatOptions ¶
Bases: BaseModel
Formatting options for serialization.
exstruct.engine.FilterOptions ¶
Bases: BaseModel
Include/exclude filters for output.
exstruct.engine.DestinationOptions ¶
Bases: BaseModel
Destinations for optional side outputs.
exstruct.engine.ColorsOptions ¶
Bases: BaseModel
Color extraction options.
Examples:
>>> ColorsOptions(
... include_default_background=False,
... ignore_colors=["#FFFFFF", "AD3815", "theme:1:0.2", "indexed:64", "auto"],
... )
ignore_colors_set ¶
ignore_colors_set() -> set[str]
Return ignore_colors as a set of normalized strings.
Returns:
| Type | Description |
|---|---|
set[str]
|
Set of color keys to ignore. |
Models¶
See generated/models.md for the detailed model fields (run python scripts/gen_model_docs.py to refresh).
Model helpers for SheetData and WorkbookData¶
to_json(pretty=False, indent=None, include_backend_metadata=False)→ JSON string (pretty when requested)to_yaml(include_backend_metadata=False)→ YAML string (requirespyyaml)to_toon(include_backend_metadata=False)→ TOON string (requirespython-toon)save(path, pretty=False, indent=None, include_backend_metadata=False)→ infers format from suffix (.json/.yaml/.yml/.toon)WorkbookData.__getitem__(name)→ get a SheetData by nameWorkbookData.__iter__()→ yields(sheet_name, SheetData)in order
Serialized output omits shape/chart backend metadata (provenance, approximation_level, confidence) by default to reduce token usage. Set include_backend_metadata=True when you need those fields.
Example:
wb = extract("input.xlsx")
first = wb["Sheet1"]
for name, sheet in wb:
print(name, len(sheet.rows))
wb.save("out.json", pretty=True)
first.save("sheet.yaml") # requires pyyaml
Error Handling¶
- Exception types:
SerializationError: Unsupported format requested (serialize_workbook, export APIs).MissingDependencyError: Optional dependency (pyyaml/python-toon/pypdfium2) is missing; message includes install instructions.ConfigError: Invalid option combinations such asmode="libreoffice"with PDF/PNG rendering or auto page-break export.RenderError: Excel/COM is unavailable or PDF/PNG rendering fails.PrintAreaError(ValueError-compatible):export_auto_page_breaksinvoked when noauto_print_areasare available.OutputError: Writing output to disk/stream failed (original exception kept in__cause__).ValueError: Invalid inputs such as an unsupportedmode.- Excel COM unavailable: extraction falls back to cells +
table_candidates;shapes/chartsare empty, warning is logged. - No print areas:
export_print_areas_aswrites nothing and returns{}; this is not an error. - Auto page-break export:
export_auto_page_breaksraisesPrintAreaErrorif no auto page-break areas are present (enable them viaDestinationOptions.auto_page_breaks_dir). - CLI mirrors these behaviors: exits non-zero on failures, prints messages in English.
Tuning Examples¶
- Reduce false positives (layout frames):
set_table_detection_params(table_score_threshold=0.4, coverage_min=0.25)
- Recover missed tiny tables:
set_table_detection_params(density_min=0.03, min_nonempty_cells=2)