Skip to content

Data Models

CellRow

A single row of cells with optional hyperlinks.

Field Type Required Default Description
c dict[str, int \| float \| str] Yes - Column index (string) to cell value map.
links dict[str, str] \| None No None Optional hyperlinks per column index.
r int Yes - Row index (1-based).

Chart

Chart metadata including series and layout.

Field Type Required Default Description
chart_type str Yes - Chart type (e.g., Column, Line).
error str \| None No None Extraction error detail if any.
h int \| None No None Chart height (None if unknown).
l int Yes - Left offset (Excel units).
name str Yes - Chart name.
series list[ChartSeries] Yes - Series included in the chart.
t int Yes - Top offset (Excel units).
title str \| None No None Chart title.
w int \| None No None Chart width (None if unknown).
y_axis_range list[float] No Y-axis range [min, max] when available.
y_axis_title str Yes - Y-axis title.

ChartSeries

Series metadata for a chart.

Field Type Required Default Description
name str Yes - Series display name.
name_range str \| None No None Range reference for the series name.
x_range str \| None No None Range reference for X axis values.
y_range str \| None No None Range reference for Y axis values.

DestinationOptions

Destinations for optional side outputs.

Field Type Required Default Description
auto_page_breaks_dir str \| Path \| None No None Directory to write auto page-break files.
print_areas_dir str \| Path \| None No None Directory to write per-print-area files.
sheets_dir str \| Path \| None No None Directory to write per-sheet files.
stream TextIO \| None No None Stream override for primary output (stdout/file).

FilterOptions

Include/exclude filters for output.

Field Type Required Default Description
include_auto_print_areas bool No False Include COM-computed auto page-break areas.
include_chart_size bool \| None No None Include chart size; None -> auto (verbose=True, others=False).
include_charts bool No True Include charts.
include_print_areas bool \| None No None Include print areas; None -> auto (light=False, others=True).
include_rows bool No True Include cell rows.
include_shape_size bool \| None No None Include shape size; None -> auto (verbose=True, others=False).
include_shapes bool No True Include shapes.
include_tables bool No True Include table candidate ranges.

FormatOptions

Formatting options for serialization.

Field Type Required Default Description
fmt Literal['json', 'yaml', 'yml', 'toon'] No 'json' Serialization format.
indent int \| None No None Indent width for JSON (defaults to 2 when pretty is True).
pretty bool No False Pretty-print JSON output.

OutputOptions

Output-time options for ExStructEngine.

  • format: serialization format/indent.
  • filters: include/exclude flags (rows/shapes/charts/tables/print_areas, size flags).
  • destinations: side outputs (per-sheet, per-print-area, stream override).

Legacy flat fields (fmt, pretty, indent, include_*, sheets_dir, print_areas_dir, stream) are still accepted and normalized into the nested structures.

Field Type Required Default Description
destinations DestinationOptions No Side output destinations.
filters FilterOptions No Include/exclude flags.
format FormatOptions No Formatting options.

PrintArea

Cell coordinate bounds for a print area.

Field Type Required Default Description
c1 int Yes - Start column (1-based).
c2 int Yes - End column (1-based, inclusive).
r1 int Yes - Start row (1-based).
r2 int Yes - End row (1-based, inclusive).

PrintAreaView

Slice of a sheet restricted to a print area (manual or auto).

Field Type Required Default Description
area PrintArea Yes - Print area bounds.
book_name str Yes - Workbook name owning the area.
charts list[Chart] No Charts overlapping the area.
rows list[CellRow] No Rows within the area bounds.
shapes list[Shape] No Shapes overlapping the area.
sheet_name str Yes - Sheet name owning the area.
table_candidates list[str] No Table candidates intersecting the area.

Shape

Shape metadata (position, size, text, and styling).

Field Type Required Default Description
begin_arrow_style int \| None No None Arrow style enum for the start of a connector.
direction Literal['E', 'SE', 'S', 'SW', 'W', 'NW', 'N', 'NE'] \| None No None Connector direction (compass heading).
end_arrow_style int \| None No None Arrow style enum for the end of a connector.
h int \| None No None Shape height (None if unknown).
l int Yes - Left offset (Excel units).
rotation float \| None No None Rotation angle in degrees.
t int Yes - Top offset (Excel units).
text str Yes - Visible text content of the shape.
type str \| None No None Excel shape type name.
w int \| None No None Shape width (None if unknown).

SheetData

Structured data for a single sheet.

Field Type Required Default Description
auto_print_areas list[PrintArea] No COM-computed auto page-break areas.
charts list[Chart] No Charts detected on the sheet.
print_areas list[PrintArea] No User-defined print areas.
rows list[CellRow] No Extracted rows with cell values and links.
shapes list[Shape] No Shapes detected on the sheet.
table_candidates list[str] No Cell ranges likely representing tables.

WorkbookData

Workbook-level container with per-sheet data.

Field Type Required Default Description
book_name str Yes - Workbook file name (no path).
sheets dict[str, SheetData] Yes - Mapping of sheet name to SheetData.