Skip to content

Data Models

Arrow

Connector shape metadata.

Field Type Required Default Description
begin_arrow_style int \| None No None Arrow style enum for the start of a connector.
begin_id int \| None No None Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape).
direction Literal['E', 'SE', 'S', 'SW', 'W', 'NW', 'N', 'NE'] \| None No None Connector direction (compass heading).
end_arrow_style int \| None No None Arrow style enum for the end of a connector.
end_id int \| None No None Shape id at the end of a connector (ConnectorFormat.EndConnectedShape).
h int \| None No None Shape height (None if unknown).
id int \| None No None Sequential shape id within the sheet (if applicable).
kind Literal['arrow'] No 'arrow' Shape kind.
l int Yes - Left offset (Excel units).
rotation float \| None No None Rotation angle in degrees.
t int Yes - Top offset (Excel units).
text str Yes - Visible text content of the shape.
w int \| None No None Shape width (None if unknown).

BaseShape

Common shape metadata (position, size, text, and styling).

Field Type Required Default Description
h int \| None No None Shape height (None if unknown).
id int \| None No None Sequential shape id within the sheet (if applicable).
l int Yes - Left offset (Excel units).
rotation float \| None No None Rotation angle in degrees.
t int Yes - Top offset (Excel units).
text str Yes - Visible text content of the shape.
w int \| None No None Shape width (None if unknown).

CellRow

A single row of cells with optional hyperlinks.

Field Type Required Default Description
c dict[str, int \| float \| str] Yes - Column index (string) to cell value map.
links dict[str, str] \| None No None Optional hyperlinks per column index.
r int Yes - Row index (1-based).

Chart

Chart metadata including series and layout.

Field Type Required Default Description
chart_type str Yes - Chart type (e.g., Column, Line).
error str \| None No None Extraction error detail if any.
h int \| None No None Chart height (None if unknown).
l int Yes - Left offset (Excel units).
name str Yes - Chart name.
series list[ChartSeries] Yes - Series included in the chart.
t int Yes - Top offset (Excel units).
title str \| None No None Chart title.
w int \| None No None Chart width (None if unknown).
y_axis_range list[float] No Y-axis range [min, max] when available.
y_axis_title str Yes - Y-axis title.

ChartSeries

Series metadata for a chart.

Field Type Required Default Description
name str Yes - Series display name.
name_range str \| None No None Range reference for the series name.
x_range str \| None No None Range reference for X axis values.
y_range str \| None No None Range reference for Y axis values.

ColorsOptions

Color extraction options.

Examples: >>> ColorsOptions( ... include_default_background=False, ... ignore_colors=["#FFFFFF", "AD3815", "theme:1:0.2", "indexed:64", "auto"], ... )

Field Type Required Default Description
ignore_colors list[str] No List of color keys to ignore.
include_default_background bool No False Include default (white) backgrounds.

DestinationOptions

Destinations for optional side outputs.

Field Type Required Default Description
auto_page_breaks_dir str \| Path \| None No None Directory to write auto page-break files.
print_areas_dir str \| Path \| None No None Directory to write per-print-area files.
sheets_dir str \| Path \| None No None Directory to write per-sheet files.
stream TextIO \| None No None Stream override for primary output (stdout/file).

FilterOptions

Include/exclude filters for output.

Field Type Required Default Description
include_auto_print_areas bool No False Include COM-computed auto page-break areas.
include_chart_size bool \| None No None Include chart size; None -> auto (verbose=True, others=False).
include_charts bool No True Include charts.
include_merged_cells bool No True Include merged cell ranges.
include_print_areas bool \| None No None Include print areas; None -> auto (light=False, others=True).
include_rows bool No True Include cell rows.
include_shape_size bool \| None No None Include shape size; None -> auto (verbose=True, others=False).
include_shapes bool No True Include shapes.
include_tables bool No True Include table candidate ranges.

FormatOptions

Formatting options for serialization.

Field Type Required Default Description
fmt Literal['json', 'yaml', 'yml', 'toon'] No 'json' Serialization format.
indent int \| None No None Indent width for JSON (defaults to 2 when pretty is True).
pretty bool No False Pretty-print JSON output.

MergedCells

Compressed merged cell ranges using schema + items.

Field Type Required Default Description
items list[tuple[int, int, int, int, str]] No Merged cell items as (r1, c1, r2, c2, v) tuples where rows are 1-based and columns are 0-based.
schema_ list[Literal['r1', 'c1', 'r2', 'c2', 'v']] No Ordered field names for each item.

OutputOptions

Output-time options for ExStructEngine.

  • format: serialization format/indent.
  • filters: include/exclude flags (rows/shapes/charts/tables/print_areas, size flags).
  • destinations: side outputs (per-sheet, per-print-area, stream override).
Field Type Required Default Description
destinations DestinationOptions No Side output destinations.
filters FilterOptions No Include/exclude flags.
format FormatOptions No Formatting options.

PrintArea

Cell coordinate bounds for a print area.

Field Type Required Default Description
c1 int Yes - Start column (0-based).
c2 int Yes - End column (0-based, inclusive).
r1 int Yes - Start row (1-based).
r2 int Yes - End row (1-based, inclusive).

PrintAreaView

Slice of a sheet restricted to a print area (manual or auto).

Field Type Required Default Description
area PrintArea Yes - Print area bounds.
book_name str Yes - Workbook name owning the area.
charts list[Chart] No Charts overlapping the area.
rows list[CellRow] No Rows within the area bounds.
shapes list[Shape \| Arrow \| SmartArt] No Shapes overlapping the area.
sheet_name str Yes - Sheet name owning the area.
table_candidates list[str] No Table candidates intersecting the area.

Shape

Normal shape metadata.

Field Type Required Default Description
h int \| None No None Shape height (None if unknown).
id int \| None No None Sequential shape id within the sheet (if applicable).
kind Literal['shape'] No 'shape' Shape kind.
l int Yes - Left offset (Excel units).
rotation float \| None No None Rotation angle in degrees.
t int Yes - Top offset (Excel units).
text str Yes - Visible text content of the shape.
type str \| None No None Excel shape type name.
w int \| None No None Shape width (None if unknown).

SheetData

Structured data for a single sheet.

Field Type Required Default Description
auto_print_areas list[PrintArea] No COM-computed auto page-break areas.
charts list[Chart] No Charts detected on the sheet.
colors_map dict[str, list[tuple[int, int]]] No Mapping of hex color codes to lists of (row, column) tuples where row is 1-based and column is 0-based.
formulas_map dict[str, list[tuple[int, int]]] No Mapping of formula strings to lists of (row, column) tuples where row is 1-based and column is 0-based.
merged_cells MergedCells \| None No None Merged cell ranges on the sheet.
print_areas list[PrintArea] No User-defined print areas.
rows list[CellRow] No Extracted rows with cell values and links.
shapes list[Shape \| Arrow \| SmartArt] No Shapes detected on the sheet.
table_candidates list[str] No Cell ranges likely representing tables.

SmartArt

SmartArt shape metadata with nested nodes.

Field Type Required Default Description
h int \| None No None Shape height (None if unknown).
id int \| None No None Sequential shape id within the sheet (if applicable).
kind Literal['smartart'] No 'smartart' Shape kind.
l int Yes - Left offset (Excel units).
layout str Yes - SmartArt layout name.
nodes list[SmartArtNode] No Root nodes of SmartArt tree.
rotation float \| None No None Rotation angle in degrees.
t int Yes - Top offset (Excel units).
text str Yes - Visible text content of the shape.
w int \| None No None Shape width (None if unknown).

SmartArtNode

Node of SmartArt hierarchy.

Field Type Required Default Description
kids list[SmartArtNode] No Child nodes.
text str Yes - Visible text for the node.

WorkbookData

Workbook-level container with per-sheet data.

Field Type Required Default Description
book_name str Yes - Workbook file name (no path).
sheets dict[str, SheetData] Yes - Mapping of sheet name to SheetData.