Data Models¶
Arrow¶
Connector shape metadata.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
begin_arrow_style |
int \| None |
No | None | Arrow style enum for the start of a connector. |
begin_id |
int \| None |
No | None | Shape id at the start of a connector (ConnectorFormat.BeginConnectedShape). |
direction |
Literal['E', 'SE', 'S', 'SW', 'W', 'NW', 'N', 'NE'] \| None |
No | None | Connector direction (compass heading). |
end_arrow_style |
int \| None |
No | None | Arrow style enum for the end of a connector. |
end_id |
int \| None |
No | None | Shape id at the end of a connector (ConnectorFormat.EndConnectedShape). |
h |
int \| None |
No | None | Shape height (None if unknown). |
id |
int \| None |
No | None | Sequential shape id within the sheet (if applicable). |
kind |
Literal['arrow'] |
No | 'arrow' | Shape kind. |
l |
int |
Yes | - | Left offset (Excel units). |
rotation |
float \| None |
No | None | Rotation angle in degrees. |
t |
int |
Yes | - | Top offset (Excel units). |
text |
str |
Yes | - | Visible text content of the shape. |
w |
int \| None |
No | None | Shape width (None if unknown). |
BaseShape¶
Common shape metadata (position, size, text, and styling).
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
h |
int \| None |
No | None | Shape height (None if unknown). |
id |
int \| None |
No | None | Sequential shape id within the sheet (if applicable). |
l |
int |
Yes | - | Left offset (Excel units). |
rotation |
float \| None |
No | None | Rotation angle in degrees. |
t |
int |
Yes | - | Top offset (Excel units). |
text |
str |
Yes | - | Visible text content of the shape. |
w |
int \| None |
No | None | Shape width (None if unknown). |
CellRow¶
A single row of cells with optional hyperlinks.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
c |
dict[str, int \| float \| str] |
Yes | - | Column index (string) to cell value map. |
links |
dict[str, str] \| None |
No | None | Optional hyperlinks per column index. |
r |
int |
Yes | - | Row index (1-based). |
Chart¶
Chart metadata including series and layout.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
chart_type |
str |
Yes | - | Chart type (e.g., Column, Line). |
error |
str \| None |
No | None | Extraction error detail if any. |
h |
int \| None |
No | None | Chart height (None if unknown). |
l |
int |
Yes | - | Left offset (Excel units). |
name |
str |
Yes | - | Chart name. |
series |
list[ChartSeries] |
Yes | - | Series included in the chart. |
t |
int |
Yes | - | Top offset (Excel units). |
title |
str \| None |
No | None | Chart title. |
w |
int \| None |
No | None | Chart width (None if unknown). |
y_axis_range |
list[float] |
No | Y-axis range [min, max] when available. | |
y_axis_title |
str |
Yes | - | Y-axis title. |
ChartSeries¶
Series metadata for a chart.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name |
str |
Yes | - | Series display name. |
name_range |
str \| None |
No | None | Range reference for the series name. |
x_range |
str \| None |
No | None | Range reference for X axis values. |
y_range |
str \| None |
No | None | Range reference for Y axis values. |
ColorsOptions¶
Color extraction options.
Examples: >>> ColorsOptions( ... include_default_background=False, ... ignore_colors=["#FFFFFF", "AD3815", "theme:1:0.2", "indexed:64", "auto"], ... )
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
ignore_colors |
list[str] |
No | List of color keys to ignore. | |
include_default_background |
bool |
No | False | Include default (white) backgrounds. |
DestinationOptions¶
Destinations for optional side outputs.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
auto_page_breaks_dir |
str \| Path \| None |
No | None | Directory to write auto page-break files. |
print_areas_dir |
str \| Path \| None |
No | None | Directory to write per-print-area files. |
sheets_dir |
str \| Path \| None |
No | None | Directory to write per-sheet files. |
stream |
TextIO \| None |
No | None | Stream override for primary output (stdout/file). |
FilterOptions¶
Include/exclude filters for output.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
include_auto_print_areas |
bool |
No | False | Include COM-computed auto page-break areas. |
include_chart_size |
bool \| None |
No | None | Include chart size; None -> auto (verbose=True, others=False). |
include_charts |
bool |
No | True | Include charts. |
include_merged_cells |
bool |
No | True | Include merged cell ranges. |
include_print_areas |
bool \| None |
No | None | Include print areas; None -> auto (light=False, others=True). |
include_rows |
bool |
No | True | Include cell rows. |
include_shape_size |
bool \| None |
No | None | Include shape size; None -> auto (verbose=True, others=False). |
include_shapes |
bool |
No | True | Include shapes. |
include_tables |
bool |
No | True | Include table candidate ranges. |
FormatOptions¶
Formatting options for serialization.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
fmt |
Literal['json', 'yaml', 'yml', 'toon'] |
No | 'json' | Serialization format. |
indent |
int \| None |
No | None | Indent width for JSON (defaults to 2 when pretty is True). |
pretty |
bool |
No | False | Pretty-print JSON output. |
MergedCells¶
Compressed merged cell ranges using schema + items.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
items |
list[tuple[int, int, int, int, str]] |
No | Merged cell items as (r1, c1, r2, c2, v) tuples where rows are 1-based and columns are 0-based. | |
schema_ |
list[Literal['r1', 'c1', 'r2', 'c2', 'v']] |
No | Ordered field names for each item. |
OutputOptions¶
Output-time options for ExStructEngine.
- format: serialization format/indent.
- filters: include/exclude flags (rows/shapes/charts/tables/print_areas, size flags).
- destinations: side outputs (per-sheet, per-print-area, stream override).
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
destinations |
DestinationOptions |
No | Side output destinations. | |
filters |
FilterOptions |
No | Include/exclude flags. | |
format |
FormatOptions |
No | Formatting options. |
PrintArea¶
Cell coordinate bounds for a print area.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
c1 |
int |
Yes | - | Start column (0-based). |
c2 |
int |
Yes | - | End column (0-based, inclusive). |
r1 |
int |
Yes | - | Start row (1-based). |
r2 |
int |
Yes | - | End row (1-based, inclusive). |
PrintAreaView¶
Slice of a sheet restricted to a print area (manual or auto).
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
area |
PrintArea |
Yes | - | Print area bounds. |
book_name |
str |
Yes | - | Workbook name owning the area. |
charts |
list[Chart] |
No | Charts overlapping the area. | |
rows |
list[CellRow] |
No | Rows within the area bounds. | |
shapes |
list[Shape \| Arrow \| SmartArt] |
No | Shapes overlapping the area. | |
sheet_name |
str |
Yes | - | Sheet name owning the area. |
table_candidates |
list[str] |
No | Table candidates intersecting the area. |
Shape¶
Normal shape metadata.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
h |
int \| None |
No | None | Shape height (None if unknown). |
id |
int \| None |
No | None | Sequential shape id within the sheet (if applicable). |
kind |
Literal['shape'] |
No | 'shape' | Shape kind. |
l |
int |
Yes | - | Left offset (Excel units). |
rotation |
float \| None |
No | None | Rotation angle in degrees. |
t |
int |
Yes | - | Top offset (Excel units). |
text |
str |
Yes | - | Visible text content of the shape. |
type |
str \| None |
No | None | Excel shape type name. |
w |
int \| None |
No | None | Shape width (None if unknown). |
SheetData¶
Structured data for a single sheet.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
auto_print_areas |
list[PrintArea] |
No | COM-computed auto page-break areas. | |
charts |
list[Chart] |
No | Charts detected on the sheet. | |
colors_map |
dict[str, list[tuple[int, int]]] |
No | Mapping of hex color codes to lists of (row, column) tuples where row is 1-based and column is 0-based. | |
formulas_map |
dict[str, list[tuple[int, int]]] |
No | Mapping of formula strings to lists of (row, column) tuples where row is 1-based and column is 0-based. | |
merged_cells |
MergedCells \| None |
No | None | Merged cell ranges on the sheet. |
print_areas |
list[PrintArea] |
No | User-defined print areas. | |
rows |
list[CellRow] |
No | Extracted rows with cell values and links. | |
shapes |
list[Shape \| Arrow \| SmartArt] |
No | Shapes detected on the sheet. | |
table_candidates |
list[str] |
No | Cell ranges likely representing tables. |
SmartArt¶
SmartArt shape metadata with nested nodes.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
h |
int \| None |
No | None | Shape height (None if unknown). |
id |
int \| None |
No | None | Sequential shape id within the sheet (if applicable). |
kind |
Literal['smartart'] |
No | 'smartart' | Shape kind. |
l |
int |
Yes | - | Left offset (Excel units). |
layout |
str |
Yes | - | SmartArt layout name. |
nodes |
list[SmartArtNode] |
No | Root nodes of SmartArt tree. | |
rotation |
float \| None |
No | None | Rotation angle in degrees. |
t |
int |
Yes | - | Top offset (Excel units). |
text |
str |
Yes | - | Visible text content of the shape. |
w |
int \| None |
No | None | Shape width (None if unknown). |
SmartArtNode¶
Node of SmartArt hierarchy.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
kids |
list[SmartArtNode] |
No | Child nodes. | |
text |
str |
Yes | - | Visible text for the node. |
WorkbookData¶
Workbook-level container with per-sheet data.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
book_name |
str |
Yes | - | Workbook file name (no path). |
sheets |
dict[str, SheetData] |
Yes | - | Mapping of sheet name to SheetData. |