Data Models¶
CellRow¶
A single row of cells with optional hyperlinks.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
c |
dict[str, int \| float \| str] |
Yes | - | Column index (string) to cell value map. |
links |
dict[str, str] \| None |
No | None | Optional hyperlinks per column index. |
r |
int |
Yes | - | Row index (1-based). |
Chart¶
Chart metadata including series and layout.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
chart_type |
str |
Yes | - | Chart type (e.g., Column, Line). |
error |
str \| None |
No | None | Extraction error detail if any. |
h |
int \| None |
No | None | Chart height (None if unknown). |
l |
int |
Yes | - | Left offset (Excel units). |
name |
str |
Yes | - | Chart name. |
series |
list[ChartSeries] |
Yes | - | Series included in the chart. |
t |
int |
Yes | - | Top offset (Excel units). |
title |
str \| None |
No | None | Chart title. |
w |
int \| None |
No | None | Chart width (None if unknown). |
y_axis_range |
list[float] |
No | Y-axis range [min, max] when available. | |
y_axis_title |
str |
Yes | - | Y-axis title. |
ChartSeries¶
Series metadata for a chart.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
name |
str |
Yes | - | Series display name. |
name_range |
str \| None |
No | None | Range reference for the series name. |
x_range |
str \| None |
No | None | Range reference for X axis values. |
y_range |
str \| None |
No | None | Range reference for Y axis values. |
DestinationOptions¶
Destinations for optional side outputs.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
auto_page_breaks_dir |
str \| Path \| None |
No | None | Directory to write auto page-break files. |
print_areas_dir |
str \| Path \| None |
No | None | Directory to write per-print-area files. |
sheets_dir |
str \| Path \| None |
No | None | Directory to write per-sheet files. |
stream |
TextIO \| None |
No | None | Stream override for primary output (stdout/file). |
FilterOptions¶
Include/exclude filters for output.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
include_auto_print_areas |
bool |
No | False | Include COM-computed auto page-break areas. |
include_chart_size |
bool \| None |
No | None | Include chart size; None -> auto (verbose=True, others=False). |
include_charts |
bool |
No | True | Include charts. |
include_print_areas |
bool \| None |
No | None | Include print areas; None -> auto (light=False, others=True). |
include_rows |
bool |
No | True | Include cell rows. |
include_shape_size |
bool \| None |
No | None | Include shape size; None -> auto (verbose=True, others=False). |
include_shapes |
bool |
No | True | Include shapes. |
include_tables |
bool |
No | True | Include table candidate ranges. |
FormatOptions¶
Formatting options for serialization.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
fmt |
Literal['json', 'yaml', 'yml', 'toon'] |
No | 'json' | Serialization format. |
indent |
int \| None |
No | None | Indent width for JSON (defaults to 2 when pretty is True). |
pretty |
bool |
No | False | Pretty-print JSON output. |
OutputOptions¶
Output-time options for ExStructEngine.
- format: serialization format/indent.
- filters: include/exclude flags (rows/shapes/charts/tables/print_areas, size flags).
- destinations: side outputs (per-sheet, per-print-area, stream override).
Legacy flat fields (fmt, pretty, indent, include_*, sheets_dir, print_areas_dir, stream) are still accepted and normalized into the nested structures.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
destinations |
DestinationOptions |
No | Side output destinations. | |
filters |
FilterOptions |
No | Include/exclude flags. | |
format |
FormatOptions |
No | Formatting options. |
PrintArea¶
Cell coordinate bounds for a print area.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
c1 |
int |
Yes | - | Start column (1-based). |
c2 |
int |
Yes | - | End column (1-based, inclusive). |
r1 |
int |
Yes | - | Start row (1-based). |
r2 |
int |
Yes | - | End row (1-based, inclusive). |
PrintAreaView¶
Slice of a sheet restricted to a print area (manual or auto).
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
area |
PrintArea |
Yes | - | Print area bounds. |
book_name |
str |
Yes | - | Workbook name owning the area. |
charts |
list[Chart] |
No | Charts overlapping the area. | |
rows |
list[CellRow] |
No | Rows within the area bounds. | |
shapes |
list[Shape] |
No | Shapes overlapping the area. | |
sheet_name |
str |
Yes | - | Sheet name owning the area. |
table_candidates |
list[str] |
No | Table candidates intersecting the area. |
Shape¶
Shape metadata (position, size, text, and styling).
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
begin_arrow_style |
int \| None |
No | None | Arrow style enum for the start of a connector. |
direction |
Literal['E', 'SE', 'S', 'SW', 'W', 'NW', 'N', 'NE'] \| None |
No | None | Connector direction (compass heading). |
end_arrow_style |
int \| None |
No | None | Arrow style enum for the end of a connector. |
h |
int \| None |
No | None | Shape height (None if unknown). |
l |
int |
Yes | - | Left offset (Excel units). |
rotation |
float \| None |
No | None | Rotation angle in degrees. |
t |
int |
Yes | - | Top offset (Excel units). |
text |
str |
Yes | - | Visible text content of the shape. |
type |
str \| None |
No | None | Excel shape type name. |
w |
int \| None |
No | None | Shape width (None if unknown). |
SheetData¶
Structured data for a single sheet.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
auto_print_areas |
list[PrintArea] |
No | COM-computed auto page-break areas. | |
charts |
list[Chart] |
No | Charts detected on the sheet. | |
print_areas |
list[PrintArea] |
No | User-defined print areas. | |
rows |
list[CellRow] |
No | Extracted rows with cell values and links. | |
shapes |
list[Shape] |
No | Shapes detected on the sheet. | |
table_candidates |
list[str] |
No | Cell ranges likely representing tables. |
WorkbookData¶
Workbook-level container with per-sheet data.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
book_name |
str |
Yes | - | Workbook file name (no path). |
sheets |
dict[str, SheetData] |
Yes | - | Mapping of sheet name to SheetData. |