caosadvancedtools.table_json_conversion.xlsx_utils module#
General utilities to work with XLSX files with (hidden) column and row annotations and typing.
The most prominent functions are:
p2s: Path to string:["some", "path"] -> "some.path"read_or_dict: Load JSON object from path, file or dict.
This module also defines these enums:
ColumnType
RowType
- class caosadvancedtools.table_json_conversion.xlsx_utils.ColumnType(*values)#
Bases:
Enumcolumn types enum
- FOREIGN = 3#
- IGNORE = 5#
- LIST = 2#
- MULTIPLE_CHOICE = 4#
- SCALAR = 1#
- class caosadvancedtools.table_json_conversion.xlsx_utils.RowType(*values)#
Bases:
Enumrow types enum
- COL_TYPE = 1#
- IGNORE = 3#
- PATH = 2#
- caosadvancedtools.table_json_conversion.xlsx_utils.array_schema_from_model_schema(model_schema: dict) dict#
Convert a data model schema to a data array schema.
Practically, this means that the top level properties are converted into lists. In a simplified notation, this can be expressed as:
array_schema = { elem: [elem typed data...] for elem in model_schema }- Parameters:
model_schema (dict) – The schema description of the data model. Must be a json schema object, with a number of object typed properties.
- Returns:
array_schema – A corresponding json schema, where the properties are arrays with the types of the input’s top-level properties.
- Return type:
dict
- caosadvancedtools.table_json_conversion.xlsx_utils.get_column_type_row_index(sheet: Worksheet)#
Return the row index (0-indexed) of the row which defines the column types.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_data_columns(sheet: Worksheet) dict[str, SimpleNamespace]#
Return the data paths of the worksheet.
- Returns:
out – The keys are the stringified paths. The values are SimpleNamespace objects with
index,pathandcolumnattributes.- Return type:
dict[str, SimpleNamespace]
- caosadvancedtools.table_json_conversion.xlsx_utils.get_defining_paths(workbook: Workbook) dict[str, list[list[str]]]#
For all sheets in
workbook, list the paths which they define.A sheet is said to define a path, if it has data columns for properties inside that path. For example, consider the following worksheet:
COL_TYPE
SCALAR
SCALAR
LIST
SCALAR
PATH
Training
Training
Training
Training
PATH
url
date
subjects
supervisor
PATH
email
example.com/mp
2024-02-27
Math;Physics
example.com/m
2024-02-27
Math
This worksheet defines properties for the paths [“Training”] and [“Training”, “supervisor”], and thus these two path lists would be returned for the key with this sheet’s sheetname.
- Parameters:
workbook (Workbook) – The workbook to analyze.
- Returns:
out – A dict with worksheet names as keys and lists of paths (represented as string lists) as values.
- Return type:
dict[str, list[list[str]]
- caosadvancedtools.table_json_conversion.xlsx_utils.get_foreign_key_columns(sheet: Worksheet) dict[str, SimpleNamespace]#
Return the foreign keys of the worksheet.
- Returns:
out – The keys are the stringified paths. The values are SimpleNamespace objects with
index,pathandcolumnattributes.- Return type:
dict[str, SimpleNamespace]
- caosadvancedtools.table_json_conversion.xlsx_utils.get_path_position(sheet: Worksheet) tuple[list[str], str]#
Return a path which represents the parent element, and the sheet’s “proper name”.
For top-level sheets / entries (those without foreign columns), the path is an empty list.
A sheet’s “proper name” is detected from the data column paths: it is the first component after the parent components.
- Returns:
parent (list[str]) – Path to the parent element. Note that there may be list elements on the path which are not represented in this return value.
proper_name (str) – The “proper name” of this sheet. This defines an array where all the data lives, relative to the parent path.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_path_rows(sheet: Worksheet)#
Return the 0-based indices of the rows which represent paths.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_row_type_column_index(sheet: Worksheet)#
Return the column index (0-indexed) of the column which defines the row types.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_subschema(path: list[str], schema: dict) dict#
Return the sub schema at
path.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_worksheet_for_path(path: list[str], defining_path_index: dict[str, list[list[str]]]) str#
Find the sheet name which corresponds to the given path.
- caosadvancedtools.table_json_conversion.xlsx_utils.is_exploded_sheet(sheet: Worksheet) bool#
Return True if this is a an “exploded” sheet.
An exploded sheet is a sheet whose data entries are LIST valued properties of entries in another sheet. A sheet is detected as exploded if and only if it has FOREIGN columns.
- caosadvancedtools.table_json_conversion.xlsx_utils.is_recursively_none(obj: list | dict | None = None)#
Test if
objis None or recursively consists only of None-like objects.- Parameters:
obj (Union[list, dict, None]) – The object to be tested, a json like dict or array, or None.
- caosadvancedtools.table_json_conversion.xlsx_utils.next_row_index(sheet: Worksheet) int#
Return the index for the next data row.
This is defined as the first row without any content.
- caosadvancedtools.table_json_conversion.xlsx_utils.p2s(path: list[str]) str#
Path to string: dot-separated.
Special treatment for jumping levels#
Starting form the last element, if any element starts with one or more dots “.”, one preceding element will be removed for each dot. For example,
["A", "B1.B2", ".C", "..D.E"]will result in"A.D.E"(the elementsB1.B2and.Care removed by the two leading dots in..D.E).
- caosadvancedtools.table_json_conversion.xlsx_utils.parse_multiple_choice(value: Any) bool#
Interpret
valueas a multiple choice input.Truthy values are: - The boolean
True. - The number “1”. - The (case-insensitive) stringstrue,wahr,x,√,yes,ja,y,j.Falsy values are: - The boolean
False. -None, empty strings, lists, dicts. - The number “0”. - The (case-insensitive) stringsfalse,falsch,-,no,nein,n. - Everything else.- Returns:
out – The interpretation result of
value.- Return type:
bool
- caosadvancedtools.table_json_conversion.xlsx_utils.prune_none_objects(obj: list | dict) list | dict#
Recursively replace elements of
obj, that are empty or recursively None, by None.- Parameters:
obj (Union[list, dict]) – The object to be tested, a json like dict or array.
- Returns:
out (Union[list, dict]) – A deep copy, with the None objects and empty sequences replaced by None.
Detailed explanation
——————–
- Elements of lists that are recursively None, shall be removed from the list.
- Elements that are empty lists / dicts are replaced by None.
- caosadvancedtools.table_json_conversion.xlsx_utils.read_or_dict(data: dict | str | TextIO) dict#
If data is a json file name or input stream, read data from there. If it is a dict already, just return it.