caosadvancedtools.table_json_conversion.xlsx_utils module#

General utilities to work with XLSX files with (hidden) column and row annotations and typing.

The most prominent functions are:

  • p2s: Path to string: ["some", "path"] -> "some.path"

  • read_or_dict: Load JSON object from path, file or dict.

This module also defines these enums:

  • ColumnType

  • RowType

class caosadvancedtools.table_json_conversion.xlsx_utils.ColumnType(*values)#

Bases: Enum

column types enum

FOREIGN = 3#
IGNORE = 5#
LIST = 2#
MULTIPLE_CHOICE = 4#
SCALAR = 1#
class caosadvancedtools.table_json_conversion.xlsx_utils.RowType(*values)#

Bases: Enum

row types enum

COL_TYPE = 1#
IGNORE = 3#
PATH = 2#
caosadvancedtools.table_json_conversion.xlsx_utils.array_schema_from_model_schema(model_schema: dict) dict#

Convert a data model schema to a data array schema.

Practically, this means that the top level properties are converted into lists. In a simplified notation, this can be expressed as:

array_schema = { elem: [elem typed data...] for elem in model_schema }

Parameters:

model_schema (dict) – The schema description of the data model. Must be a json schema object, with a number of object typed properties.

Returns:

array_schema – A corresponding json schema, where the properties are arrays with the types of the input’s top-level properties.

Return type:

dict

caosadvancedtools.table_json_conversion.xlsx_utils.get_column_type_row_index(sheet: Worksheet)#

Return the row index (0-indexed) of the row which defines the column types.

caosadvancedtools.table_json_conversion.xlsx_utils.get_data_columns(sheet: Worksheet) dict[str, SimpleNamespace]#

Return the data paths of the worksheet.

Returns:

out – The keys are the stringified paths. The values are SimpleNamespace objects with index, path and column attributes.

Return type:

dict[str, SimpleNamespace]

caosadvancedtools.table_json_conversion.xlsx_utils.get_defining_paths(workbook: Workbook) dict[str, list[list[str]]]#

For all sheets in workbook, list the paths which they define.

A sheet is said to define a path, if it has data columns for properties inside that path. For example, consider the following worksheet:

COL_TYPE

SCALAR

SCALAR

LIST

SCALAR

PATH

Training

Training

Training

Training

PATH

url

date

subjects

supervisor

PATH

email

example.com/mp

2024-02-27

Math;Physics

steve@example.com

example.com/m

2024-02-27

Math

stella@example.com

This worksheet defines properties for the paths [“Training”] and [“Training”, “supervisor”], and thus these two path lists would be returned for the key with this sheet’s sheetname.

Parameters:

workbook (Workbook) – The workbook to analyze.

Returns:

out – A dict with worksheet names as keys and lists of paths (represented as string lists) as values.

Return type:

dict[str, list[list[str]]

caosadvancedtools.table_json_conversion.xlsx_utils.get_foreign_key_columns(sheet: Worksheet) dict[str, SimpleNamespace]#

Return the foreign keys of the worksheet.

Returns:

out – The keys are the stringified paths. The values are SimpleNamespace objects with index, path and column attributes.

Return type:

dict[str, SimpleNamespace]

caosadvancedtools.table_json_conversion.xlsx_utils.get_path_position(sheet: Worksheet) tuple[list[str], str]#

Return a path which represents the parent element, and the sheet’s “proper name”.

For top-level sheets / entries (those without foreign columns), the path is an empty list.

A sheet’s “proper name” is detected from the data column paths: it is the first component after the parent components.

Returns:

  • parent (list[str]) – Path to the parent element. Note that there may be list elements on the path which are not represented in this return value.

  • proper_name (str) – The “proper name” of this sheet. This defines an array where all the data lives, relative to the parent path.

caosadvancedtools.table_json_conversion.xlsx_utils.get_path_rows(sheet: Worksheet)#

Return the 0-based indices of the rows which represent paths.

caosadvancedtools.table_json_conversion.xlsx_utils.get_row_type_column_index(sheet: Worksheet)#

Return the column index (0-indexed) of the column which defines the row types.

caosadvancedtools.table_json_conversion.xlsx_utils.get_subschema(path: list[str], schema: dict) dict#

Return the sub schema at path.

caosadvancedtools.table_json_conversion.xlsx_utils.get_worksheet_for_path(path: list[str], defining_path_index: dict[str, list[list[str]]]) str#

Find the sheet name which corresponds to the given path.

caosadvancedtools.table_json_conversion.xlsx_utils.is_exploded_sheet(sheet: Worksheet) bool#

Return True if this is a an “exploded” sheet.

An exploded sheet is a sheet whose data entries are LIST valued properties of entries in another sheet. A sheet is detected as exploded if and only if it has FOREIGN columns.

caosadvancedtools.table_json_conversion.xlsx_utils.is_recursively_none(obj: list | dict | None = None)#

Test if obj is None or recursively consists only of None-like objects.

Parameters:

obj (Union[list, dict, None]) – The object to be tested, a json like dict or array, or None.

caosadvancedtools.table_json_conversion.xlsx_utils.next_row_index(sheet: Worksheet) int#

Return the index for the next data row.

This is defined as the first row without any content.

caosadvancedtools.table_json_conversion.xlsx_utils.p2s(path: list[str]) str#

Path to string: dot-separated.

Special treatment for jumping levels#

Starting form the last element, if any element starts with one or more dots “.”, one preceding element will be removed for each dot. For example, ["A", "B1.B2", ".C", "..D.E"] will result in "A.D.E" (the elements B1.B2 and .C are removed by the two leading dots in ..D.E).

caosadvancedtools.table_json_conversion.xlsx_utils.parse_multiple_choice(value: Any) bool#

Interpret value as a multiple choice input.

Truthy values are: - The boolean True. - The number “1”. - The (case-insensitive) strings true, wahr, x, , yes, ja, y, j.

Falsy values are: - The boolean False. - None, empty strings, lists, dicts. - The number “0”. - The (case-insensitive) strings false, falsch, -, no, nein, n. - Everything else.

Returns:

out – The interpretation result of value.

Return type:

bool

caosadvancedtools.table_json_conversion.xlsx_utils.prune_none_objects(obj: list | dict) list | dict#

Recursively replace elements of obj, that are empty or recursively None, by None.

Parameters:

obj (Union[list, dict]) – The object to be tested, a json like dict or array.

Returns:

  • out (Union[list, dict]) – A deep copy, with the None objects and empty sequences replaced by None.

  • Detailed explanation

  • ——————–

  • - Elements of lists that are recursively None, shall be removed from the list.

  • - Elements that are empty lists / dicts are replaced by None.

caosadvancedtools.table_json_conversion.xlsx_utils.read_or_dict(data: dict | str | TextIO) dict#

If data is a json file name or input stream, read data from there. If it is a dict already, just return it.