caosadvancedtools.table_importer module#

This module allows to read table files like tsv and xls. They are converted to a Pandas DataFrame and checked whether they comply with the rules provided. For example, a list of column names that have to exist can be provided.

This module also implements some converters that can be applied to cell entries.

Those converters can also be used to apply checks on the entries.

class caosadvancedtools.table_importer.CSVImporter(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None, convert_int_to_nullable_int=True)#

Bases: TableImporter

read_file(filename, sep=',', **kwargs)#
class caosadvancedtools.table_importer.TSVImporter(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None, convert_int_to_nullable_int=True)#

Bases: CSVImporter

read_file(filename, **kwargs)#
class caosadvancedtools.table_importer.TableImporter(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None, convert_int_to_nullable_int=True)#

Bases: object

Abstract base class for importing data from tables.

check_columns(df, filename=None)#

Check whether all required columns exist.

Required columns are columns for which converters are defined.

Raises:

DataInconsistencyError

check_dataframe(df, filename=None, strict=False)#

Check if the dataframe conforms to the restrictions.

Checked restrictions are: Columns, data types, uniqueness requirements.

Parameters:
  • df (pandas.DataFrame) – The dataframe to be checked.

  • filename (string, optional) – The file name, only used for output in case of problems.

  • strict (boolean, optional) – If False (the default), try to convert columns, otherwise raise an error.

check_datatype(df, filename=None, strict=False)#

Check for each column whether non-null fields have the correct datatype.

Note

If columns are integer, but should be float, this method converts the respective columns in place. The same for columns that should have string value but have numeric value.

Parameters:

strict (boolean, optional) – If False (the default), try to convert columns, otherwise raise an error.

check_missing(df, filename=None)#

Check in each row whether obligatory fields are empty or null.

Rows that have missing values are removed.

Returns:

out – The input DataFrame with incomplete rows removed.

Return type:

pandas.DataFrame

check_unique(df, filename=None)#

Check whether value combinations that shall be unique for each row are unique.

If a second row is found, that uses the same combination of values as a previous one, the second one is removed.

read_file(filename, **kwargs)#
class caosadvancedtools.table_importer.XLSImporter(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None, convert_int_to_nullable_int=True)#

Bases: TableImporter

read_file(filename, **kwargs)#
read_xls(filename, **kwargs)#

Convert an xls file into a Pandas DataFrame.

The converters of the XLSImporter object are used.

Raises: DataInconsistencyError

caosadvancedtools.table_importer.assure_name_format(name)#

checks whether a string can be interpreted as ‘LastName, FirstName’

caosadvancedtools.table_importer.check_reference_field(ent_id, recordtype)#
caosadvancedtools.table_importer.date_converter(val, fmt='%Y-%m-%d')#

if the value is already a datetime, it is returned otherwise it converts it using format string

caosadvancedtools.table_importer.datetime_converter(val, fmt='%Y-%m-%d %H:%M:%S')#

if the value is already a datetime, it is returned otherwise it converts it using format string

caosadvancedtools.table_importer.incomplete_date_converter(val, fmts=None)#

if the value is already a datetime, it is returned otherwise it converts it using format string

Parameters:
  • val (str) – Candidate value for one of the possible date formats.

  • fmts (dict, optional) – Dictionary containing the possible (incomplete) date formats: keys are the formats into which the input value is tried to be converted, values are the possible input formats.

caosadvancedtools.table_importer.string_in_list(val, options, ignore_case=True)#

Return the given value if it is contained in options, raise an error otherwise.

Parameters:
  • val (str) – String value to be checked.

  • options (list<str>) – List of possible values that val may obtain

  • ignore_case (bool, optional) – Specify whether the comparison of val and the possible options should ignor capitalization. Default is True.

Returns:

val – The original value if it is contained in options

Return type:

str

Raises:

ValueError – If val is not contained in options.

caosadvancedtools.table_importer.win_path_converter(val)#

checks whether the value looks like a windows path and converts it to posix

caosadvancedtools.table_importer.win_path_list_converter(val)#

checks whether the value looks like a list of windows paths and converts it to posix paths

caosadvancedtools.table_importer.yes_no_converter(val)#

converts a string to True or False if possible.

Allowed filed values are yes and no.