caosadvancedtools.cache module#

class caosadvancedtools.cache.AbstractCache(db_file=None, force_creation=False)#

Bases: ABC

check_cache()#

Check whether the cache in db file self.db_file exists and conforms to the latest database schema.

If it does not exist, it will be created using the newest database schema.

If it exists, but the schema is outdated, an exception will be raised.

abstractmethod create_cache()#

Provide an overloaded function here that creates the cache in the most recent version.

abstractmethod get_cache_schema_version()#

A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.

Increase this variable, when changes to the cache tables are made.

get_cache_version()#

Return the version of the cache stored in self.db_file. The version is stored as the only entry in colum schema of table version.

abstractmethod get_default_file_name()#

Supply a default file name for the cache here.

run_sql_commands(commands, fetchall: bool = False)#

Run a list of SQL commands on self.db_file.

Parameters:
  • commands – List of sql commands (tuples) to execute

  • fetchall (bool, optional) – When True, run fetchall as last command and return the results. Otherwise nothing is returned.

class caosadvancedtools.cache.Cache(*args, **kwargs)#

Bases: IdentifiableCache

class caosadvancedtools.cache.IdentifiableCache(db_file=None, force_creation=False)#

Bases: AbstractCache

stores identifiables (as a hash of xml) and their respective ID.

This allows to retrieve the Record corresponding to an indentifiable without querying.

check_existing(ent_hash)#

Check the cache for a hash.

ent_hash: The hash to search for.

Return the ID and the version ID of the hashed entity. Return None if no entity with that hash is in the cache.

create_cache()#

Create a new SQLITE cache file in self.db_file.

Two tables will be created: - identifiables is the actual cache. - version is a table with version information about the cache.

get_cache_schema_version()#

A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.

Increase this variable, when changes to the cache tables are made.

get_default_file_name()#

Supply a default file name for the cache here.

static hash_entity(ent)#

Format an entity as “pretty” XML and return the SHA256 hash.

insert(ent_hash, ent_id, ent_version)#

Insert a new cache entry.

ent_hash: Hash of the entity. Should be generated with Cache.hash_entity ent_id: ID of the entity ent_version: Version string of the entity

insert_list(hashes, entities)#

Insert the ids of entities into the cache

The hashes must correspond to the entities in the list

update_ids_from_cache(entities)#

sets ids of those entities that are in cache

A list of hashes corresponding to the entities is returned

validate_cache(entities=None)#

Runs through all entities stored in the cache and checks whether the version still matches the most recent version. Non-matching entities will be removed from the cache.

entities: When set to a db.Container or a list of Entities

the IDs from the cache will not be retrieved from the CaosDB database, but the versions from the cache will be checked against the versions contained in that collection. Only entries in the cache that have a corresponding version in the collection will be checked, all others will be ignored. Useful for testing.

Return a list of invalidated entries or an empty list if no elements have been invalidated.

class caosadvancedtools.cache.UpdateCache(db_file=None, force_creation=False)#

Bases: AbstractCache

stores unauthorized inserts and updates

If the Guard is set to a mode that does not allow an insert or update, the insert or update can be stored in this cache such that it can be authorized and performed later.

create_cache()#

initialize the cache

get(run_id, querystring)#

returns the pending updates for a given run id

Parameters:#

run_id: the id of the crawler run querystring: the sql query

get_cache_schema_version()#

A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.

Increase this variable, when changes to the cache tables are made.

get_default_file_name()#

Supply a default file name for the cache here.

get_inserts(run_id)#

returns the pending updates for a given run id

Parameters:#

run_id: the id of the crawler run

static get_previous_version(cont)#

Retrieve the current, unchanged version of the entities that shall be updated, i.e. the version before the update

get_updates(run_id)#

returns the pending updates for a given run id

Parameters:#

run_id: the id of the crawler run

insert(cont, run_id, insert=False)#

Insert a pending, unauthorized insert or update

Parameters:
  • cont (Container with the records to be inserted or updated containing the desired) – version, i.e. the state after the update.

  • run_id (int) – The id of the crawler run

  • insert (bool) – Whether the entities in the container shall be inserted or updated.

caosadvancedtools.cache.cleanXML(xml)#
caosadvancedtools.cache.get_pretty_xml(cont)#
caosadvancedtools.cache.put_in_container(stuff)#