caosadvancedtools.cache module#
- class caosadvancedtools.cache.AbstractCache(db_file=None, force_creation=False)#
Bases:
ABC- check_cache()#
Check whether the cache in db file self.db_file exists and conforms to the latest database schema.
If it does not exist, it will be created using the newest database schema.
If it exists, but the schema is outdated, an exception will be raised.
- abstractmethod create_cache()#
Provide an overloaded function here that creates the cache in the most recent version.
- abstractmethod get_cache_schema_version()#
A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.
Increase this variable, when changes to the cache tables are made.
- get_cache_version()#
Return the version of the cache stored in self.db_file. The version is stored as the only entry in colum schema of table version.
- abstractmethod get_default_file_name()#
Supply a default file name for the cache here.
- run_sql_commands(commands, fetchall: bool = False)#
Run a list of SQL commands on self.db_file.
- Parameters:
commands – List of sql commands (tuples) to execute
fetchall (bool, optional) – When True, run fetchall as last command and return the results. Otherwise nothing is returned.
- class caosadvancedtools.cache.Cache(*args, **kwargs)#
Bases:
IdentifiableCache
- class caosadvancedtools.cache.IdentifiableCache(db_file=None, force_creation=False)#
Bases:
AbstractCachestores identifiables (as a hash of xml) and their respective ID.
This allows to retrieve the Record corresponding to an indentifiable without querying.
- check_existing(ent_hash)#
Check the cache for a hash.
ent_hash: The hash to search for.
Return the ID and the version ID of the hashed entity. Return None if no entity with that hash is in the cache.
- create_cache()#
Create a new SQLITE cache file in self.db_file.
Two tables will be created: - identifiables is the actual cache. - version is a table with version information about the cache.
- get_cache_schema_version()#
A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.
Increase this variable, when changes to the cache tables are made.
- get_default_file_name()#
Supply a default file name for the cache here.
- static hash_entity(ent)#
Format an entity as “pretty” XML and return the SHA256 hash.
- insert(ent_hash, ent_id, ent_version)#
Insert a new cache entry.
ent_hash: Hash of the entity. Should be generated with Cache.hash_entity ent_id: ID of the entity ent_version: Version string of the entity
- insert_list(hashes, entities)#
Insert the ids of entities into the cache
The hashes must correspond to the entities in the list
- update_ids_from_cache(entities)#
sets ids of those entities that are in cache
A list of hashes corresponding to the entities is returned
- validate_cache(entities=None)#
Runs through all entities stored in the cache and checks whether the version still matches the most recent version. Non-matching entities will be removed from the cache.
- entities: When set to a db.Container or a list of Entities
the IDs from the cache will not be retrieved from the CaosDB database, but the versions from the cache will be checked against the versions contained in that collection. Only entries in the cache that have a corresponding version in the collection will be checked, all others will be ignored. Useful for testing.
Return a list of invalidated entries or an empty list if no elements have been invalidated.
- class caosadvancedtools.cache.UpdateCache(db_file=None, force_creation=False)#
Bases:
AbstractCachestores unauthorized inserts and updates
If the Guard is set to a mode that does not allow an insert or update, the insert or update can be stored in this cache such that it can be authorized and performed later.
- create_cache()#
initialize the cache
- get(run_id, querystring)#
returns the pending updates for a given run id
Parameters:#
run_id: the id of the crawler run querystring: the sql query
- get_cache_schema_version()#
A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.
Increase this variable, when changes to the cache tables are made.
- get_default_file_name()#
Supply a default file name for the cache here.
- get_inserts(run_id)#
returns the pending updates for a given run id
Parameters:#
run_id: the id of the crawler run
- static get_previous_version(cont)#
Retrieve the current, unchanged version of the entities that shall be updated, i.e. the version before the update
- get_updates(run_id)#
returns the pending updates for a given run id
Parameters:#
run_id: the id of the crawler run
- insert(cont, run_id, insert=False)#
Insert a pending, unauthorized insert or update
- Parameters:
cont (Container with the records to be inserted or updated containing the desired) – version, i.e. the state after the update.
run_id (int) – The id of the crawler run
insert (bool) – Whether the entities in the container shall be inserted or updated.
- caosadvancedtools.cache.cleanXML(xml)#
- caosadvancedtools.cache.get_pretty_xml(cont)#
- caosadvancedtools.cache.put_in_container(stuff)#