Tags: reference, crawler

CFood-Specification#

Note

This page has been migrated from the old documentation, and has not yet been fully revised. There might be inconsistencies or errors when using with current LinkAhead versions.

CFoods are defined using a YAML find that has to abide by the following specification. The specification is defined using a JSON schema (see src/caoscrawler/cfood-schema.yml). A CFood is basically composed of converter definitions. A converter definition must have the following structure:

properties

  • type

Type of this converter node.

enum

Directory, File, DictTextElement, TextElement, SimpleFile, YamlFileCaosDBRecord, MarkdownFile, DictListElement, ListElement, DictDictElement, DictElement, DictFloatElement, FloatElement, DictIntegerElement, IntegerElement, DictBooleanElement, BooleanElement, Definitions, Dict, Date, Datetime, JSONFile, YAMLFile, CSVTableConverter, XLSXTableConverter, SPSSFile, H5File, H5Dataset, H5Group, H5Ndarray, XMLFile, XMLTag, XMLTextNode, XMLAttributeNode, PropertiesFromDictElement

  • match

typically a regexp which is matched to a structure element name

type

string

  • match_name

a regexp that is matched to the key of a key-value pair

type

string

  • match_value

a regexp that is matched to the value of a key-value pair

type

string

  • match_newer_than_file

Only relevant for Directory. A path to a file containing an ISO-formatted datetime. Only match if the contents of the Directory have been modified after that datetime.

type

string

  • record_from_dict

Only relevant for PropertiesFromDictElement. Specify the root record which is generated from the contained dictionary.

type

object

properties

  • variable_name

Name of the record by which it can be accessed in the cfood definiton. Can also be the name of an existing record in which case that record will be updated by the PropertiesFromDictConverter.

type

string

  • properties_blacklist

List of keys to be ignored in the automatic treatment. They will be ignored on all levels of the dictionary.

type

array

items

type

string

  • references

List of keys that will be transformed into named reference properties.

type

object

additionalProperties

type

object

properties

  • parents

ref:

#/$defs/parents

  • name

Name of this record. If none is given, variable_name is used.

type

string

  • parents

ref:

#/$defs/parents

  • records

This field is used to define new records or to modify records which have been defined on a higher level.

type

object

properties

  • parents

ref:

#/$defs/parents

  • additionalProperties

oneOf

type

object

properties

  • value

Dictionary notation for variable values. Values can be given by a variable which is indicated by an initial “\(". Use "\)$” for setting values actually starting with a dollar sign.

type

string

  • unit

The unit of this property. Units can be given by a variable which is indicated by an initial “\(". Use "\)$” for setting values actually starting with a dollar sign.

type

string

  • collection_mode

The collection mode defines whether the resulting property will be a single property or whether the values of multiple structure elements will be collected either into a list or a multiproperty.

enum

single, list, multiproperty

additionalProperties

False

The short notation for values. Values can be given by a variable which is indicated by an initial “\(". Use "\)$” for setting values actually starting with a dollar sign. Multiproperties can be set using an initial “*” and list properties using an initial “+”.

type

string

  • subtree

type

object

additionalProperties

ref:

#/$defs/converter