--- last_review: "2025-01-01" last_reviewer: "-" documented_code: [] --- ```{tags} tutorial ``` # CFood-Definition :::{note} This page has been migrated from the old documentation, and has not yet been fully revised. There might be inconsistencies or errors when using with current LinkAhead versions. ::: % TODO: Issue: https://gitlab.indiscale.com/caosdb/src/linkahead-docs/-/issues/83 % TODO: Merge cfood.md and cfood_definition.md and split into tutorial / explanation, Archive % TODO: documentation if for old crawler. [CFoods](/explanation/crawler/index.md#cfoods) specify how data from a file hierarchy is mapped to LinkAhead {term}`Records `. In the simplest case, the {term}`CFood` is just one yaml file with a single document including at least a converter tree specification, as will be explained in {ref}`example 1`). If metadata and macro definitions are provided, there **must** be a second document with these definitions preceeding the converter tree specification. This second document may be in the same yaml file, in which two separate yaml documents can be defined using the `---` syntax. It is highly recommended to specify the version of the LinkAhead {term}`crawler ` for which the cfood is written in the metadata section, see {ref}`below`. There may be some examples in which the custom converter definition is included in the converter tree document for historical reasons, see {ref}`example 2`. This feature is deprecated, and the converter definition should instead be included in the metadata and [macro](./macros) document (see [below](cfood-tutorial-example-4)). ## Examples A single document with a converter tree specification: (cfood-tutorial-example-1)= ```yaml extroot: type: Directory match: ^extroot$ subtree: DataAnalysis: type: Directory match: DataAnalysis # (...) ``` A single document with a converter tree specification and a custom converters section: (cfood-tutorial-example-2)= ```yaml Converters: CustomConverter_1: package: mypackage.converters converter: CustomConverter1 CustomConverter_2: package: mypackage.converters converter: CustomConverter2 extroot: type: Directory match: ^extroot$ subtree: DataAnalysis: type: Directory match: DataAnalysis # (...) ``` A yaml multi-document, defining metadata and some macros in the first document and declaring two custom converters in the second document. Using this syntax is not recommended, the preferred syntax for this can be seen in {ref}`Example 4`). (cfood-tutorial-example-3)= ```yaml --- metadata: name: Datascience CFood description: CFood for data from the local data science work group crawler-version: 0.2.1 macros: - !defmacro name: SimulationDatasetFile params: match: null recordtype: null nodename: null definition: # (...) --- Converters: CustomConverter_1: package: mypackage.converters converter: CustomConverter1 CustomConverter_2: package: mypackage.converters converter: CustomConverter2 extroot: type: Directory match: ^extroot$ subtree: DataAnalysis: type: Directory match: DataAnalysis # (...) ``` The **recommended way** of defining metadata, custom converters, macros and the main cfood specification is shown in the following code example: (cfood-tutorial-example-4)= ```yaml --- metadata: name: Datascience CFood description: CFood for data from the local data science work group crawler-version: 0.2.1 macros: - !defmacro name: SimulationDatasetFile params: match: null recordtype: null nodename: null definition: # (...) Converters: CustomConverter_1: package: mypackage.converters converter: CustomConverter1 CustomConverter_2: package: mypackage.converters converter: CustomConverter2 --- extroot: type: Directory match: ^extroot$ subtree: DataAnalysis: type: Directory match: DataAnalysis # (...) ``` ### List Mode Specifying values of {term}`properties ` can make use of two special characters, in order to automatically create lists or multi properties instead of single values: ```yaml Experiment1: Measurement: +Measurement # Element in List (list is cleared before run) *Measurement # Multi Property (properties are removed before run) Measurement # Overwrite ``` ### Values and units Property values can be specified as a simple strings (as above) or as a dictionaries that may also specify the [collection mode](#list-mode). Strings starting with a "\$" will be replaced by a corresponding variable if there is any. See the [tutorials chapter](/tutorial/crawler/index.md) of this documentation for more elaborate examples on how the variable replacement works exactly. A simple example could look the following. ```yaml ValueElt: type: TextElement match_name: ^my_prop$ match_value: "(?P.*)" # Anything in here is stored in the variable "value" records: MyRecord: MyProp: $value # will be replace by whatever is stored in the "value" variable set above. ``` If not given explicitly, the collection mode will be determined from the first character of the property value as explained above. This means the following three definitions are all equivalent: ```yaml MyProp: +$value ``` ```yaml MyProp: value: +$value ``` and ```yaml MyProp: value: $value collection_mode: list ``` Units of numeric values can be set by providing a property value as a mapping with two entries, which has the `value` and `unit` keys, as shown in this example: ```yaml ValueWithUnitElt: type: TextElement match_name: ^my_prop$ match_value: "^(?P\\d+\\.?\\d*)\\s+(?P.+)" # Extract value and unit from a string which # has a number followed by at least one whitespace # character followed by a unit. records: MyRecord: MyProp: value: $number unit: $unit ``` ### File Entities In order to use File {term}`Entities `, you must set the appropriate `role: File`. Additionally, the path and file keys have to be given, with values that set the paths remotely and locally, respectively. You can use the variable `_path`, which is automatically created by converters dealing with file system related {term}`StructureElements `. The file object itself is stored in a variable with the same name, as is the case for other Records. ```yaml somefile: type: SimpleFile match: ^params.*$ # match any file that starts with "params" records: fileEntity: role: File # necessary to create a File Entity path: somefile.path # defines the path in LinkAhead file: somefile.path # path where the file is found locally SomeRecord: ParameterFile: $fileEntity # creates a reference to the file ``` ### Transform Functions You can use transform functions to alter variable values that the crawler consumes (e.g. a string that was matched with a regular expression). For more information, refer to the [Converter](./standard_converters) and [Transform Functions](./transform_functions) tutorials. You can define your own transform functions by adding them the same way you add custom converters: ```yaml Transformers: transform_foo: package: some.package function: some_foo ``` ## Automatically generated keys Some variable names are automatically generated and can be used with the `$` syntax. These include: - ``: access the path of converter names to the current converter - `.path`: defined only for file system related converters, contains the file system path to the structure element. You need curly brackets to use them: `${.path}` - ``: all entities created in the `records` section are available under the same key as used in that section