--- last_review: "2025-01-01" last_reviewer: "-" documented_code: [] --- ```{tags} tutorial, crawler, advanced-user ``` # Transform Functions :::{note} This page has been migrated from the old documentation, and has not yet been fully revised. There might be inconsistencies or errors when using with current LinkAhead versions. ::: % TODO: Issue: https://gitlab.indiscale.com/caosdb/src/linkahead-docs/-/issues/83 % TODO: Archive documentation if for old crawler, Split into tutorial + other documents At times, you might not be able to use a value as it is found, and need to post-process it: Maybe an integer should be increased by an offset or a string should be split into a list of substrings. In order to make such simple conversions possible, transform functions can be used in a converter definition to modify variable values, by specifying the function to use, and the input and output variables it should be given as input and to write the output to, respectively. ```yaml : type: match: ".*" transform: : in: $ out: $ functions: - : # name of the function to be applied : # key value pairs that are passed as parameters : # ... ``` An example that splits the variable `a` and puts the generated list in `b` is the following: ```yaml Experiment: type: Dict match: ".*" transform: param_split: in: $a out: $b functions: - split: # split is a function that is defined by default marker: "|" # its only parameter is the marker that is used to split the string records: Report: tags: $b ``` In this example, the transformer splits the string in '\$a' and stores the resulting list in '\$b', which is then added to the Report {term}`Record` as a list valued {term}`property ` Note that from LinkAhead {term}`Crawler` 0.11.0 onwards, the value of `marker` parameter in the above example can also be read in from a variable in the usual `$` notation: ```yaml # ... variable ``separator`` is defined somewhere above this part, e.g., # by reading a config file. Experiment: type: Dict match: ".*" transform: param_split: in: $a out: $b functions: - split: marker: $separator # Now the separator is read in from a # variable, so we can, e.g., change from # '|' to ';' without changing the cfood # definition. records: Report: tags: $b ``` There are a number of transform functions that are defined by the crawler itself and therefore available by default (see `src/caoscrawler/default_transformers.yml`). You can define custom transform functions by adding them to the [cfood definition](./cfood). ## Custom Transformers Custom transformers are implemented as python functions adhering to the transformer function signature. They need to be registered in the cfood definition in order to be available during the scanning process. Let's assume we want to implement a transformer that replaces all occurrences of single letters in the value of a variable with a different letter each. So passing "abc" as `in_letters` and "xyz" as `out_letters` would transform the string "scan started" into "szxn stxrted". We could implement this in python using the following code: ```python def replace_letters(in_value: Any, in_parameters: dict) -> Any: """ Replace letters in variables """ # The arguments to the transformer (as given by the definition in the cfood) # are contained in `in_parameters`. We need to make sure they are set or # set their defaults otherwise: if "in_letters" not in in_parameters: raise RuntimeError("Parameter `in_letters` missing.") if "out_letters" not in in_parameters: raise RuntimeError("Parameter `out_letters` missing.") l_in = in_parameters["in_letters"] l_out = in_parameters["out_letters"] if len(l_in) != len(l_out): raise RuntimeError("`in_letters` and `out_letters` must have the same length.") for l1, l2 in zip(l_in, l_out): in_value = in_value.replace(l1, l2) return in_value ``` This code needs to be put into a module that can be found during runtime of the crawler. One possibility is to install the package into the same virtual environment that is used to run the crawler. Then, the transfomer needs to be registered in the cfood. In this example, the function `replace_letters` would be in a file called `replace_letters.py`, which is stored in a package called `utilities`. ```yaml --- metadata: crawler-version: 0.10.2 macros: --- Converters: # put custom converters here Transformers: replace_letters: # This name will be made available in the cfood function: replace_letters package: utilities.replace_letters ``` The transformer can then be used in a converter: ```yaml Experiment: type: Dict match: ".*" transform: replace_letters: in: $a out: $b functions: - replace_letters: # This is the name of our custom transformer in_letters: "abc" out_letters: "xyz" records: Report: tags: $b ```