---
last_review: "2025-01-01"
last_reviewer: "-"
documented_code: []
---

```{tags} tutorial, crawler, advanced-user, administrator
```

# Crawler Tutorial: Parameter File

:::{note}
This page has been migrated from the old documentation, and has not yet been fully revised.
There might be inconsistencies or errors when using with current LinkAhead versions.
:::
% TODO: Issue: https://gitlab.indiscale.com/caosdb/src/linkahead-docs/-/issues/83
% TODO: Archive documentation if for old crawler

## Our data

In the "HelloWorld" Example, the {term}`Record` which was synchronized with the server was created
manually using the Python client. Now, we want to have a look at how the {term}`Crawler` can be used
to automate the creation of Records from data.

The Crawler needs instructions on what kind of Records it should create given the data that it sees.
This is done using so called "{term}`CFood`" YAML files.

Let’s once again start with something simple. A common scenario is that we want to insert the
contents of a parameter file. The parameter file may be named `params_2022-02-02.json` and look like
the following:

```{code-block} json
:caption: params_2022-02-02.json

{
  "frequency": 0.5,
  "resolution": 0.01
}
```

This data describes the two known {term}`Properties <Property>` of our Experiment, and the date in
the file name is the date it was conducted. This means the data model could be described in a
`model.yml` like this:

```{code-block} yaml
:caption: model.yml
Experiment:
  recommended_properties:
    frequency:
      datatype: DOUBLE
    resolution:
      datatype: DOUBLE
    date:
      datatype: DATETIME
```

We assume that there will be at most experiment per day, and that we can identify experiments using
only the date, so the `identifiable.yml` is:

```{code-block} yaml
:caption: identifiable.yml

Experiment:
  - date
```

## Getting started with the CFood

CFoods (Crawler configurations) can be stored in YAML files: The following section in a `cfood.yml`
tells the Crawler that the key value pair `frequency: 0.5` shall be used to set the Property
"frequency" of an "Experiment" Record:

```yaml
...
my_frequency:  # just the name of this section
  type: FloatElement  # it is a float value
  match_name: ^frequency$  # regular expression: Match the 'frequency' key from the data json
  match_value: ^(?P<freq_value>.*)$  # regular expression: We match any value of that key
  records:
    Experiment:
      frequency: $freq_value
...
```

The first part of this section defines which kind of data element will be handled. In this example,
this is a key-value pair with the key "frequency" and a float value. We then use this to set the
"frequency" Property.

To explain in some more detail, let's look at what the regular expressions do:

- `^frequency$` assures that the key is exactly "frequency". "^" matches the beginning of the string
  and "\$" matches the end.
- `^(?P<freq_value>.*)$` creates a *named match group* with the name "freq_value". The pattern
  within this group is ".*": The dot matches any character and the star indicates that the preceding
  character can occur any number of times, which means that this expression matches any string and
  assigns it to the group with the name `freq_value`.

We can then use the values assigned to a group as a variable. In the above example, we use
`frequency: $freq_value` to assign the extracted frequency value to the frequency Property of our
new Experiment.

:::{note}
For more information on the ``cfood.yml`` specification, read on in the chapter [CFoods](./cfood).
:::

## A fully grown CFood

To give some context on how this section extracting the experiments frequency is included in a
complete CFood to create the full Experiment Record, the full CFood file `cfood.yml` for this
example might look like the following:

```{code-block} yaml
:caption: cfood.yml

---
metadata:
  crawler-version: 0.5.0
---
directory: # corresponds to the directory given to the crawler
  type: Directory
  match: .* # we do not care how it is named here
  subtree:
    parameterfile:  # corresponds to our parameter file
      type: JSONFile
      match: params_(?P<date>\d+-\d+-\d+)\.json # extract the date from the parameter file
      records:
        Experiment: # one Experiment is associated with the file
          date: $date # the date is taken from the file name
      subtree:
        dict:  # the JSON contains a dictionary
          type: Dict
          match: .* # the dictionary does not have a meaningful name
          subtree:
            my_frequency: # here we parse the frequency...
              type: FloatElement
              match_name: frequency
              match_value: (?P<val>.*)
              records:
                Experiment:
                  frequency: $val
            resolution: # ... and here the resolution
              type: FloatElement
              match_name: resolution
              match_value: (?P<val>.*)
              records:
                Experiment:
                  resolution: $val
```

You do not need to understand every aspect of this definition, a detailed tutorial on creating a
full CFood will be in the [next section](./cfood.md). For now, we want to see it running!

The crawler can now be run with the following command (assuming that the CFood file is in the
current working directory):

```sh
caosdb-crawler -s update -i identifiables.yml cfood.yml .
```

:::{note}
`caosdb-crawler` currently only works with cfoods which have a directory as top level element.
:::