---
last_review: "2025-01-01"
last_reviewer: "-"
documented_code: []
---

```{tags} tutorial, crawler
```

# Crawler Tutorial: Hello World

:::{note}
This page has been migrated from the old documentation, and has not yet been fully revised.
There might be inconsistencies or errors when using with current LinkAhead versions.
:::
% TODO: Issue: https://gitlab.indiscale.com/caosdb/src/linkahead-docs/-/issues/84
% TODO: Archive documentation if for old crawler, Rework to be easier to follow (e.g. make model.yml
% TODO: downloadable)

This tutorial demonstrates a basic usage of the LinkAhead {term}`Crawler` as part of a Python
script.

## Setting up the data model ##

For this example, we need a very simple data model. You can insert it into your CaosDB instance by
saving the following to a file called `model.yml`:

```yaml
HelloWorld:
  recommended_properties:
    time:
      datatype: DATETIME
    note:
      datatype: TEXT
```

and insert the model using

```sh
python -m caosadvancedtools.models.parser model.yml --sync
```

Let's look first at how the CaosDB Crawler synchronizes {term}`Records <Record>` that are created
locally with those that might already exist on the CaosDB server.

For this you need a file called `identifiables.yml` with this content:

```yaml
HelloWorld:
  - name
```

## Synchronizing data ##

Then you can do the following interactively in the IPython shell. But we recommend that you copy the
code into a script and execute it to spare yourself typing.

```python
import linkahead as db
from datetime import datetime
from caoscrawler import Crawler, SecurityMode
from caoscrawler.identifiable_adapters import CaosDBIdentifiableAdapter


# Create a Record that will be synced
hello_rec = db.Record(name="My first Record")
hello_rec.add_parent("HelloWorld")
hello_rec.add_property(name="time", value=datetime.now().isoformat())

# Create a Crawler instance that we will use for synchronization
crawler = Crawler(securityMode=SecurityMode.UPDATE)
# This defines how Records on the server are identified with the ones we have locally
identifiables_definition_file = "identifiables.yml"
ident = CaosDBIdentifiableAdapter()
ident.load_from_yaml_definition(identifiables_definition_file)
crawler.identifiableAdapter = ident

# Here we synchronize the Record
inserts, updates = crawler.synchronize(commit_changes=True, unique_names=True,
                                       crawled_data=[hello_rec])
print(f"Inserted {len(inserts)} Records")
print(f"Updated {len(updates)} Records")
```

Now, start by executing the code. What happens? The output suggests that one {term}`entity <Entity>`
was inserted. Please go to the web interface of your instance and have a look. You can use the query
`FIND HelloWorld`. You should see a brand-new Record with a current time stamp.

So, how did this happen? In our script, we created a "HelloWorld" Record and gave it to the Crawler.
The Crawler checks how "HelloWorld" Records are identified. We told the Crawler with our
`identifiables.yml` that Records with this RecordType are identified by name, so the Crawler checked
whether a "HelloWorld" Record with the name "My first Record" exists on the Server. As this was not
the case, the Record that we provided was inserted in the Server.

## Running the synchronization again ##

Now, run the script again. What happens? There is an update! As our Record "My first Record" was
inserted in the last script execution, this time, a Record with the required name existed.
Therefore, the "time" {term}`Property` of the existing Record was updated.

The Crawler does not change Properties that are not present in the local data. This means that if
you add a "note" Property to the Record in the server, for example with the edit mode in the web
interface and run the script again, this Property is kept unchanged. This means that you can extend
Records that were created using the Crawler using other methods of interfacing with LinkAhead.

Note that if you change the name of the "HelloWorld" Record in the script and run it again, a new
Record is inserted by the Crawler. This is because in the `identifiables.yml` we told the Crawler
that it should use the *name* to check whether a "HelloWorld" Record already exists in the Server,
which means it cannot identify our record with the changed name with the Record created before.

So far, you saw how the Crawler handles synchronization in a very simple scenario. In the following
tutorials, you will learn what this looks like if there are multiple connected Records involved,
which may have to be identified with more complex combinations of properties. Also, we created the
Record manually in this example, while the typical use case is to create it automatically from files
or directories. How this is done will also be shown in the following chapters.