Crawler Tutorial: Hello World#

Note

This page has been migrated from the old documentation, and has not yet been fully revised. There might be inconsistencies or errors when using with current LinkAhead versions.

This tutorial demonstrates a basic usage of the LinkAhead Crawler as part of a Python script.

Setting up the data model#

For this example, we need a very simple data model. You can insert it into your CaosDB instance by saving the following to a file called model.yml:

HelloWorld:
  recommended_properties:
    time:
      datatype: DATETIME
    note:
      datatype: TEXT

and insert the model using

python -m caosadvancedtools.models.parser model.yml --sync

Let’s look first at how the CaosDB Crawler synchronizes Records that are created locally with those that might already exist on the CaosDB server.

For this you need a file called identifiables.yml with this content:

HelloWorld:
  - name

Synchronizing data#

Then you can do the following interactively in the IPython shell. But we recommend that you copy the code into a script and execute it to spare yourself typing.

import linkahead as db
from datetime import datetime
from caoscrawler import Crawler, SecurityMode
from caoscrawler.identifiable_adapters import CaosDBIdentifiableAdapter


# Create a Record that will be synced
hello_rec = db.Record(name="My first Record")
hello_rec.add_parent("HelloWorld")
hello_rec.add_property(name="time", value=datetime.now().isoformat())

# Create a Crawler instance that we will use for synchronization
crawler = Crawler(securityMode=SecurityMode.UPDATE)
# This defines how Records on the server are identified with the ones we have locally
identifiables_definition_file = "identifiables.yml"
ident = CaosDBIdentifiableAdapter()
ident.load_from_yaml_definition(identifiables_definition_file)
crawler.identifiableAdapter = ident

# Here we synchronize the Record
inserts, updates = crawler.synchronize(commit_changes=True, unique_names=True,
                                       crawled_data=[hello_rec])
print(f"Inserted {len(inserts)} Records")
print(f"Updated {len(updates)} Records")

Now, start by executing the code. What happens? The output suggests that one entity was inserted. Please go to the web interface of your instance and have a look. You can use the query FIND HelloWorld. You should see a brand-new Record with a current time stamp.

So, how did this happen? In our script, we created a “HelloWorld” Record and gave it to the Crawler. The Crawler checks how “HelloWorld” Records are identified. We told the Crawler with our identifiables.yml that Records with this RecordType are identified by name, so the Crawler checked whether a “HelloWorld” Record with the name “My first Record” exists on the Server. As this was not the case, the Record that we provided was inserted in the Server.

Running the synchronization again#

Now, run the script again. What happens? There is an update! As our Record “My first Record” was inserted in the last script execution, this time, a Record with the required name existed. Therefore, the “time” Property of the existing Record was updated.

The Crawler does not change Properties that are not present in the local data. This means that if you add a “note” Property to the Record in the server, for example with the edit mode in the web interface and run the script again, this Property is kept unchanged. This means that you can extend Records that were created using the Crawler using other methods of interfacing with LinkAhead.

Note that if you change the name of the “HelloWorld” Record in the script and run it again, a new Record is inserted by the Crawler. This is because in the identifiables.yml we told the Crawler that it should use the name to check whether a “HelloWorld” Record already exists in the Server, which means it cannot identify our record with the changed name with the Record created before.

So far, you saw how the Crawler handles synchronization in a very simple scenario. In the following tutorials, you will learn what this looks like if there are multiple connected Records involved, which may have to be identified with more complex combinations of properties. Also, we created the Record manually in this example, while the typical use case is to create it automatically from files or directories. How this is done will also be shown in the following chapters.