--- last_review: "2025-01-01" last_reviewer: "-" documented_code: [] --- ```{tags} tutorial, crawler ``` # Crawler Tutorial: Hello World :::{note} This page has been migrated from the old documentation, and has not yet been fully revised. There might be inconsistencies or errors when using with current LinkAhead versions. ::: % TODO: Issue: https://gitlab.indiscale.com/caosdb/src/linkahead-docs/-/issues/84 % TODO: Archive documentation if for old crawler, Rework to be easier to follow (e.g. make model.yml % TODO: downloadable) This tutorial demonstrates a basic usage of the LinkAhead {term}`Crawler` as part of a Python script. ## Setting up the data model ## For this example, we need a very simple data model. You can insert it into your CaosDB instance by saving the following to a file called `model.yml`: ```yaml HelloWorld: recommended_properties: time: datatype: DATETIME note: datatype: TEXT ``` and insert the model using ```sh python -m caosadvancedtools.models.parser model.yml --sync ``` Let's look first at how the CaosDB Crawler synchronizes {term}`Records ` that are created locally with those that might already exist on the CaosDB server. For this you need a file called `identifiables.yml` with this content: ```yaml HelloWorld: - name ``` ## Synchronizing data ## Then you can do the following interactively in the IPython shell. But we recommend that you copy the code into a script and execute it to spare yourself typing. ```python import linkahead as db from datetime import datetime from caoscrawler import Crawler, SecurityMode from caoscrawler.identifiable_adapters import CaosDBIdentifiableAdapter # Create a Record that will be synced hello_rec = db.Record(name="My first Record") hello_rec.add_parent("HelloWorld") hello_rec.add_property(name="time", value=datetime.now().isoformat()) # Create a Crawler instance that we will use for synchronization crawler = Crawler(securityMode=SecurityMode.UPDATE) # This defines how Records on the server are identified with the ones we have locally identifiables_definition_file = "identifiables.yml" ident = CaosDBIdentifiableAdapter() ident.load_from_yaml_definition(identifiables_definition_file) crawler.identifiableAdapter = ident # Here we synchronize the Record inserts, updates = crawler.synchronize(commit_changes=True, unique_names=True, crawled_data=[hello_rec]) print(f"Inserted {len(inserts)} Records") print(f"Updated {len(updates)} Records") ``` Now, start by executing the code. What happens? The output suggests that one {term}`entity ` was inserted. Please go to the web interface of your instance and have a look. You can use the query `FIND HelloWorld`. You should see a brand-new Record with a current time stamp. So, how did this happen? In our script, we created a "HelloWorld" Record and gave it to the Crawler. The Crawler checks how "HelloWorld" Records are identified. We told the Crawler with our `identifiables.yml` that Records with this RecordType are identified by name, so the Crawler checked whether a "HelloWorld" Record with the name "My first Record" exists on the Server. As this was not the case, the Record that we provided was inserted in the Server. ## Running the synchronization again ## Now, run the script again. What happens? There is an update! As our Record "My first Record" was inserted in the last script execution, this time, a Record with the required name existed. Therefore, the "time" {term}`Property` of the existing Record was updated. The Crawler does not change Properties that are not present in the local data. This means that if you add a "note" Property to the Record in the server, for example with the edit mode in the web interface and run the script again, this Property is kept unchanged. This means that you can extend Records that were created using the Crawler using other methods of interfacing with LinkAhead. Note that if you change the name of the "HelloWorld" Record in the script and run it again, a new Record is inserted by the Crawler. This is because in the `identifiables.yml` we told the Crawler that it should use the *name* to check whether a "HelloWorld" Record already exists in the Server, which means it cannot identify our record with the changed name with the Record created before. So far, you saw how the Crawler handles synchronization in a very simple scenario. In the following tutorials, you will learn what this looks like if there are multiple connected Records involved, which may have to be identified with more complex combinations of properties. Also, we created the Record manually in this example, while the typical use case is to create it automatically from files or directories. How this is done will also be shown in the following chapters.