Tags: tutorial, developer

Scripting with LinkAhead#

In this tutorial, you will learn how to write a script to work with and update data from a LinkAhead Server.

Prerequisites#

  • You should know the basics of how to use the Linkahead Python client.

  • You need a configured connection to a Linkahead Server.

    • Please download and run the Setup Script with a configured connection to ensure that all necessary RecordTypes, Properties, and example Records exist on your server.

Scenario#

The setup script linked in prerequisites populates LinkAhead with some example data, including scientists and experiments. Before you continue with the tutorial, take a look at the new Entities.

Given a Data CSV containing data on faculties and research groups, the goal of this tutorial is to update the existing data to include this additional information. To achieve this, you will create a new Record for each research group, and update each experiment Record to add the research group in charge of this experiment.

Retrieving the relevant data#

The data CSV contains a row for each research group, with the faculty it belongs to, and a list of people working in this research group separated by semicolon. In the first step, create mappings from person to research group and from research group to faculty:

import csv
from pathlib import Path

# Extract data from CSV
with open(data_dir/"scripting_tutorial.csv") as csv_file:
    csv_reader = csv.DictReader(csv_file)
    group_by_person, faculty_by_group = {}, {}
    for line in csv_reader:
        # For each research group, extract its faculty
        faculty_by_group[line["research group"]] = line['faculty']
        # For each person, save their research group
        for person in line['persons'].split(';'):
            group_by_person[person] = line["research group"]

Updating the schema and data#

Now that you have all the needed information, you need to update the servers schema to support adding it to your Records. First, create a ResearchGroup RecordType with faculty property, for your new research group Records:

import linkahead as db
# Set up research group with a faculty property
faculty_prop = db.Property(name="faculty", datatype=db.TEXT)
faculty_prop.insert()
research_group_rt = db.RecordType(name="ResearchGroup")
research_group_rt.add_property(faculty_prop, importance=db.RECOMMENDED)
research_group_rt.insert()

Then, update the Experiment RecordType to add a responsible_group property, which you will use to link the research group responsible for this experiment:

# Update experiment recordtype to have an associated research group
responsible_group_prop = db.Property(name="responsible_group",
                                     datatype=research_group_rt)
responsible_group_prop.insert()
experiment_rt = db.RecordType(name="Experiment").retrieve()
experiment_rt.add_property(responsible_group_prop)
experiment_rt.update()

Updating the experiments#

To assign research groups to the experiments, they should first be created as Records in LinkAhead. For each entry in the CSV, create a Record of the new ResearchGroup RecordType with the correct name and faculty property, and insert it:

# Create Research Groups from CSV data
for research_group_name, faculty in faculty_by_group.items():
    research_group = db.Record(name=research_group_name)
    research_group.add_parent(research_group_rt)
    research_group.add_property(property=faculty_prop, value=faculty)
    research_group.insert()

Now you can update your experiments. To do this, first retrieve all experiments which have a responsible_scientist from the server. Then match each retrieved Record to its research group based on the responsible scientist, and add it to the experiments responsible_group property.

# Retrieve all experiments which have an associated scientist
experiments = db.execute_query("FIND Experiment with responsible_scientist")
# Update each experiment
experiment_updates = db.Container()
for experiment in experiments:
    # Get responsible scientist
    scientist_id = experiment.get_property("responsible_scientist").value
    scientist = db.Record(id=scientist_id).retrieve().name
    # Add the research group the scientist belongs to
    if scientist in group_by_person:
        group = db.Record(name=group_by_person[scientist])
        group.add_parent(research_group_rt)
        experiment.add_property(property=responsible_group_prop, value=group)
        experiment_updates.append(experiment)
# Update the server
experiment_updates.update()

Validating your results#

Lastly, you can check whether there are any experiment which were not updated:

# Check for data inconsistency
query = "FIND Experiment WHICH DOES NOT HAVE responsible_group"
incomplete_experiments = [rec.name for rec in db.execute_query(query)]
print(f"Experiments missing a group: {incomplete_experiments}")

As you can see, the only experiment without a responsible group is Experiment_5, which is the experiment assigned to Erwin Ehrlich, who is not included in the list of scientists in the data CSV. This might mean that the CSV must be updated to include more research groups, that Experiment_5 needs another responsible scientist, or that Experiment_5 is not associated with a research group.

Further Reading#

Synchronisation of complex structured data can be done automatically using the LinkAhead Crawler. To learn more, continue in the crawler tutorials.

For further information about the Python client and its features, read the Features of Interest document.