Tags: tutorial, advanced-user, schema

Define a LinkAhead Schema with YAML#

Note

This page has been migrated from the old documentation, and has not yet been fully revised. There might be inconsistencies or errors when using with current LinkAhead versions.

The caosadvancedtools library features the possibility to create and update LinkAhead Schemas using a YAML file.

Let’s start with an example taken from schema.yml in the library sources.

Project:
   obligatory_properties:
      projectId:
         datatype: INTEGER
         description: 'UID of this project'
Person:
   recommended_properties:
      firstName:
         datatype: TEXT
         description: 'first name'
      lastName:
         datatype: TEXT
         description: 'last name'
LabbookEntry:
   recommended_properties:
      Project:
      entryId:
         datatype: INTEGER
         description: 'UID of this entry'
      responsible:
         datatype: Person
         description: 'the person responsible for these notes'
      textElement:
         datatype: TEXT
         description: 'a text element of a labbook recording'
      associatedFile:
         datatype: FILE
         description: 'A file associated with this recording'
      table:
         datatype: FILE
         description: 'A table document associated with this recording'

This example defines 3 RecordTypes:

  • A Project with one obligatory property datatype

  • A Person with a firstName and a lastName (as recommended properties)

  • A LabbookEntry with multiple recommended properties of different data types

One major advantage of using this interface (in contrast to the standard python interface) is that properties can be defined and added to RecordTypes “on-the-fly”. E.g. the three lines for firstName as sub entries of Person have two effects on LinkAhead:

  • A new property with name firstName, datatype TEXT and description first name is inserted (or updated, if already present) into LinkAhead.

  • The new property is added as a recommended property to RecordType Person.

Any further occurrences of firstName in the yaml file will reuse the definition provided for Person.

Note the difference between the three property declarations of LabbookEntry:

  • Project: This RecordType is added directly as a property of LabbookEntry. Therefore, it does not specify any further attributes. Compare to the original declaration of RecordType Project.

  • responsible: This defines and adds a property with name “responsible” to LabbookEntry, which has a datatype Person. Person is defined above.

  • firstName: This defines and adds a property with the standard data type TEXT to RecordType Person.

If the Schema depends on RecordTypes or properties which already exist in LinkAhead, those can be added using the extern keyword: extern takes a list of previously defined names of Properties and/or RecordTypes. Note that if you happen to use an already existing REFERENCE property that has an already existing RecordType as datatype, you also need to add that RecordType’s name to the extern list, e.g.,

extern:
  # Let's assume the following is a reference property with datatype Person
  - Author
  # We need Person (since it's the datatype of Author) even though we might
  # not use it explicitly
  - Person

Dataset:
  recommended_properties:
    Author:

Reusing Properties#

Properties defined once (either as a property of a Record or as a separate Property) can be reused later in the yaml file. That requires that after the first occurrence of the property, the attributes have to be empty. Otherwise, the reuse of the property would be conflicting with its original definition.

Example#

Project:
  obligatory_properties:
    projectId:
      datatype: INTEGER
      description: 'UID of this project'
    date:
      datetype: DATETIME
      description: Date of a project or an experiment

Experiment:
  obligatory_properties:
    experimentId:
      datatype: INTEGER
      description: 'UID of this experiment'
    date:  # no further attributes here, since property was defined above in 'Project'!

The above example defines two Records: Project and Experiment The property date is defined upon its first occurrence as a property of Project. Later, the same property is also added to Experiment where no additional attributes are allowed to specify.

Datatypes#

You can use any data type understood by LinkAhead as datatype attribute in the Schema yaml.

List attributes are a bit special:

datatype: LIST<DOUBLE>

declares a list datatype of DOUBLE elements.

datatype: LIST<Project>

declares a list of elements with datatype Project.

Keywords#

  • importance: Importance of this entity. Possible values: “recommended”, “obligatory”, “ suggested”

  • datatype: The datatype of this property, e.g. TEXT, INTEGER or Project.

  • unit: The unit of the property, e.g. “m/s”.

  • description: A description for this entity.

  • enum-names: List of possible values which a RecordType, which is used for enumeration only, can have. These are created as Records without any properties, and the Records’ names set to the values. The values (and thus enum names) do not have to be unique across RecordTypes: for example there may be Other enum values for different RecordTypes.

  • recommended_properties: Add properties to this entity with importance “recommended”.

  • obligatory_properties: Add properties to this entity with importance “obligatory”.

  • suggested_properties: Add properties to this entity with importance “suggested”.

  • inherit_from_XXX: This keyword accepts a list of other RecordTypes. Those RecordTypes are added as parents, and all Properties with at least the importance XXX are inherited. For example, inherited_from_recommended will inherit all Properties of importance recommended and obligatory, but not suggested.

Usage#

You can use the yaml parser directly in python as follows:

from caosadvancedtools.models import parser as parser
schema = parser.parse_model_from_yaml("model.yml")

This creates a DataModel object containing all entities defined in the yaml file.

If the parsed Schema shall be appended to a pre-exsting Schema, the optional existing_model can be used:

new_schema = parser.parse_model_from_yaml("schema.yml", existing_model=old_schema)

You can now use the functions from DataModel to synchronize the Schema with a LinkAhead instance:

schema.sync_data_model()