--- last_review: "2026-05-05" last_reviewer: "-" documented_code: [] --- ```{tags} tutorial, user ``` # Data Model and Schema Tutorial :::{admonition} Learnings :class: tip Gain a basic understanding of LinkAhead's Data Model, and design a simple Schema of your data ::: ## Introduction Let's assume that you have an *empty* LinkAhead Server running somewhere. Before you can insert any data, you need a {term}`Schema`. This is a specification of the structure of the data you want to work with. LinkAhead does not come with any pre-installed or default Schema. Instead, you design the schema fitting your data -- that is how LinkAhead works. The Schema specifies constraints on the data that will be worked with: How must data be structured to be inserted. Additionally, the Schema determines which search questions LinkAhead can answer. :::{note} The users of LinkAhead create their own Schema based on their needs. ::: To design your Schema, you must be sufficiently familiar with LinkAhead's features, most importantly LinkAhead's {term}`Data Model`. This tutorial will introduce the relevant concepts, and teach you how to design a Schema from scratch. After you have designed the Schema, you will need to store it into the LinkAhead Server before it can be used. As there are several ways to insert a schema, this will be covered by other parts of the documentation. ## A Minimal Example In the example used for this tutorial, LinkAhead will be used to track soil samples. Other potential examples could be bio samples, water samples, tissue samples or something completely different: You could work with data from experiments, observations, surveys, statistical analyses, modelling activities, numerical simulations, or your holiday photos. You name it. In the chosen scenario, you may want to know how many soil samples are in your storage, or generate a list of all samples. Similar tasks could be imagined for any other application area. To start with designing a Schema for LinkAhead, it is necessary to specify the *target* structure of your data. Let's start simple: Ask yourself how you would describe what a soil sample is. Write down the description. The very core of a Schema is a {term}`Vocabulary`. In this context, this is a list of defined terms, mostly the names of objects, properties and relations between the objects. For our soil sample case, we have a very simple Vocabulary already: :::{list-table} Minimal Soil Sample Vocabulary (V1) :widths: 10 30 :header-rows: 1 * - Term - Description * - Soil Sample - A small, representative portion of soil collected from a specific location to analyze its physical, chemical, or biological properties. ::: Now that we have a minimal vocabulary, we need to decide how we utilize LinkAhead's Data Model to best represent the soil samples. There are three central concepts in the LinkAhead Data Model: :::{list-table} Records, RecordTypes, and Properties. :widths: 10 30 * - Record - Represents an individual thing - a single person named "Alice", a particular sample with bar code "xyz". * - RecordType - Represents a type of thing: These are universals, meaning classes, concepts, or categories. Every Record has a RecordType. "Alice"'s record has the RecordType "Person", the record of the soil sample "xyz" has the RecordType "SoilSample". * - Property - Represents the properties of both individuals and universals. For example, "hair color" could be a property of the RecordType person, and "Alice"'s record may have the hair color "brown". ::: To fit our example into these terms, every individual soil sample would be represented as a Record, and all soil sample Records would have a shared RecordType **SoilSample**: :::{card} RecordType ^^^ :name: SoilSample :description: A small, representative portion of soil collected from a specific location to analyze its physical, chemical, or biological properties. ::: :::{figure} /.assets/images/tutorials/introduction/datamodel_and_schema/very_simple_data_model.png :alt: The simplest data model consists of just one RecordType A very simple data model, consisting of only one RecordType ::: % TODO: The figures do not fit the text. Create new figures, ideally in a consistent style across % TODO: all tutorials This minimal schema can already be inserted into a LinkAhead server. ::::{tab-set} :::{tab-item} Python Library ```python from linkahead import RecordType RecordType(name="SoilSample", description="A small, representative portion of soil...").insert() ``` ::: :::{tab-item} Web Interface 1. Browse to 2. Login 3. Click "Edit Mode" in the top panel. A box shows up: "Edit Mode Toolbox" :::{figure} /.assets/images/tutorials/introduction/datamodel_and_schema/edit_mode_button.png :alt: The Edit Mode button appears when a user is logged in. A screenshot of the menu, with the Edit Mode button highlighted ::: 4. Click "Create RecordType" in the Edit Mode Toolbox. A form shows up showing a little green "RT" label in the left upper corner. 5. Insert name and description. 6. Click "save". ::: :::: % TODO: Add links to the full tutorials for Python and WebUI data insertion ## Extending the minimal Schema with Properties In this section you will add Properties to your Schema which describe * The *date* a Sample has been collected. * The unique *sample number* assigned to it via a bar code label on the sample's bag or box. * The *weight* of the sample in grams. While the minimal Schema is enough to create a Record for each soil sample, the Records don't carry any useful data about the samples. This is the point where Properties enter the picture. % TODO: Write section ## Adding Relations Between RecordTypes In this section you will add a new RecordType "Person" and add a property "collected_by" to each sample Record, specifying which Person collected the Sample. :::{figure} /.assets/images/tutorials/introduction/datamodel_and_schema/simple_data_model.png :alt: The minimal Schema extended by inheritance and references A simple Schema, with four RecordTypes ::: % TODO: Write section ## Adding Taxonomies In this section you will add a generic "Sample" RecordType and a "WaterSample" RecordType. Both "WaterSample" and "SoilSample" are specializations of the generic "Sample" RecordType. % TODO: Write section ## Relation Between Schema and Data The Schema describes the generic structure of the data, but what does this mean for your actual data? When you want to store data in LinkAhead, you typically look for a matching *RecordType* and then create *Record*s of that type. This means for example, that a *Record* of type “Cell Culture” should have the *Properties* that the *RecordType* provides: experimenters, lab notes, number of dishes, time of experiment start and the used cell lines. :::{figure} /.assets/images/tutorials/introduction/datamodel_and_schema/data_model_and_data.png :alt: Records are always based upon (at least) one RecordType, this defines the relationship : between Schema and data A diagram showing the relationship between RecordTypes (which are part of the Schema) and Records (the actual data) ::: ## What if I need to change my Schema? % TODO Move to explanation Science moves fast, and your infrastructure should be able to follow just as swiftly. The Schema in LinkAhead was designed to be very flexible, so it can adapt to your needs. There are multiple ways to modify and enhance your Schema, for example you could use the web application or program the changes with the Python client. This document describes how to change the Schema with the LinkAhead web application. ## What happens to my data if the Schema changes? % TODO Move to explanation With most systems, changing the Schema is not possible without migrating the existing data. With LinkAhead however, your “legacy” data can simply stay where it is, it will not be modified. New data, that you enter after changing the Schema, will of course follow your changes and adhere to your “new” Schema. Sometimes it is desirable to migrate old data, e.g., an author property that changed from datatype `TEXT` to a reference to a "Person" Record: You might want to create new Person Records with names stemming from the old `TEXT` values. LinkAhead provides the flexibility to change Schemas in the running system without necessarily having to migrate old data (see below in case you do need to migrate after all). Nevertheless, especially for substantial changes, it is crucial to make backups and, if possible, test changes on a development instance first. Of course, small changes can be performed using the WebUI's edit mode, for larger changes or to make the changes more reproducible and document them, a programmatic approach is usually desirable, either via the above Schema specifications or a custom script. ## Summary ### Start Small When initially creating a new Schema, it is important to keep in mind that the designing of a Schema is an iterative process. In case of uncertainties, it is usually a good idea to just start with a generic model and refine it while already using it to manage data in LinkAhead -- after all, that's what LinkAhead's flexibility is for. :::{figure} /.assets/images/tutorials/introduction/datamodel_and_schema/datamodel_sketch.jpg :align: center :alt: A photo of a simple sketch of a Schema, handdrawn on paper. :width: 80% A simple sketch of a Schema may be sufficient for the beginning. ::: ### Define Your Goals Before defining a Schema, there are a few questions that you should ask yourself, your work group, the members of your institution, and other people who will be using LinkAhead. - What (meta-)data is recorded? What might be added in the future? There is no need to over-engineer a Schema to cover cases that will never come to pass. - What queries might users want results for? What properties of the data might be used to filter results? The Schema needs to cover all types and properties necessary for these queries. - How will different entries be related? While it doesn't need to be complete, drawing (a sketch of) an entity relationship diagram might be helpful to see which RecordTypes will be needed, how they should reference each other, and how they should inherit from each other.