Tags: how-to

The eLab Crawler#

Note

This new documentation page has not yet been fully reviewed and may be incomplete.

Now that you have an eLab instance to retrieve data from and a LinkAhead instance to save it to, you can test the eLab crawler. The commands in this guide are written for the previously described testbed setup and may have to be modified to fit your setup, if you have pre-existing instances.

Warning

Currently the information crawled from eLabFTW does not include access control, which means that all eLab data which is visible to the user who created the ELab token will be synced to LinkAhead, and can be accessed there potentially by all users (depending on rights setting).

Prerequisites#

As in the previous tutorial, git is needed to retrieve the eLab crawler repository, unless you wish to download the repository archive from GitLab manually. Additionally, python must be available to execute the crawler scripts.

Existing eLab and LinkAhead instances#

If you are not using the previously described testbed setup, but differently configured eLab or LinkAhead instances,there are some additional prerequisites that need to be fulfilled:

  • The directory eLab saves uploaded files to must be configured as “elab” in LinkAheads extroot.

  • For the default location /var/elabftw/web, this means the following must be present in the LinkAhead profile.yml:

      # Paths to be mounted into Docker, all entries are optional.
      paths:
        # extroot: From where files are copied/symlinked.  This is a
        # list of `NAME: PATH` pairs or a single path.
        extroot:
          "elab": "/var/elabftw/web"
    

    Adjust accordingly if your eLab instance is configured to use a different directory.

  • Additionally, LinkAhead must use the same user to access the files as eLab uses to save them. Per default, eLabFTW uses the user with the id 101, the corresponding profile.yml entry would be:

      # Paths to be mounted into Docker, all entries are optional.
      conf:
        # User/Group of the server, either numeric or names.
        user_group: 101:101
    

    Same as above, if your eLab instance is configured differently, the user and group must be adapted to match.

Initial setup#

The eLab crawler configuration can be found in the ELabFTW Cfood repository. In this example, we will clone the crawler configuration into the same directory as the linkahead-control repository above. If you already have a LinkAhead instance or if you wish to clone the eLab crawler elsewhere, you may need to adjust some of the paths below.

Cloning the repository#

Assuming your terminal is still in the linkahead-control folder, clone elabftw-cfood into the directory above with

cd ..
git clone https://gitlab.com/linkahead/crawler-extensions/elabftw-cfood.git

Starting and configuring LinkAhead#

If your LinkAhead instance is still running now, you may want to stop it. Afterward, start LinkAhead with the eLab-crawler profile using

./linkahead-control/linkahead -p ./elabftw-cfood/profile/profile.yml start

Then install the eLab crawler module into a new (optional) python virtual environment with

cd elabftw-cfood/
# Optional: Create and activate venv
python -m venv .venv
source .venv/bin/activate
# Install module
pip install .
pip install -r requirements.txt

and add the data model needed to mirror the eLab object structure in LinkAhead by running

python insert_model.py

The data model should then be available on the LinkAhead entity overview.

Crawling eLab#

If your eLab instance is not running, please start it now.

Configuration#

To crawl eLab, you need to allow the crawler access to the eLab API by supplying an API token. In the elabftw-cfood directory, create a new file named .env with the content export ELAB_CRAWLER_TOKEN=your-token. Then create an API token on this page and replace your-token in the file created before with that token. If you wish to synchronize eLabFTW and LinkAhead bi-directionally (see chapter below), the created token must have write permissions. If you are not using a local eLabFTW, you can instead create the token by manually navigating to the admin panel of your instance, or by replacing the host in the link above. To make the content of the file available to the crawler, export the token using

source .env

The eLab crawl script settings default to values that fit the test configurations of eLab and LinkAhead described above. If your configuration differs from this, you can change its behaviour using environment variables:

  • ELAB_CRAWLER_TOKEN specifies the token to use to connect to the eLab API

  • ELAB_CRAWLER_HOST specifies the URL eLab can be reached at, defaults to “https://localhost/”

  • ELAB_CRAWLER_TEAM_ID allows to change which eLab-Teams data is crawled, defaults to “[0]”

If you wish to overwrite any of these defaults, add another line to the .env file created above, change ELAB_CRAWLER_TOKEN to the new variable, and insert your desired value. You must then source the file again.

The crawl script#

If you have not done so before, now you need to create some test data in eLab. You can create experiments from the dashboard, as well as resources once you have created a resource category. Other elements, such as new status options, tags, categories and users can be created in the admin panel. Once you have created enough test data, run

python crawl.py

and records for users, files, experiments and other content in eLabFTW will be automatically created in LinkAhead. You can check the results on the records page in LinkAhead.

Synchronizing data from LinkAhead to eLab#

It is also possible to synchronize new items or changes from LinkAhead back to eLabFTW.

Prerequisites#

The synchronization script utilizes the same environment variables as the crawl script. As it writes data retrieved from LinkAhead to eLab, the ELAB_CRAWLER_TOKEN variable must contain a token with write permission.

Usage#

To synchronize new data from LinkAhead to eLab, the sync_to_elab.py script is used. It can be called from the command line to sync a single LinkAhead record with sync_to_elab.py none <linkahead_id>, or the contained sync_to_elab function can be called directly to synchronize a list of ids.

Please note that there are some limitations:

  • Only a subset of the properties that can be imported to LinkAhead from eLab can be synchronized back. Changes to list-typed properties cannot currently be imported back into eLab.

  • The sync_to_elab script references some LinkAhead Property and RecordType names directly. Any properties or recordtypes changed in the data model may have to also be updated in the script.

  • The crawl script may remove certain HTML tag types (f.e. script tags) from main text imported from eLab if not configured differently. If the record is then synced back to eLab, these changes are also applied there.

  • LinkAhead records are mapped to eLab objects based on eLabInstance and externalID. Should these be set or changed manually, this may lead to unexpected or unintended changes in eLab or LinkAhead.

Continued Use#

Re-syncing eLab and LinkAhead#

If you then make further changes in eLabFTW, you can sync those changes from the elabftw-cfood directory with

source .env
python crawl.py

Stopping and starting the testbed#

Once you are done with the testbed, you can stop both LinkAhead and eLab with

cd ..
docker compose -f compose_elab.yml down
./linkahead-control/linkahead -p ./elabftw-cfood/profile/profile.yml stop

Please remember that your LinkAhead data will be wiped by a restart unless configured differently, while eLab data is persistent.

To re-start the testbed, run

docker compose -f compose_elab.yml up -d
./linkahead-control/linkahead -p ./elabftw-cfood/profile/profile.yml start
cd elabftw-cfood/
source .venv/bin/activate
python insert_model.py

and then you can re-sync your data from eLab to the LinkAhead test instance using crawler.py.