--- last_review: "2025-01-01" last_reviewer: "-" documented_code: [ ] --- ```{tags} how-to ``` # The eLab Crawler :::{note} This new documentation page has not yet been fully reviewed and may be incomplete. ::: % TODO: Issue: https://gitlab.indiscale.com/caosdb/src/linkahead-docs/-/issues/81 Now that you have an eLab instance to retrieve data from and a LinkAhead instance to save it to, you can test the eLab {term}`crawler `. The commands in this guide are written for the previously described testbed setup and may have to be modified to fit your setup, if you have pre-existing instances. :::{warning} Currently the information crawled from eLabFTW does not include access control, which means that all eLab data which is visible to the user who created the ELab token will be synced to LinkAhead, and can be accessed there potentially by all users (depending on rights setting). ::: ## Prerequisites As in the previous tutorial, git is needed to retrieve the eLab crawler repository, unless you wish to download the repository archive from GitLab manually. Additionally, python must be available to execute the crawler scripts. ### Existing eLab and LinkAhead instances If you are not using the previously described testbed setup, but differently configured eLab or LinkAhead instances,there are some additional prerequisites that need to be fulfilled: - The directory eLab saves uploaded files to must be configured as "elab" in LinkAheads extroot. - For the default location /var/elabftw/web, this means the following must be present in the LinkAhead profile.yml: ```yaml # Paths to be mounted into Docker, all entries are optional. paths: # extroot: From where files are copied/symlinked. This is a # list of `NAME: PATH` pairs or a single path. extroot: "elab": "/var/elabftw/web" ``` Adjust accordingly if your eLab instance is configured to use a different directory. - Additionally, LinkAhead must use the same user to access the files as eLab uses to save them. Per default, eLabFTW uses the user with the id `101`, the corresponding profile.yml entry would be: ```yaml # Paths to be mounted into Docker, all entries are optional. conf: # User/Group of the server, either numeric or names. user_group: 101:101 ``` Same as above, if your eLab instance is configured differently, the user and group must be adapted to match. ## Initial setup The eLab crawler configuration can be found in the [ELabFTW Cfood repository](https://gitlab.com/linkahead/crawler-extensions/elabftw-cfood). In this example, we will clone the crawler configuration into the same directory as the linkahead-control repository above. If you already have a LinkAhead instance or if you wish to clone the eLab crawler elsewhere, you may need to adjust some of the paths below. ### Cloning the repository Assuming your terminal is still in the linkahead-control folder, clone elabftw-cfood into the directory above with ```bash cd .. git clone https://gitlab.com/linkahead/crawler-extensions/elabftw-cfood.git ``` ### Starting and configuring LinkAhead If your LinkAhead instance is still running now, you may want to stop it. Afterward, start LinkAhead with the eLab-crawler profile using ```bash ./linkahead-control/linkahead -p ./elabftw-cfood/profile/profile.yml start ``` Then install the eLab crawler module into a new (optional) python virtual environment with ```bash cd elabftw-cfood/ # Optional: Create and activate venv python -m venv .venv source .venv/bin/activate # Install module pip install . pip install -r requirements.txt ``` and add the data model needed to mirror the eLab object structure in LinkAhead by running ```bash python insert_model.py ``` The data model should then be available on the [LinkAhead entity overview](https://localhost:10443/Entity/?query=find%20Entity&P=20L10). ## Crawling eLab If your eLab instance is not running, please start it now. ### Configuration To crawl eLab, you need to allow the crawler access to the eLab {term}`API` by supplying an API token. In the `elabftw-cfood` directory, create a new file named `.env` with the content `export ELAB_CRAWLER_TOKEN=your-token`. Then create an API token on [this page](https://localhost/ucp.php?tab=3) and replace `your-token` in the file created before with that token. If you wish to synchronize eLabFTW and LinkAhead bi-directionally (see chapter below), the created token must have write permissions. If you are not using a local eLabFTW, you can instead create the token by manually navigating to the admin panel of your instance, or by replacing the host in the link above. To make the content of the file available to the crawler, export the token using ```bash source .env ``` The eLab crawl script settings default to values that fit the test configurations of eLab and LinkAhead described above. If your configuration differs from this, you can change its behaviour using environment variables: - `ELAB_CRAWLER_TOKEN` specifies the token to use to connect to the eLab API - `ELAB_CRAWLER_HOST` specifies the URL eLab can be reached at, defaults to "https://localhost/" - `ELAB_CRAWLER_TEAM_ID` allows to change which eLab-Teams data is crawled, defaults to "\[0\]" If you wish to overwrite any of these defaults, add another line to the `.env` file created above, change ELAB_CRAWLER_TOKEN to the new variable, and insert your desired value. You must then `source` the file again. ### The crawl script If you have not done so before, now you need to create some test data in eLab. You can create experiments from the [dashboard](https://localhost/dashboard.php), as well as resources once you have created a resource category. Other elements, such as new status options, tags, categories and users can be created in the [admin panel](https://localhost/admin.php?tab=4#itemsCategoriesAnchor). Once you have created enough test data, run ```bash python crawl.py ``` and {term}`records ` for users, files, experiments and other content in eLabFTW will be automatically created in LinkAhead. You can check the results on the [records page in LinkAhead](https://localhost:10443/Entity/?query=find%20Record&P=0L100). ## Synchronizing data from LinkAhead to eLab It is also possible to synchronize new items or changes from LinkAhead back to eLabFTW. ### Prerequisites The synchronization script utilizes the same environment variables as the crawl script. As it writes data retrieved from LinkAhead to eLab, the `ELAB_CRAWLER_TOKEN` variable must contain a token with write permission. ### Usage To synchronize new data from LinkAhead to eLab, the sync_to_elab.py script is used. It can be called from the command line to sync a single LinkAhead record with `sync_to_elab.py none `, or the contained `sync_to_elab` function can be called directly to synchronize a list of ids. Please note that there are some limitations: - Only a subset of the {term}`properties ` that can be imported to LinkAhead from eLab can be synchronized back. Changes to list-typed properties cannot currently be imported back into eLab. - The sync_to_elab script references some LinkAhead Property and {term}`RecordType` names directly. Any properties or recordtypes changed in the data model may have to also be updated in the script. - The crawl script may remove certain HTML tag types (f.e. script tags) from main text imported from eLab if not configured differently. If the record is then synced back to eLab, these changes are also applied there. - LinkAhead records are mapped to eLab objects based on eLabInstance and externalID. Should these be set or changed manually, this may lead to unexpected or unintended changes in eLab or LinkAhead. ## Continued Use ### Re-syncing eLab and LinkAhead If you then make further changes in eLabFTW, you can sync those changes from the elabftw-cfood directory with ```bash source .env python crawl.py ``` ### Stopping and starting the testbed Once you are done with the testbed, you can stop both LinkAhead and eLab with ```bash cd .. docker compose -f compose_elab.yml down ./linkahead-control/linkahead -p ./elabftw-cfood/profile/profile.yml stop ``` Please remember that your LinkAhead data will be wiped by a restart unless configured differently, while eLab data is persistent. To re-start the testbed, run ```bash docker compose -f compose_elab.yml up -d ./linkahead-control/linkahead -p ./elabftw-cfood/profile/profile.yml start cd elabftw-cfood/ source .venv/bin/activate python insert_model.py ``` and then you can re-sync your data from eLab to the LinkAhead test instance using `crawler.py`.