Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The installation via Docker Compose follows the approach – one Semantic Turkey and one GraphDB dedicated, respectively, to ShowVoc and VocBench. This means that the two systems have separate projects. On the contrary, in the deployment we previously had (made and managed by Andrea), VocBench and ShowVoc use the same Semantic Turkey and the same GraphDB, with the consequence that to publish a project it was enough to add the user showvoc-public (but on the project dashboard one still need to create/update the various indexes).

...


Overview

The picture below summaries the current data flow (read the graphic by starting from VocBench, in the middle). 

View file
nameDataFlow.pptx
height250

Backup

Backups of the dockerised caliper system can be made by simply copying the data folders. From the root of the caliper system folder (this is the caliper folder of the git repository) these data folders are:

  • volumes - this is the data folder associated to Fuseki
  • showvoc-docker/volumes - this data folder contains both the data for ShowVoc and VocBench

Backups can be done by running the backup_volumes.sh script in the scripts folder of the respository. You can edit the backup script accordingly as the script is highly documented. You only have to update the following specific parts:

  1. Google Cloud Storage (GCS) bucket that you have access to and want to use. (Line 23). In the case of the test instance this value is fao-datalab-backups.
  2. The full system directory where the caliper instance is installed, in the case of the datalab caliper internal test instance, this is /opt/caliper_internal_test/caliper. (Line 20)
  3. GCS storage directory. (Line 51). In this case …/caliper/internal/test

The backup script archives both data directories and uploads the archives to Google Cloud Storage. The archives follow these conventions:

  1. Fuseki - fuseki_volume_<BACKUP_TIMESTAMP>.tar.gz
  2. ShowVoc/VocBench - vocs_volume_<BACKUP_TIMESTAMP>.tar.gz

Restore

Restores can be done by running the restore_volumes.sh script in the scripts folder of the git repository. You can edit the restore script accordingly as the script is highly documented. You only have to update the following specific parts:

  1. Ensure that the docker services are not currently running
  2. The caliper root folder. In the case of the test instance, it’s /opt/caliper_internal_test/caliper (Line 19)
  3. The Google Cloud Storage (GCS) bucket. In the case of the test instance this value is fao-datalab-backups (Line 20)
  4. Backup timestamp (Line 25). This is the timestamp of the backup you want to use from the GCS bucket.
  5. The GCS storage directory (Lines 34 and 62). In this case it’s …/caliper/internal/test


The restore_volumes.sh script, deletes the existing data folders if any, downloads the backup files from GCS and extracts them to the relevant data folder paths.Getting data in VocBench.
So far, this is done by CC via GUI.

Bulk load. Not implemented, but useful in case of recovery or to populate a new instance (e.g., FAO prod, instances at BC3). TO be DONE. 

Folder with all files (all classifications, all formats), used by CC and CM: https://console.cloud.google.com/storage/browser/fao-datalab-caliper;tab=objects?authuser=1&prefix=&forceOnObjectsSortingFiltering=false


Getting data in ShowVoc,
Now, CC does it manually, from GUI (it consists of "deployng" VB projects into SV – see section below in this page). Craig suggests that it could be done via script too. TODO later. 


From VB/GraphDB to Fuseki:

So far, done by CM via command line (=one project at a time). A script for automatic export from VB → import in Fuseki was passed on by Andrea UNITOV, adapted by CM (schedule, 12:00, 19:00). Data is now loaded in Fuseki, but in a somewhat problematic way, as it is now, the the queries developed so far need to be adjusted in order to keep working. CC contacted UNITOV for advise on how to fix this. Craig will implement. 


From VocBench to ShowVoc

To make a VB project visible in ShowVoc, that must be “deployed”. To do that,

...

Important. When you deploy to an existing project, you still need to go to the ShowVoc admin dashboard and update indexes and recompute the metadata.




How to simplify the process described above. [Instructions not tested yet]

In VocBench, it is possible to create a super user account and save its credential as default. In this way, when you submit a new dataset from VocBench to ShowVoc, you will no longer prompted to insert your credential. The same solution can be applied to the configuration of the triple store when deploying new projects. This can be done in the section Administration in VocBench.

...

Unfortunately, the name of the project needs to be inserted every time. In order to avoid mistakes, it is advisable to use the same project name for VocBench and for ShowVoc.

 

From VocBench to Fuseki


Data can be moved from VocBench to Fuseki programmatically using the scripts in the scripts/data-flow folder of the https://bitbucket.org/cioapps/fao-datalab-caliper-docker repository. Once you have cloned the repository, you simply have to run the following command with the appropriate parameters to move data from VocBench to Fuseki

...

<Craig>.


The arguments for the vocbench_to_fuseki.py script are as follows:

  • --vocbench_url (type=str), default="http://localhost:8080/caliper/edit/vocbench3/" - This is the VocBench3 url associated to your instance.
  • --email (type=str) - This is the VocBench login email address.
  • --password (type=str) - This is the VocBench login password.
  • --project_names (type=str), default="all" - This is the comma separated names of the projects to export. Project name is case sensitive. Use "all" to export and import all project excluding those with "test" or "staging" in their name.
  • --remove_rdfs (type=bool), default=False - Whether to remove the exported rdf files after import.
  • --fuseki_bin (type=str), default="fuseki-bin" - This is the /bin folder for Fuseki ruby scripts.
  • --fuseki_url (type=str), default="http://localhost:3030/ds" - help='This is the Fuseki url.


In most cases you can leave the parameters as default if running from the same machine as the Caliper instance.


Code Block
languagebash
themeMidnight
linenumberstrue
# from the data-flow folder run the following command;  
python3 vocbench_to_fuseki.py --email '<VOCBENCH_EMAIL>' --password '<VOCBENCH_PASSWORD>' --project_names 'all' --remove_rdfs True

# for the VOCBENCH_PASSWORD especially, when running from a bash terminal, it's best to encase the password in single quotes as special characters such as @ and & in your password can cause it to fail;