Run an Analysis

Requirements

Software requirements

  • Python (3.10 or later)
  • Poetry (optional, 1.8.0 recommended)

Infrastructure requirements

  • Fully deployed Five Safes TES stack
  • At least one TRE
  • OMOP CDM database connected to each TRE (the default demo expects an OMOP database)

Information requirements

  • Project name
  • TES endpoint
  • Minio endpoint
  • Database endpoint and credentials for each TRE
  • TRE access
  • Submission layer endpoint and credentials

It is assumed that the full five safes TES stack is already fully deployed, and you have all the required information to hand. You will also need to have the TREs set up and know how to get an access token, so it’s strongly recommended that you go through the environment verification first.

Setup

Clone the repository

Clone the repository from here: https://github.com/Health-Informatics-UoN/Five-Safes-TES-Analytics

Install dependencies

Using poetry is recommended. Run the command poetry install Alternatively, you can use pip with the requirements file: pip install -r requirements.txt

Edit the env.example

Edit the ‘env.example’ file to update the environment variables. The example file has placeholders for all the relevant details. They are all required.

This demo requires an SQL docker image. In the env.example file, set the TES_DOCKER_IMAGE: TES_DOCKER_IMAGE=harbor.ukserp.ac.uk/dare-trefx/control-tre-sqlpg@sha256:18a8d3b056fd573ec199523fc333c691cd4e7e90ff1c43e59be8314066e1313c

Rename the env.example to .env

Log in to the submission layer GUI.

Retrieve a token

The token will expire, so it has a limited lifetime! Click ‘API Access Token’ on the top, then renew and copy the token to clipboard.

Put the token into the ‘.env’ file.

Paste the token into the .env file and save the file.

Run an analysis

This runs the basic default demo, which will calculate means of measurement values with a particular concept id (3037532).

Run analysis_engine.py.

Using poetry, the command is: poetry run python analysis_engine.py

Review submission details.

The terminal will give updates on the submission status from the python script. The submission GUI will give more details under the submissions tab.

Wait.

Check the status

When the processing is complete, the status in the submission layer will change to waiting for egress. This means that the analysis has been processed and needs to be approved before the results can leave the TREs.

Approve/deny egress requests

Acting as the TREs, log into the egress control(s) and approve (or deny) the egress. The default behaviour is to complete the analysis with the results given, even if one or more TREs don’t provide results. Once they have been approved, the status in both the submission layer GUI and the terminal will be updated the next time it polls.

Fetch partial results

The partial results from each TRE will be fetched automatically.

Aggregate results

The partial results will be aggregated and return the final result to the terminal.

Next steps

The next step is to run a different analysis on a different subset of data.

The general way to use the tool is to use it in a python environment rather than running from the terminal.

The data selection is done with an SQL query. This is simply to select a subset of data to run the analysis on. Change the user query to select the data you want to run the analysis on.

Supported analysis types are currently “mean”, “variance”, “PMCC”, “chi_squared_scipy” and “chi_squared_manual”.

Once the analysis is completed, the aggregated data is stored in the engine, and the analysis, or related analyses, can be repeated without further queries to the TREs.

import analysis_engine

engine = analysis_engine.AnalysisEngine()

# Example
user_query = """SELECT value_as_number FROM public.measurement
WHERE measurement_concept_id = 3037532
AND value_as_number IS NOT NULL"""

print("Running mean analysis...")

mean_result = engine.run_analysis(
analysis_type="mean",
task_name="DEMO: mean analysis test",
user_query=user_query,
tres=["Nottingham", "Nottingham 2"]
)

print(f"Mean analysis result: {mean_result['result']}")
# Show what aggregated data we have stored
print(f"Stored aggregated data: {engine.aggregated_data}")