Run an Analysis
Requirements
Software requirements
- Python (3.10 or later)
- Poetry (optional, 1.8.0 recommended)
Infrastructure requirements
- Fully deployed Five Safes TES stack
- At least one TRE
- OMOP CDM database connected to each TRE (the default demo expects an OMOP database)
Information requirements
- Project name
- TES endpoint
- Minio endpoint
- Database endpoint and credentials for each TRE
- TRE access
- Submission layer endpoint and credentials
It is assumed that the full five safes TES stack is already fully deployed, and you have all the required information to hand. You will also need to have the TREs set up and know how to get an access token, so it’s strongly recommended that you go through the environment verification first.
Setup
Clone the repository
Clone the repository from here: https://github.com/Health-Informatics-UoN/Five-Safes-TES-Analytics
Install dependencies
Using poetry is recommended. Run the command poetry install
Alternatively, you can use pip with the requirements file:
pip install -r requirements.txt
Edit the env.example
Edit the ‘env.example’ file to update the environment variables. The example file has placeholders for all the relevant details. They are all required.
This demo requires an SQL docker image. In the env.example
file, set the TES_DOCKER_IMAGE
:
TES_DOCKER_IMAGE=harbor.ukserp.ac.uk/dare-trefx/control-tre-sqlpg@sha256:18a8d3b056fd573ec199523fc333c691cd4e7e90ff1c43e59be8314066e1313c
Rename the env.example
to .env
Log in to the submission layer GUI.
Retrieve a token
The token will expire, so it has a limited lifetime! Click ‘API Access Token’ on the top, then renew and copy the token to clipboard.
Put the token into the ‘.env’ file.
Paste the token into the .env
file and save the file.
Run an analysis
This runs the basic default demo, which will calculate means of measurement values with a particular concept id (3037532
).
Run analysis_engine.py
.
Using poetry, the command is:
poetry run python analysis_engine.py
Review submission details.
The terminal will give updates on the submission status from the python script. The submission GUI will give more details under the submissions tab.
Wait.
Check the status
When the processing is complete, the status in the submission layer will change to waiting for egress. This means that the analysis has been processed and needs to be approved before the results can leave the TREs.
Approve/deny egress requests
Acting as the TREs, log into the egress control(s) and approve (or deny) the egress. The default behaviour is to complete the analysis with the results given, even if one or more TREs don’t provide results. Once they have been approved, the status in both the submission layer GUI and the terminal will be updated the next time it polls.
Fetch partial results
The partial results from each TRE will be fetched automatically.
Aggregate results
The partial results will be aggregated and return the final result to the terminal.
Next steps
The next step is to run a different analysis on a different subset of data.
The general way to use the tool is to use it in a python environment rather than running from the terminal.
The data selection is done with an SQL query. This is simply to select a subset of data to run the analysis on. Change the user query to select the data you want to run the analysis on.
Supported analysis types are currently “mean”, “variance”, “PMCC”, “chi_squared_scipy” and “chi_squared_manual”.
Once the analysis is completed, the aggregated data is stored in the engine, and the analysis, or related analyses, can be repeated without further queries to the TREs.
import analysis_engine
engine = analysis_engine.AnalysisEngine()
# Example
user_query = """SELECT value_as_number FROM public.measurement
WHERE measurement_concept_id = 3037532
AND value_as_number IS NOT NULL"""
print("Running mean analysis...")
mean_result = engine.run_analysis(
analysis_type="mean",
task_name="DEMO: mean analysis test",
user_query=user_query,
tres=["Nottingham", "Nottingham 2"]
)
print(f"Mean analysis result: {mean_result['result']}")
# Show what aggregated data we have stored
print(f"Stored aggregated data: {engine.aggregated_data}")