Run an Analysis
This guide walks you through the complete process of running a federated analysis using the Five Safes TES weave. It covers setting up the environment, configuring connections to TREs (Trusted Research Environments), submitting analysis jobs, and retrieving aggregated results.
It demonstrates running a basic statistical analysis (mean calculation) on measurement values from OMOP Concept “History Source” across multiple TREs.
Requirements
Software Prerequisites
- Python (3.10 or later)
- Poetry (optional, 1.8.0 recommended)
Infrastructure Prerequisites
- Deployed Five Safes TES or one Submission Layer and at least one TRE Agent with Funnel connected to the agent.
- OMOP CDM database connected to each TRE. You can use
omop-lite
for a quickstart synthetic OMOP CDM.
Information Prerequisites
- Submission Layer endpoint and API Token
- Submission Layer MinIO endpoint.
- Database host and credentials for each TRE
It is assumed that the complete Five Safes TES stack is deployed, and you have all the required information.
Setup
Clone the repository
Clone the repository from here: https://github.com/Health-Informatics-UoN/Five-Safes-TES-Analytics
git clone https://github.com/Health-Informatics-UoN/Five-Safes-TES-Analytics.git
Install dependencies
Using Poetry is recommended. Run the command:
poetry install
Alternatively, you can use pip with the requirements file:
pip install -r requirements.txt
Edit the env.example
Edit the env.example
file to update the environment variables. The example file has placeholders for all the relevant details. They are all required.
# TRE-FX Analytics Environment Configuration
# Copy this file to .env and update with your actual values
# ALL VARIABLES BELOW ARE REQUIRED - the application will fail to start without them
# Authentication
TRE_FX_TOKEN=your_jwt_token_here
TRE_FX_PROJECT=your_project_name
# TES (Task Execution Service) Configuration
TES_BASE_URL=http://your-tes-endpoint:5034/ # Host and Port of the Submission Layer API
TES_DOCKER_IMAGE=harbor.ukserp.ac.uk/dare-trefx/control-tre-sqlpg@sha256:18a8d3b056fd573ec199523fc333c691cd4e7e90ff1c43e59be8314066e1313c
# Database Configuration
DB_HOST=your-database-host
DB_PORT=5432
DB_USERNAME=your-database-username
DB_PASSWORD=your-database-password
DB_NAME=your-database-name
# MinIO Configuration
MINIO_STS_ENDPOINT=http://your-minio-endpoint:9000/sts
MINIO_ENDPOINT=your-minio-endpoint:9000
MINIO_OUTPUT_BUCKET=your-output-bucket-name
This demo requires an SQL Docker container that will be used to run the analysis and the required container image is already set in the env.example
file, at the TES_DOCKER_IMAGE
variable.
There is an known issue running this image by Docker on certain machines/configurations (ex. ARM64). We are currently working on a fix for this.
Rename the env.example
to .env
Put the Access Token into the ‘.env’ file.
Paste the Access Token into the .env
file under TRE_FX_TOKEN
and save the file.
Run an analysis
This runs the basic default demo, which will calculate means of measurement values with a particular OMOP Concept: “Airway resistance” 21490742
.
Run analysis_engine.py
.
Using poetry, the command is:
poetry run python analysis_engine.py
Review submission details.
The terminal will give updates on the submission status from the Python script. The Submission GUI will give more details under the submissions tab.
Wait.
Check the status
When the processing is complete, the status in the Submission layer will change to waiting for egress. This means that the analysis has been processed and needs to be approved before the results can leave the TREs.
Approve/deny egress requests
Acting as the TREs, access the Egress control(s) and approve (or deny) the egress.
The default behaviour is to complete the analysis with the results given, even if one or more TREs don’t provide results. Once they have been approved, the status in both the submission layer GUI and the terminal will be updated the next time it polls.
Fetch partial results
The partial results from each TRE will be fetched automatically.
Aggregate results
The partial results will be aggregated and return the final result to the terminal.
Next steps
The next step is to run a different analysis on a different subset of data.
The general way to use the tool is to use it in a Python environment rather than running from the terminal.
The data selection is done with an SQL query. This is simply to select a subset of data to run the analysis on. Change the user query to select the data you want to run the analysis on.
Supported analysis types are currently mean
, variance
, PMCC
, chi_squared_scipy
and chi_squared_manual
.
Once the analysis is completed, the aggregated data is stored in the engine, and the analysis, or related analyses, can be repeated without further queries to the TREs.
import analysis_engine
engine = analysis_engine.AnalysisEngine()
# Example
user_query = """SELECT value_as_number FROM public.measurement
WHERE measurement_concept_id = 3037532
AND value_as_number IS NOT NULL"""
print("Running mean analysis...")
mean_result = engine.run_analysis(
analysis_type="mean",
task_name="DEMO: mean analysis test",
user_query=user_query,
tres=["Nottingham", "Nottingham 2"]
)
print(f"Mean analysis result: {mean_result['result']}")
# Show what aggregated data we have stored
print(f"Stored aggregated data: {engine.aggregated_data}")