Glossary

Glossary of Terms

This a glossary of terms which relate to the techonology stack and general research environment in which this technology stack will be used.

Actors

  • User Manager: This is the manager of the Keycloak server. They deal with user accounts for all services.
  • TRE Manager: This is the manager of a TRE server. They deal with project and user approvals, as well as connecting to the Egress server.
  • Submission Manager: This is the manager of the submission layer. They create projects and people, and assign people to projects.
  • Egress Manager: This is the manager of the egress layer. They deal with approving data release from TREs.
  • Researcher: This is the researcher using the service. They will be registered to use the submission layer, and they may be associated with one or more projects. They will be submitting tasks to be run in the TREs.

TRE-FX Services

  • Egress Layer: The egress layer serves as a temporary holding space for data returned from the TRE. Once the Egress Manager has authorised it for release, the data is sent to the Submission Layer.
  • HUTCH: This is an open-source toolset designed to facilitate federated activities - such as analytics, data discovery and machine learning within secure environments.
  • Submission Layer: The submission layer serves as the initial point of contact for incoming analytical tasks, responsible for receiving user-submitted tasks, performing preliminary validation and queuing the tasks for further processing.
  • TRE Agent Layer: This is the outer layer of the TRE, responsible for task scheduling, data coordination and messaging between the different layers of the TRE.
  • TRE-UI: This is the user interface that offers an integrated view of ongoing tasks, logs and system metrics.
  • TRINO: This is a distributed SQL query engine designed to query large data sets distributed over one or more heterogeneous data sources.
  • Wizard UI: A user interface that guides users through a series of steps to complete, allowing the user to break down complex processes into small steps to achieve a goal.

Trusted Research Environment / Health Research Terms

  • Common Data Model (CDM): A CDM provides a standardised data model, enabling data and information exchange between different applications.
  • Data Anonymisation: This is the removal or protection of any identifiable personal information from datasets.
  • Data Provenance: This includes the origin of data, how it was created and processed, who was involved in this, and when it was made and modified.
  • Dataset: A collection of data, often structured in columns and rows (tabular form). Datasets can differ in sizes from a few records to billon of data points.
  • Egress: This refers to data leaving an environment or a network system.
  • FAIR: A set of principles aimed at improving the management, sharing, and reuse of data. FAIR aims to ensure that data and metadata are Findable, Accessible, Interoperable, and Reusable, for humans and machines. FAIR data can be open, but it can also have controlled or restricted access.
  • Federated Activities (Trusted Research Environment): This a network of TRE institutions that collaborate among themselves without compromising data privacy and security.
  • Federated Query: This involves getting data from multiple databases as if it were one single database.
  • Ingress: This refers to data entering a system or a network.
  • Metadata: Important information associated with the data. It details the origin, version, authors, source data, and other information about the data set.
  • Observational Medical Outcomes Partnership (OMOP): This CDM is an open community data standard, designed to standardize the structure and content of observational data.
  • Payload: In the context of TRE-FX, payload is data sent from client and server in API requests and responses.
  • Project: A project in this context is an approved piece of work submitted for analysis in a Trusted Research Environment. This process involves going through the required governance guidelines.
  • Project Bucket: A project bucket is the list of all the tasks that have been submitted towards a project on TRE-FX.
  • Research Data: This refers to information that has been collected, analysed and generated during the process of a research project or scientific investigation. Research data can be collected via processes such as interviews, survey results, clinical trial outcomes, or chemical analysis.
  • Research Object Crate (RO-Crate): A transport format to bundle metadata for submission along with required input parameters and everything implementation needs to know in order to perform the requested analysis. It is a methodology of packaging research data, with it’s associated metedata and it’s components and files.
  • Secure Data Environment (SDE): A platform where sensitive health and care data is stored securely which can be used for research and analysis.
  • Task: A request submitted by a submission layer User as part of a project, for some analysis or computation within one or more Trusted Research Environments.
  • Trusted Research Environment (TRE): This is a secure digital environment that has valuable de-identified health and care data that can only be accessed by approved researchers.

Technology

  • Application Programming Interface (API)): This comprises of guidelines, and protocols that enable software applications to interact with each other by specifying the techniques and data structures for requesting and sharing information.
  • Application Stack: This is a collection of software technologies and tools that collaborate to develop and operate an application efficiently and effectively.
  • Database: This is a collection of data that is organised, stored and managed electronically.
  • Database Credentials: These are required pieces of information that allows an individual or an application to access a database. This usually comprises of a Username (the identifier for the user of the application), Password (A secret key known to the user/users associated with the username to authenticate access).
  • Docker Compose: This is a tool for defining and running multi container applications. It allows user to launch an entire application stack using the configuration YAML file.
  • Graphical User Interface (GUI): This is a user interface where the interaction is with the device or computer is done by the use of visual elements like icons, windows, menus and buttons.
  • Kubernetes (K8s): An open-source platform that automates the deployment, scaling, and management of containerised applications.
  • MinIO: A high-performance, software-defined object storage system compatible with Amazon S3.
  • Open source: Software with source code that can be accessed, modified, and improved through open collaboration. Open-source software is typically released under licenses that define usage, distribution, and modification rights.
  • PostgreSQL: A database which uses tables (row and columns to store data). This is used for storing essential data. I.e. which users have been authorised to run analyses against which projects, task statuses.
  • Repository: A storage location where project assets such as requirements, code, files are managed and stored. It assists in maintaining a structured development process.
  • Representational State Transfer API (REST API): An architectural style for designing networked applications. REST APIs use HTTP requests to perform standard operations such as creating, reading, updating, or deleting (CRUD) resources. An example is a service requesting data from another web service.
  • Workflow Agnostic: This is a tool or technology that is meant to be compatible with processes, workflows without restriction to a single method or approach.