-
Notifications
You must be signed in to change notification settings - Fork 2
Processing sensitive data #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
svedziok
wants to merge
4
commits into
main
Choose a base branch
from
processing_sensitive_data
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 3 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# Processing Sensitive Data | ||
|
||
Processing sensitive human data is fundamental to biomedical research, enabling breakthroughs in disease understanding, biomarker detection and treatment development. | ||
Rapid and secure access to such data accelerates research, but also introduces significant responsibilities for data protection and privacy. | ||
Cloud-based services are increasingly used in biomedical research to connect researchers, data, and tools throughout the data lifecycle. | ||
This page summarizes scenarios and requirements for handling sensitive data within the ELIXIR-on-Cloud framework. | ||
|
||
## Legal Frameworks | ||
|
||
Sensitive data processing in research is governed by several legal frameworks, most notably the General Data Protection Regulation (GDPR) and the European Health Data Space (EHDS): | ||
|
||
* **GDPR**: Allows the use of sensitive personal data for research when specific safeguards are in place. Processing is permitted in the public interest, provided measures such as data minimization, pseudonymization, and strict access controls are implemented. Explicit informed consent is often required, and a Data Protection Impact Assessment (DPIA) is strongly recommended. | ||
* **EHDS**: Builds on GDPR by establishing a unified framework for secure sharing and secondary use of electronic health data across the EU. Under EHDS, sensitive health data (e.g., genetic or clinical records) can be reused for research, innovation, and policy-making if anonymized or pseudonymized and accessed through secure processing environments (SPEs). | ||
|
||
## Environments | ||
|
||
* A Trusted Execution Environment (TEE) is a secure and isolated area within a computer system or processor that ensures the confidentiality and integrity of code and data during execution. It aims to protect sensitive computations and data from potential threats, such as malware or unauthorized access. | ||
* A Secure Processing Environment (SPE) is a controlled environment designed to facilitate secure data processing and analysis while maintaining confidentiality, integrity, and privacy. It focuses on secure processing techniques, often including encryption, secure computation, or secure enclaves, to protect data during computation. | ||
* A Trusted Research Environment (TRE) is a secure and controlled environment specifically tailored for research purposes, providing secure data access, analysis, collaboration, and compliance with legal and ethical requirements. TREs emphasize privacy preservation, data governance, collaboration, and knowledge generation while protecting sensitive data. | ||
|
||
### Similarities | ||
|
||
* **Isolation**: Operates separately from the main platform it runs on. | ||
* **Security**: Provides a secure environment for computations and data storage, including cryptographic key management and protection against malware. | ||
* **Integrity**: Ensures the integrity of data and code within the environment. | ||
* **Confidentiality**: Maintain confidentiality of sensitive information if the environment is compromised. | ||
* **Controlled Access and Authentication**: Authenticates code and data before execution to ensure only trusted and verified code runs. | ||
* **Collaboration and Analysis**: There is an offer of tools and infrastructure that enable researchers to perform analysis and collaborate within a secure environment. This allows for sharing and combining datasets while maintaining data privacy. | ||
|
||
### Differences | ||
|
||
* **Focus**: | ||
* TEE: Secures startup, code and data during execution. | ||
* SPE: Ensures secure data processing and computation. | ||
* TRE: Provides a comprehensive and secure environment for research activities, including data access and compliance. | ||
* **Data Handling**: | ||
* TEE: Focuses on securing the execution of code and data. | ||
* SPE: Involves secure data processing and temporary storage for processing purposes. | ||
* TRE: Covers secure data storage, access controls, and privacy-preserving methods. | ||
* **Application Context**: | ||
* TEE: Used in secure mobile device environments and secure cloud computing. | ||
* SPE: Applied in secure data analytics and cryptographic computations. | ||
* TRE: Tailored for research involving sensitive datasets like healthcare research. | ||
|
||
## Use Cases | ||
|
||
Researchers may require access to sensitive data in different scenarios. | ||
The four use cases are derived from the two dimensions of data storage and data processing. | ||
Research data can be stored in a single location or in multiple locations and institutions. | ||
We also distinguish between whether the data should be processed in the cloud or in the knowledge worker's own environment. | ||
|
||
| | Local processing | Cloud processing | | ||
| ------------------ | ------------------- | -------------------- | | ||
| **Central data** | Data repository | Cloud platform | | ||
| **Federated data** | Federated database | Federated processing | | ||
|
||
* **Data repository**: Data is stored in a single database. Researchers request access, are authorized, and transfer encrypted data to their secure environment for analysis. | ||
* **Federated database**: Data is distributed across multiple nodes. Metadata is accessible via APIs and a central portal. Researchers request access to datasets at individual nodes, which then provide data for transfer or processing. | ||
* **Cloud platform**: Centralized sensitive data is hosted on a platform. Authorized users log in and analyze data directly within an SPE, using workflows or interactive tools. | ||
* **Federated processing**: Sensitive data remains on separate nodes with restricted transfer. Analysis is performed via APIs in an SPE, often combining results from multiple sources. A special case of this is federated learning, where models are trained in several iterations and updated on different data sets. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# ELIXIR-on-Cloud documentation | ||
|
||
The ELIXIR-on-Cloud project is an initiative from the ELIXIR Compute Platform. | ||
Our goal is to support scientists across Europe in using cloud environments for their research activities. | ||
We support the use of ELIXIR services as well as open-source software, and the project has close connections with various academic cloud providers. | ||
One of our key focuses is developing and providing software that implements the specifications defined by the Global Alliance for Genomics and Health (GA4GH) for federated processing of workloads ([GA4GH Cloud Work Stream](https://www.ga4gh.org/work_stream/cloud/)). | ||
This documentation here offers guidance and best practices on how to use the services, further develop our services, and deploy services within the ELIXIR-on-Cloud Framework. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.