You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue presents a demo of the Attested-TLS (aTLS) protocol in action, utilising a Large Language Model (LLM) chat application as a practical use case. The primary goal is to demonstrate the functionalities and benefits of aTLS within a Confidential Computing environment, where hardware-based encryption ensures confidentiality and integrity protection for data and software.
This demo is intended for all ELIXIR BioHackathon participants to try, review, and gain insights into aTLS, which may later be leveraged through a loadable extension for the GA4GH-SDK/CLI. The ability to load custom extensions depends on the implementation of feat: add Extensions Manager.
aTLS Trust Model
The protocol is designed to allow independent client-side verification of:
Checksums of the entire software stack on the remote Virtual Machine (VM) including the VM firmware, virtual TPM (Trusted Platform Module), root file system, kernel and modules, Linux security policies (SELinux, IMA, EVM) and the application-level software services (e.g., ELIXIR TESK, OLLAMA inference engine, or any other software). Having a reference set of hash-sums (or 'checksums') of the trustworthy privacy-preserving software, this verification leads to a user's confidence that the data will be processed in the expected way, ensuring the user's privacy policy and purpose-limitation for their data.
Complete isolation of the VM's memory through hardware-based encryption. This ensures that any data a user might send to the VM will not be visible even to host administrators with physical access.
Upon successful verification, a user can infer that the system will persist untampered if the received hashes confirm the presence and enforcement of SELinux, IMA, and EVM policies, which will restrict any malicious behaviour on the system. In brief, these modules allow writing policies that can control every possible action on the system on the kernel level.
Please refer to the paper draft Universal Toolbox for Trust-Enhancing Technologies to learn more about the aTLS verification chain and trust model it delivers. You are welcome to leave comments or suggestions.
Objectives
Showcase aTLS: While the LLM chat serves as a practical use case, the primary objective is to demonstrate aTLS in action and explain the key verification steps it involves.
Enable Understanding for GA4GH-SDK/CLI Integration: This demo provides participants with insights into aTLS, preparing them for its potential integration into the GA4GH-SDK/CLI via an optional loadable extension.
Instructions:
Clone the Confidential AI Example repository on your machine (Mac/Linux, any CPU architecture):
Observe the initialisation of the connection with the provided nonce, a random value that ensures freshness and protects the protocol from replay attacks:
pablo-mbp:confidential-ai-example pavelnikonorov$ make chat
docker run --platform linux/amd64 -it confidential-ai-example python3 confai-promter.py
[2024-11-05T13:51:01Z INFO confido::atls] fn aTLS_connect: "http://api.genxt.ai:9000"
[2024-11-05T13:51:01Z DEBUG confido::atls] host: "http://api.genxt.ai:9000"
[2024-11-05T13:51:01Z INFO confido::atls] Connecting to http://api.genxt.ai:9000 to gather evidence, nonce: "2sV85XlR6RarfWjnYHvcRfTliflaQqHYCLmMOIchRsI="
You can see confido in the logs. It's a confidential computing middleware – a library that implements the aTLS protocol and is being developed by GENXT.
Observe the aTLS Evidence message, which comprises of:
TEE Report – CPU-signed measurements of the VM's boot-time state that can only be generated by software running within a TEE-based VM. The report includes details such as the CPU model, firmware version, and its unique hardware ID, as well as the measurements of the VM firmware, and additional data provided by the in-VM attestation agent, which, in this case, includes the vTPM Public Endorsement Key, while the private key is securely stored within the VM at VMPL0 (Virtual Machine Privilege Level, essentially a TEE within the TEE). This setup makes the VM's TPM an entirely independent software agent. Refer to OpenHCL to learn more about it.
TPM Quote – TPM-signed measurements of the run-time VM state extended to Platform Configuration Registers (PCRs) according to IMA security policies. In this example, the PCRs track only kernel modules, though a full log may contain up to twenty thousand hashes.
TPM Quote is signed by TPM using a client nonce concatenated with the TLS certificate. This combined nonce ensures that the aTLS certificate is securely linked to the aTLS Evidence message
Observe obtaining a VCEK (Versioned CPU Endorsement Key) certificate from the CPU vendor website by providing a CPU HW ID (taken from the TEE Report):
[2024-11-05T13:51:05Z DEBUG ureq::pool] adding stream to pool: https|kdsintf.amd.com|443 -> Stream(RustlsStream)
[2024-11-05T13:51:05Z DEBUG ureq::unit] response 200 to GET https://kdsintf.amd.com/vcek/v1/Genoa/2ba5395d906e01b34155c4dcfa440a744ec18afde6948810230fc05e20521d58935d3895ca2a52d87159daa06a178fdbbbb67eb8150627cfe4cd80e05df2e885?blSPL=08&teeSPL=00&snpSPL=16&ucodeSPL=68
The VCEK certificate is used to verify the genuiness of the TEE Report, and thereby establish trust in the remote CPU hardware and the fact that this aTLS Evidence was generated by software working inside isolation.
The boot_aggregate hash cryptographically includes the values of PCRs 0-9, each responsible for specific measurements:
PCR 0 – Core Root of Trust for Measurement (CRTM) and BIOS/UEFI firmware.
PCR 1 – Platform configuration and firmware extensions.
PCR 2 – Option ROMs (e.g., firmware on GPUs or network cards).
PCR 3 – Bootloader.
PCR 4 – Kernel and OS components.
PCR 5 – Additional OS configuration files or modules.
PCR 6 – Initial RAM disk (initrd/initramfs).
PCR 7 – System security policies and secure boot.
Note, that PCRs can be configured to work with SHA256 as well.
Observe the TPM PCR10 verification log against the obtained reference values. Here is the reduced example output log:
[2024-11-05T13:51:06Z INFO confido::trs] Calculating PCR10 value from IMA measurements
[2024-11-05T13:51:06Z DEBUG confido::trs] boot_aggregate boot_aggregate Match
[2024-11-05T13:51:06Z DEBUG confido::trs] autofs4.ko autofs4.ko Match
[2024-11-05T13:51:06Z DEBUG confido::trs] cryptd.ko cryptd.ko Match
[2024-11-05T13:51:06Z DEBUG confido::trs] crypto_simd.ko crypto_simd.ko Match
[2024-11-05T13:51:06Z DEBUG confido::trs] aesni-intel.ko aesni-intel.ko Match
[2024-11-05T13:51:06Z DEBUG confido::trs] ima-ng: nf_conntrack_netlink.ko, hash_value: 6ea446257cd809cf42bb759af00d24a8bb4b0dc4
[2024-11-05T13:51:06Z DEBUG confido::trs] ima-ng: nf_nat.ko, hash_value: eb37210983d620f3d8af9151eb5b7aa60120a4a7
[2024-11-05T13:51:06Z DEBUG confido::trs] ima-ng: xt_MASQUERADE.ko, hash_value: 9a7a9235231e25f37b27117763917580fc100976
[2024-11-05T13:51:06Z DEBUG confido::trs] ima-ng: nft_chain_nat.ko, hash_value: 9865f5fdd1e48faa633f092f420da618e56a325b
[2024-11-05T13:51:06Z DEBUG confido::trs] ima-ng: veth.ko, hash_value: efd5b1d9cbf646106d2a72ac63b89230e8ddce94
[2024-11-05T13:51:06Z DEBUG confido::trs] ima-ng: xt_nat.ko, hash_value: e0c70cd63480cb79c56a5e41bb9d4e0f4c34dcca
[2024-11-05T13:51:06Z DEBUG confido::evidence] Expected PCR 10: sha1: 0x71185871a0e3f31dc55c729c7426cb922f57fd39
[2024-11-05T13:51:06Z DEBUG confido::evidence] Server's PCR 10: sha1: 0x71185871a0e3f31dc55c729c7426cb922f57fd39
[2024-11-05T13:51:06Z INFO confido::evidence] PCR 10 matches
[2024-11-05T13:51:06Z INFO confido::evidence] Evidence successfully verified
[2024-11-05T13:51:06Z INFO confido::atls] aTLS certificate verification successful
Try the LLM chat by asking any question! Here is an example conversation:
[2024-11-05T13:51:06Z INFO confido::atls] aTLS certificate verification successful
Welcome to the Interactive Chat!
Type 'exit' or 'quit' to end the conversation
Type 'clear' to start a new conversation
--------------------------------------------------
You > do you think GA4GH TRS fits the role of a repository that stores reference hashes for Attested-TLS protocol?
Assistant: Yes, I understand that GA4GH TRS can potentially serve as a database of reference hashing for Attested-TLS protocol. This could be useful in securing the communication between different parties in various genome data sharing scenarios.
Provide Feedback: Share any insights, suggestions, or questions here to help refine the aTLS demo and its future integration as a GA4GH-SDK/CLI extension.
Expected Outcome:
This demo should help participants:
Gain a clear understanding of what an aTLS is and how it can secure interactions in confidential environments.
See the potential value of integrating the aTLS client library as an optional extension to the GA4GH-SDK/CLI, enabling researchers to establish secure, verifiable connections across various GA4GH API services in federated environments.
Contribute to discussions on the benefits and usability of aTLS, guiding its development, use case discovery, and potential adoption.
This is good feedback "Evidence successfully verified" It would be nice to have more explanation when it is not said by the software how it should be. Some examples below:
"Observe the initialisation of the connection with the provided nonce, a random value that ensures freshness and protects the protocol from replay attacks:"
For a new beginner would be nice to explain what is the correct nonce and what would be sign of replay attack feedback/nonce?
confido Could the "library that implements the Attested-TLS" be reviewed by a user?
"TEE Report – CPU-signed measurements of the VM's boot-time state that can only be generated by software running within a TEE-based VM. " Here an image/screenshot with marked areas where to look and what values to expect explained would be helpful?
Compare VCEK certificate of used VM to manufactures's ID related cert ? More detailed steps would help, the certs should be equal?
Thanks for explaining RPC values.
pavelnikonorov
changed the title
demo: LLM inference service with Attested-TLS (Confidential Computing)
demo: Private LLM Chat through Attested-TLS (Confidential Computing)
Nov 5, 2024
From my understanding, the demo process involves obtaining the aTLS certificate from http://api.genxt.ai:9000 and using it like an SSL context to send HTTP requests.
Having a reference set of hash-sums (or 'checksums') of the trustworthy privacy-preserving software, this verification leads to a user's confidence that the data will be processed in the expected way, ensuring the user's privacy policy and purpose-limitation for their data.
I believe this means that the server provides proof at the CPU or VM (such as the TEE Report, TPM Quote, etc.) to confirm a secure environment, and the client trusts these attestations.
Does this imply that we can trust the system all the way down to the software level where the actual processing occurs? For example, if a Python script within a workflow maliciously sends data to an external source, would this setup be able to prevent it?
Additionally, there is no server-side implementation in this demo. How challenging would such an implementation be in practice?
Description:
This issue presents a demo of the Attested-TLS (aTLS) protocol in action, utilising a Large Language Model (LLM) chat application as a practical use case. The primary goal is to demonstrate the functionalities and benefits of aTLS within a Confidential Computing environment, where hardware-based encryption ensures confidentiality and integrity protection for data and software.
Confidential Computing is an umbrella term for various CPU/GPU-based implementations of Trusted Execution Environments (TEEs). To learn more about it, please refer to the 5-minute explainer video and aTLS presentation on the GA4GH Connect 2024 Session "Towards GA4GH-Powered SPEs/TREs".
Purpose of the Demo:
This demo is intended for all ELIXIR BioHackathon participants to try, review, and gain insights into aTLS, which may later be leveraged through a loadable extension for the GA4GH-SDK/CLI. The ability to load custom extensions depends on the implementation of feat: add Extensions Manager.
aTLS Trust Model
The protocol is designed to allow independent client-side verification of:
Checksums of the entire software stack on the remote Virtual Machine (VM) including the VM firmware, virtual TPM (Trusted Platform Module), root file system, kernel and modules, Linux security policies (SELinux, IMA, EVM) and the application-level software services (e.g., ELIXIR TESK, OLLAMA inference engine, or any other software). Having a reference set of hash-sums (or 'checksums') of the trustworthy privacy-preserving software, this verification leads to a user's confidence that the data will be processed in the expected way, ensuring the user's privacy policy and purpose-limitation for their data.
Complete isolation of the VM's memory through hardware-based encryption. This ensures that any data a user might send to the VM will not be visible even to host administrators with physical access.
Upon successful verification, a user can infer that the system will persist untampered if the received hashes confirm the presence and enforcement of SELinux, IMA, and EVM policies, which will restrict any malicious behaviour on the system. In brief, these modules allow writing policies that can control every possible action on the system on the kernel level.
Please refer to the paper draft Universal Toolbox for Trust-Enhancing Technologies to learn more about the aTLS verification chain and trust model it delivers. You are welcome to leave comments or suggestions.
Objectives
Instructions:
cd confidential-ai-example make build make chat
You can see
confido
in the logs. It's a confidential computing middleware – a library that implements the aTLS protocol and is being developed by GENXT.TEE Report – CPU-signed measurements of the VM's boot-time state that can only be generated by software running within a TEE-based VM. The report includes details such as the CPU model, firmware version, and its unique hardware ID, as well as the measurements of the VM firmware, and additional data provided by the in-VM attestation agent, which, in this case, includes the vTPM Public Endorsement Key, while the private key is securely stored within the VM at VMPL0 (Virtual Machine Privilege Level, essentially a TEE within the TEE). This setup makes the VM's TPM an entirely independent software agent. Refer to OpenHCL to learn more about it.
TPM Quote – TPM-signed measurements of the run-time VM state extended to Platform Configuration Registers (PCRs) according to IMA security policies. In this example, the PCRs track only kernel modules, though a full log may contain up to twenty thousand hashes.
TPM Quote is signed by TPM using a client nonce concatenated with the TLS certificate. This combined nonce ensures that the aTLS certificate is securely linked to the aTLS Evidence message
The VCEK certificate is used to verify the genuiness of the TEE Report, and thereby establish trust in the remote CPU hardware and the fact that this aTLS Evidence was generated by software working inside isolation.
See how the reference values are currently stored.
boot_aggregate
IMA log entry:The
boot_aggregate
hash cryptographically includes the values of PCRs 0-9, each responsible for specific measurements:Note, that PCRs can be configured to work with SHA256 as well.
Observe how aTLS is leveraged in this Python chat app.. Please note, in this example,
confido
is compiled with preconfigured TRS registry, while in a version for GA4GH-SDK ExtensionsManager it will be possible to configure manually.Provide Feedback: Share any insights, suggestions, or questions here to help refine the aTLS demo and its future integration as a GA4GH-SDK/CLI extension.
Expected Outcome:
This demo should help participants:
Additional Resources
The text was updated successfully, but these errors were encountered: