Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demo: Private LLM Chat through Attested-TLS (Confidential Computing) #17

Open
pavelnikonorov opened this issue Nov 5, 2024 · 2 comments

Comments

@pavelnikonorov
Copy link

pavelnikonorov commented Nov 5, 2024

Description:

This issue presents a demo of the Attested-TLS (aTLS) protocol in action, utilising a Large Language Model (LLM) chat application as a practical use case. The primary goal is to demonstrate the functionalities and benefits of aTLS within a Confidential Computing environment, where hardware-based encryption ensures confidentiality and integrity protection for data and software.

Confidential Computing is an umbrella term for various CPU/GPU-based implementations of Trusted Execution Environments (TEEs). To learn more about it, please refer to the 5-minute explainer video and aTLS presentation on the GA4GH Connect 2024 Session "Towards GA4GH-Powered SPEs/TREs".

Purpose of the Demo:

This demo is intended for all ELIXIR BioHackathon participants to try, review, and gain insights into aTLS, which may later be leveraged through a loadable extension for the GA4GH-SDK/CLI. The ability to load custom extensions depends on the implementation of feat: add Extensions Manager.

aTLS Trust Model

The protocol is designed to allow independent client-side verification of:

  • Checksums of the entire software stack on the remote Virtual Machine (VM) including the VM firmware, virtual TPM (Trusted Platform Module), root file system, kernel and modules, Linux security policies (SELinux, IMA, EVM) and the application-level software services (e.g., ELIXIR TESK, OLLAMA inference engine, or any other software). Having a reference set of hash-sums (or 'checksums') of the trustworthy privacy-preserving software, this verification leads to a user's confidence that the data will be processed in the expected way, ensuring the user's privacy policy and purpose-limitation for their data.

  • Complete isolation of the VM's memory through hardware-based encryption. This ensures that any data a user might send to the VM will not be visible even to host administrators with physical access.

Upon successful verification, a user can infer that the system will persist untampered if the received hashes confirm the presence and enforcement of SELinux, IMA, and EVM policies, which will restrict any malicious behaviour on the system. In brief, these modules allow writing policies that can control every possible action on the system on the kernel level.

Please refer to the paper draft Universal Toolbox for Trust-Enhancing Technologies to learn more about the aTLS verification chain and trust model it delivers. You are welcome to leave comments or suggestions.

Objectives

  • Showcase aTLS: While the LLM chat serves as a practical use case, the primary objective is to demonstrate aTLS in action and explain the key verification steps it involves.
  • Enable Understanding for GA4GH-SDK/CLI Integration: This demo provides participants with insights into aTLS, preparing them for its potential integration into the GA4GH-SDK/CLI via an optional loadable extension.

Instructions:

  1. Clone the Confidential AI Example repository on your machine (Mac/Linux, any CPU architecture):
git clone https://github.com/genxnetwork/confidential-ai-example.git
  1. Build the app container and run it :
cd confidential-ai-example
make build
make chat
  1. Observe the initialisation of the connection with the provided nonce, a random value that ensures freshness and protects the protocol from replay attacks:
pablo-mbp:confidential-ai-example pavelnikonorov$ make chat
docker run --platform linux/amd64 -it confidential-ai-example python3 confai-promter.py
[2024-11-05T13:51:01Z INFO  confido::atls] fn aTLS_connect: "http://api.genxt.ai:9000"
[2024-11-05T13:51:01Z DEBUG confido::atls] host: "http://api.genxt.ai:9000"
[2024-11-05T13:51:01Z INFO  confido::atls] Connecting to http://api.genxt.ai:9000 to gather evidence, nonce: "2sV85XlR6RarfWjnYHvcRfTliflaQqHYCLmMOIchRsI="

You can see confido in the logs. It's a confidential computing middleware – a library that implements the aTLS protocol and is being developed by GENXT.

  1. Observe the aTLS Evidence message, which comprises of:
  • TEE Report – CPU-signed measurements of the VM's boot-time state that can only be generated by software running within a TEE-based VM. The report includes details such as the CPU model, firmware version, and its unique hardware ID, as well as the measurements of the VM firmware, and additional data provided by the in-VM attestation agent, which, in this case, includes the vTPM Public Endorsement Key, while the private key is securely stored within the VM at VMPL0 (Virtual Machine Privilege Level, essentially a TEE within the TEE). This setup makes the VM's TPM an entirely independent software agent. Refer to OpenHCL to learn more about it.

  • TPM Quote – TPM-signed measurements of the run-time VM state extended to Platform Configuration Registers (PCRs) according to IMA security policies. In this example, the PCRs track only kernel modules, though a full log may contain up to twenty thousand hashes.

[2024-11-05T13:51:04Z DEBUG confido::atls] aTLS response: Evidence {
        report: "48434c410200000084.....00000000000000000000",
        quote: Quote {
            signature: "2ea14a.....f28519b4",
            message: "ff544.....fddd45b7a1",
            pcr_banks: [
                {
                    algo: Sha1,
                    pcr_values: [
                        "733a683619d3dd46dbf8d78a268076b2fbececd8",
                        "f124a1e2bd2229a347c90475db881922da6fd719",
                        "b2a83b0ebf2f8374299a5b2bdfc31ea955ad7236",
                        "b2a83b0ebf2f8374299a5b2bdfc31ea955ad7236",
                        "ae7939e21358bf8cd0ac5b24397a9265c48d9f56",
                        "e38330d740b4640ad84c34d50ad1bffea7bf20c0",
                        "6436981b752f8e08a10c0f2ae6b5cbe65e04da91",
                        "5028147c7baa7967821bd1868cd057706da8e304",
                        "0000000000000000000000000000000000000000",
                        "70d88a5df6072098e0421bf9dcd02a83eab28e7a",
                        "71185871a0e3f31dc55c729c7426cb922f57fd39",
                        "0000000000000000000000000000000000000000",
                        "2a6d6d4124b1ec83a4d5a69111fb23711e36170f",
                        "0000000000000000000000000000000000000000",
                        "157e07e56e25148f3e4d306a73f82664ec2f80db",
                        "0000000000000000000000000000000000000000",
                        "0000000000000000000000000000000000000000",
                        "ffffffffffffffffffffffffffffffffffffffff",
                        "ffffffffffffffffffffffffffffffffffffffff",
                        "ffffffffffffffffffffffffffffffffffffffff",
                        "ffffffffffffffffffffffffffffffffffffffff",
                        "ffffffffffffffffffffffffffffffffffffffff",
                        "ffffffffffffffffffffffffffffffffffffffff",
                        "0000000000000000000000000000000000000000",
                    ]
                },
            ],
        },
    }
[2024-11-05T13:51:04Z INFO  confido::atls] aTLS response successfully received

TPM Quote is signed by TPM using a client nonce concatenated with the TLS certificate. This combined nonce ensures that the aTLS certificate is securely linked to the aTLS Evidence message

  1. Observe obtaining a VCEK (Versioned CPU Endorsement Key) certificate from the CPU vendor website by providing a CPU HW ID (taken from the TEE Report):
[2024-11-05T13:51:05Z DEBUG ureq::pool] adding stream to pool: https|kdsintf.amd.com|443 -> Stream(RustlsStream)
[2024-11-05T13:51:05Z DEBUG ureq::unit] response 200 to GET https://kdsintf.amd.com/vcek/v1/Genoa/2ba5395d906e01b34155c4dcfa440a744ec18afde6948810230fc05e20521d58935d3895ca2a52d87159daa06a178fdbbbb67eb8150627cfe4cd80e05df2e885?blSPL=08&teeSPL=00&snpSPL=16&ucodeSPL=68

The VCEK certificate is used to verify the genuiness of the TEE Report, and thereby establish trust in the remote CPU hardware and the fact that this aTLS Evidence was generated by software working inside isolation.

  1. Observe obtaining reference hashes from TRS (here, Trusted Repository Service). Notably, the GA4GH Tool Repository Service (TRS) API standard might be used, as it also supports checksum storage.
[2024-11-05T13:51:05Z DEBUG confido::trs] TRS host configured: api.genxt.ai
[2024-11-05T13:51:05Z INFO  confido::trs] Fetching TRS for 172.177.22.106
[2024-11-05T13:51:05Z DEBUG reqwest::connect] starting new connection: https://api.github.com/
[2024-11-05T13:51:06Z DEBUG reqwest::async_impl::client] redirecting 'https://api.github.com/repos/genxnetwork/confido-private/contents/bb-attest-172.177.22.106.json' to 'https://api.github.com/repositories/816818055/contents/bb-attest-172.177.22.106.json'

See how the reference values are currently stored.

  1. Observe the verification of boot_aggregate IMA log entry:
[2024-11-05T13:51:06Z INFO  confido::evidence] calculating boot_aggregate sha1 hash using the vTPM Quote PCR values...
[2024-11-05T13:51:06Z DEBUG confido::evidence] PCR 0: 733a683619d3dd46dbf8d78a268076b2fbececd8
[2024-11-05T13:51:06Z DEBUG confido::evidence] PCR 1: f124a1e2bd2229a347c90475db881922da6fd719
[2024-11-05T13:51:06Z DEBUG confido::evidence] PCR 2: b2a83b0ebf2f8374299a5b2bdfc31ea955ad7236
[2024-11-05T13:51:06Z DEBUG confido::evidence] PCR 3: b2a83b0ebf2f8374299a5b2bdfc31ea955ad7236
[2024-11-05T13:51:06Z DEBUG confido::evidence] PCR 4: ae7939e21358bf8cd0ac5b24397a9265c48d9f56
[2024-11-05T13:51:06Z DEBUG confido::evidence] PCR 5: e38330d740b4640ad84c34d50ad1bffea7bf20c0
[2024-11-05T13:51:06Z DEBUG confido::evidence] PCR 6: 6436981b752f8e08a10c0f2ae6b5cbe65e04da91
[2024-11-05T13:51:06Z DEBUG confido::evidence] PCR 7: 5028147c7baa7967821bd1868cd057706da8e304
[2024-11-05T13:51:06Z INFO  confido::evidence] calculated boot_aggregate sha1 hash: 690106058d5aa385bc44cc09f6a0ad54086298ca
[2024-11-05T13:51:06Z INFO  confido::evidence] Boot aggregate hash matches

The boot_aggregate hash cryptographically includes the values of PCRs 0-9, each responsible for specific measurements:

  • PCR 0 – Core Root of Trust for Measurement (CRTM) and BIOS/UEFI firmware.
  • PCR 1 – Platform configuration and firmware extensions.
  • PCR 2 – Option ROMs (e.g., firmware on GPUs or network cards).
  • PCR 3 – Bootloader.
  • PCR 4 – Kernel and OS components.
  • PCR 5 – Additional OS configuration files or modules.
  • PCR 6 – Initial RAM disk (initrd/initramfs).
  • PCR 7 – System security policies and secure boot.

Note, that PCRs can be configured to work with SHA256 as well.

  1. Observe the TPM PCR10 verification log against the obtained reference values. Here is the reduced example output log:
[2024-11-05T13:51:06Z INFO  confido::trs] Calculating PCR10 value from IMA measurements
[2024-11-05T13:51:06Z DEBUG confido::trs] boot_aggregate                                     boot_aggregate                                     Match
[2024-11-05T13:51:06Z DEBUG confido::trs] autofs4.ko                                         autofs4.ko                                         Match
[2024-11-05T13:51:06Z DEBUG confido::trs] cryptd.ko                                          cryptd.ko                                          Match
[2024-11-05T13:51:06Z DEBUG confido::trs] crypto_simd.ko                                     crypto_simd.ko                                     Match
[2024-11-05T13:51:06Z DEBUG confido::trs] aesni-intel.ko                                     aesni-intel.ko                                     Match
[2024-11-05T13:51:06Z DEBUG confido::trs] ima-ng: nf_conntrack_netlink.ko, hash_value: 6ea446257cd809cf42bb759af00d24a8bb4b0dc4
[2024-11-05T13:51:06Z DEBUG confido::trs] ima-ng: nf_nat.ko, hash_value: eb37210983d620f3d8af9151eb5b7aa60120a4a7
[2024-11-05T13:51:06Z DEBUG confido::trs] ima-ng: xt_MASQUERADE.ko, hash_value: 9a7a9235231e25f37b27117763917580fc100976
[2024-11-05T13:51:06Z DEBUG confido::trs] ima-ng: nft_chain_nat.ko, hash_value: 9865f5fdd1e48faa633f092f420da618e56a325b
[2024-11-05T13:51:06Z DEBUG confido::trs] ima-ng: veth.ko, hash_value: efd5b1d9cbf646106d2a72ac63b89230e8ddce94
[2024-11-05T13:51:06Z DEBUG confido::trs] ima-ng: xt_nat.ko, hash_value: e0c70cd63480cb79c56a5e41bb9d4e0f4c34dcca
[2024-11-05T13:51:06Z DEBUG confido::evidence] Expected PCR 10: sha1: 0x71185871a0e3f31dc55c729c7426cb922f57fd39
[2024-11-05T13:51:06Z DEBUG confido::evidence] Server's PCR 10: sha1: 0x71185871a0e3f31dc55c729c7426cb922f57fd39
[2024-11-05T13:51:06Z INFO  confido::evidence] PCR 10 matches
[2024-11-05T13:51:06Z INFO  confido::evidence] Evidence successfully verified
[2024-11-05T13:51:06Z INFO  confido::atls] aTLS certificate verification successful
  1. Try the LLM chat by asking any question! Here is an example conversation:
[2024-11-05T13:51:06Z INFO  confido::atls] aTLS certificate verification successful

Welcome to the Interactive Chat!
Type 'exit' or 'quit' to end the conversation
Type 'clear' to start a new conversation
--------------------------------------------------

You > do you think GA4GH TRS fits the role of a repository that stores reference hashes for Attested-TLS protocol?

Assistant: Yes, I understand that GA4GH TRS can potentially serve as a database of reference hashing for Attested-TLS protocol. This could be useful in securing the communication between different parties in various genome data sharing scenarios.
  1. Observe how aTLS is leveraged in this Python chat app.. Please note, in this example, confido is compiled with preconfigured TRS registry, while in a version for GA4GH-SDK ExtensionsManager it will be possible to configure manually.

  2. Provide Feedback: Share any insights, suggestions, or questions here to help refine the aTLS demo and its future integration as a GA4GH-SDK/CLI extension.

Expected Outcome:

This demo should help participants:

  • Gain a clear understanding of what an aTLS is and how it can secure interactions in confidential environments.
  • See the potential value of integrating the aTLS client library as an optional extension to the GA4GH-SDK/CLI, enabling researchers to establish secure, verifiable connections across various GA4GH API services in federated environments.
  • Contribute to discussions on the benefits and usability of aTLS, guiding its development, use case discovery, and potential adoption.

Additional Resources

@lilachic
Copy link

lilachic commented Nov 5, 2024

Looks good to me.

Maybe in the beginning give the acronym e.g. Attested-TLS (aTLS) .
https://docs.google.com/file/d/1yaldURlbJ9kdeQZYUIyb5bDr21PzI8f3 asks for signing in?

This is good feedback "Evidence successfully verified" It would be nice to have more explanation when it is not said by the software how it should be. Some examples below:

"Observe the initialisation of the connection with the provided nonce, a random value that ensures freshness and protects the protocol from replay attacks:"
For a new beginner would be nice to explain what is the correct nonce and what would be sign of replay attack feedback/nonce?

confido Could the "library that implements the Attested-TLS" be reviewed by a user?

"TEE Report – CPU-signed measurements of the VM's boot-time state that can only be generated by software running within a TEE-based VM. " Here an image/screenshot with marked areas where to look and what values to expect explained would be helpful?

Compare VCEK certificate of used VM to manufactures's ID related cert ? More detailed steps would help, the certs should be equal?

Thanks for explaining RPC values.

@pavelnikonorov pavelnikonorov changed the title demo: LLM inference service with Attested-TLS (Confidential Computing) demo: Private LLM Chat through Attested-TLS (Confidential Computing) Nov 5, 2024
@suecharo
Copy link

suecharo commented Nov 6, 2024

From my understanding, the demo process involves obtaining the aTLS certificate from http://api.genxt.ai:9000 and using it like an SSL context to send HTTP requests.

Having a reference set of hash-sums (or 'checksums') of the trustworthy privacy-preserving software, this verification leads to a user's confidence that the data will be processed in the expected way, ensuring the user's privacy policy and purpose-limitation for their data.

I believe this means that the server provides proof at the CPU or VM (such as the TEE Report, TPM Quote, etc.) to confirm a secure environment, and the client trusts these attestations.
Does this imply that we can trust the system all the way down to the software level where the actual processing occurs? For example, if a Python script within a workflow maliciously sends data to an external source, would this setup be able to prevent it?

Additionally, there is no server-side implementation in this demo. How challenging would such an implementation be in practice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In review
Development

No branches or pull requests

5 participants