Skip to content

Conversation

@salmanyam
Copy link

@salmanyam salmanyam commented Nov 19, 2024

This plugin generates credentials (keys and certificates) for both the API proxy server (required for
kata-containers/kata-containers#9159 and
kata-containers/kata-containers#9752) and the workload owner. This plugin also delivers the credentials to a sandbox (i.e., confidential PODs or VMs), specifically to the kata agent to initiate the SplitAPI proxy server so that a workload owner can communicate with the proxy server using a secure tunnel.

The IPv4 address, name, and the ID of the sandbox (i.e., pod) must be provided in the query string to obtain the credentials from the KBS.

After receiving the credential request, the splitapi plugin will create a key pair for the server and client and sign them using the self-signed CA. The generated credentials are stored in a hashmap with a unique key for each sandbox based on its name, ID, and IP address. SplitAPI plugin responds to a request from the sandbox by sending the server specific credentials (key, cert) along with the CA certificate. A request from workload owner gets the client specific credentials (key, cert) and CA's certificate.

The splitapi plugin itself is not built or initialized by default. To build or initialize it, following steps need to be followed.

How to build

  1. Enable splitapi-plugin feature while building kbs.
  2. Update the kbs/config/kbs-config.toml to enable the splitapi plugin.
$ make background-check-kbs POLICY_ENGINE=opa SPLITAPI_PLUGIN=true
$ cat >> kbs/config/kbs-config.toml << EOF
[[plugins]]
name = "pki_vault"
#plugin_dir = "/opt/confidential-containers/kbs/plugin/splitapi"
#cred_filename = "certificates.json"
[plugins.pkivault_cert_details]
country = "AA"
#state = "Default State"
#locality = "Default City"
organization = "Default Organization"
org_unit = "Default Unit"

[plugins.pkivault_cert_details.ca]
#common_name = "grpc-tls CA"
validity_days = 3650

[plugins.pkivault_cert_details.server]
#common_name = "server"
#validity_days = 180

[plugins.pkivault_cert_details.client]
common_name = "client"
validity_days = 180

Note: Only name is mandatory for the plugin configuration. The rest fields will get default values in case those are not configured in the kbs-config.toml.

How to test

This plugin can be tested similarly to the nebula-ca plugin (#539). We need the kbs-client patch available in the branch https://github.com/cclaudio/trustee/tree/nebula-ca-plugin-test to build the kbs-client and test the plugin.

Once the kbs-client is built, use the kbs-client to make a request to KBS for generating and providing the server specific credentials.

$ cd kbs && make cli ATTESTER=snp-attester
$ sudo make install-cli
$ kbs-client --url http://127.0.0.1:8080 get-resource --plugin-name "pki_vault" --resource-path "credential?id=3367353&ip=60.11.12.48&name=pod33" | base64 -d

KBS should return the server specific credentials such as server key, server certificates, and the certificate of the CA.

@salmanyam salmanyam requested a review from a team as a code owner November 19, 2024 19:44
@bpradipt
Copy link
Member

This plugin generates credentials (keys and certificates) for both the API proxy server (required for kata-containers/kata-containers#9159 and kata-containers/kata-containers#9752) and the workload owner. This plugin also delivers the credentials to a sandbox (i.e., confidential PODs or VMs), specifically to the kata agent to initiate the SplitAPI proxy server so that a workload owner can communicate with the proxy server using a secure tunnel.

The IPv4 address, name, and the ID of the sandbox must be provided in the query string to obtain the credential resources from the kbs.

After receiving the credential request, the splitapi plugin will create a key pair for the server and client and sign them using the self-signed CA. The generated ca.crt, server.crt, and server.key are stored in a directory specific to the sandbox (the caller) and returned to the caller. In addition, ca.key, client.key, and client.crt are also generated and stored to that particular directory specific to the sandbox (i.e., caller).

During the credential generation, a sandbox directory mapper creates a unique directory specific to the sandbox (i.e., the caller). The mapper creates the unique directory using the sandbox parameters passed in the query string. A mapping file is also maintained to store the mapping between the sandbox name and the unique directory created for the sandbox.

The splitapi plugin itself is not initialized by default. To initialize it, you need to add 'splitapi' in the kbs-config.toml.

A generic question. Would it make sense to have a generic plugin to create key-pair for a sandbox and not club it with splitapi?
There are other use cases for per sandbox (pod or VM) keys especially in the peer-pods case.
cc @yoheiueda @davidhadas

@fitzthum
Copy link
Member

fitzthum commented Nov 20, 2024

A generic question. Would it make sense to have a generic plugin to create key-pair for a sandbox and not club it with splitapi?

Some kind of PKI plugin would be very useful.

@salmanyam
Copy link
Author

salmanyam commented Nov 20, 2024

A generic question. Would it make sense to have a generic plugin to create key-pair for a sandbox and not club it with splitapi?
There are other use cases for per sandbox (pod or VM) keys especially in the peer-pods case.

We are good. I think it would be a good idea to have the key generation as a basic service needed by other plugins. We can definitely consider refactoring and making it as a basic service. In that case, the splitapi will utilize this basic service.

@fitzthum
Copy link
Member

@salmanyam a potential advantage of a generic PKI approach is that it wouldn't depend on any external patches in kata or wherever to be usable. I haven't look v closely at this PR yet but that's something to consider.

@yoheiueda
Copy link
Member

yoheiueda commented Nov 21, 2024

A generic PKI approach will be very helpful for our peer pod use cases.

I I understand correctly, the current proposal is to run a key generation service in a trustee, but a PKI approach sounds to me a signing service in a trustee.

Trustee has a private key for CA, and accepts key signing requests from clients. A client in a sandbox generates a key pair, and send a key signing request with the generated public key. The CA in Trustee issues a certificate by singing the received public key using the CA private key, and distribute the issued certificate. I think this is how the current PKI for TLS certificates works, and the sandbox does not need to reveal its private key to Trustee.

I think this will work If the purpose of the certificate generation is TLS secure tunneling.

This is just out of my curiosity, and a key generation service will still very helpful for our peer pod use case.

Another point is that the current proposal generates RSA keys. To support other encryption mechanisms such as WireGuard and SSH, it is helpful to support different types of keys such as Ed25529. WireGuard uses Curve25529 keys, and Ed25529 is also preferable for SSH these days.

Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a few comments. Please run cargo fmt and cargo clippy to fix the CI.


match self.generate_private_key(&self.ca_key, self.key_size) {
Ok(_) => println!("CA key generation succeeded and saved to {}.", self.ca_key.display()),
Err(e) => eprintln!("CA key generation failed: {}", e),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use info! and error! instead of printing.

I don't really like using a match here either. Are these errors fatal? Currently you continue with the function even if it fails. It might make more sense to call each generate function and pass the error up with a question mark. If you need to log something in the success case, you can just add that afterwards.

let state = "Default State";
let locality = "Default City";
let organization = "Default Organization";
let org_unit = "Default Unit";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be updated presumably with something that says that it's coming from a Trustee plugin or you could have this be configurable.

impl SplitAPIBackend for CertManager {
async fn get_server_credential(&self, params: &SandboxParams) -> Result<Vec<u8>> {
// Try locking the sandbox directory mapper
let mut mapper_guard = self.mapper.lock().map_err(|e| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a separate variable for the mapper_guard is a bit c-like. I think you can just use the mapper itself to hold the lock.

sandbox_dir_info = existing_dir.clone();

//TODO: check if the credentails are already in there
// send the existing credentials if they are not expired
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you create the certs, just store the expiration time alongside them so you can easily see if they are expired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a mistake to store all this stuff on the filesystem.

Here's what I would do. Create a struct that contains all the state that represents a keypair. OpenSSL has structs for all the things you need. Store this struct in a dictionary keyed by the connection id. Wrap the struct in a RwLock. If you need some kind of persistence you can serialize/desrialize the dictionary as needed.

I don't see any reason to be writing things to the filesystem and I think this entire mapper concept can be replaced by something from std, which decreases your complexity significantly.

if let Some(mapper) = mapper_guard.as_mut() {
let sandbox_dir_info: SandboxDirectoryInfo;

if let Some(existing_dir) = mapper.get_directory(&params.name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These params are sent in other the network in plaintext. Would it be a problem if someone tampered with the name value in flight?

More fundamentally, why does this need to be stateful at all?

Comment on lines 28 to 33
pub enum SplitAPIConfig {
CertManager(manager::SplitAPIRepoDesc),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use enum here, we are hinting that potentially we could have other types of SplitAPIConfig besides CertManager. Do we? If not, I suggest to change SplitAPIConfig into struct and move members of SplitAPIRepoDesc directly into SplitAPIConfig

Comment on lines 25 to 27
lazy_static! {
static ref SANDBOX_DIRECTORY_MAPPER: Arc<Mutex<Option<SandboxDirectoryMapper>>> = Arc::new(Mutex::new(None));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this static variable directly into CertManager as a member? s.t.

pub struct CertManager {
    pub plugin_dir: String,
    pub mapping_filename: String,
    mapper: Arc<Mutex<SandboxDirectoryMapper>>,
}

Also, could we use tokio::sync::Mutex rather than std::sync::Mutex? The async version would promote CPU efficiency.

// Try locking the sandbox directory mapper
let mut mapper_guard = self.mapper.lock().map_err(|e| {
anyhow!("Failed to lock sandbox directory mapper: {}", e)
})?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyhow provides an easier way for map_err(|e| anyhow!(..., e)) like things

xxx.context("Failed to lock sandbox directory mapper")?;

@salmanyam
Copy link
Author

Hi @fitzthum and @Xynnn007, I have addressed your comments in the latest commit. Could you please take a look? Thank you!

Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I think this is a big improvement. At some point you should squash your commits together. I made some comments.

pub struct SandboxParams {
/// Required: ID of a sandbox or pod
pub id: String,
// Required: IP of a sandbox or pod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably want triple quotes here and on the next field.

}

impl TryFrom<SplitAPIConfig> for SplitAPI {
type Error = anyhow::Error;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this line?

impl TryFrom<SplitAPIConfig> for SplitAPI {
type Error = anyhow::Error;

fn try_from(config: SplitAPIConfig) -> anyhow::Result<Self> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are importing anyhow::Result so you can just use Result here


pub struct SplitAPI {
pub backend: Arc<dyn SplitAPIBackend>,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this struct or could we just deal with the SplitAPIBackend directly? I guess this is supposed to be pluggable, but you only have one backend. Are you planning to add more?

// Return the server credential if the credential presents in the hashmap
let key = format!("{}_{}_{}", &params.name, &params.ip, &params.id);
if let Some(credentials) = self.load_credentials(&key).await {
log::info!("Returning already existed credentials!");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "Returning existing credentials"

use super::generator::CertificateDetails;
use super::manager;

pub const CREDENTIALS_BLOB_FILE: &str = "certificates.json";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using the term credential throughout the PR, but also the term certificates. It might be clearer to just use one especially on lines like this one.


log::info!("Credentials are generated!");

// Aquire the write lock and write the credential into the hashmap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little odd to be writing state in a method called get...


#[async_trait::async_trait]
impl SplitAPIBackend for CertManager {
async fn get_server_credential(&self, params: &SandboxParams) -> Result<Vec<u8>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the server here? These credentials are for the client, right? Maybe you're referring to some other server, like the SplitAPI server in the guest, but this is pretty confusing.

})
}

async fn load_credentials(&self, key: &str) -> Option<Credentials> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you are alternating between credentials (plural) and credential singular in your method names. Probably best to pick one.

Also, this function should probably return Result<Option<Credentials>> given that you could have a failure to load the credentials (i.e. if the file doesn't exist or something) or you could legitimately return no credentials.


async fn load_credentials(&self, key: &str) -> Option<Credentials> {
// Check if the credential is not loaded. If not, load them
if !self.credential_loaded_from_file.load(Ordering::SeqCst) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just do this at initialization time and avoid keeping track of whether they've been loaded with every request?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fitzthum, Thank you very much for providing feedback.

The reason for doing the loading this way is because the plugin initialization code (in plugin_manager.rs), that initializes the plugins, is synchronous, but the load_credentials function is async. Since the async function cannot be used inside a sync function, I cannot directly call the load_credentials function when the splitapi plugin is initialized. There might be two ways to solve it:

  1. changing the load_credentials function to be synchronous, or
  2. making the plugin initialization code to be asynchronous

I am not sure if there's a better way to call an async function inside a sync function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either one seems fine. Probably easier to make this sync. You're just reading from a file so that should be doable. I think you can make a new tokio::sync::arc without async.

@bpradipt
Copy link
Member

A generic question. Would it make sense to have a generic plugin to create key-pair for a sandbox and not club it with splitapi?

Some kind of PKI plugin would be very useful.

@salmanyam any plans to revisit this PR from the pov of creating a generic PKI plugin ?

@salmanyam
Copy link
Author

A generic question. Would it make sense to have a generic plugin to create key-pair for a sandbox and not club it with splitapi?

Some kind of PKI plugin would be very useful.

@salmanyam any plans to revisit this PR from the pov of creating a generic PKI plugin ?

Hi @bpradipt, yes, I almost finished working on that aspect of the PR. The redesign of the PR will not be tied with splitapi. I will update the PR within a couple days.

@salmanyam
Copy link
Author

@fitzthum, I have addressed your comments and refactored and cleaned the code. I have renamed the plugin name to pki-vault. I am also working to add the admin APIs which I will update soon. I would greatly appreciate your feedback.

Hi @bpradipt, The plugin name is now pki-vault. Currently, it generates all the keys [CA, server (sandabox), client (owner or admin)] and then generates the certificates for server and client signed by the CA certificate. I am working to support functionality where the sandbox or client (owner or admin) will generate their own key-pair and send the certificate signing request (CSR) to the plugin to sign it by the same CA. One key aspect of the current design is mutual authentication, i.e., both the sandbox and owner confirm each other.
BTW, can you please tell me your use case? That will help me greatly for generalizing the design.

@salmanyam salmanyam changed the title Add splitapi resource plugin Add pki-vault plugin Mar 7, 2025
@portersrc
Copy link
Member

Hey @salmanyam, cool work! One comment related to the cred_store:

Do you really need to maintain state with this plugin? You do it in two places. One is a hashmap called cred_store; and then you also persist that hashmap to file. The key to that hashmap is this IP-name-podID that you're using. Does a client ever need to request its credentials more than once? (Even if the pod dies and needs to get credentials again, won't it start up with a new podID, in which case it would need new keys anyway?)

@salmanyam
Copy link
Author

salmanyam commented Mar 7, 2025

Hey @salmanyam, cool work! One comment related to the cred_store:

Do you really need to maintain state with this plugin? You do it in two places. One is a hashmap called cred_store; and then you also persist that hashmap to file. The key to that hashmap is this IP-name-podID that you're using. Does a client ever need to request its credentials more than once? (Even if the pod dies and needs to get credentials again, won't it start up with a new podID, in which case it would need new keys anyway?)

Thanks, @portersrc! You brought a valid point. We also discussed this point in our internal meeting. Your are right. The client (or pod or sandbox) requests its credential only one time. But we need the persistence state for the following reason. That is if trustee restarts after the pod initialization, in that case, pod (client or sandbox) gets its credential, but the workload owner lose the CA that signs the pod's credential. As a result, the workload owner cannot perform mutual authentication between the pod and itself as we need the same CA for mutual authentication.

One possible solution is to use the Trustee as the CA. But if trustee's key gets changed for some reason, all the existing credentials would be invalidated. Hope I was able to clarify.

@portersrc
Copy link
Member

I see. Is this PR as-is supporting this, though, or do you have more changes planned? I noticed when a new set of credentials have to be created, a new ca_cert is created. (That is, you don't sign each new server_cert with a common ca_cert; rather, you generate a new ca_cert for every new server_cert.)

Not sure if the following is related: The client credentials, though they get stored in a hashmap and persisted, aren't externally accessible. Maybe the client credentials need to have an API endpoint, as well, if you want any mutual authentication with the ca_cert that's common between the 1 server and 1 client?

@salmanyam
Copy link
Author

I see. Is this PR as-is supporting this, though, or do you have more changes planned? I noticed when a new set of credentials have to be created, a new ca_cert is created. (That is, you don't sign each new server_cert with a common ca_cert; rather, you generate a new ca_cert for every new server_cert.)

Not sure if the following is related: The client credentials, though they get stored in a hashmap and persisted, aren't externally accessible. Maybe the client credentials need to have an API endpoint, as well, if you want any mutual authentication with the ca_cert that's common between the 1 server and 1 client?

The PR does not completely support whatever I said in the previous comment. I am working on the admin (workload owner) APIs. The admin APIs endpoints will allow workload owner to fetch the client credentials for the mutual authentication. Yes, to simplify the design, we decided to use a separate CA for each pod. The pod credentials can be obtained through the same API (get_resource).

@bpradipt
Copy link
Member

bpradipt commented Mar 8, 2025

@fitzthum, I have addressed your comments and refactored and cleaned the code. I have renamed the plugin name to pki-vault. I am also working to add the admin APIs which I will update soon. I would greatly appreciate your feedback.

Hi @bpradipt, The plugin name is now pki-vault. Currently, it generates all the keys [CA, server (sandabox), client (owner or admin)] and then generates the certificates for server and client signed by the CA certificate. I am working to support functionality where the sandbox or client (owner or admin) will generate their own key-pair and send the certificate signing request (CSR) to the plugin to sign it by the same CA. One key aspect of the current design is mutual authentication, i.e., both the sandbox and owner confirm each other. BTW, can you please tell me your use case? That will help me greatly for generalizing the design.

Thanks for working on this @salmanyam .. Following are my use cases:

  1. Mutual authentication between the user (owner) and the CoCo pod
  2. Mutual authentication between between two CoCo pods
  3. Mutual authentication between between a workload in the trusted env and CoCo pod (similar to 1)
  4. A server in the trusted env (having the CA cert from Trustee) verifying the client certs sent from CoCo pods

Ability to send a CSR to the plugin from the pod as well as an API to retrieve the ca cert will be important imho. The features can be iteratively implemented.

Can you add a design doc or readme to the PR. It'll help to review.

@fitzthum
Copy link
Member

fitzthum commented Mar 10, 2025

I agree with most of the things mentioned here. It seems like it would be simpler to not maintain any state. I was picturing there being one root cert, corresponding to Trustee itself. It seems like in the current implementation different pods would have different root certs, which is interesting.

Also, I'm not exactly sure about the data that is being provided from the client. The pod id and such. This seems a little arbitrary and it doesn't seem like this information actually makes it into the cert anyway. Do we want the cert to be endorsing something about the guest? One possible approach would be to allow clients to specify any values they want as part of a query string. This query string would pass through the policy engine, so Trustee admins could add their own logic about what is allowed. Then maybe these claims would be added to the cert as an extension or something. This is sort of a generalization on what is currently here.

Ultimately I'm not sure exactly what PKI for Trustee should look like, so if you have a vision here, feel free to go that direction. I think a design doc would help understand the sort of use cases that you are looking at. The code is looking pretty clean tho. Thanks for the new version.

@salmanyam
Copy link
Author

@fitzthum, I have addressed your comments and refactored and cleaned the code. I have renamed the plugin name to pki-vault. I am also working to add the admin APIs which I will update soon. I would greatly appreciate your feedback.
Hi @bpradipt, The plugin name is now pki-vault. Currently, it generates all the keys [CA, server (sandabox), client (owner or admin)] and then generates the certificates for server and client signed by the CA certificate. I am working to support functionality where the sandbox or client (owner or admin) will generate their own key-pair and send the certificate signing request (CSR) to the plugin to sign it by the same CA. One key aspect of the current design is mutual authentication, i.e., both the sandbox and owner confirm each other. BTW, can you please tell me your use case? That will help me greatly for generalizing the design.

Thanks for working on this @salmanyam .. Following are my use cases:

  1. Mutual authentication between the user (owner) and the CoCo pod
  2. Mutual authentication between between two CoCo pods
  3. Mutual authentication between between a workload in the trusted env and CoCo pod (similar to 1)
  4. A server in the trusted env (having the CA cert from Trustee) verifying the client certs sent from CoCo pods

Ability to send a CSR to the plugin from the pod as well as an API to retrieve the ca cert will be important imho. The features can be iteratively implemented.

Can you add a design doc or readme to the PR. It'll help to review.

Hi @bpradipt, thank you for providing the use cases. It will help me greatly to design the admin (workload owner) APIs. I am working on the owner side APIs and design doc. I will update soon.

@zvonkok
Copy link
Member

zvonkok commented Apr 9, 2025

/cc @zvonkok

@bpradipt
Copy link
Member

@salmanyam any updates on this ?

@salmanyam
Copy link
Author

@salmanyam any updates on this ?

@bpradipt Yes, I have made progress (both in code and documentation). I will update what I have so far by today.

@salmanyam
Copy link
Author

salmanyam commented Jun 25, 2025

Hi @bpradipt and @fitzthum, I have made progress to this work as it was sidelined for a few weeks due to other work. I have also started the documentation and client-side code. Could you please take a look? Thank you!

For @bpradipt, the client-side code is available in the following repo, in case you need it. I have mentioned this information in the documentation.
https://github.com/salmanyam/trustee/commits/pki-vault-test/

@bpradipt
Copy link
Member

Hi @bpradipt and @fitzthum, I have made progress to this work as it was sidelined for a few weeks due to other work. I have also started the documentation and client-side code. Could you please take a look? Thank you!

For @bpradipt, the client-side code is available in the following repo, in case you need it. I have mentioned this information in the documentation. https://github.com/salmanyam/trustee/commits/pki-vault-test/

Thanks @salmanyam. I started with the doc to understand the design. One thing I feel which needs to be enabled is pluggable persistence store. The current design will not work with multiple KBS replicas . So at least laying the ground work to handle multiple replicas will be a requirement imho.
I'll give it a try as well.

Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice update. Mainly commented on the doc. Code looks tighter. An integration test would be nice. I think we are definitely moving in the right direction here.

This plugin currently generates credentials (keys and certificates) for a server running inside the confidential VM (aka sandbox) and for the workload owner (who acts as a client for the server). The current design of the plugin prioritizes the mutual authentication between the server and the client. Such design is necessary for the SplitAPI (kata-containers/kata-containers#9159 and
kata-containers/kata-containers#9752) and peer-pods.

This plugin also delivers the server-specific credentials to the sandbox (i.e., confidential PODs or VMs), specifically to the kata agent to initiate the server. The workload owner can communicate with the server using a secure tunnel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably no need to mention the Kata Agent here. That's not really in scope of Trustee.


The server-specific credentials can be obtained throught the `get-resource` APIs by specifying the `plugin-name`. Currently, the plugin requires that the sandbox or `kata-agent` sends the IPv4 address, name, and the ID of the sandbox (i.e., pod) as part of the query string to obtain the credentials from the KBS.

After receiving the credential request, the `pki_vault` plugin will create a CA, a key pair for the server and another key pair for client, and sign them using the self-signed CA. Currently, the generated credentials are stored in a hashmap with a unique key for each sandbox based on its name, ID, and IP address. But we expect this design will be changed in future. PKI Vault plugin responds to a request from the sandbox by sending the server specific credentials (key, cert) along with the CA certificate. A request from workload owner gets the client specific credentials (key, cert) and CA's certificate.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a little unclear on why the credentials are stored. Why not just generated a new certificate? What do you expect to change about the design in the future?

[[plugins]]
name = "pki_vault"
#plugin_dir = "/opt/confidential-containers/kbs/plugin/splitapi"
#cred_filename = "certificates.json"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these commented-out lines?


Request the credentials for initiating a server inside a pod or sandbox.

Only `GET` request is supported, e.g. `GET /kbs/v0/pki_vault/credentials?id=3367&ip=60.11.12.89&name=pod51`. Current the `GET` takes `id`, `ip`, and `name` parameters, but expect this parameters to be changed in the future design for supporting more generic use cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Current -> Currently.

Yeah it's a little unclear why id, ip, and name are the required fields. You could just allow any query string to be given and create a cert with the values in various extensions. The plugin requests pass through the policy, so it might be best to have the policy, which is highly configurable, be the place where we have logic about checking the parameters.

}
```


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: no need for these empty lines


The request will be processed only if the request is authenticated, otherwise an error is returned. In the current design, the credentials for the client already already exists in the KBS as they have been already created as part of the response of server credential request, so the plugin simply returns the existing client credentials.

Once the request is processed, the following structure is returned in JSON format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to give these examples as JSON since that is the interface users will see.

# How to test (client or owner-side code)
In order to test the `pki-vault` plugin, we need to enable the plugin support in the `kbs_protocol` inside `attestation-agent` of the `guest-components` respository. The plugin support has been enabled in the `pki-vault-test` branch of following `guest-components` repository.

`https://github.com/salmanyam/guest-components/tree/pki-vault-test`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sort of thing probably shouldn't go into the actual docs. Ideally we should just have an integration test in the PR. The docs are aimed a little more at users than developers.

We do have a gap in guest components. We will need plugin support there. I think @cclaudio and @portersrc have also run into this problem.


Ok(pods)
}
"get_client_credentials" => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having get_ here is redundant and not very restful.

.context("accessed path is illegal, should start with `/`")?;

if method.as_str() == "GET" {
if sub_path != "credentials" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems odd to have the same endpoint require both auth and encryption (next function). I might have this backwards, but don't you want get_client_credentials to be the one that requires auth?


/// Whether the body needs to be encrypted via TEE key pair.
/// If returns `Ok(true)`, the KBS server will encrypt the whole body
/// with TEE key pair and use KBS protocol's Response format.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should change this from the sample comment to a comment that explains why we are encrypting various endpoints.

This plugin generates credentials (keys and certificates) for
both the API proxy server (required for
kata-containers/kata-containers#9159 and
kata-containers/kata-containers#9752) and the workload owner.
This plugin also delivers the credentials to a sandbox (i.e.,
confidential PODs or VMs), specifically to the kata agent to
initiate the SplitAPI proxy server so that a workload owner
can communicate with the proxy server using a secure tunnel.

The IPv4 address, name, and the ID of the sandbox must be
provided in the query string to obtain the credential
resources from the kbs.

After receiving the credential request, the splitapi plugin
will create a key pair for the server and client and sign them
using the self-signed CA. The generated ca.crt, server.crt, and
server.key are stored in a directory specific to the sandbox
(the caller) and returned to the caller. In addition, ca.key,
client.key, and client.crt are also generated and stored to that
particular directory specific to the sandbox (i.e., caller).

During the credential generation, a sandbox directory mapper
creates a unique directory specific to the sandbox (i.e., the
caller). The mapper creates the unique directory using the
sandbox parameters passed in the query string. A mapping file is
also maintained to store the mapping between the sandbox name
and the unique directory created for the sandbox.

The splitapi plugin itself is not initialized by default. To
initialize it, you need to add 'splitapi' in the kbs-config.toml.

Signed-off-by: Salman Ahmed <[email protected]>
Updates include the following changes:
- removing the storage of credentials on the filesystem,
- serializing/desrializing the credentials dictionary as needed.
- configurable credentials/certificate details
- other coding related issues

Signed-off-by: Salman Ahmed <[email protected]>
Updates include the following changes:
- code refactor and clean up,
- generic PKI-type plugin,
- keys generated using Ed25519

Signed-off-by: Salman Ahmed <[email protected]>
Updates include the following changes:
- adding two endpoints (list_pods and get_client_credentials)
- documentation of the pki_vault plugin (kbs/docs/pki_vault.md)
- restructuring the pki_vault plugin code

Signed-off-by: Salman Ahmed <[email protected]>
Changes include the following functionalities:
- refactor the code for on-demand generation of credentials
- remove the stateful credentials support from the initial verison
- update the readme documentation
- fix comments

Signed-off-by: Salman Ahmed <[email protected]>
```toml
[[plugins]]
name = "pki_vault"
plugin_dir = "/opt/confidential-containers/kbs/plugin/splitapi"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the dir name splitapi doesn't align with the intended functionality here.

# Design choices
There can be three key design choices one can consider to configure PKI Vault plugin. As of now, PKI Vault only supports the separte CA per Pod approach with a non-persistent credential storage.

1. **Separate CA per Pod VM**:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this model, mTLS between two CoCo pods can't be used right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants