Skip to content

Conversation

@amd-aliem
Copy link
Contributor

@amd-aliem amd-aliem commented Jan 5, 2026

Implement a filesystem based cache option that enables specifying a directory which may be preloaded with certs in http-cache-reqwest format for offline trustee deployments.

Benefits of enabling the cache:

  • Allows normal function of trustee even if KDS is unreachable. This applies for completely air-gapped situations, as well as transient KDS or external network outages.
  • Improves timing performance of KDS requests, as the bottleneck is generally reaching out to KDS to fetch a certificate or verify that the certificate is still valid.

Drawbacks of enabling the cache:

  • There is no checking of certificate revocation lists. Certificate revocation events are generally very rare and certificates normally are valid for years - but be aware that revoked certificates must be manually ejected by the administrator.

Notes:

  • AMD Offline Certificate Cache is the main doc describing the steps to get an air-gapped deployment with the appropriate vceks
  • Tested KDS + filesystem store priorities by blocking traffic to my trustee container and/or removing certificates from filesystem. I saw it correctly falling back to secondary option if it fails in both cases.

@amd-aliem amd-aliem requested a review from a team as a code owner January 5, 2026 20:02
@amd-aliem amd-aliem force-pushed the amd-offline-cache branch 2 times, most recently from 3a94360 to dbec7ea Compare January 5, 2026 20:07
Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok. A few comments on the code.

Also tbh I don't really like the docs. They can be a lot more concise.

Copy link
Contributor

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit puzzled why we want to mix a set of offline-available certificates with the concept of a cache, which in computers I usually associate with a place where we store things temporarily (so we can access them faster).

we are exposing the implementation details of a crate that we use for caching in our http client library to the user. if a) the crate authors decide to change their folder layout or b) we decide to use a different http client library with a different caching impl, we'd have to support/migrate/deprecate this functionality.

Can we have a configurable folder that is used for offline certificats and maybe a config options that prescribes how we want to deal with that "vcek_sources: [Offline, KDS]". The user would then be responsible to put their files into that folder.

If that's deemed too lo-fi and user-unfriendly then maybe this could be built on top of the storage abstraction that @Xynnn007 is currently refining. there could be an admin CRUD endpoint that handles certificates.

(this is not a hard veto, if others don't share my concern)

@amd-aliem
Copy link
Contributor Author

amd-aliem commented Jan 6, 2026

I'm still a bit puzzled why we want to mix a set of offline-available certificates with the concept of a cache, which in computers I usually associate with a place where we store things temporarily (so we can access them faster).

we are exposing the implementation details of a crate that we use for caching in our http client library to the user. if a) the crate authors decide to change their folder layout or b) we decide to use a different http client library with a different caching impl, we'd have to support/migrate/deprecate this functionality.

Can we have a configurable folder that is used for offline certificats and maybe a config options that prescribes how we want to deal with that "vcek_sources: [Offline, KDS]". The user would then be responsible to put their files into that folder.

If that's deemed too lo-fi and user-unfriendly then maybe this could be built on top of the storage abstraction that @Xynnn007 is currently refining. there could be an admin CRUD endpoint that handles certificates.

(this is not a hard veto, if others don't share my concern)

I think in the previous PR I thought it would be better shared since one solution could cover a wide variety of situations (temporary outages, performance, and offline situations). i.e. if KDS went down for a few hours (as it did a month ago), we'd get some resilience for free without having to actually load anything manually - it'd automatically fall back to the cached value.

However with how the current implementation shook out with hidden options I'm ok with implementing a logically separate store. I do like the idea of having a management API that will let users upload their own certificates. I would like to get this implemented soon though, so I'll follow this format first https://github.com/confidential-containers/trustee/blob/main/deps/verifier/src/csv/mod.rs#L261-L266 (which @Xynnn007 reference to me before) and add instructions on how to manually add them.

Regarding your config suggestion: "vcek_sources: [Offline, KDS]"
Envisioning what it will look like in the future if we add a proxy option:

        "snp_verifier": {
            "vcek_sources": {
               {
                   "type": "Offline",
                   "path": "/opt/confidential-containers/vcek-cache"
               },
               {
                   "type": "Proxy",
                   "url": "https://proxy-url"
               },
               {
                    "type": "KDS"
               }
            }
        }

or

        "snp_verifier": {
            "vcek_sources": {
               {
                   "type": "Offline",
                   "path": "/opt/confidential-containers/vcek-cache"
               },
               {
                   "type": "KDS",
                   "url": "https://proxy-url" // defaults to the KDS but can be set to proxy?
               }
            }
        }

Maybe it would be more straightforward as like an extensible struct:

        "snp_verifier": {
            "vcek_sources": {
               {
                   "kds-url": <defaults to the real KDS>?
                   "local-fs-path: <defaults to nothing, in which case skip checking?>
               }
            }
        }

@mkulke thanks!

@mkulke
Copy link
Contributor

mkulke commented Jan 6, 2026

@amd-aliem I think the 2nd option for configuration looks best. An array makes more sense than a map (option 3), because it can encode an order (priority), order in maps is not guaranteed.

@fitzthum
Copy link
Member

fitzthum commented Jan 7, 2026

As I mentioned on the other PR(s) I don't think the points against caching make much sense. That said, the vcek sources config looks fine and it does allow us to have multiple fallbacks. This is more complex, but the design seems fine. The AMD KDS should probably be a default option so that no extra configuration is required in basic cases.

@amd-aliem
Copy link
Contributor Author

@mkulke @fitzthum ready for re-review! Check Links is failing but I think it's a transient issue (I went to the link and it seems fine?), and not a link added by my PR.

Copy link
Contributor

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. some nits about handling a default vcek directory

@amd-aliem amd-aliem requested a review from mkulke January 8, 2026 23:44
Copy link
Contributor

@mythi mythi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current implementation matches with my earlier feedback too so +1. a couple of suggestions.

Comment on lines 290 to 302
#[derive(Clone, Debug, Default, Deserialize, PartialEq)]
pub struct SnpVerifierConfig {
#[serde(default)]
pub vcek_sources: Vec<VCEKSource>,
}

#[derive(Clone, Debug, Deserialize, PartialEq)]
#[serde(tag = "type")]
pub enum VCEKSource {
OfflineStore { path: Option<String> },
KDS { base_url: Option<String> },
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are rather generic configs that could serve other TEEs too (e.g., TDX we could map "enable cache", "collateral url" to what TDX uses). The question is do we want to keep the config snp-verifier specific for now or try to have a global config entry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in the future the same struct definitions can be used within the implementation-specific configs, so they can reject unsupported options as needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe can move the VCEK enums to deps/verifier/src/lib.rs

Copy link
Contributor

@mkulke mkulke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for considering the suggestions, IMO it's fine to merge the change as is, we can internally refine the solution later to use a storage backend and consolidate it with other certs stores and collateral caches without breaking the config/user surface.

Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. A few small notes.

//{
// "type": "KDS",
//}
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is now a default option, so maybe this comment should change a little bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked wording about it - trying to convey the ordering rules (add KDS if you want a fallback, otherwise KDS will never be attempted).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless nothing is specified, then the KDS will be used, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

const KDS_VCEK: &str = "/vcek/v1";

// KDS Offline Store
const KDS_OFFLINE_STORE_PATH: &str = "/opt/confidential-containers";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure this is the best path. maybe something a little more specific would be good. like opt/confidential-containers/attestation-service/kds-store. Also note that in other places we use a shared work-dir variable to keep all the paths together. We should use that here too. Also potentially could make this configurable but not really necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can update the default to /opt/confidential-containers/attestation-service/kds-store. About using work_dir, do you mean accessing the attestation config's work_dir and using it as the base path for kds-store?

cat kbs/config/as-config.json
{
    "work_dir": "/opt/confidential-containers/attestation-service",
    "policy_engine": "opa",
    ....
    "verifier_config": {
        "snp_verifier": {
            	"vcek_sources": [
				{
            	    "type": "OfflineStore"
				}
            ]
        }
    }
}

I would maybe have to pass it into the snp_verifier when creating it 🤔

pub const DEFAULT_WORK_DIR: &str = "/opt/confidential-containers/attestation-service";
#[derive(Clone, Debug, Deserialize, PartialEq)]
pub struct Config {
/// The location for Attestation Service to store data.
#[serde(default = "default_work_dir")]
pub work_dir: PathBuf,

The OfflineStore type does allow a configurable path rn, which I think we'd expect to be an absolute path that doesn't necessarily point to something in the work_dir:

pub enum VCEKSource {
    OfflineStore { path: Option<String> },
    KDS { base_url: Option<String> },
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess relatively easy way to handle is to import the default work dir variable into this file and concat a path to the end of that. Not perfect but that's what we do in some other places. It's fine for the configurable path to be absolute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also does it make sense for kds-store to be under attestation-service 🤔
Full path is now:

/opt/confidential-containers/attestation-service/kds-store/vcek/{hwid}/vcek.crt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the verifiers are generally part of the attestation service. Although they're in a separate crate so you probably won't actually be able to import from the AS (without causing a circular dependency).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah having trouble importing. I remember someone had linked this: #992 that could help in the future. Perhaps I can just hardcode it right now until something like that is implemented? I see that's being done in some other RVPS code:

const FILE_PATH: &str = "/opt/confidential-containers/attestation-service/reference_values";

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not ideal but not worth fixing in this PR I guess.

@amd-aliem amd-aliem force-pushed the amd-offline-cache branch 3 times, most recently from f679424 to fc10dcf Compare January 9, 2026 18:37
Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks @amd-aliem

@amd-aliem
Copy link
Contributor Author

I have a follow-up PR planned for detailing how to handle FW updates (additional section in doc + code to support multiple loaded certs per server), but yeah I'm good to merge if there are no outstanding comments - thanks for the reviews everyone! Really appreciate the patience & time spent :)

@amd-aliem amd-aliem marked this pull request as draft January 12, 2026 05:16
@amd-aliem amd-aliem marked this pull request as ready for review January 12, 2026 06:40
Add the verifier config for SNP to parse the vcek_sources option. Implement a
filesystem based cache option that enables specifying a directory that may be
preloaded with certs for offline trustee deployments.

Signed-off-by: Amanda Liem <[email protected]>

Update deps/verifier/src/snp/mod.rs
Apply suggestion from @mkulke

Co-authored-by: Magnus Kulke <[email protected]>
Signed-off-by: Amanda Liem <[email protected]>
@fitzthum fitzthum merged commit df6d2d6 into confidential-containers:main Jan 12, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants