Skip to content

Conversation

@portersrc
Copy link
Member

@portersrc portersrc commented Oct 23, 2024

RFC issue is here

This PR holds the main guest functionality for overlay network support with Nebula. The main additions are in confidential-data-hub/overlay-network.

Related PRs:

Additional support:

  • possible kata agent support (no PR opened for this yet) (passes the pod name to the CDH when initializing the VPN)

@portersrc portersrc force-pushed the feature/encrypted-mesh branch 3 times, most recently from 00e1f81 to 6fb1d22 Compare October 28, 2024 19:59
@portersrc portersrc force-pushed the feature/encrypted-mesh branch 9 times, most recently from 60d552e to 8cdf322 Compare November 11, 2024 14:25
@portersrc portersrc force-pushed the feature/encrypted-mesh branch 4 times, most recently from 8062c00 to 2163ae5 Compare November 18, 2024 15:00
@portersrc portersrc marked this pull request as ready for review November 18, 2024 15:41
@portersrc portersrc requested a review from a team as a code owner November 18, 2024 15:41
Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on the first half of the PR

@portersrc portersrc force-pushed the feature/encrypted-mesh branch 4 times, most recently from 0f1fdb4 to f425b00 Compare January 8, 2025 06:40
@portersrc portersrc force-pushed the feature/encrypted-mesh branch from f425b00 to 39ee471 Compare February 3, 2025 08:14
@portersrc
Copy link
Member Author

portersrc commented Feb 3, 2025

Do we want a compile-time feature flag for the overlay-network AND a CdhConfig setting to enable it? We should definitely have the config setting. I'm wondering if the compile-time feature confuses the matter (though it has its merits).

@fitzthum
Copy link
Member

fitzthum commented Feb 3, 2025

Do we want a compile-time feature flag for the overlay-network AND a CdhConfig setting to enable it? We should definitely have the config setting. I'm wondering if the compile-time feature confuses the matter (though it has its merits).

We probably should have a feature so the code can be completely removed for trypophobes (or people who just want a really small guest tcb).

@portersrc portersrc force-pushed the feature/encrypted-mesh branch from 39ee471 to 15daed7 Compare February 4, 2025 00:38
@portersrc portersrc force-pushed the feature/encrypted-mesh branch 2 times, most recently from eae3ed2 to 10013de Compare February 28, 2025 21:17
@portersrc portersrc force-pushed the feature/encrypted-mesh branch from 10013de to b576ee3 Compare February 28, 2025 23:08
@portersrc portersrc force-pushed the feature/encrypted-mesh branch from b576ee3 to 6c9dca2 Compare March 11, 2025 19:20
@portersrc portersrc force-pushed the feature/encrypted-mesh branch from 6c9dca2 to 53844ac Compare March 19, 2025 18:48
@portersrc portersrc force-pushed the feature/encrypted-mesh branch 6 times, most recently from 29ff8bd to 8ed80ac Compare April 11, 2025 20:17
@portersrc portersrc force-pushed the feature/encrypted-mesh branch 3 times, most recently from 3b925a1 to dc3ce4e Compare May 12, 2025 12:21
@portersrc portersrc force-pushed the feature/encrypted-mesh branch from dc3ce4e to f9f33de Compare May 22, 2025 08:54
pub image: ImageConfig,

#[serde(default)]
pub overlay_network: OverlayNetworkConfig,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have this as pub overlay_network: Some(OverlayNetworkConfig) and the OverlayNetworkConfig struct as below, I think you would not need the enable flag as rust would require nebula to be defined at compilation time. Later, if someone adds support for another overlay network we could have them under cargo feature flags.

pub struct OverlayNetworkConfig {
    pub nebula: NebulaConfig,
}

With this I think you should be able to clean up multiple other places in this PR. E.g. you may not need the validate function and the overlay_network cargo feature flag.


const NEBULA_BIN: &str = "/opt/overlay-network/nebula";
const LIGHTHOUSE_PORT: u32 = 4242; // sync with WORKER_CONFIG_TEMPLATE
const CA_CERT_PATH: &str = "/tmp/nebula/ca.crt";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if CDH is multi-threaded or if we will have support for multiple nebula VPNs, but if so it might be better to use tempdir_in

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we handle this if/when we try to tackle multi-VPN?

let mesh_ip = self.generate_mesh_ip()?;
let overlay_netmask: Ipv4Addr = self.config.overlay_netmask.parse()?;
let prefix_len: u32 = self.netmask_to_prefix_len(overlay_netmask);
let neb_cred_uri: String = format!(
Copy link
Member

@cclaudio cclaudio Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IP and pod-name are required, but it also have some nice optional fields. I would be great if we can support them as well
https://github.com/confidential-containers/trustee/blob/main/kbs/src/plugins/implementations/nebula_ca.rs#L55

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes me thinkg if we should add the nebula version somewhere to be able to check for compatibility in the future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember you added these fields. I'm seeing:

    /// Optional: how long the cert should be valid for.
    /// The default is 1 second before the signing cert expires.
    /// Valid time units are seconds: "s", minutes: "m", hours: "h".
    duration: Option<String>,
    /// Optional: comma separated list of groups.
    groups: Option<String>,
    /// Optional: comma separated list of ipv4 address and network in CIDR notation.
    /// Subnets this cert can serve for
    subnets: Option<String>,

I'm wondering now if duration should even be something that the worker node requests. Should it instead be set up on the trustee side? For groups and subnets, I'm tempted to punt to a future PR. For example, I'm not clear on the constraints for the subnets option (and I haven't tested either of these features yet).

Copy link
Member Author

@portersrc portersrc Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes me think if we should add the nebula version somewhere to be able to check for compatibility in the future

I also wonder if nebula does this as part of its own protocol.

message InitOverlayNetworkResponse {
int32 return_code = 1;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to enable this entirely via the cdh config / init-data rather than through the kata agent. You may still want to get the pod name from the agent, though.

Copy link
Member Author

@portersrc portersrc Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the thing that should actually kick the initialization of the overlay network now, though? The CDH is a driver that registers services to listen for requests. For example, it might receive a request to pull an image. Or, in the case of the overlay network, it would receive a request from the kata-agent to start. If kata-agent isn't doing that, what should? In other words, after we register the service here, who actually makes this API call?

@portersrc portersrc force-pushed the feature/encrypted-mesh branch from f9f33de to 2c8bb5f Compare June 24, 2025 02:28
@portersrc portersrc force-pushed the feature/encrypted-mesh branch from 2c8bb5f to c2550fb Compare June 24, 2025 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants