Moves nodes to private subnets. #3004

dcmcand · 2025-03-28T14:06:48Z

Reference Issues or PRs

What does this implement/fix?

Moves nodes to private subnets and removes the autoassign public IP option.

Currently our nodes are placed in public subnets with a public ip assigned by default. This is a security vulnerability that gives us no benefit whatsoever. The new setup places all nodes in a private subnet while keeping load balancers in public subnets. This will still allow public access to nebari, but you will not be able to access the nodes themselves over the public internet anymore.

The following illustration is from the AWS documentation (https://docs.aws.amazon.com/eks/latest/best-practices/subnets.html) and shows the new setup. Note that this is the recommended setup for EKS on AWS.

Put a x in the boxes that apply

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds a feature)
Breaking change (fix or feature that would cause existing features not to work as expected)
Documentation Update
Code style update (formatting, renaming)
Refactoring (no functional changes, no API changes)
Build related changes
Other (please describe):

Testing

Did you test the pull request locally?
Did you add new tests?

How to test this PR?

Deploy Nebari to AWS, in the console validate that the nodes are located in Private subnets, then go through the testing checklist to validate all functionality is unchanged.

Any other comments?

NOTE This will likely result in issues with the general node not restarting if it ends up in a different AZ from it's EBS volume. This is a known issue and needs addressed by changing our storage setup.

marcelovilla · 2025-03-31T16:39:48Z

NOTE This will likely result in issues with the general node not restarting if it ends up in a different AZ from it's EBS volume. This is a known issue and needs addressed by changing our storage setup.

This issue is outlined in #3008. @dcmcand, @viniciusdc, and I had a discussion regarding this limitation and decided that we'll try to first address #3008 before merging this PR.

dcmcand · 2025-04-01T11:14:54Z

Do not merge until #3008 is fixed as this will cause difficulties with upgrades.

satra · 2025-07-04T18:12:46Z

src/_nebari/stages/infrastructure/template/aws/modules/network/variables.tf

  description = "VPC cidr number of bits to support 2^N subnets"
  type        = number
-  default     = 2
+  default     = 2 # allows 4 /18 subnets with 16382 addresses each


Suggested change

default = 2 # allows 4 /18 subnets with 16382 addresses each

default = 3 # allows 8 /18 subnets with 16382 addresses each

needed this for my use case with 3 subnets specified

src/_nebari/stages/infrastructure/template/aws/modules/network/main.tf

…/main.tf Co-authored-by: Austin Macdonald <[email protected]>

viniciusdc · 2025-10-02T15:14:53Z

We need the upgrade path on the next release -- (follow-up release), this is the last remaining bit to get this going

@satra

Annotation for our "manual patch queue": commit done manually based on @satra's comment at nebari-dev#3004 (comment)

asmacdo · 2025-10-06T20:00:19Z

I dont think I can help much with providing an update path but I did want to provide my feedback on using this for a while in production. We eventually dropped this change from our deployment due to high cost of the NAT Gateway usage to move data around.

When we moved back from private to public subnets, our upgrade path was a little bit awkward. Its the "reverse" of what an upgrade path for this one, so just in case its helpful, heres how I managed to move from private->public subnets.

nebari hangs on the deletion of a public subnet, so I manually deleted the Elastic Load Balancer and nebari proceeded
similarly, nebari tries to delete private subnets, but these are dependent on the EKS cluster which also had to be manually terminated
the first run of the deployment eventually fails with a 404 for GET /auth/admin/realms/nebari/default-groups (IIUC this is caused by the removal of the EKS cluster invalidating the keycloak state but tofu doesnt know that
in the keycloak configuration stage, tofu state rm keycloak_default_groups.default then redeploy

This upgrade path does lose state though, I had to restore keycloak and conda-store state from backups.

add private subnets

ae94451

github-project-automation bot added this to 🪴 Nebari Project Management Mar 28, 2025

github-project-automation bot moved this to New 🚦 in 🪴 Nebari Project Management Mar 28, 2025

dcmcand added provider: AWS impact: high 🟥 This issue affects most of the nebari users or is a critical issue area: security 🔐 area: networking All items related to networking labels Mar 28, 2025

dcmcand moved this from New 🚦 to In review/QA 👀 in 🪴 Nebari Project Management Mar 28, 2025

dcmcand added 2 commits March 28, 2025 17:15

add moved blocks to keep from recreating subnets

c5cbd84

fix tests

d1f3595

dcmcand added needs: review 👀 This PR is complete and ready for reviewing DO-NOT-MERGE status: in review 👀 This PR is currently being reviewed by the team labels Apr 1, 2025

dcmcand marked this pull request as ready for review April 1, 2025 11:14

dcmcand requested a review from a team as a code owner April 1, 2025 11:14

dcmcand requested review from marcelovilla and viniciusdc and removed request for a team April 1, 2025 11:14

satra reviewed Jul 4, 2025

View reviewed changes

satra mentioned this pull request Jul 5, 2025

404 on reaching self-registration page nebari-dev/nebari-self-registration#25

Open

asmacdo reviewed Jul 18, 2025

View reviewed changes

src/_nebari/stages/infrastructure/template/aws/modules/network/main.tf Outdated Show resolved Hide resolved

asmacdo mentioned this pull request Jul 31, 2025

Dandihub requirements: meta issue #3107

Open

asmacdo mentioned this pull request Aug 12, 2025

Set eks_endpoint_access to immutable on upgrade to allow merge of #3004 #3114

Open

Update src/_nebari/stages/infrastructure/template/aws/modules/network…

bc95660

…/main.tf Co-authored-by: Austin Macdonald <[email protected]>

yarikoptic pushed a commit to asmacdo/nebari that referenced this pull request Oct 2, 2025

fix: allow for more than 4 subnets

e3597ed

Annotation for our "manual patch queue": commit done manually based on @satra's comment at nebari-dev#3004 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Moves nodes to private subnets. #3004

Moves nodes to private subnets. #3004

Uh oh!

dcmcand commented Mar 28, 2025

Uh oh!

marcelovilla commented Mar 31, 2025

Uh oh!

dcmcand commented Apr 1, 2025

Uh oh!

satra Jul 4, 2025

Uh oh!

Uh oh!

viniciusdc commented Oct 2, 2025

Uh oh!

asmacdo commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	default = 2 # allows 4 /18 subnets with 16382 addresses each
	default = 3 # allows 8 /18 subnets with 16382 addresses each

Moves nodes to private subnets. #3004

Are you sure you want to change the base?

Moves nodes to private subnets. #3004

Uh oh!

Conversation

dcmcand commented Mar 28, 2025

Reference Issues or PRs

What does this implement/fix?

Testing

How to test this PR?

Any other comments?

Uh oh!

marcelovilla commented Mar 31, 2025

Uh oh!

dcmcand commented Apr 1, 2025

Uh oh!

satra Jul 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

viniciusdc commented Oct 2, 2025

Uh oh!

asmacdo commented Oct 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants