-
Notifications
You must be signed in to change notification settings - Fork 866
Add DVT "selections" endpoints to VC #4248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey guys, love this in depth issue and happy to add a bit of colour in a few places. The only major point to flag is:
The full, combined signature, not the partial, should be used as the selection_proof. Other than that, I would like to give a bit of history of the original spec, and why I think an easily replaceable, read-only middleware based approach is both safer and better for Ethereum's neutrality.
So this is true, and I acknowledge there are others in the space that would like to muddy the waters around whether something is canonical or not, but I would like to add a bit of color as to where this design came from, and why I think it is important for client teams, and not adversely supportive of any one DV implementation over others. Collin and Mara have been working with the EF on DVs since pre devcon 5. They formed the original trustless validators working group, and later when Obol was started, we co-funded a grant with the EF to develop the original spec with Consensys. The main issue with it, in my eyes, was we made the assumption that changing the base protocol was unviable/out of scope, and that excluded a middleware approach, as it was not possible for a middleware without private key signing power to determine if it was an aggregator. I effectively was uncomfortable with this restriction, as it would have meant that Obol either 1) needed to build a custom validator client, that we would have to convince operators was superior to existing VCs; or 2) we would have to convince every client team to support a single DV protocol, and that we would have to get the protocol near perfect first try, or face a very slow iteration cycle if we have to get full client team consensus for every breaking change. To expand on 1) I was reticent to go down the path of making a custom VC. I think the separation of concerns between coming to consensus on what to sign (DVT) and the code that checks slashing conditions and controls the private keys is important. We saw from other DV teams that did not think this was important resulted in slashing 20+ testnet validators due to a bug in their VC, but worse if their supply chain was compromised and malicious code was in charge, every validator could be slashed, not just a subset. Designing your architecture assuming you will be compromised is important, and that's why I was so insistent on a read-only middleware, which if compromised, will pose a liveness risk not a safety risk to its validators. (And those liveness risks we intend to further mitigate with multiple interoperable middleware implementations.) This decision I believe has made Charon and Obol more trustable for node operators to consider running versus a version requiring private key access. And to expand on 2), this middleware based approach to distributed validators gives new optionality to client teams, if Obol becomes a bad actor, or stops maintaining its client, or simply gets out-competed, everyone can swap in a new/faster/better middleware without any significant changes to their codebase. If we went the route of a canonical single spec all client teams build into their VCs directly, a new, better protocol might never get enough support/adoption. Or if it does we might fragment the client teams into supporting different variations and end up in an xkcd standards problem. I raised this to Danny and Ben E last June, and asked if they saw this problem space the same way I did, and whether they had any ideas for how we could minimally change the base spec to enable middleware DVs. They were both very supportive of enabling DVs to run as replaceable middlewares, and with their help, we first made this proposal to ethresear.ch, refined it, proved it works by building it ourselves with our own (unreleased, dev-only) VC, and ultimately got it approved and included in the spec. (59:40 for Danny calling this out explicitly) So to conclude, this was longer than I intended, but I thought it would help provide color on how and why we ended up enabling middleware DVs, and that they are a good idea and this is not overly favouring of a single team/solution. :) |
Thanks for this! I fixed this error in the issue description 🙏 |
Personally I'm generally not a fan of middleware for two reasons:
Originally mev-boost was going to be a BN/EL middleware, but we successfully avoided that by making it a "sidecar" instead. This protocol has many different properties though, so I don't think a straight analogue between the two is fair or useful. However, just because I generally don't like middleware it doesn't mean that it's always bad. It could very well be a good solution in this case, especially considering how long it might take to specify and implement an alternative. Although I don't love middleware, I'm still of the mind that this change is small and safe enough to implement. @OisinKyne if there's going to be several implementations of these endpoints in VCs, it might be worth providing some sort of "mock" middleware that client teams can use for testing. If client teams don't need to implemented the server side of these endpoints, then each team is going to have to build it's own type of testing/mocking to test them. Deduplicating work is always nice, plus it might give you some sort of conformance testing framework about which VCs you can recommend. |
Hi everyone, I wanted to jump in the discussion to give my 2 Satoshi Me and my team developed the first proof-of-concept DVT as middleware in 2020 together with the EF, same one that later on @OisinKyne joined in testing in it's second iteration after we've invite outsiders to test it out :). Full disclosure - I'm not against this PR, competition is good for DVT. Having said that I do feel the backstory of why the remote signer approach was adopted is important. Also, the framing of this being a "need" for DVT is a bit misleading as it's unclear it is. For now, this is an Obol specific PR for Charon which currently has a commercial use license. Even if others would want to use the middleware approach they will develop it from scratch (as they can't use the Charon code) which might result in a whole different API requirements (and additional PRs asking clients for changes). Middleware was abandoned in favor of a remote signer approach for several reasons:
Thinking long term here is crucial as DVT gets into its mainnet phase. |
Feedback received, thanks Paul! We will see what we can do on this front, potentially we can PR something into Hive in the near term. And to refute some points and address some inaccuracies in Alon's statement about 'abandoning' a middleware because I can't not:
This model is running just fine on mainnet and at scale on testnet. Of course every team can tweak and optimize some performance parameters, but I would like to hear you expand on what other unilateral change is needed, because I do not believe there is one.
Yes that is the point of separating concerns into coordination software and signing software. Charon is the coordination software and does not need to be near the private keys.
There are more validator implementations out there than remote signers. Also I believe all validator clients support one or more remote signers, this is a false either or. Also of the three main remote signer implementations out there, one recently caused an outage, and another caused a slashing.
Here are some reasons its valuable:
VC's and BN's communicate over a succint and standardised API. There are three+ implementations of key manager API interfaces.
Ethereum validators have been intended to be MPC friendly since the earliest of serenity designs, with work from V, Dankrad, Justin Drake, Carl Beek and more, on how to design a validator interface that is simple and MPC-compatible. This is not a foot in the door, this is "hey we've gone and built what you planned for and the only snag is this piece left, can you help us finish it?".
Yes this can happen, but as above, the plan since adopting BLS signatures has been for Ethereum validators to be MPC-friendly and communicate over a simple API. This also happens at the remote signer layer, they have to pass pre-signatures amongst themselves, to assemble an aggregate, to conclude what to sign next. Pre-signatures present complexity at any layer, and determinism and avoiding multi-round complexity where possible leads to simpler designs for client teams of all types to implement.
Adopting simple, standard APIs; separating 'what to sign' from 'power to sign it', and favoring optionality instead of replacement software is how we stay out of an engineering nightmare in future. An SSV client extracts Lighthouse from the most important piece of the entire puzzle, and puts an internet-connected piece of software in there instead. If the 'distributed' part of 'distributed validators' is not something that every validator can easily opt into (by using the api they already support), then we will either get stuck on unanimous support of a single protocol that cannot mature/iterate, or we fragment into different VCs speaking different protocols. |
Still claiming this approach is overly complicated, creates dependency between 5 implementation teams and 1 DVT team, will require more and more maintenance and changes down the line (for a commercially license project) and really doesn't help with the above points since it "tells" the VC what it wants and it can cause issues if it grows. |
I agree with all the points mentioned here. As this issue is relatively old and there's no clear path forward for the general case. For some immediate changes, I've made #4867. |
Overview
In ethereum/beacon-APIs#224, there were two new endpoints added to the Beacon APIs:
/eth/v1/validator/beacon_committee_selections
/eth/v1/validator/sync_committee_selections
These two endpoints are not expected to be implemented by client BNs. Rather, these endpoints expected to be implemented by "DVT (distributed validator technology) middleware" that's sitting between the BN and VC. Whilst no one is asking us to implement this for the LH BN, Obol are asking that we implement it in the LH VC.
For clarity, here's a simple illustration:
These endpoints solve the issue of aggregation selection for attestations and sync aggregates for distributed validators. As you know, a validator is selected to be an aggregated based on the output of their signature of the current slot. The issue with DVT is that distributed validators only have a cryptographic partial of the full keypair that is know to the Beacon Chain. Unfortunately, DVT validators cannot independently sign with the full keypair (only the partial keypair) and therefore cannot determine if their validator is an aggregator or not.
The new endpoints solve this by giving the VC a new endpoint to call which will resolve its partial signature over the slot into a full signature over the slot.
There's more info in this Google doc from Obol.
Implementation
To implement this in Lighthouse, we'd have to make some modifications to the VC. Firstly, we'd need a flag to enable/disable this behavior (perhaps something like
--use-dvt-selections-endpoints
or a more concise alternative).Then, we'd need to add additional logic when we're determining aggregate duties to ensure that:
selection_proof
field.Anyone implementing this should give the Obol Google doc a read. Obol has clearly put effort into thinking about what the implementation looks like for client teams.
Additional Considerations
As far as I'm aware, this functionality is exclusive to Obol. I'm aware that we need to be careful about favoring any particular entity. However I'm also aware that sometimes a single entity will "pave roads" and I don't think we want to get in the way of that.
In the Obol Google doc there are two suggestions that I'd personally avoid implementing:
--distributed
flag to enable/disable the selections endpoints. Although it's deliciously simple, I fear it is too generic and doesn't give room for competing systems./eth/v1/node/version
output. I'd be open to a more generic method of auto-identification, however I'm not super keen on maintaining a special list of softwares that are allowed to have auto-enabling.I've seen a comment that this solution goes against the EFs "latest DVT solution". I've also heard similar sentiment from SSV Network, a competitor of Obol who have been pioneering in this space for quite some time. That being said, it looks like the
ethereum/distributed-validator-specs
haven't had a change to the primary branch since October 2022 (7 months ago). Given that (a) the EF DVT solution smells a little stale and (b) adding the selections endpoints seems safe enough and doesn't exclude an EF/coordinated DVT solution, I'm tempted to give little weight to the argument.I'm going to share this issue with a few people so that we can get some feedback. Given that DVT generally aligns with our ethos, I think there's a fairly strong case as to why we should implement this. However, I'm particularly interested in the arguments as to why we should not implement this.
The text was updated successfully, but these errors were encountered: