-
Notifications
You must be signed in to change notification settings - Fork 76
Add the responsible program's account index and inner instruction index to each InstructionError
#74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This is a bit of a spitball pr. Things I'm thinking about:
|
270d3c8
to
d7fbd88
Compare
transaction-error/src/lib.rs
Outdated
/// third element indicates the address of the program that raised the error, if applicable; the | ||
/// error could after all have been raised during a cross-program invocation (ie. in an inner | ||
/// instruction). | ||
InstructionError(u8, InstructionError, Option<Pubkey>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won't have a program address in cases where the error is InstructionError::UnsupportedProgramId
, so this is made an Option
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate what cases UnsupportedProgramId
is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for sure. It's thrown, for instance, here:
ie. if you tried to issue an instruction without providing the address of a program.
The larger point here is that there exist InstructionErrors
that are more like ‘there was a structural problem with this instruction’ rather than ‘a program died and here's why.’ The latter has a program address associated with it while the former may not.
btw this is a breaking change, but we're already due a bump to 3.0 because of #46 |
transaction-error/src/lib.rs
Outdated
/// third element indicates the address of the program that raised the error, if applicable; the | ||
/// error could after all have been raised during a cross-program invocation (ie. in an inner | ||
/// instruction). | ||
InstructionError(u8, InstructionError, Option<Pubkey>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding Option<Pubkey>
here means TransactionError
is now 64 bytes instead of 32 bytes. I'm not sure how often we move/clone these on the agave side, but this could have some small perf implications. I'm aware of at least a few places we clone these values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ this is not a comment on why we shouldn't do this - just a note that we should be aware of this size change, and be more aware of where we're cloning these in our processing pipeline so we can avoid performance hits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels bad. I should probably endeavour to store just the index of the program account in the accounts array, and then make the downstream thing that really wants to know what the program was look it up.
I'll rifle through the Agave code and see if that's tractable (ie. the instruction will need to be available wherever something needs know what program address was involved).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the PR! This is now implemented as a u8 that points to the transaction-level account index of the program responsible for the error.
So, here's what I'm thinking. There are three ways to approach this. Option 1 – Add program address to
|
So good news, I went through all of the To go with that, the I'm down to go with the option 1, but using a It'll be a bit of a slog to get through all the changes, but it should be mostly straightforward. Happy to hear other ideas though! [1] if I'm incorrect, then we should fix the usage of that particular error variant rather than torpedo the design |
I started then scrapped a PR that did option 1 and I seem to remember it being really hard to obtain the program address in places. Here's I can try again, but a few on spec:
The big problem is here, because it takes in a dynamic error after long having forgotten what program is responsible for the error: (link) So basically you have to follow |
That one looks at the owner of the first program account in the transaction context, so we could probably use that, right? https://github.com/anza-xyz/agave/blob/ccbf3f25f332cf4bfa4f1a9bf27db8d3333b3064/program-runtime/src/invoke_context.rs#L523
If they do, then that's an issue with their usage, which should be fixable.
That plumbing does look a bit involved, but should end up similar to any big refactor. |
@joncinque @steveluscher Hey guys, sorry to come in here late! Another major issue I see with obtaining the program ID for an inner instruction from the transaction accounts is that CPI callees will eventually no longer be in that list at all. If a program decides to hard-code Rather than plumbing callee IDs all the way from SVM, what about just enabling inner instructions recording on Bank all of the time, and walking that payload to grab the program ID? The array of inner instructions comes back empty when there's an error, but we can just change that behavior. |
Oh, I like this. Also, we wouldn't have to enable CPI logging all the time; I think we could lazily walk the Something like… let status = status.map(|_| ()).map_err(|e| {
// Walk the `TransactionContext` to figure out which program was responsible for `e`.
}); |
d7fbd88
to
f47932c
Compare
f47932c
to
4d9c1e8
Compare
transaction-error/src/lib.rs
Outdated
/// the index of the outer instruction in which the error occurred, and the third the account | ||
/// index of the program responsible for the error (ie. the error may have originated from an | ||
/// inner instruction). The account index of the responsible program may be `None` for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this approach actually clears up much ambiguity. There could be multiple inner instructions that invoke the same program. In order to pinpoint the failing inner instruction you would still need to rely on the inner ix metadata that we already provide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Though the original goal was to be able to correlate what program Custom(###)
pertains to so that you can properly decode and handle ###
, we could also add the path to the actual inner instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I don't really follow your response. Is the "original goal" still the current goal? When you say "we could also add the path to the actual inner instruction" is that rhetorical? If this is a "good catch," does this warrant any change with your current approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct to say that you're just trying to map a custom program error code to the actual program that returned it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're just mapping custom error codes, do we need to add the tx account index for the failing program to all instruction error variants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're just mapping custom error codes…
We're not. That was the original impetus for this change, but the exploration led to the insight that most InstructionErrors
would benefit from knowing what program caused them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is a "good catch," does this warrant any change with your current approach?
Yep! It warrants me adding the inner instruction index so that you can – for instance – tell that a InsufficientFundsForRent
came from instruction 1.4 as opposed to 1.2, even though 1.2 and 1.4 have the same program address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
821245c
to
d2fd373
Compare
I think I'm a bit more up to speed on the evolution of the problem we're trying to solve here. I think it's a very good idea to add an indication that a given instruction error was actually caused by an inner instruction if that was the case. I'm not super excited about making a breaking change to the Also, I think we're now trying to solve more than just the problem of knowing whether an error came from an inner instruction. We're trying to know which inner program produced the error and also where in the call tree the error was produced (partly my fault we went down that path due to this thread: #74 (comment)) I think that it's ok to force users to debug errors by first fetching tx metadata for now. They need metadata to resolve ALT's, to have full context of the call tree leading to the failure, as well as any pertinent logs. I think later we could separately add a better way to get rich error context from the SVM which includes the full call stack in errors so that local development doesn't require metadata fetching. For differentiating top level vs inner instruction errors, I think I'm most in favor of option 3 where we add a new transaction error variant. As discussed in the PR, we can just use the program account index instead of including the full pubkey, so something like: enum TransactionError {
..
InnerInstructionError(InstructionError),
} And then have the user figure out where in the call tree this happened from the tx metadata. I acknowledge this might be too minimalist, adding top level tx index and the account index of the program invoked in the inner instruction could be nice too. It might also be worth exploring whether adding a new enum InstructionError {
..
InnerInstructionFailure(InnerInstructionError)
}
enum InnerInstructionError {
// subset of InstructionError variants?
} |
transaction-error/src/lib.rs
Outdated
/// (ie. the error may have originated from an inner instruction). The inner instruction index | ||
/// may be `None` if the error originated from the top-level program call. The account index of | ||
/// the responsible program may be `None` for transactions created before it was introduced. | ||
InstructionError(u8, InstructionError, Option<u8>, Option<u8>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to make a backwards incompatible change, might as well start naming these fields. Pretty easy to misuse this tuple as proposed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I had to look back at this PR a few times while reviewing the other one to remember which field was which
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not end up doing this, because we'd have to write a custom serde
serializer to serialize the new struct variant with named fields in the old format the RPC requires.
What I did do, however, is to make type aliases, which gives you a bit of guidance when you're constructing one of these things.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not end up doing this, because we'd have to write a custom
serde
serializer to serialize the new struct variant with named fields in the old format the RPC requires.
this shouldn't be too bad to do, just a bit verbose.
transaction-error/src/lib.rs
Outdated
/// An error occurred while processing an instruction. The first element of the tuple indicates | ||
/// the index of the outer instruction in which the error occurred, the third the index of the | ||
/// inner instruction if the program responsible for the error was called by cross-program | ||
/// invocation (CPI), and the fourth the account index of the program responsible for the error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The program invoked by the inner instruction might be loaded from an ALT so downstream users may still need to fetch metadata to get the offending program id fyi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh. This really sucks. I'll have to think about this for a hot second. Ideally we'd just store the program address, but that would increase the size of stored errors quite a bit, which we should try to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were just dealing in indices all over the place? Why do we need the ALT if we know the index? Are you guys talking about the eventual end-user on the client side having to resolve a program ID from the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah when I said downstream users I meant the the client side end user experience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of comments, looks good otherwise
transaction-error/src/lib.rs
Outdated
/// (ie. the error may have originated from an inner instruction). The inner instruction index | ||
/// may be `None` if the error originated from the top-level program call. The account index of | ||
/// the responsible program may be `None` for transactions created before it was introduced. | ||
InstructionError(u8, InstructionError, Option<u8>, Option<u8>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I had to look back at this PR a few times while reviewing the other one to remember which field was which
transaction-error/src/lib.rs
Outdated
// NOTE: We intentionally do not augment the error message in the event that the error | ||
// carries the index of the inner instruction or the account index of the responsible | ||
// program. While it would add value to the log, to do so now would also break any log | ||
// parser that presumed the log format to be immutable for all time (eg. | ||
// https://tinyurl.com/3uuczr68). | ||
=> write!(f, "Error processing Instruction {index}: {err}"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's fine for now, people can also use the Debug
formatting to get every piece of data.
We could consider creating a new enum variant, but then the UMI parser would be totally useless, which is even worse in my opinion.
d2fd373
to
4e4a04a
Compare
InstructionError
InstructionError
Thanks for the detailed thoughts, @jstarry.
tx metadata is, unfortunately, insufficient. Consider these three transactions.
The inner instruction trace will look the same in all of these cases, making it impossible to discern where the error actually came from.
Guh, this sucks. In some cases you'll have the ALT information locally (eg. you used
This is underway in https://github.com/anza-xyz/agave/pull/6083/commits |
Ah yeah thanks for pointing that out, definitely insufficient to just look at metadata as it is right now, you would need to know the stack height of the last instruction which was running. You could probably parse the logs for this but that's not a great solution. I guess my suggestion with going with option 3 that you listed earlier would be contingent on adding the stack height to the new error variant (something like |
transaction-error/src/lib.rs
Outdated
InstructionError( | ||
OuterInstructionIndex, | ||
InstructionError, | ||
Option<ResponsibleProgramAccountIndex>, | ||
Option<InnerInstructionIndex>, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aliasing helps read this file, but anywhere you access these values in the runtime it's still going to be a tuple-index access, which means maintainers would have to go back to this file to double-check which index is what.
Personally I would rather see a struct here.
transaction-error/src/lib.rs
Outdated
// NOTE: We intentionally do not augment the error message in the event that the error | ||
// carries the account index of the responsible program. While it would add value to | ||
// the log, but to do so at this point would also break any log parser that presumes a | ||
// stable log format (eg. https://tinyurl.com/3uuczr68). | ||
=> write!(f, "Error processing Instruction {idx}: {err}"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! I'm pretty sure if we did change this, it would break consensus.
4e4a04a
to
ec46e8b
Compare
ec46e8b
to
6a11e9b
Compare
…tructionError` and the inner instruction index that points to its location in the outer instruction This will help app developers correlate an error apparent to the program from which it originated in cases where the instruction index alone is insufficient to do so (eg. when the program that caused the error is in an inner instruction / CPI) Addresses: anza-xyz/agave#5152
6a11e9b
to
7040146
Compare
OK, this puppy is ready to land, and is the prerequisite/companion to anza-xyz/agave#6083.
|
Problem
Consider a transaction error that originates from a cross-program invocation (ie. an inner instruction). Currently
TransactionError
returns you the index of the outer instruction, which in no way helps you to correlate the content of the error message with the actual program from whence it came.Summary of changes
Added the transaction-level account index of the erroring program to the
TransactionError::InstructionError
variant, which will help consumers correlate instruction errors with their actual source.The account index of the program is designed to be from the perspective of the transaction and not the perspective of the instruction to make this change safe from SIMD-163.
Addresses anza-xyz/agave#5152 and anza-xyz/kit#149.
Blocks anza-xyz/agave#6083.