Add the responsible program's account index and inner instruction index to each `InstructionError` #74

steveluscher · 2025-03-06T21:44:36Z

Problem

Consider a transaction error that originates from a cross-program invocation (ie. an inner instruction). Currently TransactionError returns you the index of the outer instruction, which in no way helps you to correlate the content of the error message with the actual program from whence it came.

Summary of changes

Added the transaction-level account index of the erroring program to the TransactionError::InstructionError variant, which will help consumers correlate instruction errors with their actual source.

The account index of the program is designed to be from the perspective of the transaction and not the perspective of the instruction to make this change safe from SIMD-163.

Addresses anza-xyz/agave#5152 and anza-xyz/kit#149.
Blocks anza-xyz/agave#6083.

steveluscher · 2025-03-06T21:47:30Z

This is a bit of a spitball pr. Things I'm thinking about:

This will require a major version bump, because Rust. It will probably require a major version bump in a lot of downstream packages too. Perhaps all software ever written.
I imagine that this will increase the storage cost of transactions by the 32 bytes of the pubkey. Maybe this is fine, but perhaps there is an alternative, like storing the index of the program in the accounts array and then reconstructing the program address later (ie. in the RPC method that loads and vends the InstructionError)?
We'll have to teach the program runtime to include the program address when it throws InstructionErrors.

steveluscher · 2025-03-06T22:42:28Z

transaction-error/src/lib.rs

+    /// third element indicates the address of the program that raised the error, if applicable; the
+    /// error could after all have been raised during a cross-program invocation (ie. in an inner
+    /// instruction).
+    InstructionError(u8, InstructionError, Option<Pubkey>),


We won't have a program address in cases where the error is InstructionError::UnsupportedProgramId, so this is made an Option.

Can you elaborate what cases UnsupportedProgramId is for?

Yeah, for sure. It's thrown, for instance, here:

https://github.com/anza-xyz/agave/blob/1bcf743f445270407bccf55681f74e06ef8b9a48/program-runtime/src/invoke_context.rs#L250-L252

ie. if you tried to issue an instruction without providing the address of a program.

The larger point here is that there exist InstructionErrors that are more like ‘there was a structural problem with this instruction’ rather than ‘a program died and here's why.’ The latter has a program address associated with it while the former may not.

kevinheavey · 2025-03-07T10:51:06Z

btw this is a breaking change, but we're already due a bump to 3.0 because of #46

apfitzge · 2025-03-07T14:20:21Z

transaction-error/src/lib.rs

+    /// third element indicates the address of the program that raised the error, if applicable; the
+    /// error could after all have been raised during a cross-program invocation (ie. in an inner
+    /// instruction).
+    InstructionError(u8, InstructionError, Option<Pubkey>),


adding Option<Pubkey> here means TransactionError is now 64 bytes instead of 32 bytes. I'm not sure how often we move/clone these on the agave side, but this could have some small perf implications. I'm aware of at least a few places we clone these values.

^ this is not a comment on why we shouldn't do this - just a note that we should be aware of this size change, and be more aware of where we're cloning these in our processing pipeline so we can avoid performance hits

Feels bad. I should probably endeavour to store just the index of the program account in the accounts array, and then make the downstream thing that really wants to know what the program was look it up.

I'll rifle through the Agave code and see if that's tractable (ie. the instruction will need to be available wherever something needs know what program address was involved).

Updated the PR! This is now implemented as a u8 that points to the transaction-level account index of the program responsible for the error.

steveluscher · 2025-03-07T22:21:11Z

So, here's what I'm thinking. There are three ways to approach this.

Option 1 – Add program address to `TransactionError::InstructionError`

- TransactionError::InstructionError(u8, InstructionError)
+ TransactionError::InstructionError(u8, InstructionError, Option<Pubkey>)

the address (or address index) will have to be Option, because not all InstructionErrors are related to a program
breaks people who construct TransactionError::InstructionError (ie. the SVM)
does not break people who construct InstructionError

Option 2 – Add program address to every `InstructionError` that is related to a program

  ArithmeticOverflow,  // Leave as is; not related to a program
- InstructionError::CallDepth,
+ InstructionError::CallDepth(Pubkey)

does not break people who construct TransactionError::InstructionError (ie. the SVM)
breaks people who construct InstructionError

Note

There still might exist some InstructionErrors that sometimes have programs related to them and other times do not. These would still, unfortunately for complexity, be Option<Pubkey>

Option 3 – Create a `TransactionError::InstructionErrorCausedByProgram` type

  InstructionError(u8, InstructionError),
+
+ InstructionErrorCausedByProgram(u8, InstructionError, Pubkey)

no need for making the address (or address index) Option; just return the relevant TransactionError::InstructionError
would need to make a separate enum to represent the union of both, or use Either (ie. functions would have to have return types like Result<(), Either<InstructionError, InstructionErrorCausedByProgram>)

joncinque · 2025-03-17T23:37:42Z

So good news, I went through all of the InstructionError variants, and they pretty much all pertain to a certain program, or preparing the execution of a certain program. [1]

To go with that, the TransactionError::InstructionError variant uses a u8 for the index of the top-level instruction in the transaction, which means that there's already some notion of "this error requires some greater context to truly understand".

I'm down to go with the option 1, but using a u8 for the index of the account in the transaction, as mentioned earlier. Looking at the storage protos for transaction errors, I think we should just be able to add a new field for the originating account: https://github.com/anza-xyz/agave/blob/bc09ffa335d9773fd6c4b354e61c44b8fc36724a/storage-proto/proto/transaction_by_addr.proto#L21

It'll be a bit of a slog to get through all the changes, but it should be mostly straightforward. Happy to hear other ideas though!

[1] if I'm incorrect, then we should fix the usage of that particular error variant rather than torpedo the design

steveluscher · 2025-03-18T18:45:20Z

…pretty much all pertain to a certain program, or preparing the execution of a certain program

I started then scrapped a PR that did option 1 and I seem to remember it being really hard to obtain the program address in places. Here's cargo check after adding a u8 to TransactionError::InstructionError (gist).

I can try again, but a few on spec:

UnsupportedProgramId doesn't, by its very nature. (link)
Do MissingAccount and NotEnoughAccountKeys ever happen before we know what the program address is? (link, link)

The big problem is here, because it takes in a dynamic error after long having forgotten what program is responsible for the error: (link)

So basically you have to follow process_precompile and process_instruction all the way down, and make sure that they throw the responsible program address up through the Result, which means now you're not throwing an InstructionError there, you're throwing either an InstructionErrorWithResponsibleProgram or a (InstructionError, u8). I found this to get really hairy. (link)

joncinque · 2025-03-18T19:10:29Z

UnsupportedProgramId doesn't, by its very nature. (link)

That one looks at the owner of the first program account in the transaction context, so we could probably use that, right? https://github.com/anza-xyz/agave/blob/ccbf3f25f332cf4bfa4f1a9bf27db8d3333b3064/program-runtime/src/invoke_context.rs#L523

Do MissingAccount and NotEnoughAccountKeys ever happen before we know what the program address is? (link, link)

If they do, then that's an issue with their usage, which should be fixable.

The big problem is here, because it takes in a dynamic error after long having forgotten what program is responsible for the error: (link)

That plumbing does look a bit involved, but should end up similar to any big refactor.

buffalojoec · 2025-04-07T15:10:25Z

@joncinque @steveluscher Hey guys, sorry to come in here late!

Another major issue I see with obtaining the program ID for an inner instruction from the transaction accounts is that CPI callees will eventually no longer be in that list at all.
https://github.com/solana-foundation/solana-improvement-documents/blob/main/proposals/0163-lift-cpi-caller-restriction.md

If a program decides to hard-code TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA and CPI to it, the RPC isn't going to know about it without parsing either the transaction logs or the inner instructions payload.

Rather than plumbing callee IDs all the way from SVM, what about just enabling inner instructions recording on Bank all of the time, and walking that payload to grab the program ID? The array of inner instructions comes back empty when there's an error, but we can just change that behavior.

steveluscher · 2025-04-29T16:32:12Z

Rather than plumbing callee IDs all the way from SVM, what about just enabling inner instructions recording on Bank all of the time, and walking that payload to grab the program ID?

Oh, I like this. Also, we wouldn't have to enable CPI logging all the time; I think we could lazily walk the TransactionContext to figure out what the last instruction in the trace was before it died. I think all that's involved is to see where the logs ended, to know in what program the error originated. I'll give this a shot.

Something like…

let status = status.map(|_| ()).map_err(|e| {
    // Walk the `TransactionContext` to figure out which program was responsible for `e`.
});

jstarry · 2025-05-05T01:38:32Z

transaction-error/src/lib.rs

+    /// the index of the outer instruction in which the error occurred, and the third the account
+    /// index of the program responsible for the error (ie. the error may have originated from an
+    /// inner instruction). The account index of the responsible program may be `None` for


I'm not sure this approach actually clears up much ambiguity. There could be multiple inner instructions that invoke the same program. In order to pinpoint the failing inner instruction you would still need to rely on the inner ix metadata that we already provide.

Good catch! Though the original goal was to be able to correlate what program Custom(###) pertains to so that you can properly decode and handle ###, we could also add the path to the actual inner instruction.

Sorry I don't really follow your response. Is the "original goal" still the current goal? When you say "we could also add the path to the actual inner instruction" is that rhetorical? If this is a "good catch," does this warrant any change with your current approach?

Is it correct to say that you're just trying to map a custom program error code to the actual program that returned it?

If we're just mapping custom error codes, do we need to add the tx account index for the failing program to all instruction error variants?

If we're just mapping custom error codes…

We're not. That was the original impetus for this change, but the exploration led to the insight that most InstructionErrors would benefit from knowing what program caused them.

If this is a "good catch," does this warrant any change with your current approach?

Yep! It warrants me adding the inner instruction index so that you can – for instance – tell that a InsufficientFundsForRent came from instruction 1.4 as opposed to 1.2, even though 1.2 and 1.4 have the same program address.

I added the index of the inner instruction here in the type, and here in the implementation, @jstarry!

jstarry · 2025-05-07T06:12:31Z

I think I'm a bit more up to speed on the evolution of the problem we're trying to solve here. I think it's a very good idea to add an indication that a given instruction error was actually caused by an inner instruction if that was the case. I'm not super excited about making a breaking change to the InstructionError variant to solve that problem.

Also, I think we're now trying to solve more than just the problem of knowing whether an error came from an inner instruction. We're trying to know which inner program produced the error and also where in the call tree the error was produced (partly my fault we went down that path due to this thread: #74 (comment))

I think that it's ok to force users to debug errors by first fetching tx metadata for now. They need metadata to resolve ALT's, to have full context of the call tree leading to the failure, as well as any pertinent logs. I think later we could separately add a better way to get rich error context from the SVM which includes the full call stack in errors so that local development doesn't require metadata fetching.

For differentiating top level vs inner instruction errors, I think I'm most in favor of option 3 where we add a new transaction error variant. As discussed in the PR, we can just use the program account index instead of including the full pubkey, so something like:

enum TransactionError {
  ..
  InnerInstructionError(InstructionError),
}

And then have the user figure out where in the call tree this happened from the tx metadata. I acknowledge this might be too minimalist, adding top level tx index and the account index of the program invoked in the inner instruction could be nice too.

It might also be worth exploring whether adding a new InstructionError variant would be less disruptive to users. We wouldn't want crazy recursive nesting but something like this could work:

enum InstructionError {
  ..
  InnerInstructionFailure(InnerInstructionError)
}

enum InnerInstructionError {
  // subset of InstructionError variants?
}

jstarry · 2025-05-07T05:11:17Z

transaction-error/src/lib.rs

+    /// (ie. the error may have originated from an inner instruction). The inner instruction index
+    /// may be `None` if the error originated from the top-level program call. The account index of
+    /// the  responsible program may be `None` for transactions created before it was introduced.
+    InstructionError(u8, InstructionError, Option<u8>, Option<u8>),


If we're going to make a backwards incompatible change, might as well start naming these fields. Pretty easy to misuse this tuple as proposed

Agreed, I had to look back at this PR a few times while reviewing the other one to remember which field was which

I did not end up doing this, because we'd have to write a custom serde serializer to serialize the new struct variant with named fields in the old format the RPC requires.

What I did do, however, is to make type aliases, which gives you a bit of guidance when you're constructing one of these things.

I did not end up doing this, because we'd have to write a custom serde serializer to serialize the new struct variant with named fields in the old format the RPC requires.

this shouldn't be too bad to do, just a bit verbose.

jstarry · 2025-05-07T05:12:53Z

transaction-error/src/lib.rs

+    /// An error occurred while processing an instruction. The first element of the tuple indicates
+    /// the index of the outer instruction in which the error occurred, the third the index of the
+    /// inner instruction if the program responsible for the error was called by cross-program
+    /// invocation (CPI), and the fourth the account index of the program responsible for the error


The program invoked by the inner instruction might be loaded from an ALT so downstream users may still need to fetch metadata to get the offending program id fyi

Ugh. This really sucks. I'll have to think about this for a hot second. Ideally we'd just store the program address, but that would increase the size of stored errors quite a bit, which we should try to avoid.

I thought we were just dealing in indices all over the place? Why do we need the ALT if we know the index? Are you guys talking about the eventual end-user on the client side having to resolve a program ID from the index?

Yeah when I said downstream users I meant the the client side end user experience

joncinque

Just a couple of comments, looks good otherwise

joncinque · 2025-05-08T15:29:25Z

transaction-error/src/lib.rs

+    /// (ie. the error may have originated from an inner instruction). The inner instruction index
+    /// may be `None` if the error originated from the top-level program call. The account index of
+    /// the  responsible program may be `None` for transactions created before it was introduced.
+    InstructionError(u8, InstructionError, Option<u8>, Option<u8>),


Agreed, I had to look back at this PR a few times while reviewing the other one to remember which field was which

joncinque · 2025-05-08T15:32:43Z

transaction-error/src/lib.rs

+             // NOTE: We intentionally do not augment the error message in the event that the error
+             // carries the index of the inner instruction or the account index of the responsible
+             // program. While it would add value to the log, to do so now would also break any log
+             // parser that presumed the log format to be immutable for all time (eg.
+             // https://tinyurl.com/3uuczr68).
+             =>  write!(f, "Error processing Instruction {index}: {err}"),


I think that's fine for now, people can also use the Debug formatting to get every piece of data.

We could consider creating a new enum variant, but then the UMI parser would be totally useless, which is even worse in my opinion.

steveluscher · 2025-05-16T05:54:26Z

Thanks for the detailed thoughts, @jstarry.

I think that it's ok to force users to debug errors by first fetching tx metadata for now.

tx metadata is, unfortunately, insufficient.

Consider these three transactions.

-> Program A
- -> Program B
  - -> Program C
  - <- Program C (ERROR)
- <- Program B
<- Program A

-> Program A
- -> Program B
  - -> Program C
  - <- Program C
- <- Program B (ERROR)
<- Program A

-> Program A
- -> Program B
  - -> Program C
  - <- Program C
- <- Program B
<- Program A (ERROR)

The inner instruction trace will look the same in all of these cases, making it impossible to discern where the error actually came from.

They need metadata to resolve ALT's

Guh, this sucks. In some cases you'll have the ALT information locally (eg. you used @solana/kit to create the transaction, and the error happened in simulation) but not in all cases for sure. I'll have to consult with @steviez to see if we can stand another 32 bytes of context data in stored errors or not, because the ideal thing would be to just bake the address itself into the error, rather than the index of the program account.

I think later we could separately add a better way to get rich error context from the SVM…

This is underway in https://github.com/anza-xyz/agave/pull/6083/commits

jstarry · 2025-05-16T11:14:16Z

The inner instruction trace will look the same in all of these cases, making it impossible to discern where the error actually came from.

Ah yeah thanks for pointing that out, definitely insufficient to just look at metadata as it is right now, you would need to know the stack height of the last instruction which was running. You could probably parse the logs for this but that's not a great solution.

I guess my suggestion with going with option 3 that you listed earlier would be contingent on adding the stack height to the new error variant (something like InnerInstructionError { stack_height: u8, err: InstructionError}). If we had that, then we could actually be ok with having users fetch metadata for now.

buffalojoec · 2025-05-17T04:14:03Z

transaction-error/src/lib.rs

+    InstructionError(
+        OuterInstructionIndex,
+        InstructionError,
+        Option<ResponsibleProgramAccountIndex>,
+        Option<InnerInstructionIndex>,
+    ),


The aliasing helps read this file, but anywhere you access these values in the runtime it's still going to be a tuple-index access, which means maintainers would have to go back to this file to double-check which index is what.

Personally I would rather see a struct here.

buffalojoec · 2025-05-17T04:15:05Z

transaction-error/src/lib.rs

+             // NOTE: We intentionally do not augment the error message in the event that the error
+             // carries the account index of the responsible program. While it would add value to
+             // the log, but to do so at this point would also break any log parser that presumes a
+             // stable log format (eg. https://tinyurl.com/3uuczr68).
+             =>  write!(f, "Error processing Instruction {idx}: {err}"),


Makes sense! I'm pretty sure if we did change this, it would break consensus.

…tructionError` and the inner instruction index that points to its location in the outer instruction This will help app developers correlate an error apparent to the program from which it originated in cases where the instruction index alone is insufficient to do so (eg. when the program that caused the error is in an inner instruction / CPI) Addresses: anza-xyz/agave#5152

steveluscher · 2025-05-29T05:59:20Z

OK, this puppy is ready to land, and is the prerequisite/companion to anza-xyz/agave#6083.

We're going with the strategy of breaking the enum type and bumping the major version of solana-transaction-error
Backward compatibility with old data and old clients has been achieved by airgapping the TransactionError::InstructionError type from all of the places in Agave where the old serialization and storage formats must be maintained. See the introduction of StoredTransactionError and UiTransactionError in Add the inner instruction index and transaction-level account index of erroring programs to TransactionError::InstructionError agave#6083, the custom serializer/deserializer implemented on each, and the extensive tests.

steveluscher requested a review from a team as a code owner March 6, 2025 21:44

steveluscher force-pushed the add_program_address_to_instruction_error branch 4 times, most recently from 270d3c8 to d7fbd88 Compare March 6, 2025 22:42

steveluscher commented Mar 6, 2025

View reviewed changes

apfitzge reviewed Mar 7, 2025

View reviewed changes

steveluscher mentioned this pull request Mar 7, 2025

Add program_address to the TransactionError type anza-xyz/agave#5152

Open

joncinque mentioned this pull request Mar 24, 2025

Tracking issue for upcoming breaking changes #84

Open

joncinque added the breaking PR contains breaking changes label Mar 31, 2025

steveluscher force-pushed the add_program_address_to_instruction_error branch from d7fbd88 to f47932c Compare May 3, 2025 00:36

steveluscher mentioned this pull request May 3, 2025

Add the inner instruction index and transaction-level account index of erroring programs to TransactionError::InstructionError anza-xyz/agave#6083

Draft

1 task

steveluscher requested a review from apfitzge May 3, 2025 01:04

steveluscher force-pushed the add_program_address_to_instruction_error branch from f47932c to 4d9c1e8 Compare May 3, 2025 01:05

jstarry reviewed May 5, 2025

View reviewed changes

steveluscher force-pushed the add_program_address_to_instruction_error branch from 821245c to d2fd373 Compare May 5, 2025 22:31

joncinque self-requested a review May 6, 2025 20:11

jstarry reviewed May 7, 2025

View reviewed changes

joncinque reviewed May 8, 2025

View reviewed changes

steveluscher force-pushed the add_program_address_to_instruction_error branch from d2fd373 to 4e4a04a Compare May 16, 2025 05:44

steveluscher changed the title ~~Add the affected program address to each InstructionError~~ Add the responsible program's account index and inner instruction index to each InstructionError May 16, 2025

buffalojoec reviewed May 17, 2025

View reviewed changes

steveluscher force-pushed the add_program_address_to_instruction_error branch from 4e4a04a to ec46e8b Compare May 28, 2025 20:22

steveluscher marked this pull request as draft May 28, 2025 20:22

steveluscher force-pushed the add_program_address_to_instruction_error branch from ec46e8b to 6a11e9b Compare May 29, 2025 05:52

steveluscher force-pushed the add_program_address_to_instruction_error branch from 6a11e9b to 7040146 Compare May 29, 2025 05:55

steveluscher marked this pull request as ready for review May 29, 2025 05:59

Add the responsible program's account index and inner instruction index to each InstructionError #74

Are you sure you want to change the base?

Add the responsible program's account index and inner instruction index to each InstructionError #74

Uh oh!

Conversation

steveluscher commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Summary of changes

Uh oh!

steveluscher commented Mar 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevinheavey commented Mar 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steveluscher commented Mar 7, 2025

Option 1 – Add program address to TransactionError::InstructionError

Option 2 – Add program address to every InstructionError that is related to a program

Option 3 – Create a TransactionError::InstructionErrorCausedByProgram type

Uh oh!

joncinque commented Mar 17, 2025

Uh oh!

steveluscher commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joncinque commented Mar 18, 2025

Uh oh!

buffalojoec commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steveluscher commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steveluscher May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

steveluscher May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jstarry commented May 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

buffalojoec May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Add the responsible program's account index and inner instruction index to each `InstructionError` #74

Add the responsible program's account index and inner instruction index to each `InstructionError` #74

steveluscher commented Mar 6, 2025 •

edited

Loading

steveluscher commented Mar 6, 2025 •

edited

Loading

Option 1 – Add program address to `TransactionError::InstructionError`

Option 2 – Add program address to every `InstructionError` that is related to a program

Option 3 – Create a `TransactionError::InstructionErrorCausedByProgram` type

steveluscher commented Mar 18, 2025 •

edited

Loading

buffalojoec commented Apr 7, 2025 •

edited

Loading

steveluscher commented Apr 29, 2025 •

edited

Loading

steveluscher May 5, 2025 •

edited

Loading

steveluscher May 5, 2025 •

edited

Loading

buffalojoec May 17, 2025 •

edited

Loading

buffalojoec May 17, 2025 •

edited

Loading