feat: increase default and max FVM concurrency #448

Stebalien · 2024-04-03T22:56:26Z

We used to allocate 1024 instances per "concurrency" but we now only allocate 1024 plus 20 per "concurrency". So, I'm increasing the default and maximum concurrency factors to take advantage of this. Importantly:

Max concurrency 128 -> 4096. 1024 + 4096*20 < 128 * 1024.
Default concurrency 4 -> 150. 1024 + 150*20 < 4 * 1024.

Motivation: I'm hoping that the default (150) is sufficient (more than the previous max...) so we can just get rid of this knob.

fixes filecoin-project/lotus#11817

Alternatively, we could just remove the knob now? The default is more than the previous max... On the other hand, the ability to decrease the max may be nice...

We used to allocate 1024 instances per "concurrency" but we now only allocate 1024 plus 20 per "concurrency". So, I'm increasing the default and maximum concurrency factors to take advantage of this. Importantly: 1. Max concurrency 128 -> 4096. `1024 + 4096*20 < 128 * 1024`. 2. Default concurrency 4 -> 150. `1024 + 150*20 < 4 * 1024`. Motivation: I'm hoping that the default (150) is sufficient (more than the previous max...) so we can just get rid of this knob. fixes filecoin-project/lotus#11817

rvagg · 2024-04-04T03:01:55Z

OK, so I think this is related to this: https://github.com/filecoin-project/ref-fvm/blob/464bba53ad01391452db605002c9b5cca189fe39/fvm/src/engine/mod.rs#L63

Which comes from: filecoin-project/ref-fvm@b77d152 which looks like it's been in fvm since v3.5.0. So does that mean we've actually been running with 1024 + 4*20 from that point till now? But this hasn't shown up as a problem?

I don't fully understand the impact of this to be honest, why does the pool size relate to the call stack depth? Is "call stack" here defined as calls from one actor to another? So we use a separate VM each time we execute an actor's method?

Stebalien · 2024-04-04T04:13:15Z

Some background:

We use this concurrency option for two things:

To limit the maximum number of FVM messages we can execute in parallel.
More importantly, to determine how many wasmtime "vm instances" we need to allocate up front.

Worst case scenario, processing a single message will require 1024 wasmtime instances, because the stack might reach 1024 calls deep (we need an instance per actor in the call stack). That's why, prior to 3.5.0, we allocated 1024*concurrency wasmtime instances (1024 * 4, by default). This was rather wasteful, but was very simple to implement.

In 3.5.0 we changed our logic to oversubscribe (underallocate) instances. However, this was trickier to implement because we needed to avoid a deadlock where multiple concurrent FVM invocations need an instance to progress but we have no more available. We solved this by ensuring that one of the concurrent FVM invocations always has the full 1024 instances to work with, ensuring that one of these invocations is guaranteed to be able to make progress at any time.

Answer:

Since 3.5.0, we've been allocating significantly fewer instances. This patch is taking advantage of this additional headroom by enabling for additional concurrency.

rvagg

ok, I grok at least some of the InstancePool stuff, but not quite how the engines are allocated and kept around, but this seems reasonable to me now, I think

Stebalien · 2024-04-04T17:54:01Z

Hm. So... unfortunately, we have a thread-pool limit in the FVM itself. We probably need to make this thread-pool flexible.

Stebalien · 2024-04-06T18:35:31Z

Replacing this PR with #449.

Stebalien requested review from rvagg and aarshkshah1992 April 3, 2024 23:02

rvagg approved these changes Apr 4, 2024

View reviewed changes

Stebalien mentioned this pull request Apr 6, 2024

Make sure we can reach the user's requested FVM concurrency #449

Merged

Stebalien closed this Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: increase default and max FVM concurrency #448

feat: increase default and max FVM concurrency #448

Stebalien commented Apr 3, 2024

rvagg commented Apr 4, 2024

Stebalien commented Apr 4, 2024

rvagg left a comment

Stebalien commented Apr 4, 2024

Stebalien commented Apr 6, 2024

feat: increase default and max FVM concurrency #448

feat: increase default and max FVM concurrency #448

Conversation

Stebalien commented Apr 3, 2024

rvagg commented Apr 4, 2024

Stebalien commented Apr 4, 2024

rvagg left a comment

Choose a reason for hiding this comment

Stebalien commented Apr 4, 2024

Stebalien commented Apr 6, 2024