Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: increase default and max FVM concurrency #448

Closed
wants to merge 1 commit into from

Conversation

Stebalien
Copy link
Member

We used to allocate 1024 instances per "concurrency" but we now only allocate 1024 plus 20 per "concurrency". So, I'm increasing the default and maximum concurrency factors to take advantage of this. Importantly:

  1. Max concurrency 128 -> 4096. 1024 + 4096*20 < 128 * 1024.
  2. Default concurrency 4 -> 150. 1024 + 150*20 < 4 * 1024.

Motivation: I'm hoping that the default (150) is sufficient (more than the previous max...) so we can just get rid of this knob.

fixes filecoin-project/lotus#11817

Alternatively, we could just remove the knob now? The default is more than the previous max... On the other hand, the ability to decrease the max may be nice...

We used to allocate 1024 instances per "concurrency" but we now only
allocate 1024 plus 20 per "concurrency". So, I'm increasing the default
and maximum concurrency factors to take advantage of this. Importantly:

1. Max concurrency 128 -> 4096. `1024 + 4096*20 < 128 * 1024`.
2. Default concurrency 4 -> 150. `1024 + 150*20 < 4 * 1024`.

Motivation: I'm hoping that the default (150) is sufficient (more than
the previous max...) so we can just get rid of this knob.

fixes filecoin-project/lotus#11817
@rvagg
Copy link
Member

rvagg commented Apr 4, 2024

OK, so I think this is related to this: https://github.com/filecoin-project/ref-fvm/blob/464bba53ad01391452db605002c9b5cca189fe39/fvm/src/engine/mod.rs#L63

Which comes from: filecoin-project/ref-fvm@b77d152 which looks like it's been in fvm since v3.5.0. So does that mean we've actually been running with 1024 + 4*20 from that point till now? But this hasn't shown up as a problem?

I don't fully understand the impact of this to be honest, why does the pool size relate to the call stack depth? Is "call stack" here defined as calls from one actor to another? So we use a separate VM each time we execute an actor's method?

@Stebalien
Copy link
Member Author

Some background:

We use this concurrency option for two things:

  1. To limit the maximum number of FVM messages we can execute in parallel.
  2. More importantly, to determine how many wasmtime "vm instances" we need to allocate up front.

Worst case scenario, processing a single message will require 1024 wasmtime instances, because the stack might reach 1024 calls deep (we need an instance per actor in the call stack). That's why, prior to 3.5.0, we allocated 1024*concurrency wasmtime instances (1024 * 4, by default). This was rather wasteful, but was very simple to implement.

In 3.5.0 we changed our logic to oversubscribe (underallocate) instances. However, this was trickier to implement because we needed to avoid a deadlock where multiple concurrent FVM invocations need an instance to progress but we have no more available. We solved this by ensuring that one of the concurrent FVM invocations always has the full 1024 instances to work with, ensuring that one of these invocations is guaranteed to be able to make progress at any time.

Answer:

Since 3.5.0, we've been allocating significantly fewer instances. This patch is taking advantage of this additional headroom by enabling for additional concurrency.

Copy link
Member

@rvagg rvagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I grok at least some of the InstancePool stuff, but not quite how the engines are allocated and kept around, but this seems reasonable to me now, I think

@Stebalien
Copy link
Member Author

Hm. So... unfortunately, we have a thread-pool limit in the FVM itself. We probably need to make this thread-pool flexible.

@Stebalien
Copy link
Member Author

Replacing this PR with #449.

@Stebalien Stebalien closed this Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ☑️ Done (Archive)
Development

Successfully merging this pull request may close these issues.

Increase LOTUS_FVM_CONCURRENCY
2 participants