Skip to content

Fix signal handler race condition on metrics.write() #2893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 14, 2022

Conversation

alindima
Copy link
Contributor

@alindima alindima commented Feb 10, 2022

Reason for This PR

FIxes the root cause of intermittent failure of test_custom_seccomp.py::test_failing_fiter
View commit messages.

Description of Changes

View commit messages.

  • This functionality can be added in rust-vmm.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license.

PR Checklist

[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]

  • All commits in this PR are signed (git commit -s).
  • The issue which led to this PR has a clear conclusion.
  • This PR follows the solution outlined in the related issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • Any newly added unsafe code is properly documented.
  • Any API changes follow the Runbook for Firecracker API changes.
  • Any user-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.

@alindima alindima self-assigned this Feb 10, 2022
@raduiliescu
Copy link
Contributor

For the second commit I doubt this is the best approach. I am hardly familiar with RUST, but adding acquire/release a lock, on operations done on the hotpath is not OK. Maybe you can take the second commit out, and we fix just the issue with intermittent test.
One other approach with metrics on multiple threads is to have that metric split per thread, and when dumping just do an add.

@alindima
Copy link
Contributor Author

alindima commented Feb 10, 2022

I am hardly familiar with RUST, but adding acquire/release a lock, on operations done on the hotpath is not OK.

There are no locks added. These are just atomic instruction flags that prevent the CPU from reordering loads of a value before stores on the same atomic.
On x86 this should have no effect because of the stronger memory model. On ARM it should just prevent the CPU from reordering the atomic instructions in ways that violate the load-after-store rule across different threads.

More details about this: https://doc.rust-lang.org/nomicon/atomics.html#hardware-reordering

Comment on lines 280 to 284
let snapshot = self.count();
let res = serializer.serialize_u64(snapshot as u64 - self.1.load(Ordering::Acquire) as u64);

if res.is_ok() {
self.1.store(snapshot, Ordering::Relaxed);
self.1.store(snapshot, Ordering::Release);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is correct. If we can serialize from multiple threads, I think snapshot can be different on each one and the value that ends up being assigned to self.1 in the end (self.1.store(snapshot, Ordering::Release);) won't necessarily be the most recent one. So we can end up in the situation where self.0 < self.1.

A solution would be to have another atomic bool that is marking whether there is a serialization in progress, and if there is, don't do anything. Although this wouldn't guarantee the serialization of the latest values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.0 is incremented atomically from any thread. If we have Relaxed semantics, it may have different values on multiple threads when serialising.
If using Acquire semantics, like we do in self.count(), we can be sure that any writes to self.0 with Release semantics (like we do in add()) are going to be executed and visible before the load.

I believe that the scenario you are describing is possible prior to this PR, due to the Relaxed ordering.

A solution would be to have another atomic bool that is marking whether there is a serialization in progress, and if there is, don't do anything. Although this wouldn't guarantee the serialization of the latest values.

This is essentially simulating a mutex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anyway, I believe that the usage of the metrics system in signal handlers needs a redesign with thread-safety in mind.
Since this second commit is pretty controversial and may be introducing some performance degradation on ARM, I will remove it from this PR to unblock it and I will open a new issue for it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would something like the following be possible ?

+-----------------------------------+-----------------------------------+
| Thread 1                          | Thread 2                          |
+-----------------------------------+-----------------------------------+
| let snapshot = self.count(); -> 5 | ...                               |
+-----------------------------------+-----------------------------------+
| ...                               | inc metric -> 6                   |
+-----------------------------------+-----------------------------------+
| ...                               | let snapshot = self.count(); -> 6 |
+-----------------------------------+-----------------------------------+
| ...                               | store snapshot in self.1 -> 6     |
+-----------------------------------+-----------------------------------+
| store snapshot in self.1 -> 5     |                                   |
+-----------------------------------+-----------------------------------+

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is not possible since we only write the metrics from the VMM thread or the signal handler.
In the signal handler we now only use StoreMetrics so this would not be possible

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the type is not thread safe than Sync should not be implemented for it. Otherwise we're breaking the Rust safety principles and Rust cannot protect and benefit of the "fearless concurrency" anymore. The problem seems to be that we're doing more things than we should be doing in a context of the signal handler, so maybe a solution would be to simplify the signal handler. This will come at the expense of maybe metrics being missed, but it is arguably better than opening potentially other safety problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have discussed removing logging and metrics systems from the signal handlers.
One idea is to use specialised files for dumping abrupt exit information from signal and panic handlers (much like a core dump).

This is a separate feature and will be worked on in the future. It requires careful thought and more investment.

This PR addresses the issue surgically, since it is creating much trouble for us with intermittent test failures.
We are not introducing problems with this PR, we are applying a patch over the known occurrence of this problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a patch I think it's ok. Do we have an issue for doing a proper fix ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue for it: #2899

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem sufficient to me because if I had to add a new metric I can imagine that it would be very easy to miss these comments. I wouldn't check the comments for the Metrics.write() or serialize() methods. It would be the same even if the serialize() method would be unsafe.

I realised that the signal handler generation macro is hardcoded to use the store function, which is only defined for StoreMetrics. It would require a separate function in order to add an IncMetric to a signal handler, like it is for SIGPIPE. This should be even more evident in code reviews

The METRICS.write() method operates on the false assumption
that it is only being called from one thread (VMM).
This is not true since it is called from the signal handlers
as well (which may be executed on any thread).

The race condition appears due to the fact that metrics state is
mutated during serialization for SharedIncMetrics (old value gets
assigned the new value after the difference is serialized).
The METRICS.write() method only holds the lock when writing to the
file, but there is no lock held while serialising the metrics to JSON.

VMM THREAD (1/MINUTE)   VCPU THREAD (SIGNAL HANDLER)
                        new_val=1
dif=new_val-old_val(=1)
old_val=new_val (=1)
                        dif=new_val-old_val (=0)
                        write(dif) (=0)
                        exit()
Because of the exit(), the VMM thread no longer gets a chance to
write the metric.

We now use SharedStoreMetric for deadly signal metrics,
which does not have this type of race condition because it is not
mutated on serialization.

It also does not make sense to have a SharedIncMetric for deadly
signal metrics, which can only ever be 0 or 1.

This was the root-cause of the intermittent failure in
test_custom_seccomp.py::test_failing_fiter

Signed-off-by: alindima <[email protected]>
@alindima alindima merged commit 8f9ec61 into firecracker-microvm:main Feb 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants