Skip to content

Commit 42cdc44

Browse files
authored
Merge branch 'firecracker-microvm:main' into clippy_cast_lossless
2 parents 05127c5 + 1644b3c commit 42cdc44

File tree

11 files changed

+221
-229
lines changed

11 files changed

+221
-229
lines changed

.buildkite/pipeline_pr.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,8 @@ def group(group_name, command, agent_tags=None, priority=0, timeout=30):
5555
step_style = {
5656
"command": "./tools/devtool -y test -- ../tests/integration_tests/style/",
5757
"label": "🪶 Style",
58-
# no agent tags, it doesn't matter where this runs
58+
# we only install the required dependencies in x86_64
59+
"agents": ["platform=x86_64.metal"]
5960
}
6061

6162
build_grp = group(

docs/snapshotting/snapshot-support.md

+36-20
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@
88
- [Overview](#overview)
99
- [Snapshot files management](#snapshot-files-management)
1010
- [Performance](#performance)
11-
- [Known issues](#known-issues)
11+
- [Developer preview status](#developer-preview-status)
12+
- [Limitations](#limitations)
1213
- [Firecracker Snapshotting characteristics](#firecracker-snapshotting-characteristics)
1314
- [Snapshot versioning](#snapshot-versioning)
1415
- [Snapshot API](#snapshot-api)
@@ -38,6 +39,7 @@ guest workload at that particular point in time.
3839

3940
The Firecracker snapshot feature is in [developer preview](../RELEASE_POLICY.md)
4041
on all CPU micro-architectures listed in [README](../../README.md#supported-platforms).
42+
See [this section](#developer-preview-status) for more info.
4143

4244
### Overview
4345

@@ -82,8 +84,6 @@ resumed microVM.
8284

8385
The Firecracker snapshot design offers a very simple interface to interact with
8486
snapshots but provides no functionality to package or manage them on the host.
85-
Using snapshots in production is currently not recommended as there are open
86-
[Known issues](#known-issues).
8787

8888
The [threat containment model](../design.md#threat-containment) states
8989
that the host, host/API communication and snapshot files are trusted by Firecracker.
@@ -93,33 +93,49 @@ snapshot files by implementing authentication and encryption schemes while
9393
managing their lifecycle or moving them across the trust boundary, like for
9494
example when provisioning them from a respository to a host over the network.
9595

96-
Firecracker is optimized for fast load/resume and it's designed to do some very basic
97-
sanity checks only on the vm state file. It only verifies integrity using a 64
98-
bit CRC value embedded in the vm state file, but this is only as a partial
99-
measure to protect against accidental corruption, as the disk files and memory
100-
file need to be secured as well. It is important to note that CRC computation
101-
is validated before trying to load the snapshot. Should it encounter failure,
102-
an error will be shown to the user and the Firecracker process will be terminated.
96+
Firecracker is optimized for fast load/resume, and it's designed to do some
97+
very basic sanity checks only on the vm state file. It only verifies integrity
98+
using a 64-bit CRC value embedded in the vm state file, but this is only
99+
a partial measure to protect against accidental corruption, as the disk
100+
files and memory file need to be secured as well. It is important to note that
101+
CRC computation is validated before trying to load the snapshot. Should it
102+
encounter failure, an error will be shown to the user and the Firecracker
103+
process will be terminated.
103104

104105
### Performance
105106

106107
The Firecracker snapshot create/resume performance depends on the memory size,
107-
vCPU count and emulated devices count. The Firecracker CI runs snapshots tests
108-
on AWS **m5d.metal** instances for Intel and on AWS **m6g.metal** for ARM.
109-
The baseline for snapshot resume latency target on Intel is under **8ms** with
110-
5ms p90, and on ARM is under **3ms** for a microVM with the following specs:
111-
2vCPU/512MB/1 block/1 net device.
108+
vCPU count and emulated devices count.
109+
The Firecracker CI runs snapshot tests on:
112110

113-
### Known issues
111+
- AWS **m5d.metal** and **m6i.metal** instances for Intel
112+
- AWS **m6g.metal** for ARM
113+
- AWS **m6a.metal** for AMD
114114

115-
- High snapshot latency on 5.4+ host kernels - [#2129](https://github.com/firecracker-microvm/firecracker/issues/2129)
115+
We are running nightly performance tests for all the enumerated platforms on
116+
all supported kernel versions.
117+
The baselines can be found in their [respective config file](../../tests/integration_tests/performance/configs/).
118+
119+
### Developer preview status
120+
121+
The snapshot functionality is still in developer preview due to the following:
122+
123+
- Poor entropy and replayable randomness when resuming multiple microvms from
124+
the same snapshot. We do not recommend to use snapshotting in production if
125+
there is no mechanism to guarantee proper secrecy and uniqueness between
126+
guests.
127+
Please see [Snapshot security and uniqueness](#snapshot-security-and-uniqueness).
128+
129+
### Limitations
130+
131+
- High snapshot latency on 5.4+ host kernels due to cgroups V1. We
132+
strongly recommend to deploy snapshots on cgroups V2 enabled hosts for the
133+
implied kernel versions - [related issue](https://github.com/firecracker-microvm/firecracker/issues/2129).
116134
- Guest network connectivity is not guaranteed to be preserved after resume.
117135
For recommendations related to guest network connectivity for clones please
118136
see [Network connectivity for clones](network-for-clones.md).
119137
- Vsock device does not have full snapshotting support.
120138
Please see [Vsock device limitation](#vsock-device-limitation).
121-
- Poor entropy and replayable randomness when resuming multiple microvms which
122-
deal with cryptographic secrets. Please see [Snapshot security and uniqueness](#snapshot-security-and-uniqueness).
123139
- Snapshotting on arm64 works for both GICv2 and GICv3 enabled guests.
124140
However, restoring between different GIC version is not possible.
125141

@@ -542,7 +558,7 @@ Boot microVM A -> ... -> Create snapshot S -> Resume -> ...
542558
-> Load S in microVM B -> Resume -> ...
543559
```
544560

545-
Here, both microVM A and B do work staring from the state stored in snapshot S.
561+
Here, both microVM A and B do work starting from the state stored in snapshot S.
546562
Unique identifiers, random numbers, and cryptographic tokens that are meant to
547563
be used once may be used twice. It doesn't matter if microVM A is terminated
548564
before microVM B resumes execution from snapshot S or not. In this example, we

resources/tests/setup_rootfs.sh

+8-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,14 @@ prepare_fc_rootfs() {
1111
SSH_DIR="$BUILD_DIR/ssh"
1212
RESOURCE_DIR="$2"
1313

14-
packages="udev systemd-sysv openssh-server iproute2 msr-tools"
14+
packages="udev systemd-sysv openssh-server iproute2"
15+
16+
# msr-tools is only supported on x86-64.
17+
arch=$(uname -m)
18+
if [ "${arch}" == "x86_64" ]; then
19+
packages="$packages msr-tools"
20+
fi
21+
1522
apt-get update
1623
apt-get install -y --no-install-recommends $packages
1724

src/cpuid/src/common.rs

+2-2
Original file line numberDiff line numberDiff line change
@@ -31,8 +31,8 @@ pub enum Error {
3131
/// Extract entry from the cpuid.
3232
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
3333
pub fn get_cpuid(function: u32, count: u32) -> Result<CpuidResult, Error> {
34-
// TODO: replace with validation based on `has_cpuid()` when it becomes stable:
35-
// https://doc.rust-lang.org/core/arch/x86/fn.has_cpuid.html
34+
// TODO: Use `core::arch::x86_64::has_cpuid`
35+
// (https://github.com/firecracker-microvm/firecracker/issues/3271)
3636
#[cfg(target_env = "sgx")]
3737
{
3838
return Err(Error::NotSupported);

tests/integration_tests/build/test_pylint.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ def test_python_pylint():
2020
'--variable-rgx="[a-z_][a-z0-9_]{1,30}$" --disable='
2121
"fixme,too-many-instance-attributes,import-error,"
2222
"too-many-locals,too-many-arguments,consider-using-f-string,"
23-
"consider-using-with,implicit-str-concat"
23+
"consider-using-with,implicit-str-concat,line-too-long"
2424
)
2525

2626
# Get all *.py files from the project

tests/integration_tests/functional/test_balloon.py

+14-7
Original file line numberDiff line numberDiff line change
@@ -45,22 +45,29 @@ def get_rss_from_pmap():
4545

4646
def make_guest_dirty_memory(ssh_connection, should_oom=False, amount=8192):
4747
"""Tell the guest, over ssh, to dirty `amount` pages of memory."""
48+
logger = logging.getLogger("make_guest_dirty_memory")
49+
4850
amount_in_mbytes = amount / MB_TO_PAGES
4951

50-
exit_code, _, _ = ssh_connection.execute_command(
51-
"/sbin/fillmem {}".format(amount_in_mbytes)
52-
)
52+
cmd = f"/sbin/fillmem {amount_in_mbytes}"
53+
exit_code, stdout, stderr = ssh_connection.execute_command(cmd)
54+
# add something to the logs for troubleshooting
55+
if exit_code != 0:
56+
logger.error("while running: %s", cmd)
57+
logger.error("stdout: %s", stdout.read())
58+
logger.error("stderr: %s", stderr.read())
5359

5460
cmd = "cat /tmp/fillmem_output.txt"
5561
_, stdout, _ = ssh_connection.execute_command(cmd)
5662
if should_oom:
5763
assert (
58-
exit_code == 0
59-
and ("OOM Killer stopped the program with " "signal 9, exit code 0")
60-
in stdout.read()
64+
"OOM Killer stopped the program with "
65+
"signal 9, exit code 0" in stdout.read()
6166
)
6267
else:
63-
assert exit_code == 0 and ("Memory filling was " "successful") in stdout.read()
68+
assert exit_code == 0, stderr.read()
69+
stdout_txt = stdout.read()
70+
assert "Memory filling was successful" in stdout_txt, stdout_txt
6471

6572

6673
def build_test_matrix(network_config, bin_cloner_path, logger):

0 commit comments

Comments
 (0)