Skip to content

Conversation

@heywji
Copy link
Contributor

@heywji heywji commented Oct 29, 2025

Increase dump timeout from 90s to 1800s (30 minutes) to handle large vmcore files (~132GB). The original timeout was too short for VMs with 126GB memory, causing truncated vmcore files.

With 30-minute timeout, even at 75MB/s disk I/O speed, 132GB dumps should complete successfully.

ID: 4239

Signed-off-by: Wenkang Ji wji@redhat.com

Summary by CodeRabbit

  • Chores
    • Updated memory configuration for guest memory dump testing.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Oct 29, 2025

Walkthrough

A single configuration parameter (mem = 8192) is added to the dump_guest_memory.cfg file. This introduces a memory allocation specification to the existing test configuration without modifying or removing any other parameters.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title claims to increase timeout, but the actual change only adds a memory configuration parameter (mem = 8192) with no timeout modification visible in the changeset. Update the title to accurately reflect the actual change, such as 'dump_guest_memory: Set memory configuration to 8192 MB' or verify the timeout change is included in the changeset.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@heywji
Copy link
Contributor Author

heywji commented Oct 29, 2025

Test Results: PASS

FROM:

(01/10) repeat1.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (01/10) repeat1.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (399.91 s)                                                                    
 (02/10) repeat2.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (02/10) repeat2.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (402.98 s)                                                                    
 (03/10) repeat3.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (03/10) repeat3.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (404.10 s)                                                                    
 (04/10) repeat4.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (04/10) repeat4.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (400.25 s)                                                                    
 (05/10) repeat5.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (05/10) repeat5.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (402.54 s)                                                                    
 (06/10) repeat6.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (06/10) repeat6.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (403.41 s)                                                                    
 (07/10) repeat7.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED                                                                                             
 (07/10) repeat7.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (401.73 s)
 (08/10) repeat8.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED              
 (08/10) repeat8.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (401.28 s)
 (09/10) repeat9.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED  
 (09/10) repeat9.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (401.53 s)
 (10/10) repeat10.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED             
 (10/10) repeat10.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  FAIL: Vmcore corrupt (404.02 s)
RESULTS    : PASS 1 | ERROR 0 | FAIL 9 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0                                 

TO:

 (01/10) repeat1.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (01/10) repeat1.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (423.51 s)
 (02/10) repeat2.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (02/10) repeat2.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (414.12 s)
 (03/10) repeat3.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (03/10) repeat3.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (425.09 s)
 (04/10) repeat4.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (04/10) repeat4.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (412.60 s)
 (05/10) repeat5.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (05/10) repeat5.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (433.74 s)
 (06/10) repeat6.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (06/10) repeat6.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (410.03 s)
 (07/10) repeat7.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (07/10) repeat7.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (416.82 s)
 (08/10) repeat8.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (08/10) repeat8.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (420.95 s)
 (09/10) repeat9.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (09/10) repeat9.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (420.31 s)
 (10/10) repeat10.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35: STARTED
 (10/10) repeat10.Host_RHEL.m10.u1.qcow2.virtio_scsi.up.virtio_net.Guest.RHEL.10.1.x86_64.io-github-autotest-qemu.dump_guest_core.on.q35:  PASS (437.95 s)
RESULTS    : PASS 10 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
qemu/tests/dump_guest_core.py (1)

65-65: Inconsistent timeout: crash command should also be increased.

The crash command timeout remains at 60 seconds, but dump_guest_memory.py line 160 increased it to 1800 seconds for the same operation. Large vmcore files (132 GB) will likely require more than 60 seconds for crash to analyze. This timeout should be increased to 1800 for consistency.

Apply this diff to fix the inconsistency:

-        output = process.getoutput(crash_cmd, timeout=60)
+        output = process.getoutput(crash_cmd, timeout=1800)
🧹 Nitpick comments (1)
qemu/tests/cfg/dump_guest_memory.cfg (1)

7-7: Consider a larger safety margin for the timeout value.

The timeout increase to 1800 seconds is appropriate for large vmcore dumps. However, based on your PR description (132 GB at 75 MB/s ≈ 1802 seconds), this provides minimal buffer. Consider increasing to 2400 or 3600 seconds to accommodate slower I/O speeds, system load, or compression overhead.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1eb1d9a and b4f6cbd.

📒 Files selected for processing (3)
  • qemu/tests/cfg/dump_guest_memory.cfg (1 hunks)
  • qemu/tests/dump_guest_core.py (2 hunks)
  • qemu/tests/dump_guest_memory.py (1 hunks)
🔇 Additional comments (3)
qemu/tests/dump_guest_core.py (2)

48-48: LGTM! Appropriate timeout increase for gdb dump generation.

The timeout increase from 360 to 1800 seconds is appropriate for generating large vmcore files via gdb.


116-116: LGTM! Defensive timeout increase.

The timeout increase to 1800 seconds is safe and consistent. Note that this wait might be redundant since check_core_file() (line 113) should have already created the vmcore_file synchronously via the gdb command at line 48. However, the defensive check with adequate timeout is reasonable.

qemu/tests/dump_guest_memory.py (1)

160-160: LGTM! Good addition of timeout to prevent indefinite hangs.

Adding the 1800-second timeout to the crash verification command is appropriate and prevents potential indefinite hangs when analyzing large vmcore files.

@heywji
Copy link
Contributor Author

heywji commented Oct 30, 2025

Depends on: #4390

@heywji heywji force-pushed the fix_dump_guest_memory_increase_timeout branch 2 times, most recently from 9ad7310 to d91978b Compare November 4, 2025 06:40
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
qemu/tests/dump_guest_core.py (1)

48-116: Consider making timeouts configurable.

The 1800s timeout is now hardcoded in multiple locations. For large values like 30 minutes, consider making timeouts configurable via test parameters to provide flexibility for different hardware configurations or memory sizes.

For example, you could add to the params section:

gdb_timeout = int(params.get("gdb_timeout", 1800))
crash_timeout = int(params.get("crash_timeout", 1800))
vmcore_wait_timeout = int(params.get("vmcore_wait_timeout", 1800))

Then use these variables in the respective locations instead of hardcoded values.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ad7310 and d91978b.

📒 Files selected for processing (3)
  • qemu/tests/cfg/dump_guest_memory.cfg (1 hunks)
  • qemu/tests/dump_guest_core.py (4 hunks)
  • qemu/tests/dump_guest_memory.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • qemu/tests/dump_guest_memory.py
  • qemu/tests/cfg/dump_guest_memory.cfg
🔇 Additional comments (3)
qemu/tests/dump_guest_core.py (3)

48-48: Timeout increase is appropriate for large memory dumps.

The increase from 360s to 1800s is well-justified by the test results showing operations take 400-438 seconds for 126 GB memory VMs. This aligns with the PR objective to handle ~132 GB vmcore files.


65-65: Timeout increase is consistent and necessary.

The increase from 60s to 1800s is necessary for analyzing large vmcore files (~132 GB) and maintains consistency with the other timeout changes in this test.


116-116: Timeout increase is appropriate and consistent.

The increase from 60s to 1800s for waiting for the vmcore file to exist is necessary for large vmcore files and maintains consistency with the other timeout changes throughout this test.

@heywji heywji force-pushed the fix_dump_guest_memory_increase_timeout branch from d91978b to d4d5e01 Compare November 4, 2025 06:47
@leidwang
Copy link
Contributor

Hi @yiqianwei I Know you will run some test cases in big machine, just want to confirm if you will run this test cases? If not, we may limit the memory in the cfg file.Thanks.

@leidwang
Copy link
Contributor

Hi @heywji Please involve arm64 feature owner to review this PR as it will also affect them.Thanks.

@heywji
Copy link
Contributor Author

heywji commented Nov 17, 2025

@fbq815 Hi Biaoqiao, please help review this patch for the multi-arch platform. Thanks!

@yiqianwei
Copy link
Contributor

Hi @yiqianwei I Know you will run some test cases in big machine, just want to confirm if you will run this test cases? If not, we may limit the memory in the cfg file.Thanks.

@leidwang, I haven't run this test case on a large machine

@leidwang
Copy link
Contributor

Hi @yiqianwei I Know you will run some test cases in big machine, just want to confirm if you will run this test cases? If not, we may limit the memory in the cfg file.Thanks.

@leidwang, I haven't run this test case on a large machine

Okay, thanks @yiqianwei

@heywji I think we can limit the memory directly instead enlarge the timeout.What do you think?

@heywji
Copy link
Contributor Author

heywji commented Nov 18, 2025

@leidwang OK for me.

Copy link
Contributor

@fbq815 fbq815 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
there's known issue for gdb on s390x with the failed case

JOB ID : 8e3c54ac251c1a4e198e9d77dc9e7bfd2aa12615
JOB LOG : /root/avocado/job-results/job-2025-12-05T02.45-8e3c54a/job.log
(01/13) dump_guest_memory.with_detach_params.s390-virtio: PASS (68.85 s)
(02/13) dump_guest_memory.verify_diff_format_dump_file.default.s390-virtio: PASS (38.91 s)
(03/13) dump_guest_memory.verify_diff_format_dump_file.query_dump_guest_memory_capability.s390-virtio: PASS (25.25 s)
(04/13) dump_guest_memory.verify_diff_format_dump_file.quux_format_dump.s390-virtio: PASS (25.43 s)
(05/13) dump_guest_memory.verify_diff_format_dump_file.elf_formate_dump.s390-virtio: PASS (38.35 s)
(06/13) dump_guest_memory.verify_diff_format_dump_file.zlib_formate_dump.s390-virtio: PASS (45.41 s)
(07/13) dump_guest_memory.verify_diff_format_dump_file.lzo_formate_dump.s390-virtio: PASS (32.43 s)
(08/13) dump_guest_memory.verify_diff_format_dump_file.snappy_formate_dump.s390-virtio: PASS (32.00 s)
(09/13) dump_guest_memory.verify_diff_format_dump_file.paging_true_snappy_format_dump.s390-virtio: PASS (25.52 s)
(10/13) dump_guest_memory.verify_diff_format_dump_file.paging_true_zlib_format_dump.s390-virtio: PASS (25.58 s)
(11/13) dump_guest_memory.verify_diff_format_dump_file.paging_true_lzo_format_dump.s390-virtio: PASS (25.56 s)
(12/13) dump_guest_core.on.s390-virtio: ERROR: [Errno 2] No such file or directory: '/home/vmcore' (3622.14 s)
(13/13) dump_guest_core.off.s390-virtio: PASS (1835.13 s)
RESULTS : PASS 12 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB HTML : /root/avocado/job-results/job-2025-12-05T02.45-8e3c54a/results.html
JOB TIME : 5847.01 s

@leidwang
Copy link
Contributor

@leidwang OK for me.

Do not forgot to update the code.

Remove process.TimeoutError from exception handling, as it doesn't
exist in avocado.utils.process module.

Signed-off-by: wji <wji@redhat.com>
@heywji heywji force-pushed the fix_dump_guest_memory_increase_timeout branch from d4d5e01 to b13085e Compare January 20, 2026 08:21
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@qemu/tests/dump_guest_core.py`:
- Around line 85-86: Replace the incorrect built-in TimeoutError in the
exception tuple with subprocess.TimeoutExpired in the except block that
currently reads catching process.CmdError and TimeoutError (the block handling
install of debuginfo packages), and add an import for the subprocess module at
the top of the file so subprocess.TimeoutExpired is available; update the except
to use (process.CmdError, subprocess.TimeoutExpired) and ensure the new import
is present.

The dump_guest_memory test was failing on VMs with large memory
(e.g., 128GB) because the vmcore dump would exceed the original 90s
timeout, resulting in truncated files and test failure.

While simply increasing the timeout could work around the issue for a
specific memory size, a more robust solution is to set a fixed memory
size for the test. This commit sets the VM memory to 8GB, making the
dump time predictable and preventing future timeouts, regardless of
changes to the default VM memory allocation.

Signed-off-by: Wenkang Ji <wji@redhat.com>
@heywji heywji force-pushed the fix_dump_guest_memory_increase_timeout branch from b13085e to 66b222b Compare January 20, 2026 08:26
@heywji
Copy link
Contributor Author

heywji commented Jan 20, 2026

@leidwang Hi Leidong, I'm calling you for a review of this patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants