Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BSOD when loading the driver: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED #10

Closed
w568w opened this issue Aug 5, 2024 · 6 comments
Closed

BSOD when loading the driver: SYSTEM_THREAD_EXCEPTION_NOT_HANDLED #10

w568w opened this issue Aug 5, 2024 · 6 comments

Comments

@w568w
Copy link

w568w commented Aug 5, 2024

1. Problem

The kernel BSODed when trying to load mvisor-win-vgpu-driver, due to an error Attempt to read from address 0000000000000008. BSOD code is SYSTEM_THREAD_EXCEPTION_NOT_HANDLED.

2. Steps to reproduce

  1. have a computer installed Archlinux x86_64;
  2. Create a empty .qcow2 with qemu-img: qemu-img -f qcow2 win.qcow2 80G;
  3. git clone https://github.com/tenclass/mvisor and build build/visor with the instructions in README;
  4. get the Windows 10 22H2 ISO (Full name: Windows 10 (consumer editions), version 22H2 (updated July 2024) (x64) - DVD (Chinese-Simplified)) from MSDN I Tell You.
    • Link: magnet:?xt=urn:btih:04c08aeaf5f6849b30cead6f722138d7ce1460c6&dn=zh-cn_windows_10_consumer_editions_version_22h2_updated_july_2024_x64_dvd_3245b006.iso&xl=7133401088
  5. fill config/sample.yaml with the following content:
config/sample.yaml

name: Default configuration
base: q35.yaml

machine:
  memory: 4G
  vcpu: 4
  # Set vcpu thread priority value [-20, 19]
  # A higher value means a lower priority
  priority: 1
  # Turn on BIOS output and performance measurement
  debug: No
  # Turn on hypervisor to lower CPU usage (Hyper-V is used for Windows)
  hypervisor: Yes

objects:
  - name: cmos
    # gmtime for linux, localtime for windows
    rtc: localtime

  - class: qxl
  - class: spice-agent
  - class: usb-tablet

  - class: virtio-network
    mac: 00:50:00:11:22:33
    map: tcp:0.0.0.0:8022-:22

  - class: ata-cdrom 
    image: /home/w568w/Downloads/win10.iso
  
  - class: ata-cdrom
    image: /home/w568w/Downloads/virtio-win-0.1.240.iso

  - class: virtio-block
    image: /home/w568w/win.qcow2
    snapshot: No

  - class: virtio-vgpu
    memory: 1G
    staging: No
    blob: No
    node: /dev/dri/renderD129

  1. run ./build/mvisor -c config/sample.yaml -vnc 5900 to install Windows normally;
  2. download release from https://github.com/tenclass/mvisor-win-vgpu-driver/releases/tag/v1.0.0, extract it in the guest Windows;
  3. run install.bat with admin permission. While installing the kernel driver, the screen immediately freezes and BSODs, and then restarts. .dmp file is dumped in C:/Windows/minidump.

3. Additional Information

I debugged a little with WinDbg and Ghidra, and I believe that the error is due to a broken Idrs[0].FreeIdList.

The error is NULL_CLASS_PTR_DEREFERENCE, and it seems that vgpu.sys+0x3753 instruction was trying to access zero address, which is likely in:

VOID UnInitializeIdr()
{
for (size_t i = 0; i < ARRAYSIZE(Idrs); i++)
{
ASSERT(Idrs[i].Initilaized);
while (!IsListEmpty(&Idrs[i].FreeIdList)) {
PLIST_ENTRY item = RemoveHeadList(&Idrs[i].FreeIdList);
PFREEID freeId = CONTAINING_RECORD(item, FREEID, Entry);
ExFreeToLookasideListEx(&Idrs[i].LookAsideList, freeId);
}
ExDeleteLookasideListEx(&Idrs[i].LookAsideList);
Idrs[i].Initilaized = FALSE;
}
}

This method is called by VirtioVgpuDeviceReleaseHardware in vgpu.c, according to the dump stacktrace.

I checked the disassembled codes of UnInitializeIdr:

  1. At +3748, LEA RBX, [0x14000c1b0] sets RBX to the address of &Idrs[0].FreeIdList static variable (i.e. RBX = &Idrs[0].FreeIdList), then jumps to 37af;
  2. At +37af, MOV RAX, qword ptr [RBX] reads the first integer from Idrs and saves to RAX, which should be RAX = Idrs[0].FreeIdList.Flink;
  3. Compare Idrs[0].FreeIdList.Flink and &Idrs[0].FreeIdList (which is 0x0 and 0x14000c1b0 respectively), and jump to +3753;
  4. At +3753, CMP qword ptr [RAX + 0x8], RBX reads from address RAX + 0x8, i.e. Idrs[0].FreeIdList.Flink->Blink, i.e. 0x0000000000000008, and the exception occurred.

The pseudocode is:

RBX = &Idrs[0].FreeIdList; // 0x14000c1b0
RAX = *RBX; // Idrs[0].FreeIdList.Flink, 0x0 (!!!)
if (RAX != RBX) { // check if Idrs[0].FreeIdList is not empty
    *(RAX + 8); // read Idrs[0].FreeIdList.Flink->Blink (to verify the list's consistency?), exception occurred!
}

4. Logs and dumps

Windows minidump: 080524-4140-01.dmp

System Information:

Kernel: Linux 6.10.3-x64v3-xanmod1
DE: KDE Plasma 6.1.3
WM: KWin (Wayland)
GPU 1: AMD Radeon Vega Series / Radeon Vega Mobile Series [Integrated]
GPU 2: NVIDIA GeForce GTX 1650 Mobile / Max-Q [Discrete]
@nooodles2023
Copy link
Collaborator

Strange, VirtioVgpuDeviceReleaseHardware was only called when the driver is unloading.
I guess you have not change you win10 to test-signing mode, so the windows unload it automatically when the Idrs have not been initialized!
The driver v1.0.0 is release mode, "ASSERT(Idrs[i].Initilaized); " didn't work.

@w568w
Copy link
Author

w568w commented Aug 6, 2024

I guess you have not change you win10 to test-signing mode

That could not be true. I did enable it with bcdedit.exe /set testsigning on.

If I didn't, I won't even get the driver to run! The driver will be blocked during installation, and nothing happens.

The driver v1.0.0 is release mode, "ASSERT(Idrs[i].Initilaized); " didn't work.

Do you mean that I should compile a driver in the debug mode by myself?

@nooodles2023
Copy link
Collaborator

The system may unload the driver due to the small memory. Attempt to allocate a larger memory size.

@w568w
Copy link
Author

w568w commented Aug 6, 2024

The system may unload the driver due to the small memory. Attempt to allocate a larger memory size.

I increased it to 8GB RAM. No luck. :(


I find my issue similar to #5 and try to compile the driver successfully by myself.

Both of his and my situations are listed here:

#5:

  1. Install released driver with install.bat directly: BSOD
  2. Install released driver in QEMU: Code 39 (Driver Entry Point Not Found)
  3. Compile the driver himself: Unable to compile

Mine:

  1. Install released driver with install.bat directly: BSOD (SYSTEM_THREAD_EXCEPTION_NOT_HANDLED)
  2. Install released driver in QEMU: Nothing happens. Seems the driver is not loaded at all
  3. Compile the driver myself: Install normally, but not working

@nooodles2023
Copy link
Collaborator

The driver only worked for Mvisor!
I guess the key reason was in loading process. The windows kernel thought the driver got sth wrong, so kernel unloaded it and cause BSOD.
I would test it on 22h2 tonight.

@w568w
Copy link
Author

w568w commented Aug 6, 2024

Update: I thought Code 39 is due to a too high version of the target NT kernel, so I decreased _NT_TARGET_VERSION to 19041 and recompile it.

Now the kernel driver installs and (seems?) loading successfully, and the device's status also becomes "Operating normally". But now I encounter #2 too. Same [5412] IOCTL_VIRTIO_VGPU_GET_CAPS failed=31 error.


I will close this issue, since the problem described here has been fixed. The later discussion will be in #2. Thanks a lot! 👍

Solution: rebuild the kernel driver with a lower _NT_TARGET_VERSION (i.e. 19041. This is your only choice in Visual Studio 2022), and reinstall it. When installing, I met BSOD again, but after rebooting the device starts working normally.

@w568w w568w closed this as completed Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants