Discussion:
Infinite IRQ injection loop in QEMU
John Snow
2014-10-22 15:33:41 UTC
Permalink
Hello all;

I've been working on improving the AHCI device emulation for QEMU but
have recently run into an issue where Windows 8 guests -- upon trying to
resume from hibernation -- manage to trigger an infinite IRQ injection
loop where it seems that the IRQ never quite properly gets cleared.

I am still working on troubleshooting it further, but I wanted to see if
anyone had advice or experience with this type of issue.

In a nutshell:
- Windows 8 boots up inside of QEMU/KVM
- Windows 8 is suspended to disk either via "shut down" or explicit
hibernate. QEMU exits.
- Windows 8 is resumed
- Windows 8 resets the AHCI device and begins re-initializing it
- Once the active AHCI port is reset, it issues an interrupt to indicate
it has a pending message (set of register values) ready for the host to
synchronize state with the HBA. This interrupt appears to be legacy PCI
and not MSI.
- This triggers an infinite injection loop.

Here are some characteristic traces from perf record, grabbing
kvm-related entries with user space traces.

Here's where the interrupt first appears to become stuck, showing when
it is set: http://pastebin.com/KPevxCw2

It looks like pin #16, vec=177. All activity in the guest and QEMU now
apparently ceases, and then the perf script shows many, many loops which
look like the following: http://pastebin.com/qYh9035y

which repeats over-and-over. It does not appear that QEMU is re-setting
the IRQ, and there are no further calls from the guest into ICH9 or AHCI
related code to set/unset any device registers.

In talking with Stefan, we think that the irr bit is possibly not
getting cleared (or getting set again?) after the EOI (see the first
paste) -- does anyone have experience with debugging this type of issue,
or have some hints about what may be happening?

Thanks in advance for any advice.
--John S.

(As a post-script: the kernel I am using is the version provided by
David Airlie for MST [Multi-Stream Transport] support in Linux, which is
still experimental. Sorry for the non-stock kernel!
http://airlied.livejournal.com/79657.html)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Paolo Bonzini
2014-10-22 18:01:31 UTC
Permalink
Post by John Snow
I've been working on improving the AHCI device emulation for QEMU but
have recently run into an issue where Windows 8 guests -- upon trying to
resume from hibernation -- manage to trigger an infinite IRQ injection
loop where it seems that the IRQ never quite properly gets cleared.
I am still working on troubleshooting it further, but I wanted to see if
anyone had advice or experience with this type of issue.
- Windows 8 boots up inside of QEMU/KVM
- Windows 8 is suspended to disk either via "shut down" or explicit
hibernate. QEMU exits.
- Windows 8 is resumed
- Windows 8 resets the AHCI device and begins re-initializing it
- Once the active AHCI port is reset, it issues an interrupt to indicate
it has a pending message (set of register values) ready for the host to
synchronize state with the HBA. This interrupt appears to be legacy PCI
and not MSI.
- This triggers an infinite injection loop.
This usually means that the interrupt was not properly cleared in the
AHCI controller. Since legacy PCI interrupts are shared, it probably
means that the guest was not expecting the AHCI interrupt and is just
not asking the driver to handle it. Perhaps the BIOS is leaving the
driver with INTX enabled, or something like that?

Paolo
Post by John Snow
Here are some characteristic traces from perf record, grabbing
kvm-related entries with user space traces.
Here's where the interrupt first appears to become stuck, showing when
it is set: http://pastebin.com/KPevxCw2
It looks like pin #16, vec=177. All activity in the guest and QEMU now
apparently ceases, and then the perf script shows many, many loops which
look like the following: http://pastebin.com/qYh9035y
which repeats over-and-over. It does not appear that QEMU is re-setting
the IRQ, and there are no further calls from the guest into ICH9 or AHCI
related code to set/unset any device registers.
In talking with Stefan, we think that the irr bit is possibly not
getting cleared (or getting set again?) after the EOI (see the first
paste) -- does anyone have experience with debugging this type of issue,
or have some hints about what may be happening?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Stefan Hajnoczi
2014-10-23 10:17:09 UTC
Permalink
Post by Paolo Bonzini
Post by John Snow
I've been working on improving the AHCI device emulation for QEMU but
have recently run into an issue where Windows 8 guests -- upon trying to
resume from hibernation -- manage to trigger an infinite IRQ injection
loop where it seems that the IRQ never quite properly gets cleared.
I am still working on troubleshooting it further, but I wanted to see if
anyone had advice or experience with this type of issue.
- Windows 8 boots up inside of QEMU/KVM
- Windows 8 is suspended to disk either via "shut down" or explicit
hibernate. QEMU exits.
- Windows 8 is resumed
- Windows 8 resets the AHCI device and begins re-initializing it
- Once the active AHCI port is reset, it issues an interrupt to indicate
it has a pending message (set of register values) ready for the host to
synchronize state with the HBA. This interrupt appears to be legacy PCI
and not MSI.
- This triggers an infinite injection loop.
This usually means that the interrupt was not properly cleared in the
AHCI controller. Since legacy PCI interrupts are shared, it probably
means that the guest was not expecting the AHCI interrupt and is just
not asking the driver to handle it. Perhaps the BIOS is leaving the
driver with INTX enabled, or something like that?
John: you could investigate that by looking at writes the PCI Command
register in Configuration Space for the ICH9 AHCI device.

Stefan

Loading...