Discussion:
qemu-kvm-1.1.0 crashing with kernel 3.5.0-rc6
(too old to reply)
Chris Clayton
2012-07-09 10:57:48 UTC
Permalink
Hi,

When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I
get a segmentation fault within 3 or 4 minutes maximum. In dmesg I see:

qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
libc-2.16.so[b697d000+1b4000]

The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur
qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations
survive for 15 minutes or more

When I try to get a backtrace with gdb, the screen on which konsole and
qemu are running locks up until I kill qemu in another console.
Consequently I can't get a full BT, but, although probably not very
helpful, what I did get is:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6946b40 (LWP 506)]
0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
Cannot access memory at address 0xb694610c
(gdb)

Note that the gdb and dmesg outputs above are not from the same crash
instance.

I'm not subscribed,so please cc me on any reply.

Happy to provide any additional diagnostics (but may need help on how to
get them) or test patches, etc

Thanks

Chris Clayton
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-11 07:09:42 UTC
Permalink
Ping.

Have I committed a bug-reporting sin in the mail below or is everyone
simply too busy to look at this kvm-related crash?
Post by Chris Clayton
Hi,
When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
libc-2.16.so[b697d000+1b4000]
The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur
qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations
survive for 15 minutes or more
When I try to get a backtrace with gdb, the screen on which konsole and
qemu are running locks up until I kill qemu in another console.
Consequently I can't get a full BT, but, although probably not very
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6946b40 (LWP 506)]
0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
Cannot access memory at address 0xb694610c
(gdb)
Note that the gdb and dmesg outputs above are not from the same crash
instance.
I'm not subscribed,so please cc me on any reply.
Happy to provide any additional diagnostics (but may need help on how to
get them) or test patches, etc
Thanks
Chris Clayton
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Gleb Natapov
2012-07-11 07:12:00 UTC
Permalink
Post by Chris Clayton
Ping.
Have I committed a bug-reporting sin in the mail below or is
everyone simply too busy to look at this kvm-related crash?
Since you have good and bad points can you bisect the problem?
Post by Chris Clayton
Post by Chris Clayton
Hi,
When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
libc-2.16.so[b697d000+1b4000]
The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur
qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations
survive for 15 minutes or more
When I try to get a backtrace with gdb, the screen on which konsole and
qemu are running locks up until I kill qemu in another console.
Consequently I can't get a full BT, but, although probably not very
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6946b40 (LWP 506)]
0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
Cannot access memory at address 0xb694610c
(gdb)
Note that the gdb and dmesg outputs above are not from the same crash
instance.
I'm not subscribed,so please cc me on any reply.
Happy to provide any additional diagnostics (but may need help on how to
get them) or test patches, etc
Thanks
Chris Clayton
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-11 07:18:17 UTC
Permalink
Post by Gleb Natapov
Post by Chris Clayton
Ping.
Have I committed a bug-reporting sin in the mail below or is
everyone simply too busy to look at this kvm-related crash?
Since you have good and bad points can you bisect the problem?
Yes, I can bisect, but since the crash occurs with only only one
combination of qemu-kvm (1.1.0) and kernel (3.5.0-rc6), I'm not sure
which of those I should bisect. Any ideas on how I could narrow that down.

Thanks.
Post by Gleb Natapov
Post by Chris Clayton
Post by Chris Clayton
Hi,
When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
libc-2.16.so[b697d000+1b4000]
The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur
qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations
survive for 15 minutes or more
When I try to get a backtrace with gdb, the screen on which konsole and
qemu are running locks up until I kill qemu in another console.
Consequently I can't get a full BT, but, although probably not very
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6946b40 (LWP 506)]
0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
Cannot access memory at address 0xb694610c
(gdb)
Note that the gdb and dmesg outputs above are not from the same crash
instance.
I'm not subscribed,so please cc me on any reply.
Happy to provide any additional diagnostics (but may need help on how to
get them) or test patches, etc
Thanks
Chris Clayton
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Gleb Natapov
2012-07-11 07:22:55 UTC
Permalink
Post by Chris Clayton
Post by Gleb Natapov
Post by Chris Clayton
Ping.
Have I committed a bug-reporting sin in the mail below or is
everyone simply too busy to look at this kvm-related crash?
Since you have good and bad points can you bisect the problem?
Yes, I can bisect, but since the crash occurs with only only one
combination of qemu-kvm (1.1.0) and kernel (3.5.0-rc6), I'm not sure
which of those I should bisect. Any ideas on how I could narrow that down.
Bisect qemu between qemu-kvm-1.0.1 & qemu-kvm-1.1.0.
Post by Chris Clayton
Thanks.
Post by Gleb Natapov
Post by Chris Clayton
Post by Chris Clayton
Hi,
When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
libc-2.16.so[b697d000+1b4000]
The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur
qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations
survive for 15 minutes or more
When I try to get a backtrace with gdb, the screen on which konsole and
qemu are running locks up until I kill qemu in another console.
Consequently I can't get a full BT, but, although probably not very
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6946b40 (LWP 506)]
0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
Cannot access memory at address 0xb694610c
(gdb)
Note that the gdb and dmesg outputs above are not from the same crash
instance.
I'm not subscribed,so please cc me on any reply.
Happy to provide any additional diagnostics (but may need help on how to
get them) or test patches, etc
Thanks
Chris Clayton
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Gleb.
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-15 19:52:37 UTC
Permalink
Post by Gleb Natapov
Post by Chris Clayton
Post by Gleb Natapov
Post by Chris Clayton
Ping.
Have I committed a bug-reporting sin in the mail below or is
everyone simply too busy to look at this kvm-related crash?
Since you have good and bad points can you bisect the problem?
Yes, I can bisect, but since the crash occurs with only only one
combination of qemu-kvm (1.1.0) and kernel (3.5.0-rc6), I'm not sure
which of those I should bisect. Any ideas on how I could narrow that down.
Bisect qemu between qemu-kvm-1.0.1 & qemu-kvm-1.1.0.
Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
times more invocations before the crash occurs with 1.0.1 and I haven't
used qemu-kvm much in the past few weeks.

I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
linux-3.4.4. I'll report back in a day or two.
Post by Gleb Natapov
Post by Chris Clayton
Thanks.
Post by Gleb Natapov
Post by Chris Clayton
Post by Chris Clayton
Hi,
When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
libc-2.16.so[b697d000+1b4000]
The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur
qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations
survive for 15 minutes or more
When I try to get a backtrace with gdb, the screen on which konsole and
qemu are running locks up until I kill qemu in another console.
Consequently I can't get a full BT, but, although probably not very
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6946b40 (LWP 506)]
0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e8c94c in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
Cannot access memory at address 0xb694610c
(gdb)
Note that the gdb and dmesg outputs above are not from the same crash
instance.
I'm not subscribed,so please cc me on any reply.
Happy to provide any additional diagnostics (but may need help on how to
get them) or test patches, etc
Thanks
Chris Clayton
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Gleb.
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-19 12:14:20 UTC
Permalink
Post by Chris Clayton
Post by Gleb Natapov
Post by Chris Clayton
Post by Gleb Natapov
Post by Chris Clayton
Ping.
Have I committed a bug-reporting sin in the mail below or is
everyone simply too busy to look at this kvm-related crash?
Since you have good and bad points can you bisect the problem?
Yes, I can bisect, but since the crash occurs with only only one
combination of qemu-kvm (1.1.0) and kernel (3.5.0-rc6), I'm not sure
which of those I should bisect. Any ideas on how I could narrow that down.
Bisect qemu between qemu-kvm-1.0.1 & qemu-kvm-1.1.0.
Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
times more invocations before the crash occurs with 1.0.1 and I haven't
used qemu-kvm much in the past few weeks.
I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
linux-3.4.4. I'll report back in a day or two.
I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
That would indicate that the problem is in the kernel. However, I pulled
the latest and greatest from Linus yesterday evening and I now can't get
the crash there either, so whatever it was seems to have been fixed. If
I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
so it's been fixed in the last few days.

Thanks
Post by Chris Clayton
Post by Gleb Natapov
Post by Chris Clayton
Thanks.
Post by Gleb Natapov
Post by Chris Clayton
Post by Chris Clayton
Hi,
When I run WinXP SP3 through qemu-kvm-1.1.0 on linux kernel 3.5.0-rc6, I
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm: sending ioctl 5326 to a partition!
qemu-kvm: sending ioctl 801c0204 to a partition!
qemu-kvm[860] general protection ip:b6abad77 sp:b52ff09c error:0 in
libc-2.16.so[b697d000+1b4000]
The crash does not occur with qemu-kvm-1.0.1 on rc6. Nor does it occur
qemu-kvm-1.0.1 or qemu-kvm-1.1.0 on kernel 3.4.4. All three combinations
survive for 15 minutes or more
When I try to get a backtrace with gdb, the screen on which konsole and
qemu are running locks up until I kill qemu in another console.
Consequently I can't get a full BT, but, although probably not very
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6946b40 (LWP 506)]
0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7705d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e8d6e3 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e8c94c in g_hash_table_lookup () from
/usr/lib/libglib-2.0.so.0
Cannot access memory at address 0xb694610c
(gdb)
Note that the gdb and dmesg outputs above are not from the same crash
instance.
I'm not subscribed,so please cc me on any reply.
Happy to provide any additional diagnostics (but may need help on how to
get them) or test patches, etc
Thanks
Chris Clayton
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Gleb.
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-19 12:17:26 UTC
Permalink
Post by Chris Clayton
Post by Chris Clayton
Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
times more invocations before the crash occurs with 1.0.1 and I haven't
used qemu-kvm much in the past few weeks.
I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
linux-3.4.4. I'll report back in a day or two.
I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
That would indicate that the problem is in the kernel. However, I pulled
the latest and greatest from Linus yesterday evening and I now can't get
the crash there either, so whatever it was seems to have been fixed. If
I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
so it's been fixed in the last few days.
There were no kvm changes post-rc7.
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-19 18:23:41 UTC
Permalink
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
times more invocations before the crash occurs with 1.0.1 and I haven't
used qemu-kvm much in the past few weeks.
I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
linux-3.4.4. I'll report back in a day or two.
I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
That would indicate that the problem is in the kernel. However, I pulled
the latest and greatest from Linus yesterday evening and I now can't get
the crash there either, so whatever it was seems to have been fixed. If
I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
so it's been fixed in the last few days.
There were no kvm changes post-rc7.
Yes, I'm aware of that, Avi. This thread started because I was getting a
crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
out the the problem was also present in v1.0.1, but much harder to hit.
However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
version of qemu-kvm, was stable. So then it seemed that the problem was
in the kernel, (but not necessarily in the kvm code).

Something that's changed since rc7 has either fixed the problem or made
it much harder to hit. With rc7 and earlier I can recreate the crash
quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
rc7+, I haven't been able to get a crash at all.

I'm not inclined to bisect to find out which patch provided the fix, but
this mail should at least close the mail thread down tidily.

Chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-26 09:52:41 UTC
Permalink
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
times more invocations before the crash occurs with 1.0.1 and I haven't
used qemu-kvm much in the past few weeks.
I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
linux-3.4.4. I'll report back in a day or two.
I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
That would indicate that the problem is in the kernel. However, I pulled
the latest and greatest from Linus yesterday evening and I now can't get
the crash there either, so whatever it was seems to have been fixed. If
I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
so it's been fixed in the last few days.
There were no kvm changes post-rc7.
Yes, I'm aware of that, Avi. This thread started because I was getting a
crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
out the the problem was also present in v1.0.1, but much harder to hit.
However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
version of qemu-kvm, was stable. So then it seemed that the problem was
in the kernel, (but not necessarily in the kvm code).
Something that's changed since rc7 has either fixed the problem or made
it much harder to hit. With rc7 and earlier I can recreate the crash
quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
rc7+, I haven't been able to get a crash at all.
Well, I'm getting the crash again, but this time I've managed to get a
backtrace:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 9405)]
0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
#3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
qom/object.c:94
#4 type_get_by_name (name=***@entry=0x802b0c50 "apic-common") at
qom/object.c:149
#5 0x8014e933 in object_dynamic_cast (obj=***@entry=0x80a5d818,
typename=***@entry=0x802b0c50 "apic-common")
at qom/object.c:416
#6 0x8014e8b9 in object_dynamic_cast_assert (obj=***@entry=0x80a5d818,
typename=***@entry=0x802b0c50 "apic-common") at qom/object.c:478
#7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#8 0x801d0560 in kvm_arch_post_run (env=***@entry=0x80a55a60,
run=***@entry=0xb6239000)
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#9 0x801cb05f in kvm_cpu_exec (env=***@entry=0x80a55a60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
#12 0xb77bbbbe in clone () from /lib/libc.so.6

This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built
against 3.4.4 kernel headers. The glibc, the kernel headers and the
kernel are vanilla and the only change to the qemu-kvm sources is:

--- qemu-kvm-1.1.0/configure~ 2012-07-15 22:38:39.000000000 +0100
+++ qemu-kvm-1.1.0/configure 2012-07-15 22:39:09.000000000 +0100
@@ -2783,7 +2783,7 @@ int main(int argc, char **argv)
}
EOF
if ! compile_prog "" "" ; then
- CFLAGS+="-march=i486"
+ CFLAGS+="-march=i686"
fi
fi

Please let me know of anything I can do to help track this down.

Thanks

Chris
Post by Chris Clayton
I'm not inclined to bisect to find out which patch provided the fix, but
this mail should at least close the mail thread down tidily.
Chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-26 10:01:10 UTC
Permalink
Post by Chris Clayton
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
times more invocations before the crash occurs with 1.0.1 and I haven't
used qemu-kvm much in the past few weeks.
I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
linux-3.4.4. I'll report back in a day or two.
I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
That would indicate that the problem is in the kernel. However, I pulled
the latest and greatest from Linus yesterday evening and I now can't get
the crash there either, so whatever it was seems to have been fixed. If
I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
so it's been fixed in the last few days.
There were no kvm changes post-rc7.
Yes, I'm aware of that, Avi. This thread started because I was getting a
crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
out the the problem was also present in v1.0.1, but much harder to hit.
However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
version of qemu-kvm, was stable. So then it seemed that the problem was
in the kernel, (but not necessarily in the kvm code).
Something that's changed since rc7 has either fixed the problem or made
it much harder to hit. With rc7 and earlier I can recreate the crash
quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
rc7+, I haven't been able to get a crash at all.
Well, I'm getting the crash again, but this time I've managed to get a
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 9405)]
0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
#3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
qom/object.c:94
qom/object.c:149
at qom/object.c:416
#7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
#12 0xb77bbbbe in clone () from /lib/libc.so.6
This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built
It looks like general memory corruption. Is this repeatable? What's
the guest uptime when it happens (i.e. is it immediate?)

Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Kiszka
2012-07-26 10:29:15 UTC
Permalink
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
times more invocations before the crash occurs with 1.0.1 and I haven't
used qemu-kvm much in the past few weeks.
I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
linux-3.4.4. I'll report back in a day or two.
I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
That would indicate that the problem is in the kernel. However, I pulled
the latest and greatest from Linus yesterday evening and I now can't get
the crash there either, so whatever it was seems to have been fixed. If
I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
so it's been fixed in the last few days.
There were no kvm changes post-rc7.
Yes, I'm aware of that, Avi. This thread started because I was getting a
crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
out the the problem was also present in v1.0.1, but much harder to hit.
However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
version of qemu-kvm, was stable. So then it seemed that the problem was
in the kernel, (but not necessarily in the kvm code).
Something that's changed since rc7 has either fixed the problem or made
it much harder to hit. With rc7 and earlier I can recreate the crash
quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
rc7+, I haven't been able to get a crash at all.
Well, I'm getting the crash again, but this time I've managed to get a
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 9405)]
0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
#3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
qom/object.c:94
qom/object.c:149
at qom/object.c:416
#7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
#12 0xb77bbbbe in clone () from /lib/libc.so.6
This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built
It looks like general memory corruption. Is this repeatable? What's
the guest uptime when it happens (i.e. is it immediate?)
Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
To sync the userspace state with what the kernel maintains. Will end up
in kvm_apic_set_tpr which does precisely this. We always did, just the
QOM modeling is new.

Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-26 10:45:57 UTC
Permalink
Post by Jan Kiszka
Post by Avi Kivity
It looks like general memory corruption. Is this repeatable? What's
the guest uptime when it happens (i.e. is it immediate?)
Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
To sync the userspace state with what the kernel maintains. Will end up
in kvm_apic_set_tpr which does precisely this. We always did, just the
QOM modeling is new.
We should move it to the general register synchronization code, there is
no reason to do this every exit (though the cost is likely minimal).
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Kiszka
2012-07-26 10:49:46 UTC
Permalink
Post by Avi Kivity
Post by Jan Kiszka
Post by Avi Kivity
It looks like general memory corruption. Is this repeatable? What's
the guest uptime when it happens (i.e. is it immediate?)
Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
To sync the userspace state with what the kernel maintains. Will end up
in kvm_apic_set_tpr which does precisely this. We always did, just the
QOM modeling is new.
We should move it to the general register synchronization code, there is
no reason to do this every exit (though the cost is likely minimal).
The cost is, well, was close to nothing. But I'm not sure about that QOM
type casting magic (and also it's locking requirements, long-term).
However, if that is a problem, it's likely a much bigger one anyway.

Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Kiszka
2012-07-26 11:04:59 UTC
Permalink
Post by Jan Kiszka
Post by Avi Kivity
Post by Jan Kiszka
Post by Avi Kivity
It looks like general memory corruption. Is this repeatable? What's
the guest uptime when it happens (i.e. is it immediate?)
Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
To sync the userspace state with what the kernel maintains. Will end up
in kvm_apic_set_tpr which does precisely this. We always did, just the
QOM modeling is new.
We should move it to the general register synchronization code, there is
no reason to do this every exit (though the cost is likely minimal).
The cost is, well, was close to nothing. But I'm not sure about that QOM
type casting magic (and also it's locking requirements, long-term).
However, if that is a problem, it's likely a much bigger one anyway.
But, independent of this, we can likely move the whole kvm_arch_post_run
out of the exit path for kvm_irqchip_in_kernel() == true. The price is
that we create more deviation between both, but that should be
controllable. I will play with a patch.

Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-26 11:58:07 UTC
Permalink
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
times more invocations before the crash occurs with 1.0.1 and I haven't
used qemu-kvm much in the past few weeks.
I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
linux-3.4.4. I'll report back in a day or two.
I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
That would indicate that the problem is in the kernel. However, I pulled
the latest and greatest from Linus yesterday evening and I now can't get
the crash there either, so whatever it was seems to have been fixed. If
I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
so it's been fixed in the last few days.
There were no kvm changes post-rc7.
Yes, I'm aware of that, Avi. This thread started because I was getting a
crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
out the the problem was also present in v1.0.1, but much harder to hit.
However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
version of qemu-kvm, was stable. So then it seemed that the problem was
in the kernel, (but not necessarily in the kvm code).
Something that's changed since rc7 has either fixed the problem or made
it much harder to hit. With rc7 and earlier I can recreate the crash
quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
rc7+, I haven't been able to get a crash at all.
Well, I'm getting the crash again, but this time I've managed to get a
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 9405)]
0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
#3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
qom/object.c:94
qom/object.c:149
at qom/object.c:416
#7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
#12 0xb77bbbbe in clone () from /lib/libc.so.6
This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built
It looks like general memory corruption. Is this repeatable? What's
the guest uptime when it happens (i.e. is it immediate?)
I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed
early as XP was starting up - well before the desktop would have
appeared. The other two crashed as XP was closing down, having been
running for a few minutes (but not doing much).

The error messages seen through dmesg are:

qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in
libc-2.16.so[b6b06000+1b4000]
qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in
libc-2.16.so[b6ab9000+1b4000]
qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in
libc-2.16.so[b6b96000+1b4000]
qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in
libc-2.16.so[b6b54000+1b4000]
qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in
libc-2.16.so[b6b1e000+1b4000]

The other 5 were OK, although I only did a bit of web browsing for few
minutes with IE.
Post by Avi Kivity
Jan, why are we calling cpu_set_apic_tpr() with kvm_irqchip_in_kernel?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-26 12:07:07 UTC
Permalink
Post by Chris Clayton
Post by Avi Kivity
It looks like general memory corruption. Is this repeatable? What's
the guest uptime when it happens (i.e. is it immediate?)
I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed
early as XP was starting up - well before the desktop would have
appeared. The other two crashed as XP was closing down, having been
running for a few minutes (but not doing much).
qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in
libc-2.16.so[b6b06000+1b4000]
qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in
libc-2.16.so[b6ab9000+1b4000]
qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in
libc-2.16.so[b6b96000+1b4000]
qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in
libc-2.16.so[b6b54000+1b4000]
qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in
libc-2.16.so[b6b1e000+1b4000]
The other 5 were OK, although I only did a bit of web browsing for few
minutes with IE.
Failures always in the same place (I'm guess the variations are due to
PIE -- please configure with --disable-pie for future tests).

Please generate a core and look around, esp. in frame 3
(type_table_lookup). Also try to dissect type_table (you may need to
install the glib debug symbols for this).
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-26 23:22:11 UTC
Permalink
Post by Avi Kivity
Post by Chris Clayton
Post by Avi Kivity
It looks like general memory corruption. Is this repeatable? What's
the guest uptime when it happens (i.e. is it immediate?)
I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed
early as XP was starting up - well before the desktop would have
appeared. The other two crashed as XP was closing down, having been
running for a few minutes (but not doing much).
qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in
libc-2.16.so[b6b06000+1b4000]
qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in
libc-2.16.so[b6ab9000+1b4000]
qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in
libc-2.16.so[b6b96000+1b4000]
qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in
libc-2.16.so[b6b54000+1b4000]
qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in
libc-2.16.so[b6b1e000+1b4000]
The other 5 were OK, although I only did a bit of web browsing for few
minutes with IE.
Failures always in the same place (I'm guess the variations are due to
PIE -- please configure with --disable-pie for future tests).
Please generate a core and look around, esp. in frame 3
(type_table_lookup). Also try to dissect type_table (you may need to
install the glib debug symbols for this).
Mmm, I'm sailing out of my comfort zone here, but I've built a debug
version of glib and trapped another crash. The backtrace is:

(gdb) bt
#0 0xb7822d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
key=0x8319b82, hash_return=0xb60ff178)
at ghash.c:422
#3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
key=***@entry=0x8319b82) at ghash.c:1074
#4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
qom/object.c:94
#5 type_get_by_name (name=***@entry=0x8319b82 "apic-common") at
qom/object.c:149
#6 0x0815cf93 in object_dynamic_cast (obj=***@entry=0x8a44818,
typename=***@entry=0x8319b82 "apic-common")
at qom/object.c:416
#7 0x0815cf2d in object_dynamic_cast_assert (obj=***@entry=0x8a44818,
typename=***@entry=0x8319b82 "apic-common") at qom/object.c:478
#8 0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=8 '\b')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#9 0x081cb86c in kvm_arch_post_run (env=***@entry=0x8a3ca60,
run=***@entry=0xb6258000)
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#10 0x081c686f in kvm_cpu_exec (env=***@entry=0x8a3ca60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77dabbe in clone () from /lib/libc.so.6

Inspecting the args passed into g_str_equal shows:

(gdb) print (gchar *) 0x8a0cd58
$12 = (gchar *) 0x8a0cd58 "apic-common"
(gdb) print (gchar *) 0x8319b82
$13 = (gchar *) 0x8319b82 "apic-common"

So it seems odd that glibc's implementation of strcmp should crash with
two equal strings. As I say, however, I'm a bit out of my comfort zone
here, so I may be missing something.

I wouldn't know how to go about disecting type_table, which I assume is
the hash_table arg passed into g_hash_table_lookup, so advice on how to
do that and what I am looking for (NULL pointer?) would be helpful.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-27 10:46:50 UTC
Permalink
Post by Avi Kivity
Post by Chris Clayton
Post by Avi Kivity
It looks like general memory corruption. Is this repeatable? What's
the guest uptime when it happens (i.e. is it immediate?)
I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed
early as XP was starting up - well before the desktop would have
appeared. The other two crashed as XP was closing down, having been
running for a few minutes (but not doing much).
qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in
libc-2.16.so[b6b06000+1b4000]
qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in
libc-2.16.so[b6ab9000+1b4000]
qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in
libc-2.16.so[b6b96000+1b4000]
qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in
libc-2.16.so[b6b54000+1b4000]
qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in
libc-2.16.so[b6b1e000+1b4000]
The other 5 were OK, although I only did a bit of web browsing for few
minutes with IE.
Failures always in the same place (I'm guess the variations are due to
PIE -- please configure with --disable-pie for future tests).
Please generate a core and look around, esp. in frame 3
(type_table_lookup). Also try to dissect type_table (you may need to
install the glib debug symbols for this).
<snip>
Here's another backtrace and source listing of the failing function,
following build and installation of libc (2.16) with debugging turned
on. I'm afraid it's beyond my current knowledge to know what this might
be telling us.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 6515)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217 movdqu (%edx), %xmm2
(gdb) generate-core-file
Saved corefile core.6509
(gdb) bt
#0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
key=0x8319b82, hash_return=0xb60ff178)
at ghash.c:422
#3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
key=***@entry=0x8319b82) at ghash.c:1074
#4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
qom/object.c:94
#5 type_get_by_name (name=***@entry=0x8319b82 "apic-common") at
qom/object.c:149
#6 0x0815cf93 in object_dynamic_cast (obj=***@entry=0x8a44818,
typename=***@entry=0x8319b82 "apic-common")
at qom/object.c:416
#7 0x0815cf2d in object_dynamic_cast_assert (obj=***@entry=0x8a44818,
typename=***@entry=0x8319b82 "apic-common") at qom/object.c:478
#8 0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#9 0x081cb86c in kvm_arch_post_run (env=***@entry=0x8a3ca60,
run=***@entry=0xb6271000)
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#10 0x081c686f in kvm_cpu_exec (env=***@entry=0x8a3ca60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
(gdb) print *(0x8a0cd58)
$1 = 1667854433
(gdb) print (char*) 0x8a0cd58
$2 = 0x8a0cd58 "apic-common"
(gdb) list __strcmp_sse4_2
201 PUSH (REM)
202 #endif
203 #if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L
204 PUSH (%edi)
205 #endif
206 mov STR1(%esp), %edx
207 mov STR2(%esp), %eax
208 #if defined USE_AS_STRNCMP || defined USE_AS_STRNCASECMP_L
209 movl CNT(%esp), REM
210 test REM, REM
(gdb) list
211 je L(eq)
212 #endif
213 mov %dx, %cx
214 and $0xfff, %cx
215 cmp $0xff0, %cx
216 ja L(first4bytes)
217 movdqu (%edx), %xmm2
218 mov %eax, %ecx
219 and $0xfff, %ecx
220 cmp $0xff0, %ecx
(gdb) list
221 ja L(first4bytes)
222 #if defined USE_AS_STRCASECMP_L || defined USE_AS_STRNCASECMP_L
223 # define TOLOWER(reg1, reg2) \
224 movdqa reg1, %xmm3;
\
225 movdqa UCHIGH_reg, %xmm4;
\
226 movdqa reg2, %xmm5;
\
227 movdqa UCHIGH_reg, %xmm6;
\
228 pcmpgtb UCLOW_reg, %xmm3;
\
229 pcmpgtb reg1, %xmm4;
\
230 pcmpgtb UCLOW_reg, %xmm5;
\
(gdb)

I'll stop sending backtraces etc in now in the hope that someone will
advise me on how I might better direct my efforts.

Thanks for your help so far.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-27 19:04:57 UTC
Permalink
Could you include the output of "info registers" at the point where it
crashed?
Here you go:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6a78b40 (LWP 13249)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217 movdqu (%edx), %xmm2
(gdb) bt
#0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
key=0x8319b82, hash_return=0xb6a78178)
at ghash.c:422
#3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
key=***@entry=0x8319b82) at ghash.c:1074
#4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
qom/object.c:94
#5 type_get_by_name (name=***@entry=0x8319b82 "apic-common") at
qom/object.c:149
#6 0x0815cf93 in object_dynamic_cast (obj=***@entry=0x8a313e0,
typename=***@entry=0x8319b82 "apic-common")
at qom/object.c:416
#7 0x0815cf2d in object_dynamic_cast_assert (obj=***@entry=0x8a313e0,
typename=***@entry=0x8319b82 "apic-common") at qom/object.c:478
#8 0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#9 0x081cb86c in kvm_arch_post_run (env=***@entry=0x8a29370,
run=***@entry=0xb6274000)
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#10 0x081c686f in kvm_cpu_exec (env=***@entry=0x8a29370) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
(gdb) info registers
eax 0x8319b82 137468802
ecx 0xd58 3416
edx 0x8a0cd58 144756056
ebx 0xb7f7f2c4 -1208487228
esp 0xb6a780ec 0xb6a780ec
ebp 0xb6a78118 0xb6a78118
esi 0x8a313e0 144905184
edi 0xc513 50451
eip 0xb7824f77 0xb7824f77 <__strcmp_sse4_2+23>
eflags 0x10283 [ CF SF IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x33 51


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-29 12:42:48 UTC
Permalink
Post by Chris Clayton
Could you include the output of "info registers" at the point where it
crashed?
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6a78b40 (LWP 13249)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217 movdqu (%edx), %xmm2
(gdb) bt
#0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
key=0x8319b82, hash_return=0xb6a78178)
at ghash.c:422
#3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
#4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
qom/object.c:94
qom/object.c:149
at qom/object.c:416
#8 0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
(gdb) info registers
eax 0x8319b82 137468802
ecx 0xd58 3416
edx 0x8a0cd58 144756056
ebx 0xb7f7f2c4 -1208487228
esp 0xb6a780ec 0xb6a780ec
ebp 0xb6a78118 0xb6a78118
esi 0x8a313e0 144905184
edi 0xc513 50451
eip 0xb7824f77 0xb7824f77 <__strcmp_sse4_2+23>
eflags 0x10283 [ CF SF IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x33 51
ds shouldn't be zero for a 32-bit process.

But that should have crashed *much* earlier, ds is accessed all the time.

Please add the following snippet to the beginning of kvm_arch_post_run():

{
unsigned short ds;
asm("mov %%ds, %0" : "=rm"(ds));
assert(ds != 0);
}

if the assert triggers, then kvm corrupted the segment registers. If
not, corruption happens somewhere above.
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-29 14:03:00 UTC
Permalink
Post by Avi Kivity
Post by Chris Clayton
Could you include the output of "info registers" at the point where it
crashed?
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6a78b40 (LWP 13249)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217 movdqu (%edx), %xmm2
(gdb) bt
#0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
key=0x8319b82, hash_return=0xb6a78178)
at ghash.c:422
#3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
#4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
qom/object.c:94
qom/object.c:149
at qom/object.c:416
#8 0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e45ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
(gdb) info registers
eax 0x8319b82 137468802
ecx 0xd58 3416
edx 0x8a0cd58 144756056
ebx 0xb7f7f2c4 -1208487228
esp 0xb6a780ec 0xb6a780ec
ebp 0xb6a78118 0xb6a78118
esi 0x8a313e0 144905184
edi 0xc513 50451
eip 0xb7824f77 0xb7824f77 <__strcmp_sse4_2+23>
eflags 0x10283 [ CF SF IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x33 51
ds shouldn't be zero for a 32-bit process.
But that should have crashed *much* earlier, ds is accessed all the time.
{
unsigned short ds;
asm("mov %%ds, %0" : "=rm"(ds));
assert(ds != 0);
}
if the assert triggers, then kvm corrupted the segment registers. If
not, corruption happens somewhere above.
Thanks, Avi.

The assert didn't trigger - I got:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 2134)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217 movdqu (%edx), %xmm2
(gdb) info registers
eax 0x8319ba2 137468834
ecx 0xd58 3416
edx 0x8a0cd58 144756056
ebx 0xb7f7f2c4 -1208487228
esp 0xb60ff0ec 0xb60ff0ec
ebp 0xb60ff118 0xb60ff118
esi 0x8a44818 144984088
edi 0xc513 50451
eip 0xb7820f77 0xb7820f77 <__strcmp_sse4_2+23>
eflags 0x10283 [ CF SF IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x33 51
(gdb) list
212 #endif
213 mov %dx, %cx
214 and $0xfff, %cx
215 cmp $0xff0, %cx
216 ja L(first4bytes)
217 movdqu (%edx), %xmm2
218 mov %eax, %ecx
219 and $0xfff, %ecx
220 cmp $0xff0, %ecx
221 ja L(first4bytes)
(gdb) bt
#0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319ba2) at ghash.c:1704
#2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
key=0x8319ba2, hash_return=0xb60ff178)
at ghash.c:422
#3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
key=***@entry=0x8319ba2) at ghash.c:1074
#4 0x0815c9cb in type_table_lookup (name=0x8319ba2 "apic-common") at
qom/object.c:94
#5 type_get_by_name (name=***@entry=0x8319ba2 "apic-common") at
qom/object.c:149
#6 0x0815cf93 in object_dynamic_cast (obj=***@entry=0x8a44818,
typename=***@entry=0x8319ba2 "apic-common")
at qom/object.c:416
#7 0x0815cf2d in object_dynamic_cast_assert (obj=***@entry=0x8a44818,
typename=***@entry=0x8319ba2 "apic-common") at qom/object.c:478
#8 0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
#9 0x081cb874 in kvm_arch_post_run (env=***@entry=0x8a3ca60,
run=***@entry=0xb626d000)
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1702
#10 0x081c686f in kvm_cpu_exec (env=***@entry=0x8a3ca60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ad9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e05ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132

I think you are saying that the problem isn't in kvm, so where would you
recommend I continue investigations. I'm not seeing a crash with any
other applications.

Thanks again.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-29 14:18:26 UTC
Permalink
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Could you include the output of "info registers" at the point where it
crashed?
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6a78b40 (LWP 13249)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217 movdqu (%edx), %xmm2
(gdb) bt
#0 __strcmp_sse4_2 () at
../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
key=0x8319b82, hash_return=0xb6a78178)
at ghash.c:422
#3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
#4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
qom/object.c:94
qom/object.c:149
at qom/object.c:416
qom/object.c:478
#8 0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e45ee in clone () at
../sysdeps/unix/sysv/linux/i386/clone.S:132
(gdb) info registers
eax 0x8319b82 137468802
ecx 0xd58 3416
edx 0x8a0cd58 144756056
ebx 0xb7f7f2c4 -1208487228
esp 0xb6a780ec 0xb6a780ec
ebp 0xb6a78118 0xb6a78118
esi 0x8a313e0 144905184
edi 0xc513 50451
eip 0xb7824f77 0xb7824f77 <__strcmp_sse4_2+23>
eflags 0x10283 [ CF SF IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x33 51
ds shouldn't be zero for a 32-bit process.
But that should have crashed *much* earlier, ds is accessed all the time.
{
unsigned short ds;
asm("mov %%ds, %0" : "=rm"(ds));
assert(ds != 0);
}
if the assert triggers, then kvm corrupted the segment registers. If
not, corruption happens somewhere above.
Thanks, Avi.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 2134)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217 movdqu (%edx), %xmm2
(gdb) info registers
eax 0x8319ba2 137468834
ecx 0xd58 3416
edx 0x8a0cd58 144756056
ebx 0xb7f7f2c4 -1208487228
esp 0xb60ff0ec 0xb60ff0ec
ebp 0xb60ff118 0xb60ff118
esi 0x8a44818 144984088
edi 0xc513 50451
eip 0xb7820f77 0xb7820f77 <__strcmp_sse4_2+23>
eflags 0x10283 [ CF SF IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x33 51
(gdb) list
212 #endif
213 mov %dx, %cx
214 and $0xfff, %cx
215 cmp $0xff0, %cx
216 ja L(first4bytes)
217 movdqu (%edx), %xmm2
218 mov %eax, %ecx
219 and $0xfff, %ecx
220 cmp $0xff0, %ecx
221 ja L(first4bytes)
(gdb) bt
#0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319ba2) at ghash.c:1704
#2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
key=0x8319ba2, hash_return=0xb60ff178)
at ghash.c:422
#3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
#4 0x0815c9cb in type_table_lookup (name=0x8319ba2 "apic-common") at
qom/object.c:94
qom/object.c:149
at qom/object.c:416
#8 0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1702
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ad9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e05ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
I think you are saying that the problem isn't in kvm, so where would you
recommend I continue investigations. I'm not seeing a crash with any
other applications.
What might have happened is that the movdqu instruction faulted (as it's
an fpu instruction), and on the way back from the fault, ds and es
didn't get restored correctly.

You can test this by writing a trivial version of g_str_equal()
somewhere in the qemu source code and rebuilding it.
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-29 14:48:00 UTC
Permalink
Post by Avi Kivity
Post by Chris Clayton
I think you are saying that the problem isn't in kvm, so where would you
recommend I continue investigations. I'm not seeing a crash with any
other applications.
What might have happened is that the movdqu instruction faulted (as it's
an fpu instruction), and on the way back from the fault, ds and es
didn't get restored correctly.
You can test this by writing a trivial version of g_str_equal()
somewhere in the qemu source code and rebuilding it.
You're running a 32-bit kernel, yes? Please confirm.
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-29 15:21:55 UTC
Permalink
Post by Avi Kivity
Post by Avi Kivity
Post by Chris Clayton
I think you are saying that the problem isn't in kvm, so where would you
recommend I continue investigations. I'm not seeing a crash with any
other applications.
What might have happened is that the movdqu instruction faulted (as it's
an fpu instruction), and on the way back from the fault, ds and es
didn't get restored correctly.
You can test this by writing a trivial version of g_str_equal()
somewhere in the qemu source code and rebuilding it.
You're running a 32-bit kernel, yes? Please confirm.
Yes, I am running a 32-bit kernel and userland.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-29 15:47:00 UTC
Permalink
Post by Avi Kivity
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Could you include the output of "info registers" at the point where it
crashed?
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb6a78b40 (LWP 13249)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217 movdqu (%edx), %xmm2
(gdb) bt
#0 __strcmp_sse4_2 () at
../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319b82) at ghash.c:1704
#2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
key=0x8319b82, hash_return=0xb6a78178)
at ghash.c:422
#3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
#4 0x0815c9cb in type_table_lookup (name=0x8319b82 "apic-common") at
qom/object.c:94
qom/object.c:149
at qom/object.c:416
qom/object.c:478
#8 0x08192c1b in cpu_set_apic_tpr (d=0x8a313e0, val=13 '\r')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a29370) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ed9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e45ee in clone () at
../sysdeps/unix/sysv/linux/i386/clone.S:132
(gdb) info registers
eax 0x8319b82 137468802
ecx 0xd58 3416
edx 0x8a0cd58 144756056
ebx 0xb7f7f2c4 -1208487228
esp 0xb6a780ec 0xb6a780ec
ebp 0xb6a78118 0xb6a78118
esi 0x8a313e0 144905184
edi 0xc513 50451
eip 0xb7824f77 0xb7824f77 <__strcmp_sse4_2+23>
eflags 0x10283 [ CF SF IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x33 51
ds shouldn't be zero for a 32-bit process.
But that should have crashed *much* earlier, ds is accessed all the time.
{
unsigned short ds;
asm("mov %%ds, %0" : "=rm"(ds));
assert(ds != 0);
}
if the assert triggers, then kvm corrupted the segment registers. If
not, corruption happens somewhere above.
Thanks, Avi.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 2134)]
__strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
217 movdqu (%edx), %xmm2
(gdb) info registers
eax 0x8319ba2 137468834
ecx 0xd58 3416
edx 0x8a0cd58 144756056
ebx 0xb7f7f2c4 -1208487228
esp 0xb60ff0ec 0xb60ff0ec
ebp 0xb60ff118 0xb60ff118
esi 0x8a44818 144984088
edi 0xc513 50451
eip 0xb7820f77 0xb7820f77 <__strcmp_sse4_2+23>
eflags 0x10283 [ CF SF IF RF ]
cs 0x73 115
ss 0x7b 123
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x33 51
(gdb) list
212 #endif
213 mov %dx, %cx
214 and $0xfff, %cx
215 cmp $0xff0, %cx
216 ja L(first4bytes)
217 movdqu (%edx), %xmm2
218 mov %eax, %ecx
219 and $0xfff, %ecx
220 cmp $0xff0, %ecx
221 ja L(first4bytes)
(gdb) bt
#0 __strcmp_sse4_2 () at ../sysdeps/i386/i686/multiarch/strcmp-sse4.S:217
#1 0xb7e82cb4 in g_str_equal (v1=0x8a0cd58, v2=0x8319ba2) at ghash.c:1704
#2 0xb7e8137a in g_hash_table_lookup_node (hash_table=0x89fe800,
key=0x8319ba2, hash_return=0xb60ff178)
at ghash.c:422
#3 0xb7e821e5 in g_hash_table_lookup (hash_table=0x89fe800,
#4 0x0815c9cb in type_table_lookup (name=0x8319ba2 "apic-common") at
qom/object.c:94
qom/object.c:149
at qom/object.c:416
#8 0x08192c1b in cpu_set_apic_tpr (d=0x8a44818, val=7 '\a')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1702
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#11 0x08198c72 in qemu_kvm_cpu_thread_fn (arg=0x8a3ca60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#12 0xb7a3ad9e in start_thread () from /lib/libpthread.so.0
#13 0xb77e05ee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:132
I think you are saying that the problem isn't in kvm, so where would you
recommend I continue investigations. I'm not seeing a crash with any
other applications.
What might have happened is that the movdqu instruction faulted (as it's
an fpu instruction), and on the way back from the fault, ds and es
didn't get restored correctly.
You can test this by writing a trivial version of g_str_equal()
somewhere in the qemu source code and rebuilding it.
from entry_32.S:

.macro RESTORE_REGS pop=0
RESTORE_INT_REGS
1: popl_cfi %ds
/*CFI_RESTORE ds;*/
2: popl_cfi %es
/*CFI_RESTORE es;*/
3: popl_cfi %fs
/*CFI_RESTORE fs;*/
POP_GS \pop
.pushsection .fixup, "ax"
4: movl $0, (%esp)
jmp 1b
5: movl $0, (%esp)
jmp 2b
6: movl $0, (%esp)
jmp 3b
.popsection

this piece of code tries to restore %ds, and if it fails, zeros it,
which is consistent with the core dump.

This could happen if kvm is failing to restore GDT correctly.
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-29 16:34:50 UTC
Permalink
Post by Avi Kivity
Post by Avi Kivity
What might have happened is that the movdqu instruction faulted (as it's
an fpu instruction), and on the way back from the fault, ds and es
didn't get restored correctly.
You can test this by writing a trivial version of g_str_equal()
somewhere in the qemu source code and rebuilding it.
.macro RESTORE_REGS pop=0
RESTORE_INT_REGS
1: popl_cfi %ds
/*CFI_RESTORE ds;*/
2: popl_cfi %es
/*CFI_RESTORE es;*/
3: popl_cfi %fs
/*CFI_RESTORE fs;*/
POP_GS \pop
.pushsection .fixup, "ax"
4: movl $0, (%esp)
jmp 1b
5: movl $0, (%esp)
jmp 2b
6: movl $0, (%esp)
jmp 3b
.popsection
this piece of code tries to restore %ds, and if it fails, zeros it,
which is consistent with the core dump.
This could happen if kvm is failing to restore GDT correctly.
Possible culprit: b2da15ac26a0c00.
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-29 17:50:09 UTC
Permalink
Post by Avi Kivity
Post by Avi Kivity
Post by Avi Kivity
What might have happened is that the movdqu instruction faulted (as it's
an fpu instruction), and on the way back from the fault, ds and es
didn't get restored correctly.
You can test this by writing a trivial version of g_str_equal()
somewhere in the qemu source code and rebuilding it.
.macro RESTORE_REGS pop=0
RESTORE_INT_REGS
1: popl_cfi %ds
/*CFI_RESTORE ds;*/
2: popl_cfi %es
/*CFI_RESTORE es;*/
3: popl_cfi %fs
/*CFI_RESTORE fs;*/
POP_GS \pop
.pushsection .fixup, "ax"
4: movl $0, (%esp)
jmp 1b
5: movl $0, (%esp)
jmp 2b
6: movl $0, (%esp)
jmp 3b
.popsection
this piece of code tries to restore %ds, and if it fails, zeros it,
which is consistent with the core dump.
This could happen if kvm is failing to restore GDT correctly.
Possible culprit: b2da15ac26a0c00.
That commit isn't in qermu-kvm-1.1.1.

I'm testing a build with g_str_equal implemented in kvm.c and so far I
haven't had a crash in 6 invocations. That hasn't been possible with
vanilla qemu-kvm-1.1.{0,1}, but I'll do a few more, just to be sure.

Thanks for your help, Avi.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Gleb Natapov
2012-07-29 17:54:53 UTC
Permalink
Post by Chris Clayton
Post by Avi Kivity
Post by Avi Kivity
Post by Avi Kivity
What might have happened is that the movdqu instruction faulted (as it's
an fpu instruction), and on the way back from the fault, ds and es
didn't get restored correctly.
You can test this by writing a trivial version of g_str_equal()
somewhere in the qemu source code and rebuilding it.
.macro RESTORE_REGS pop=0
RESTORE_INT_REGS
1: popl_cfi %ds
/*CFI_RESTORE ds;*/
2: popl_cfi %es
/*CFI_RESTORE es;*/
3: popl_cfi %fs
/*CFI_RESTORE fs;*/
POP_GS \pop
.pushsection .fixup, "ax"
4: movl $0, (%esp)
jmp 1b
5: movl $0, (%esp)
jmp 2b
6: movl $0, (%esp)
jmp 3b
.popsection
this piece of code tries to restore %ds, and if it fails, zeros it,
which is consistent with the core dump.
This could happen if kvm is failing to restore GDT correctly.
Possible culprit: b2da15ac26a0c00.
That commit isn't in qermu-kvm-1.1.1.
It is in kernel.
Post by Chris Clayton
I'm testing a build with g_str_equal implemented in kvm.c and so far
I haven't had a crash in 6 invocations. That hasn't been possible
with vanilla qemu-kvm-1.1.{0,1}, but I'll do a few more, just to be
sure.
Thanks for your help, Avi.
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-29 19:10:15 UTC
Permalink
Post by Gleb Natapov
Post by Chris Clayton
Post by Avi Kivity
Post by Avi Kivity
Post by Avi Kivity
What might have happened is that the movdqu instruction faulted (as it's
an fpu instruction), and on the way back from the fault, ds and es
didn't get restored correctly.
You can test this by writing a trivial version of g_str_equal()
somewhere in the qemu source code and rebuilding it.
.macro RESTORE_REGS pop=0
RESTORE_INT_REGS
1: popl_cfi %ds
/*CFI_RESTORE ds;*/
2: popl_cfi %es
/*CFI_RESTORE es;*/
3: popl_cfi %fs
/*CFI_RESTORE fs;*/
POP_GS \pop
.pushsection .fixup, "ax"
4: movl $0, (%esp)
jmp 1b
5: movl $0, (%esp)
jmp 2b
6: movl $0, (%esp)
jmp 3b
.popsection
this piece of code tries to restore %ds, and if it fails, zeros it,
which is consistent with the core dump.
This could happen if kvm is failing to restore GDT correctly.
Possible culprit: b2da15ac26a0c00.
That commit isn't in qermu-kvm-1.1.1.
It is in kernel.
Sorry, so it is.

With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
to be the problem.
Post by Gleb Natapov
Post by Chris Clayton
I'm testing a build with g_str_equal implemented in kvm.c and so far
I haven't had a crash in 6 invocations. That hasn't been possible
with vanilla qemu-kvm-1.1.{0,1}, but I'll do a few more, just to be
sure.
Similarly, with my "local" implementation of g_str_equal, I've had 15
clean invocations on vanilla kernel 3.5.0.

I'm more than happy to test patches to fix this regression, but it will
be tomorrow before I will be able to do so.
Post by Gleb Natapov
Post by Chris Clayton
Thanks for your help, Avi.
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-30 14:00:41 UTC
Permalink
Post by Chris Clayton
Post by Gleb Natapov
Post by Chris Clayton
Post by Avi Kivity
Possible culprit: b2da15ac26a0c00.
That commit isn't in qermu-kvm-1.1.1.
It is in kernel.
Sorry, so it is.
With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
to be the problem.
Just to be sure, I've run some more tests today. No crashes occurred in
20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
reverted.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-30 14:03:13 UTC
Permalink
Post by Chris Clayton
Post by Chris Clayton
Post by Gleb Natapov
Post by Chris Clayton
Post by Avi Kivity
Possible culprit: b2da15ac26a0c00.
That commit isn't in qermu-kvm-1.1.1.
It is in kernel.
Sorry, so it is.
With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
to be the problem.
Just to be sure, I've run some more tests today. No crashes occurred in
20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
reverted.
Ok. I'm trying to reproduce it here on a nested-virt setup, since the
code looks correct.

What's your preemption settings?
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-30 14:07:17 UTC
Permalink
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
Post by Gleb Natapov
Post by Chris Clayton
Post by Avi Kivity
Possible culprit: b2da15ac26a0c00.
That commit isn't in qermu-kvm-1.1.1.
It is in kernel.
Sorry, so it is.
With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
to be the problem.
Just to be sure, I've run some more tests today. No crashes occurred in
20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
reverted.
Ok. I'm trying to reproduce it here on a nested-virt setup, since the
code looks correct.
What's your preemption settings?
[chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-30 16:39:31 UTC
Permalink
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
to be the problem.
Just to be sure, I've run some more tests today. No crashes occurred in
20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
reverted.
Ok. I'm trying to reproduce it here on a nested-virt setup, since the
code looks correct.
What's your preemption settings?
[chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
Here's what I think that is happening

vcpu_load
...
vmx_save_host_state
vmx_vcpu_run
(ds.cpl, es.cpl cleared by hardware)

interrupt
push ds, es # pushes bad ds, es
schedule
vmx_vcpu_put
vmx_load_host_state
reload ds, es
pop ds, es # of other thread's stack
iret
# other thread runs
interrupt
schedule # back in vcpu thread
interrupt return: pop ds, es # <-- problem
iret

...
vcpu_put

# bad ds, es, but !vmx->host_state.loaded

Marcelo, did I miss something here?

Unfortunately, my reproducer has ceased to reproduce. But the fix is
easy if the analysis above is right.
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti
2012-07-30 23:36:16 UTC
Permalink
Post by Avi Kivity
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
to be the problem.
Just to be sure, I've run some more tests today. No crashes occurred in
20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
reverted.
Ok. I'm trying to reproduce it here on a nested-virt setup, since the
code looks correct.
What's your preemption settings?
[chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
Here's what I think that is happening
vcpu_load
...
vmx_save_host_state
vmx_vcpu_run
(ds.cpl, es.cpl cleared by hardware)
interrupt
push ds, es # pushes bad ds, es
schedule
vmx_vcpu_put
vmx_load_host_state
reload ds, es
pop ds, es # of other thread's stack
iret
# other thread runs
interrupt
schedule # back in vcpu thread
interrupt return: pop ds, es # <-- problem
iret
...
vcpu_put
# bad ds, es, but !vmx->host_state.loaded
Marcelo, did I miss something here?
Don't think so.
Post by Avi Kivity
Unfortunately, my reproducer has ceased to reproduce. But the fix is
easy if the analysis above is right.
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-31 09:11:13 UTC
Permalink
Post by Marcelo Tosatti
Post by Avi Kivity
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
to be the problem.
Just to be sure, I've run some more tests today. No crashes occurred in
20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
reverted.
Ok. I'm trying to reproduce it here on a nested-virt setup, since the
code looks correct.
What's your preemption settings?
[chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
Here's what I think that is happening
vcpu_load
...
vmx_save_host_state
vmx_vcpu_run
(ds.cpl, es.cpl cleared by hardware)
interrupt
push ds, es # pushes bad ds, es
schedule
vmx_vcpu_put
vmx_load_host_state
reload ds, es
pop ds, es # of other thread's stack
iret
# other thread runs
interrupt
schedule # back in vcpu thread
interrupt return: pop ds, es # <-- problem
iret
...
vcpu_put
# bad ds, es, but !vmx->host_state.loaded
Marcelo, did I miss something here?
Don't think so.
So the same problem should happen with %fs and %gs, no?

x86_64 is safe, since it entry_64.S never saves/restores segment registers.
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-07-31 16:46:38 UTC
Permalink
Post by Avi Kivity
So the same problem should happen with %fs and %gs, no?
depends on CONFIG_X86_32_LAZY_GS for GS, unconditional for FS.
This fs/gs were already in there, I wonder how it wasn't broken before.
Something's fishy here.
Post by Avi Kivity
x86_64 is safe, since it entry_64.S never saves/restores segment registers.
Is the comment
/*
* The sysexit path does not restore ds/es, so we must set them
* to
* a reasonable value ourselves.
*/
Correct?
syscall_exit -> syscall_exit_work -> resume_userspace ->
restore_all -> RESTORE_REGS
That's the non-sysexit path (could have arrived here by sysenter). Look
at sysenter_exit.
--
error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Marcelo Tosatti
2012-07-31 16:29:40 UTC
Permalink
Post by Avi Kivity
Post by Marcelo Tosatti
Post by Avi Kivity
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
to be the problem.
Just to be sure, I've run some more tests today. No crashes occurred in
20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
reverted.
Ok. I'm trying to reproduce it here on a nested-virt setup, since the
code looks correct.
What's your preemption settings?
[chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
Here's what I think that is happening
vcpu_load
...
vmx_save_host_state
vmx_vcpu_run
(ds.cpl, es.cpl cleared by hardware)
interrupt
push ds, es # pushes bad ds, es
schedule
vmx_vcpu_put
vmx_load_host_state
reload ds, es
pop ds, es # of other thread's stack
iret
# other thread runs
interrupt
schedule # back in vcpu thread
interrupt return: pop ds, es # <-- problem
iret
...
vcpu_put
# bad ds, es, but !vmx->host_state.loaded
Marcelo, did I miss something here?
Don't think so.
So the same problem should happen with %fs and %gs, no?
AFAICS:

depends on CONFIG_X86_32_LAZY_GS for GS, unconditional for FS.
Post by Avi Kivity
x86_64 is safe, since it entry_64.S never saves/restores segment registers.
Is the comment

/*
* The sysexit path does not restore ds/es, so we must set them
* to
* a reasonable value ourselves.
*/

Correct?

syscall_exit -> syscall_exit_work -> resume_userspace ->
restore_all -> RESTORE_REGS
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Avi Kivity
2012-08-01 13:11:49 UTC
Permalink
Post by Avi Kivity
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
With kernel 3.5.0 with b2da15ac26a0c00 reverted, I have just had 15
clean invocations of vanilla qemu-kvm-1.1.1. So that commit would seem
to be the problem.
Just to be sure, I've run some more tests today. No crashes occurred in
20 runs of vanilla qemu-kvm-1.1.1 on kernel 3.5.0 with b2da15ac26a0c00
reverted.
Ok. I'm trying to reproduce it here on a nested-virt setup, since the
code looks correct.
What's your preemption settings?
[chris:~/kernel/linux-3.5.0]$ grep PREEMPT .config
CONFIG_TREE_PREEMPT_RCU=y
CONFIG_PREEMPT_RCU=y
CONFIG_PREEMPT_NOTIFIERS=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
Here's what I think that is happening
vcpu_load
...
vmx_save_host_state
vmx_vcpu_run
(ds.cpl, es.cpl cleared by hardware)
interrupt
push ds, es # pushes bad ds, es
schedule
vmx_vcpu_put
vmx_load_host_state
reload ds, es
pop ds, es # of other thread's stack
iret
# other thread runs
interrupt
schedule # back in vcpu thread
interrupt return: pop ds, es # <-- problem
In fact, those are fine.
Post by Avi Kivity
iret
But IRET-to-outer-privilege-level clears segment registers with the
wrong RPL. Think how secure OSes would be if they used the hardware
fully. Credit to Gleb for pinpointing this.
Post by Avi Kivity
...
vcpu_put
# bad ds, es, but !vmx->host_state.loaded
--
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Jan Kiszka
2012-07-26 12:09:19 UTC
Permalink
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
times more invocations before the crash occurs with 1.0.1 and I haven't
used qemu-kvm much in the past few weeks.
I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
linux-3.4.4. I'll report back in a day or two.
I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
That would indicate that the problem is in the kernel. However, I pulled
the latest and greatest from Linus yesterday evening and I now can't get
the crash there either, so whatever it was seems to have been fixed. If
I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
so it's been fixed in the last few days.
There were no kvm changes post-rc7.
Yes, I'm aware of that, Avi. This thread started because I was getting a
crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
out the the problem was also present in v1.0.1, but much harder to hit.
However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
version of qemu-kvm, was stable. So then it seemed that the problem was
in the kernel, (but not necessarily in the kvm code).
Something that's changed since rc7 has either fixed the problem or made
it much harder to hit. With rc7 and earlier I can recreate the crash
quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
rc7+, I haven't been able to get a crash at all.
Well, I'm getting the crash again, but this time I've managed to get a
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 9405)]
0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
#3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at
qom/object.c:94
qom/object.c:149
at qom/object.c:416
#7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/kvm-all.c:1269
#10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at
/home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
#12 0xb77bbbbe in clone () from /lib/libc.so.6
This is with kernel 3.5.0 and qemu-kvm-1.1.1. glibc is 2.16.0 built
It looks like general memory corruption. Is this repeatable? What's
the guest uptime when it happens (i.e. is it immediate?)
I've just done 10 runs of WinXP SP3 and 5 of them crashed. Three crashed
Hmm, I'm running various XP SP3 here against qemu.git (now widely
equivalent to qemu-kvm), and I saw no crashes at all.
Post by Chris Clayton
early as XP was starting up - well before the desktop would have
appeared. The other two crashed as XP was closing down, having been
running for a few minutes (but not doing much).
qemu-kvm[12778] general protection ip:b6c43d77 sp:b5e800fc error:0 in
libc-2.16.so[b6b06000+1b4000]
qemu-kvm[12813] general protection ip:b6bf6d77 sp:b54ff0fc error:0 in
libc-2.16.so[b6ab9000+1b4000]
qemu-kvm[12986] general protection ip:b6cd3d77 sp:b55ff0fc error:0 in
libc-2.16.so[b6b96000+1b4000]
qemu-kvm[13045] general protection ip:b6c91d77 sp:b54ff0fc error:0 in
libc-2.16.so[b6b54000+1b4000]
qemu-kvm[13225] general protection ip:b6c5bd77 sp:b54ff0fc error:0 in
libc-2.16.so[b6b1e000+1b4000]
Oh, you are running 32-bit userland? Also 32-bit kernel? Most of us do
64-on-64.

Jan
--
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Xiao Guangrong
2012-07-26 11:10:31 UTC
Permalink
Hi Chris,

Could you please try this patch?
http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=ccebf448daf7964ee2aff7947c0bbe4c7962d059
Post by Chris Clayton
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
times more invocations before the crash occurs with 1.0.1 and I haven't
used qemu-kvm much in the past few weeks.
I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
linux-3.4.4. I'll report back in a day or two.
I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
That would indicate that the problem is in the kernel. However, I pulled
the latest and greatest from Linus yesterday evening and I now can't get
the crash there either, so whatever it was seems to have been fixed. If
I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
so it's been fixed in the last few days.
There were no kvm changes post-rc7.
Yes, I'm aware of that, Avi. This thread started because I was getting a
crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
out the the problem was also present in v1.0.1, but much harder to hit.
However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
version of qemu-kvm, was stable. So then it seemed that the problem was
in the kernel, (but not necessarily in the kvm code).
Something that's changed since rc7 has either fixed the problem or made
it much harder to hit. With rc7 and earlier I can recreate the crash
quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
rc7+, I haven't been able to get a crash at all.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 9405)]
0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
#3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at qom/object.c:94
at qom/object.c:416
#7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
#12 0xb77bbbbe in clone () from /lib/libc.so.6
--- qemu-kvm-1.1.0/configure~ 2012-07-15 22:38:39.000000000 +0100
+++ qemu-kvm-1.1.0/configure 2012-07-15 22:39:09.000000000 +0100
@@ -2783,7 +2783,7 @@ int main(int argc, char **argv)
}
EOF
if ! compile_prog "" "" ; then
- CFLAGS+="-march=i486"
+ CFLAGS+="-march=i686"
fi
fi
Please let me know of anything I can do to help track this down.
Thanks
Chris
Post by Chris Clayton
I'm not inclined to bisect to find out which patch provided the fix, but
this mail should at least close the mail thread down tidily.
Chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Chris Clayton
2012-07-26 13:49:18 UTC
Permalink
Post by Xiao Guangrong
Hi Chris,
Could you please try this patch?
http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=ccebf448daf7964ee2aff7947c0bbe4c7962d059
Sorry, that patch does not fix the crashes.
Post by Xiao Guangrong
Post by Chris Clayton
Post by Chris Clayton
Post by Avi Kivity
Post by Chris Clayton
Post by Chris Clayton
Change of diagnostics, unfortunately. qemu-kvm-1.0.1 can, in fact, crash
on 3.5.0-rc6 (and rc7). I didn't get it earlier because it takes many
times more invocations before the crash occurs with 1.0.1 and I haven't
used qemu-kvm much in the past few weeks.
I'm now checking whether I can get crashes (with 1.0.1 and/or 1.1.0) on
linux-3.4.4. I'll report back in a day or two.
I've started up qemu-kvm on kernel 3.4.4 many times and not see a crash.
That would indicate that the problem is in the kernel. However, I pulled
the latest and greatest from Linus yesterday evening and I now can't get
the crash there either, so whatever it was seems to have been fixed. If
I checkout and build 3.5.0-rc[1..7], I can get the crash pretty quickly,
so it's been fixed in the last few days.
There were no kvm changes post-rc7.
Yes, I'm aware of that, Avi. This thread started because I was getting a
crash in qemu-kvm, which I thought was only in v1.1.0. Later it turned
out the the problem was also present in v1.0.1, but much harder to hit.
However, it only ever happened with 3.5.0 kernels. 3.4.4, with either
version of qemu-kvm, was stable. So then it seemed that the problem was
in the kernel, (but not necessarily in the kvm code).
Something that's changed since rc7 has either fixed the problem or made
it much harder to hit. With rc7 and earlier I can recreate the crash
quite easily with qemu-kvm-1.1.0 and with enough runs of 1.0.1. With
rc7+, I haven't been able to get a crash at all.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb60ffb40 (LWP 9405)]
0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
(gdb) bt
#0 0xb7803d77 in __strcmp_sse4_2 () from /lib/libc.so.6
#1 0xb7e65333 in g_str_equal () from /usr/lib/libglib-2.0.so.0
#2 0xb7e6458d in g_hash_table_lookup () from /usr/lib/libglib-2.0.so.0
#3 0x8014e2cf in type_table_lookup (name=0x802b0c50 "apic-common") at qom/object.c:94
at qom/object.c:416
#7 0x80193462 in cpu_set_apic_tpr (d=0x80a5d818, val=8 '\b')
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/hw/apic_common.c:60
at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/target-i386/kvm.c:1695
#10 0x80199d1e in qemu_kvm_cpu_thread_fn (arg=0x80a55a60) at /home/chris/rpm/BUILD/qemu-kvm-1.1.1/cpus.c:752
#11 0xb7a1fd9e in start_thread () from /lib/libpthread.so.0
#12 0xb77bbbbe in clone () from /lib/libc.so.6
--- qemu-kvm-1.1.0/configure~ 2012-07-15 22:38:39.000000000 +0100
+++ qemu-kvm-1.1.0/configure 2012-07-15 22:39:09.000000000 +0100
@@ -2783,7 +2783,7 @@ int main(int argc, char **argv)
}
EOF
if ! compile_prog "" "" ; then
- CFLAGS+="-march=i486"
+ CFLAGS+="-march=i686"
fi
fi
Please let me know of anything I can do to help track this down.
Thanks
Chris
Post by Chris Clayton
I'm not inclined to bisect to find out which patch provided the fix, but
this mail should at least close the mail thread down tidily.
Chris
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Continue reading on narkive:
Loading...