[PATCH 0/3] KVM: VMX: Support hosted VMM coexsitence.

Post by Xu, Dongxiao
VMX: Support for coexistence of KVM and other hosted VMMs.
The following NOTE is picked up from Intel SDM 3B 27.3 chapter,
MANAGING VMCS REGIONS AND POINTERS.
----------------------
NOTE
As noted in Section 21.1, the processor may optimize VMX operation
by maintaining the state of an active VMCS (one for which VMPTRLD
has been executed) on the processor. Before relinquishing control to
other system software that may, without informing the VMM, remove
power from the processor (e.g., for transitions to S3 or S4) or leave
VMX operation, a VMM must VMCLEAR all active VMCSs. This ensures
that all VMCS data cached by the processor are flushed to memory
and that no other software can corrupt the current VMM's VMCS data.
It is also recommended that the VMM execute VMXOFF after such
executions of VMCLEAR.
----------------------
Currently, VMCLEAR is called at VCPU migration. To support hosted
VMM coexistence, this patch modifies the VMCLEAR/VMPTRLD and
VMXON/VMXOFF usages. VMCLEAR will be called when VCPU is
scheduled out of a physical CPU, while VMPTRLD is called when VCPU
is scheduled in a physical CPU. Also this approach could eliminates
the IPI mechanism for original VMCLEAR. As suggested by SDM,
VMXOFF will be called after VMCLEAR, and VMXON will be called
before VMPTRLD.
With this patchset, KVM and VMware Workstation 7 could launch
serapate guests and they can work well with each other. Besides, I
measured the performance for this patch, there is no visable
performance loss according to the test results.
The following performance results are got from a host with 8 cores.
1. vConsolidate benchmarks on KVM
Test Round WebBench SPECjbb SysBench LoadSim GEOMEAN
1 W/O patch 2,614.72 28,053.09 1,108.41 16.30 1,072.95
W/ patch 2,691.55 28,145.71 1,128.41 16.47 1,089.28
2 W/O patch 2,642.39 28,104.79 1,096.99 17.79 1,097.19
W/ patch 2,699.25 28,092.62 1,116.10 15.54 1,070.98
3 W/O patch 2,571.58 28,131.17 1,108.43 16.39 1,070.70
W/ patch 2,627.89 28,090.19 1,110.94 17.00 1,086.57
Average
W/O patch 2,609.56 28,096.35 1,104.61 16.83 1,080.28
W/ patch 2,672.90 28,109.51 1,118.48 16.34 1,082.28
2. CPU overcommitment tests for KVM
A) Run 8 while(1) in host which pin with 8 cores.
B) Launch 6 guests, each has 8 VCPUs, pin each VCPU with one core.
C) Among the 6 guests, 5 of them are running 8*while(1).
D) The left guest is doing kernel build "make -j9" under ramdisk.
In this case, the overcommitment ratio for each core is 7:1.
The VCPU schedule frequency on all cores is totally ~15k/sec.
l record the kernel build time.
While doing the average, the first round data is treated as invalid,
which isn't counted into the final average result.
Kernel Build Time (second)
Round w/o patch w/ patch
1 541 501
2 488 490
3 488 492
4 492 493
5 489 491
6 494 487
7 497 494
8 492 492
9 493 496
10 492 495
11 490 496
12 489 494
13 489 490
14 490 491
15 494 497
16 495 496
17 496 496
18 493 492
19 493 500
20 490 499
Average 491.79 493.74

So the general message here is:

It does get slower, but not by much.

I think this should be a module option. By default we can probably go
with the non-coexist behavior. If users really want to run two VMMs on
the same host, they can always flip the module parameter.

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Avi Kivity

2010-03-18 12:55:19 UTC

My worry is that newer processors will cache more and more VMCS contents
on-chip, so the VMCLEAR/VMXOFF will cause a greater loss with newer
processors.

Post by Xu, Dongxiao
With this patchset, KVM and VMware Workstation 7 could launch
serapate guests and they can work well with each other. Besides, I
measured the performance for this patch, there is no visable
performance loss according to the test results.

Is that the only motivation? It seems like an odd use-case. If there
was no performance impact (current or future), I wouldn't mind, but the
design of VMPTRLD/VMCLEAR/VMXON/VMXOFF seems to indicate that we want to
keep a VMCS loaded as much as possible on the processor.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Xu, Dongxiao

2010-03-23 04:01:54 UTC

My worry is that newer processors will cache more and more VMCS
contents on-chip, so the VMCLEAR/VMXOFF will cause a greater loss
with newer processors.

Based on our intenal testing, we saw less than 1% of performance
differences even on such processors.

Is that the only motivation? It seems like an odd use-case. If there
was no performance impact (current or future), I wouldn't mind, but
the design of VMPTRLD/VMCLEAR/VMXON/VMXOFF seems to indicate that we
want to keep a VMCS loaded as much as possible on the processor.

I just used KVM and VMware Workstation 7 for testing this patchset.

Through this new usage of VMPTRLD/VMCLEAR/VMXON/VMXOFF,
we could make hosted VMMs work separately and do not impact each
other.

Thanks!
Dongxiao
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Avi Kivity

2010-03-23 07:39:01 UTC

My worry is that newer processors will cache more and more VMCS
contents on-chip, so the VMCLEAR/VMXOFF will cause a greater loss
with newer processors.

Based on our intenal testing, we saw less than 1% of performance
differences even on such processors.

Did you measure workloads that exit to userspace very often?

Also, what about future processors? My understanding is that the manual
recommends keeping things cached, the above description is for sleep states.

Is that the only motivation? It seems like an odd use-case. If there
was no performance impact (current or future), I wouldn't mind, but
the design of VMPTRLD/VMCLEAR/VMXON/VMXOFF seems to indicate that we
want to keep a VMCS loaded as much as possible on the processor.

I just used KVM and VMware Workstation 7 for testing this patchset.
Through this new usage of VMPTRLD/VMCLEAR/VMXON/VMXOFF,
we could make hosted VMMs work separately and do not impact each
other.

What I am questioning is whether a significant number of users want to
run kvm in parallel with another hypervisor.

--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Xu, Dongxiao

2010-03-23 08:33:25 UTC

My worry is that newer processors will cache more and more VMCS
contents on-chip, so the VMCLEAR/VMXOFF will cause a greater loss
with newer processors.

Based on our intenal testing, we saw less than 1% of performance
differences even on such processors.

Did you measure workloads that exit to userspace very often?
Also, what about future processors? My understanding is that the
manual recommends keeping things cached, the above description is for
sleep states.

I measured the performance by using kernel build in guest. I launched 6
guests, 5 of them and the host are doing while(1) loop, and the left guest
is doing kernel build. The CPU overcommitment is 7:1, and vcpu schedule
frequency is about 15k/sec. I tested this with Intel new processors on
my hand, and the performance difference is little.

Is that the only motivation? It seems like an odd use-case. If
there was no performance impact (current or future), I wouldn't
mind, but the design of VMPTRLD/VMCLEAR/VMXON/VMXOFF seems to
indicate that we want to keep a VMCS loaded as much as possible on
the processor.

I just used KVM and VMware Workstation 7 for testing this patchset.
Through this new usage of VMPTRLD/VMCLEAR/VMXON/VMXOFF,
we could make hosted VMMs work separately and do not impact each
other.

What I am questioning is whether a significant number of users want to
run kvm in parallel with another hypervisor.

At least this approach gives users an option to run VMMs in parallel without
significant performance loss. Think of this senario, if a server has already
Deployed VMware software, but some new customers want to use KVM,
this patch could help them to meet their requirements.

I ever tested this case, if a KVM guest is already run, and the user launches
VMware guest, the KVM guest will die and VMware guest still runs well.
This patchset can solve the problem.

Thanks!
Dongxiao
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Avi Kivity

2010-03-23 08:58:26 UTC

Post by Avi Kivity
Did you measure workloads that exit to userspace very often?
Also, what about future processors? My understanding is that the
manual recommends keeping things cached, the above description is for
sleep states.

The 15k/sec context switches are distributed among 7 entities, so we
have about 2k/sec for the guest you are measuring. If the cost is 1
microsecond, then the impact would be 0.2% on the kernel build. But 1
microsecond is way too high for some workloads.

Can you measure the impact directly? kvm/user/test/x86/vmexit.c has a
test called inl_pmtimer that measures exit to userspace costs. Please
run it with and without the patch.

btw, what about VPID? That's a global resource. How do you ensure no
VPID conflicts?

Post by Avi Kivity
Is that the only motivation? It seems like an odd use-case. If
there was no performance impact (current or future), I wouldn't
mind, but the design of VMPTRLD/VMCLEAR/VMXON/VMXOFF seems to
indicate that we want to keep a VMCS loaded as much as possible on
the processor.

I just used KVM and VMware Workstation 7 for testing this patchset.
Through this new usage of VMPTRLD/VMCLEAR/VMXON/VMXOFF,
we could make hosted VMMs work separately and do not impact each
other.

What I am questioning is whether a significant number of users want to
run kvm in parallel with another hypervisor.

For server workloads vmware users will run esx, on which you can't run
kvm. If someone wants to evaluate kvm or vmware on a workstation, they
can shut down the other product. I simply don't see a scenario where
you want to run both concurrently that would be worth even a small
performance loss.

Alexander Graf

2010-03-23 09:12:18 UTC

The 15k/sec context switches are distributed among 7 entities, so we have about 2k/sec for the guest you are measuring. If the cost is 1 microsecond, then the impact would be 0.2% on the kernel build. But 1 microsecond is way too high for some workloads.
Can you measure the impact directly? kvm/user/test/x86/vmexit.c has a test called inl_pmtimer that measures exit to userspace costs. Please run it with and without the patch.
btw, what about VPID? That's a global resource. How do you ensure no VPID conflicts?

I just used KVM and VMware Workstation 7 for testing this patchset.
Through this new usage of VMPTRLD/VMCLEAR/VMXON/VMXOFF,
we could make hosted VMMs work separately and do not impact each
other.

What I am questioning is whether a significant number of users want to
run kvm in parallel with another hypervisor.

For server workloads vmware users will run esx, on which you can't run kvm. If someone wants to evaluate kvm or vmware on a workstation, they can shut down the other product. I simply don't see a scenario where you want to run both concurrently that would be worth even a small performance loss.

I can certainly see value for some people. I just don't think we should burden every user with the performance penalty. Hence my suggestion to default this behavior to off.

1% might not sound a lot, but people have worked pretty hard optimizing stuff for less :-).

Alex--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Avi Kivity

2010-03-18 13:51:01 UTC

Post by Xu, Dongxiao
VMX: Support for coexistence of KVM and other hosted VMMs.
The following NOTE is picked up from Intel SDM 3B 27.3 chapter,
MANAGING VMCS REGIONS AND POINTERS.

Note: the actual patches didn't make it to the list.

Avi Kivity

2010-03-18 14:27:21 UTC