[PATCH v3 0/6] KVM: nVMX: Enable unrestricted guest mode and fix some nEPT issues

Discussion:

[PATCH v3 0/6] KVM: nVMX: Enable unrestricted guest mode and fix some nEPT issues

Jan Kiszka

2013-08-08 14:26:27 UTC

These patches apply on top of kvm.git queue.

Changes in v3:
- rebased over queue
- added "Do not set identity page map for L2"
- dropped "Fix guest CR3 read-back on VM-exit"

Jan Kiszka (6):
KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0 in
load_vmcs12_host_state
KVM: nVMX: Do not set identity page map for L2
KVM: nVMX: Load nEPT state after EFER
KVM: nVMX: Implement support for EFER saving on VM-exit
KVM: nVMX: Update mmu.base_role.nxe after EFER loading on
VM-entry/exit
KVM: nVMX: Enable unrestricted guest mode support

arch/x86/kvm/vmx.c | 44 +++++++++++++++++++++++++++++++-------------
1 files changed, 31 insertions(+), 13 deletions(-)

--
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-08-08 14:26:29 UTC

Fiddling with CR3 for L2 is L1's job. It may set its own, different
identity map or simple leave it alone if unrestricted guest mode is
enabled. This also fixes reading back the current CR3 on L2 exits for
reporting it to L1.

Signed-off-by: Jan Kiszka <***@siemens.com>
---
arch/x86/kvm/vmx.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index d001b019..6c42518 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3376,8 +3376,10 @@ static void vmx_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
if (enable_ept) {
eptp = construct_eptp(cr3);
vmcs_write64(EPT_POINTER, eptp);
- guest_cr3 = is_paging(vcpu) ? kvm_read_cr3(vcpu) :
- vcpu->kvm->arch.ept_identity_map_addr;
+ if (is_paging(vcpu) || is_guest_mode(vcpu))
+ guest_cr3 = kvm_read_cr3(vcpu);
+ else
+ guest_cr3 = vcpu->kvm->arch.ept_identity_map_addr;
ept_load_pdptrs(vcpu);
}

--
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-08-08 14:26:31 UTC

Implement and advertise VM_EXIT_SAVE_IA32_EFER. L0 traps EFER writes
unconditionally, so we always find the current L2 value in the
architectural state.

Signed-off-by: Jan Kiszka <***@siemens.com>
---
arch/x86/kvm/vmx.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 363fe19..9b0510b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2206,7 +2206,7 @@ static __init void nested_vmx_setup_ctls_msrs(void)
#endif
VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
- VM_EXIT_LOAD_IA32_EFER);
+ VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER);

/* entry controls */
rdmsr(MSR_IA32_VMX_ENTRY_CTLS,
@@ -8116,6 +8116,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmcs12->guest_ia32_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_IA32_PAT)
vmcs12->guest_ia32_pat = vmcs_read64(GUEST_IA32_PAT);
+ if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_IA32_EFER)
+ vmcs12->guest_ia32_efer = vcpu->arch.efer;
vmcs12->guest_sysenter_cs = vmcs_read32(GUEST_SYSENTER_CS);
vmcs12->guest_sysenter_esp = vmcs_readl(GUEST_SYSENTER_ESP);
vmcs12->guest_sysenter_eip = vmcs_readl(GUEST_SYSENTER_EIP);

--
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-08-08 14:26:33 UTC

Now that we provide EPT support, there is no reason to torture our
guests by hiding the relieving unrestricted guest mode feature. We just
need to relax CR0 checks for always-on bits as PE and PG can now be
switched off.

Signed-off-by: Jan Kiszka <***@siemens.com>
---
arch/x86/kvm/vmx.c | 18 ++++++++++++++----
1 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7cc208a..a9ce214 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2252,6 +2252,7 @@ static __init void nested_vmx_setup_ctls_msrs(void)
nested_vmx_secondary_ctls_low = 0;
nested_vmx_secondary_ctls_high &=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
+ SECONDARY_EXEC_UNRESTRICTED_GUEST |
SECONDARY_EXEC_WBINVD_EXITING;

if (enable_ept) {
@@ -4877,6 +4878,17 @@ vmx_patch_hypercall(struct kvm_vcpu *vcpu, unsigned char *hypercall)
hypercall[2] = 0xc1;
}

+static bool nested_cr0_valid(struct vmcs12 *vmcs12, unsigned long val)
+{
+ unsigned long always_on = VMXON_CR0_ALWAYSON;
+
+ if (nested_vmx_secondary_ctls_high &
+ SECONDARY_EXEC_UNRESTRICTED_GUEST &&
+ nested_cpu_has2(vmcs12, SECONDARY_EXEC_UNRESTRICTED_GUEST))
+ always_on &= ~(X86_CR0_PE | X86_CR0_PG);
+ return (val & always_on) == always_on;
+}
+
/* called to set cr0 as appropriate for a mov-to-cr0 exit. */
static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
{
@@ -4895,9 +4907,7 @@ static int handle_set_cr0(struct kvm_vcpu *vcpu, unsigned long val)
val = (val & ~vmcs12->cr0_guest_host_mask) |
(vmcs12->guest_cr0 & vmcs12->cr0_guest_host_mask);

- /* TODO: will have to take unrestricted guest mode into
- * account */
- if ((val & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON)
+ if (!nested_cr0_valid(vmcs12, val))
return 1;

if (kvm_set_cr0(vcpu, val))
@@ -7864,7 +7874,7 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch)
return 1;
}

- if (((vmcs12->guest_cr0 & VMXON_CR0_ALWAYSON) != VMXON_CR0_ALWAYSON) ||
+ if (!nested_cr0_valid(vmcs12, vmcs12->guest_cr0) ||
((vmcs12->guest_cr4 & VMXON_CR4_ALWAYSON) != VMXON_CR4_ALWAYSON)) {
nested_vmx_entry_failure(vcpu, vmcs12,
EXIT_REASON_INVALID_STATE, ENTRY_FAIL_DEFAULT);

--
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-08-08 14:26:28 UTC

Likely a typo, but a fatal one as kvm_set_cr0 performs checks on the
state transition that may prevent loading L1's cr0.

Signed-off-by: Jan Kiszka <***@siemens.com>
---
arch/x86/kvm/vmx.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 57b4e12..d001b019 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8185,7 +8185,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
* fpu_active (which may have changed).
* Note that vmx_set_cr0 refers to efer set above.
*/
- kvm_set_cr0(vcpu, vmcs12->host_cr0);
+ vmx_set_cr0(vcpu, vmcs12->host_cr0);
/*
* If we did fpu_activate()/fpu_deactivate() during L2's run, we need
* to apply the same changes to L1's vmcs. We just set cr0 correctly,

--
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Gleb Natapov

2013-09-02 08:21:36 UTC

Post by Jan Kiszka
Likely a typo, but a fatal one as kvm_set_cr0 performs checks on the

Not a typo :) That what Avi asked for do during initial nested VMX
review: http://markmail.org/message/hhidqyhbo2mrgxxc

But there is at least one transition check that kvm_set_cr0() does that
should not be done during vmexit emulation, namely CS.L bit check, so I
tend to agree that kvm_set_cr0() is not appropriate here, at lest not as
it is. But can we skip other checks kvm_set_cr0() does? For instance
what prevents us from loading CR0.PG = 1 EFER.LME = 1 and CR4.PAE = 0
during nested vmexit? What _should_ prevent it is vmentry check from
26.2.4

If the "host address-space size" VM-exit control is 1, the following
must hold:
- Bit 5 of the CR4 field (corresponding to CR4.PAE) is 1.

But I do not see that we do that check on vmentry.

What about NW/CD bit checks, or reserved bits checks? 27.5.1 says:
The following bits are not modified:
For CR0, ET, CD, NW; bits 63:32 (on processors that support Intel 64
architecture), 28:19, 17, and 15:6; and any bits that are fixed in
VMX operation (see Section 23.8).

But again current vmexit code does not emulate this properly and just
sets everything from host_cr0. vmentry should also preserve all those
bit but it looks like it doesn't too.

Post by Jan Kiszka
state transition that may prevent loading L1's cr0.
---
arch/x86/kvm/vmx.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 57b4e12..d001b019 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8185,7 +8185,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
* fpu_active (which may have changed).
* Note that vmx_set_cr0 refers to efer set above.
*/
- kvm_set_cr0(vcpu, vmcs12->host_cr0);
+ vmx_set_cr0(vcpu, vmcs12->host_cr0);
/*
* If we did fpu_activate()/fpu_deactivate() during L2's run, we need
* to apply the same changes to L1's vmcs. We just set cr0 correctly,
--
1.7.3.4

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-09-02 09:06:53 UTC

Post by Gleb Natapov

Post by Jan Kiszka
Likely a typo, but a fatal one as kvm_set_cr0 performs checks on the

Not a typo :) That what Avi asked for do during initial nested VMX
review: http://markmail.org/message/hhidqyhbo2mrgxxc

Yeah, should rephrase this.

Post by Gleb Natapov
But there is at least one transition check that kvm_set_cr0() does that
should not be done during vmexit emulation, namely CS.L bit check, so I
tend to agree that kvm_set_cr0() is not appropriate here, at lest not as
it is.

kvm_set_cr0() is for emulating explicit guest changes. It is not the
proper interface for implicit, vendor-dependent changes like this one.

Post by Gleb Natapov
But can we skip other checks kvm_set_cr0() does? For instance
what prevents us from loading CR0.PG = 1 EFER.LME = 1 and CR4.PAE = 0
during nested vmexit? What _should_ prevent it is vmentry check from
26.2.4
If the "host address-space size" VM-exit control is 1, the following
- Bit 5 of the CR4 field (corresponding to CR4.PAE) is 1.
But I do not see that we do that check on vmentry.
For CR0, ET, CD, NW; bits 63:32 (on processors that support Intel 64
architecture), 28:19, 17, and 15:6; and any bits that are fixed in
VMX operation (see Section 23.8).
But again current vmexit code does not emulate this properly and just
sets everything from host_cr0. vmentry should also preserve all those
bit but it looks like it doesn't too.

Yes, there is surely more to improve. Do you think the lacking checks
can cause troubles for L0, or is this just imprecise emulation that can
be addressed separately?

Jan

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Gleb Natapov

2013-09-02 09:36:28 UTC

Post by Jan Kiszka

Post by Gleb Natapov

Post by Jan Kiszka
Likely a typo, but a fatal one as kvm_set_cr0 performs checks on the

Not a typo :) That what Avi asked for do during initial nested VMX
review: http://markmail.org/message/hhidqyhbo2mrgxxc

Yeah, should rephrase this.

Post by Gleb Natapov
But there is at least one transition check that kvm_set_cr0() does that
should not be done during vmexit emulation, namely CS.L bit check, so I
tend to agree that kvm_set_cr0() is not appropriate here, at lest not as
it is.

kvm_set_cr0() is for emulating explicit guest changes. It is not the
proper interface for implicit, vendor-dependent changes like this one.

Agree, the problem is that we do not have proper interface for implicit
changes like this one (do not see why it is vendor-dependent, SVM also
restores host state in a similar way).

Post by Jan Kiszka

Post by Gleb Natapov
But can we skip other checks kvm_set_cr0() does? For instance
what prevents us from loading CR0.PG = 1 EFER.LME = 1 and CR4.PAE = 0
during nested vmexit? What _should_ prevent it is vmentry check from
26.2.4
If the "host address-space size" VM-exit control is 1, the following
- Bit 5 of the CR4 field (corresponding to CR4.PAE) is 1.
But I do not see that we do that check on vmentry.
For CR0, ET, CD, NW; bits 63:32 (on processors that support Intel 64
architecture), 28:19, 17, and 15:6; and any bits that are fixed in
VMX operation (see Section 23.8).
But again current vmexit code does not emulate this properly and just
sets everything from host_cr0. vmentry should also preserve all those
bit but it looks like it doesn't too.

Yes, there is surely more to improve. Do you think the lacking checks
can cause troubles for L0, or is this just imprecise emulation that can
be addressed separately?

The lacking checks may cause L0 to fail guest entry which will trigger
internal error. If it is exploitable by L0 userspace it is a serious
problem, if only L0 kernel can trigger it then less so. I remember Avi
was concerned that KVM code may depend on all registers to be consistent
otherwise it can be exploited, I cannot prove or disprove this theory
:), but if it is the case then event L0 kernel case is problematic.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-09-03 17:44:41 UTC

Post by Gleb Natapov

Post by Jan Kiszka

Post by Gleb Natapov

Post by Jan Kiszka
Likely a typo, but a fatal one as kvm_set_cr0 performs checks on the

Not a typo :) That what Avi asked for do during initial nested VMX
review: http://markmail.org/message/hhidqyhbo2mrgxxc

Yeah, should rephrase this.

Post by Gleb Natapov
But there is at least one transition check that kvm_set_cr0() does that
should not be done during vmexit emulation, namely CS.L bit check, so I
tend to agree that kvm_set_cr0() is not appropriate here, at lest not as
it is.

kvm_set_cr0() is for emulating explicit guest changes. It is not the
proper interface for implicit, vendor-dependent changes like this one.

Agree, the problem is that we do not have proper interface for implicit
changes like this one (do not see why it is vendor-dependent, SVM also
restores host state in a similar way).

Post by Jan Kiszka

Post by Gleb Natapov
But can we skip other checks kvm_set_cr0() does? For instance
what prevents us from loading CR0.PG = 1 EFER.LME = 1 and CR4.PAE = 0
during nested vmexit? What _should_ prevent it is vmentry check from
26.2.4
If the "host address-space size" VM-exit control is 1, the following
- Bit 5 of the CR4 field (corresponding to CR4.PAE) is 1.
But I do not see that we do that check on vmentry.
For CR0, ET, CD, NW; bits 63:32 (on processors that support Intel 64
architecture), 28:19, 17, and 15:6; and any bits that are fixed in
VMX operation (see Section 23.8).
But again current vmexit code does not emulate this properly and just
sets everything from host_cr0. vmentry should also preserve all those
bit but it looks like it doesn't too.

Yes, there is surely more to improve. Do you think the lacking checks
can cause troubles for L0, or is this just imprecise emulation that can
be addressed separately?

The lacking checks may cause L0 to fail guest entry which will trigger
internal error. If it is exploitable by L0 userspace it is a serious
problem, if only L0 kernel can trigger it then less so. I remember Avi
was concerned that KVM code may depend on all registers to be consistent
otherwise it can be exploited, I cannot prove or disprove this theory
:), but if it is the case then event L0 kernel case is problematic.

So how to proceed with this?

Jan

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Gleb Natapov

2013-09-03 17:55:44 UTC

Post by Jan Kiszka

Post by Gleb Natapov

Post by Jan Kiszka

Post by Gleb Natapov

Post by Jan Kiszka
Likely a typo, but a fatal one as kvm_set_cr0 performs checks on the

Not a typo :) That what Avi asked for do during initial nested VMX
review: http://markmail.org/message/hhidqyhbo2mrgxxc

Yeah, should rephrase this.

Post by Gleb Natapov
But there is at least one transition check that kvm_set_cr0() does that
should not be done during vmexit emulation, namely CS.L bit check, so I
tend to agree that kvm_set_cr0() is not appropriate here, at lest not as
it is.

kvm_set_cr0() is for emulating explicit guest changes. It is not the
proper interface for implicit, vendor-dependent changes like this one.

Agree, the problem is that we do not have proper interface for implicit
changes like this one (do not see why it is vendor-dependent, SVM also
restores host state in a similar way).

Post by Jan Kiszka

Post by Gleb Natapov
But can we skip other checks kvm_set_cr0() does? For instance
what prevents us from loading CR0.PG = 1 EFER.LME = 1 and CR4.PAE = 0
during nested vmexit? What _should_ prevent it is vmentry check from
26.2.4
If the "host address-space size" VM-exit control is 1, the following
- Bit 5 of the CR4 field (corresponding to CR4.PAE) is 1.
But I do not see that we do that check on vmentry.
For CR0, ET, CD, NW; bits 63:32 (on processors that support Intel 64
architecture), 28:19, 17, and 15:6; and any bits that are fixed in
VMX operation (see Section 23.8).
But again current vmexit code does not emulate this properly and just
sets everything from host_cr0. vmentry should also preserve all those
bit but it looks like it doesn't too.

Yes, there is surely more to improve. Do you think the lacking checks
can cause troubles for L0, or is this just imprecise emulation that can
be addressed separately?

The lacking checks may cause L0 to fail guest entry which will trigger
internal error. If it is exploitable by L0 userspace it is a serious
problem, if only L0 kernel can trigger it then less so. I remember Avi
was concerned that KVM code may depend on all registers to be consistent
otherwise it can be exploited, I cannot prove or disprove this theory
:), but if it is the case then event L0 kernel case is problematic.

So how to proceed with this?

Looking at the set_sreg code it looks like we already can create non
consistent state there, so I will apply 1,2,4,6 of this series and hope
that CR0 loading bugs I listed above will be eventually fixed on top :)
Can you rephrase commit message for patch 1?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-09-03 19:11:45 UTC

kvm_set_cr0 performs checks on the state transition that may prevent
loading L1's cr0. For now we rely on the hardware to catch invalid
states loaded by L1 into its VMCS. Still, consistency checks on the host
state part of the VMCS on guest entry will have to be improved later on.

Signed-off-by: Jan Kiszka <***@siemens.com>
---
arch/x86/kvm/vmx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1f1da43..b43d1f8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8186,7 +8186,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
* fpu_active (which may have changed).
* Note that vmx_set_cr0 refers to efer set above.
*/
- kvm_set_cr0(vcpu, vmcs12->host_cr0);
+ vmx_set_cr0(vcpu, vmcs12->host_cr0);
/*
* If we did fpu_activate()/fpu_deactivate() during L2's run, we need
* to apply the same changes to L1's vmcs. We just set cr0 correctly,

--
1.8.1.1.298.ge7eed54
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Gleb Natapov

2013-09-08 08:57:41 UTC

Post by Jan Kiszka
kvm_set_cr0 performs checks on the state transition that may prevent
loading L1's cr0. For now we rely on the hardware to catch invalid
states loaded by L1 into its VMCS. Still, consistency checks on the host
state part of the VMCS on guest entry will have to be improved later on.
---
arch/x86/kvm/vmx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1f1da43..b43d1f8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8186,7 +8186,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
* fpu_active (which may have changed).
* Note that vmx_set_cr0 refers to efer set above.
*/
- kvm_set_cr0(vcpu, vmcs12->host_cr0);
+ vmx_set_cr0(vcpu, vmcs12->host_cr0);
/*
* If we did fpu_activate()/fpu_deactivate() during L2's run, we need
* to apply the same changes to L1's vmcs. We just set cr0 correctly,

For this one and 2,4,6 of the series:

Reviewed-by: Gleb Natapov <***@redhat.com>

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Arthur Chunqi Li

2013-09-10 13:14:14 UTC

Post by Gleb Natapov

Post by Jan Kiszka
Likely a typo, but a fatal one as kvm_set_cr0 performs checks on the

Not a typo :) That what Avi asked for do during initial nested VMX
review: http://markmail.org/message/hhidqyhbo2mrgxxc
But there is at least one transition check that kvm_set_cr0() does that
should not be done during vmexit emulation, namely CS.L bit check, so I
tend to agree that kvm_set_cr0() is not appropriate here, at lest not as
it is. But can we skip other checks kvm_set_cr0() does? For instance
what prevents us from loading CR0.PG = 1 EFER.LME = 1 and CR4.PAE = 0
during nested vmexit? What _should_ prevent it is vmentry check from
26.2.4
If the "host address-space size" VM-exit control is 1, the following
- Bit 5 of the CR4 field (corresponding to CR4.PAE) is 1.

Hi Jan and Gleb,
Our nested VMX testing framework may not support such testing modes.
Here we need to catch the failed result (ZF flag) close to vmresume,
but vmresume/vmlaunch is well encapsulated in our framework. If we
simply write a vmresume inline function, the VMX will act unexpectedly
when it doesn't cause "vmresume fail".

Do you have any ideas about this?

Arthur

Post by Gleb Natapov
But I do not see that we do that check on vmentry.
For CR0, ET, CD, NW; bits 63:32 (on processors that support Intel 64
architecture), 28:19, 17, and 15:6; and any bits that are fixed in
VMX operation (see Section 23.8).
But again current vmexit code does not emulate this properly and just
sets everything from host_cr0. vmentry should also preserve all those
bit but it looks like it doesn't too.

Post by Jan Kiszka
state transition that may prevent loading L1's cr0.
---
arch/x86/kvm/vmx.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 57b4e12..d001b019 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8185,7 +8185,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
* fpu_active (which may have changed).
* Note that vmx_set_cr0 refers to efer set above.
*/
- kvm_set_cr0(vcpu, vmcs12->host_cr0);
+ vmx_set_cr0(vcpu, vmcs12->host_cr0);
/*
* If we did fpu_activate()/fpu_deactivate() during L2's run, we need
* to apply the same changes to L1's vmcs. We just set cr0 correctly,
--
1.7.3.4

--
Gleb.

--
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Paolo Bonzini

2013-09-10 13:26:16 UTC

Post by Arthur Chunqi Li

Post by Gleb Natapov

Post by Jan Kiszka
Likely a typo, but a fatal one as kvm_set_cr0 performs checks on the

Not a typo :) That what Avi asked for do during initial nested VMX
review: http://markmail.org/message/hhidqyhbo2mrgxxc
But there is at least one transition check that kvm_set_cr0() does that
should not be done during vmexit emulation, namely CS.L bit check, so I
tend to agree that kvm_set_cr0() is not appropriate here, at lest not as
it is. But can we skip other checks kvm_set_cr0() does? For instance
what prevents us from loading CR0.PG = 1 EFER.LME = 1 and CR4.PAE = 0
during nested vmexit? What _should_ prevent it is vmentry check from
26.2.4
If the "host address-space size" VM-exit control is 1, the following
- Bit 5 of the CR4 field (corresponding to CR4.PAE) is 1.

Hi Jan and Gleb,
Our nested VMX testing framework may not support such testing modes.
Here we need to catch the failed result (ZF flag) close to vmresume,
but vmresume/vmlaunch is well encapsulated in our framework. If we
simply write a vmresume inline function, the VMX will act unexpectedly
when it doesn't cause "vmresume fail".
Do you have any ideas about this?

You could have a "failed entry" handler that by default would be simply

return launched ? VMX_TEST_RESUME_ERR :
VMX_TEST_LAUNCH_ERR

For this test, it would return VMX_TEST_RESUME or VMX_TEST_VMEXIT if it
fails, while the "regular" exit_handler would return VMX_TEST_EXIT.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Gleb Natapov

2013-09-15 11:01:04 UTC

Post by Arthur Chunqi Li

Post by Gleb Natapov

Post by Jan Kiszka
Likely a typo, but a fatal one as kvm_set_cr0 performs checks on the

Not a typo :) That what Avi asked for do during initial nested VMX
review: http://markmail.org/message/hhidqyhbo2mrgxxc
But there is at least one transition check that kvm_set_cr0() does that
should not be done during vmexit emulation, namely CS.L bit check, so I
tend to agree that kvm_set_cr0() is not appropriate here, at lest not as
it is. But can we skip other checks kvm_set_cr0() does? For instance
what prevents us from loading CR0.PG = 1 EFER.LME = 1 and CR4.PAE = 0
during nested vmexit? What _should_ prevent it is vmentry check from
26.2.4
If the "host address-space size" VM-exit control is 1, the following
- Bit 5 of the CR4 field (corresponding to CR4.PAE) is 1.

Hi Jan and Gleb,
Our nested VMX testing framework may not support such testing modes.
Here we need to catch the failed result (ZF flag) close to vmresume,
but vmresume/vmlaunch is well encapsulated in our framework. If we
simply write a vmresume inline function, the VMX will act unexpectedly
when it doesn't cause "vmresume fail".
Do you have any ideas about this?

I am not sure what you mean. The framework does capture failed vmentry
flags, but it handles the failure internally in vmx_run(). If you want
framework to be able to provide vmentry failure handler do what Paolo
suggests.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-08-08 14:26:32 UTC

This job is normally performed by the architectural EFER set service
which we cannot use as it prevents transitions that are valid when
switching between L1 and L2. So open-code the update of base_role.nxe
after changing EFER on VM-entry and exit.

Signed-off-by: Jan Kiszka <***@siemens.com>
---
arch/x86/kvm/vmx.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9b0510b..7cc208a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7735,6 +7735,8 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
/* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
vmx_set_efer(vcpu, vcpu->arch.efer);
+ vcpu->arch.mmu.base_role.nxe =
+ (vcpu->arch.efer & EFER_NX) && !enable_ept;

if (nested_cpu_has_ept(vmcs12)) {
kvm_mmu_unload(vcpu);
@@ -8179,6 +8181,8 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
else
vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
vmx_set_efer(vcpu, vcpu->arch.efer);
+ vcpu->arch.mmu.base_role.nxe =
+ (vcpu->arch.efer & EFER_NX) && !enable_ept;

kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12->host_rsp);
kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12->host_rip);

--
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Gleb Natapov

2013-09-03 08:39:39 UTC

Post by Jan Kiszka
This job is normally performed by the architectural EFER set service
which we cannot use as it prevents transitions that are valid when
switching between L1 and L2. So open-code the update of base_role.nxe
after changing EFER on VM-entry and exit.
---
arch/x86/kvm/vmx.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9b0510b..7cc208a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7735,6 +7735,8 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
/* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
vmx_set_efer(vcpu, vcpu->arch.efer);
+ vcpu->arch.mmu.base_role.nxe =
+ (vcpu->arch.efer & EFER_NX) && !enable_ept;

kvm_mmu_reset_context() called later in this function does that.

Post by Jan Kiszka
if (nested_cpu_has_ept(vmcs12)) {
kvm_mmu_unload(vcpu);
@@ -8179,6 +8181,8 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
else
vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
vmx_set_efer(vcpu, vcpu->arch.efer);
+ vcpu->arch.mmu.base_role.nxe =
+ (vcpu->arch.efer & EFER_NX) && !enable_ept;

Same here.

Post by Jan Kiszka
kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12->host_rsp);
kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12->host_rip);
--
1.7.3.4

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-09-03 08:51:31 UTC

Post by Gleb Natapov

Post by Jan Kiszka
This job is normally performed by the architectural EFER set service
which we cannot use as it prevents transitions that are valid when
switching between L1 and L2. So open-code the update of base_role.nxe
after changing EFER on VM-entry and exit.
---
arch/x86/kvm/vmx.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9b0510b..7cc208a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7735,6 +7735,8 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
/* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
vmx_set_efer(vcpu, vcpu->arch.efer);
+ vcpu->arch.mmu.base_role.nxe =
+ (vcpu->arch.efer & EFER_NX) && !enable_ept;

kvm_mmu_reset_context() called later in this function does that.

For all relevant scenarios? The code is pretty convoluted, so this is
non-obvious for me: kvm_mmu_reset_context calls init_kvm_mmu which may
call init_kvm_softmmu that does kvm_init_shadow_mmu with nxe
initialization. But is that path always taken?

Jan

Post by Gleb Natapov

Post by Jan Kiszka
if (nested_cpu_has_ept(vmcs12)) {
kvm_mmu_unload(vcpu);
@@ -8179,6 +8181,8 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
else
vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
vmx_set_efer(vcpu, vcpu->arch.efer);
+ vcpu->arch.mmu.base_role.nxe =
+ (vcpu->arch.efer & EFER_NX) && !enable_ept;

Same here.

Post by Jan Kiszka
kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12->host_rsp);
kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12->host_rip);
--
1.7.3.4

--
Gleb.

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Gleb Natapov

2013-09-03 09:04:11 UTC

Post by Jan Kiszka

Post by Gleb Natapov

Post by Jan Kiszka
This job is normally performed by the architectural EFER set service
which we cannot use as it prevents transitions that are valid when
switching between L1 and L2. So open-code the update of base_role.nxe
after changing EFER on VM-entry and exit.
---
arch/x86/kvm/vmx.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9b0510b..7cc208a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7735,6 +7735,8 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
/* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
vmx_set_efer(vcpu, vcpu->arch.efer);
+ vcpu->arch.mmu.base_role.nxe =
+ (vcpu->arch.efer & EFER_NX) && !enable_ept;

kvm_mmu_reset_context() called later in this function does that.

For all relevant scenarios? The code is pretty convoluted, so this is
non-obvious for me: kvm_mmu_reset_context calls init_kvm_mmu which may
call init_kvm_softmmu that does kvm_init_shadow_mmu with nxe
initialization. But is that path always taken?

What scenarios are relevant here? kvm_mmu_reset_context() is what
kvm_set_efer() would have called anyway. Since you use !enable_ept
here I assume that relevant scenario is shadow on shadow and in this
case it is easy to see that kvm_mmu_reset_context() will call
kvm_init_shadow_mmu() eventually. For all other scenarios guest nx
setting is irrelevant.

Post by Jan Kiszka
Jan

Post by Gleb Natapov

Post by Jan Kiszka
if (nested_cpu_has_ept(vmcs12)) {
kvm_mmu_unload(vcpu);
@@ -8179,6 +8181,8 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
else
vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
vmx_set_efer(vcpu, vcpu->arch.efer);
+ vcpu->arch.mmu.base_role.nxe =
+ (vcpu->arch.efer & EFER_NX) && !enable_ept;

Same here.

Post by Jan Kiszka
kvm_register_write(vcpu, VCPU_REGS_RSP, vmcs12->host_rsp);
kvm_register_write(vcpu, VCPU_REGS_RIP, vmcs12->host_rip);
--
1.7.3.4

--
Gleb.

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-09-03 09:32:10 UTC

Post by Gleb Natapov

Post by Jan Kiszka

Post by Gleb Natapov

Post by Jan Kiszka
This job is normally performed by the architectural EFER set service
which we cannot use as it prevents transitions that are valid when
switching between L1 and L2. So open-code the update of base_role.nxe
after changing EFER on VM-entry and exit.
---
arch/x86/kvm/vmx.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9b0510b..7cc208a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7735,6 +7735,8 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vcpu->arch.efer &= ~(EFER_LMA | EFER_LME);
/* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
vmx_set_efer(vcpu, vcpu->arch.efer);
+ vcpu->arch.mmu.base_role.nxe =
+ (vcpu->arch.efer & EFER_NX) && !enable_ept;

kvm_mmu_reset_context() called later in this function does that.

For all relevant scenarios? The code is pretty convoluted, so this is
non-obvious for me: kvm_mmu_reset_context calls init_kvm_mmu which may
call init_kvm_softmmu that does kvm_init_shadow_mmu with nxe
initialization. But is that path always taken?

What scenarios are relevant here? kvm_mmu_reset_context() is what
kvm_set_efer() would have called anyway. Since you use !enable_ept
here I assume that relevant scenario is shadow on shadow and in this
case it is easy to see that kvm_mmu_reset_context() will call
kvm_init_shadow_mmu() eventually. For all other scenarios guest nx
setting is irrelevant.

Indeed... Ah! I missed that 2c9afa52 changed the picture. OK, another
obsolete one.

Jan

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-08-08 14:26:30 UTC

We need to update EFER.NX before building the nEPT state via
nested_ept_init_mmu_context. Otherwise, we risk to create an MMU context
that claims to have NX disabled while the guest EPT used NX. This will
cause spurious faults for L2.

Signed-off-by: Jan Kiszka <***@siemens.com>
---
arch/x86/kvm/vmx.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6c42518..363fe19 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7727,11 +7727,6 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmx_flush_tlb(vcpu);
}

- if (nested_cpu_has_ept(vmcs12)) {
- kvm_mmu_unload(vcpu);
- nested_ept_init_mmu_context(vcpu);
- }
-
if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
vcpu->arch.efer = vmcs12->guest_ia32_efer;
else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
@@ -7741,6 +7736,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
/* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
vmx_set_efer(vcpu, vcpu->arch.efer);

+ if (nested_cpu_has_ept(vmcs12)) {
+ kvm_mmu_unload(vcpu);
+ nested_ept_init_mmu_context(vcpu);
+ }
+
/*
* This sets GUEST_CR0 to vmcs12->guest_cr0, with possibly a modified
* TS bit (for lazy fpu) and bits which we consider mandatory enabled.

--
1.7.3.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Gleb Natapov

2013-09-02 13:16:12 UTC

Post by Jan Kiszka
We need to update EFER.NX before building the nEPT state via
nested_ept_init_mmu_context. Otherwise, we risk to create an MMU context
that claims to have NX disabled while the guest EPT used NX. This will
cause spurious faults for L2.

Hmm, I do not see how nested ept mmu depends on guests EFER.NX setting.
It just sets mmu->nx to true.

Post by Jan Kiszka
---
arch/x86/kvm/vmx.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6c42518..363fe19 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7727,11 +7727,6 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmx_flush_tlb(vcpu);
}
- if (nested_cpu_has_ept(vmcs12)) {
- kvm_mmu_unload(vcpu);
- nested_ept_init_mmu_context(vcpu);
- }
-
if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
vcpu->arch.efer = vmcs12->guest_ia32_efer;
else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
@@ -7741,6 +7736,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
/* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
vmx_set_efer(vcpu, vcpu->arch.efer);
+ if (nested_cpu_has_ept(vmcs12)) {
+ kvm_mmu_unload(vcpu);
+ nested_ept_init_mmu_context(vcpu);
+ }
+
/*
* This sets GUEST_CR0 to vmcs12->guest_cr0, with possibly a modified
* TS bit (for lazy fpu) and bits which we consider mandatory enabled.
--
1.7.3.4

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-09-02 17:58:30 UTC

Post by Gleb Natapov

Post by Jan Kiszka
We need to update EFER.NX before building the nEPT state via
nested_ept_init_mmu_context. Otherwise, we risk to create an MMU context
that claims to have NX disabled while the guest EPT used NX. This will
cause spurious faults for L2.

Hmm, I do not see how nested ept mmu depends on guests EFER.NX setting.
It just sets mmu->nx to true.

Don't ask me for the details behind this, but update_permission_bitmask
called by kvm_init_shadow_ept_mmu is using it e.g. And the
"before-after" effect was clearly visible when L2 and L1 were using
different NX settings. Maybe Arthur can write a test for this.

Jan

Post by Gleb Natapov

Post by Jan Kiszka
---
arch/x86/kvm/vmx.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6c42518..363fe19 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7727,11 +7727,6 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmx_flush_tlb(vcpu);
}
- if (nested_cpu_has_ept(vmcs12)) {
- kvm_mmu_unload(vcpu);
- nested_ept_init_mmu_context(vcpu);
- }
-
if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
vcpu->arch.efer = vmcs12->guest_ia32_efer;
else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
@@ -7741,6 +7736,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
/* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
vmx_set_efer(vcpu, vcpu->arch.efer);
+ if (nested_cpu_has_ept(vmcs12)) {
+ kvm_mmu_unload(vcpu);
+ nested_ept_init_mmu_context(vcpu);
+ }
+
/*
* This sets GUEST_CR0 to vmcs12->guest_cr0, with possibly a modified
* TS bit (for lazy fpu) and bits which we consider mandatory enabled.
--
1.7.3.4

--
Gleb.

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Gleb Natapov

2013-09-02 18:09:35 UTC

Post by Jan Kiszka

Post by Gleb Natapov

Post by Jan Kiszka
We need to update EFER.NX before building the nEPT state via
nested_ept_init_mmu_context. Otherwise, we risk to create an MMU context
that claims to have NX disabled while the guest EPT used NX. This will
cause spurious faults for L2.

Hmm, I do not see how nested ept mmu depends on guests EFER.NX setting.
It just sets mmu->nx to true.

Don't ask me for the details behind this, but update_permission_bitmask
called by kvm_init_shadow_ept_mmu is using it e.g. And the

It uses it only in !ept case though and never looks at a guest setting
as far as I can tell. Is it possible that this was an artifact of all
nEPT code and the latest one does not need this patch?

Post by Jan Kiszka
"before-after" effect was clearly visible when L2 and L1 were using
different NX settings. Maybe Arthur can write a test for this.
Jan

Post by Gleb Natapov

Post by Jan Kiszka
---
arch/x86/kvm/vmx.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6c42518..363fe19 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7727,11 +7727,6 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
vmx_flush_tlb(vcpu);
}
- if (nested_cpu_has_ept(vmcs12)) {
- kvm_mmu_unload(vcpu);
- nested_ept_init_mmu_context(vcpu);
- }
-
if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
vcpu->arch.efer = vmcs12->guest_ia32_efer;
else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
@@ -7741,6 +7736,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
/* Note: modifies VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
vmx_set_efer(vcpu, vcpu->arch.efer);
+ if (nested_cpu_has_ept(vmcs12)) {
+ kvm_mmu_unload(vcpu);
+ nested_ept_init_mmu_context(vcpu);
+ }
+
/*
* This sets GUEST_CR0 to vmcs12->guest_cr0, with possibly a modified
* TS bit (for lazy fpu) and bits which we consider mandatory enabled.
--
1.7.3.4

--
Gleb.

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-09-02 18:20:18 UTC

Post by Gleb Natapov

Post by Jan Kiszka

Post by Gleb Natapov

Post by Jan Kiszka
We need to update EFER.NX before building the nEPT state via
nested_ept_init_mmu_context. Otherwise, we risk to create an MMU context
that claims to have NX disabled while the guest EPT used NX. This will
cause spurious faults for L2.

Hmm, I do not see how nested ept mmu depends on guests EFER.NX setting.
It just sets mmu->nx to true.

Don't ask me for the details behind this, but update_permission_bitmask
called by kvm_init_shadow_ept_mmu is using it e.g. And the

It uses it only in !ept case though and never looks at a guest setting
as far as I can tell. Is it possible that this was an artifact of all
nEPT code and the latest one does not need this patch?

Hmm, possibly. Let me recheck, hope I can find the reproduction pattern
again...

Jan

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-09-02 18:38:11 UTC

Post by Jan Kiszka

Post by Gleb Natapov

Post by Jan Kiszka

Post by Gleb Natapov

Post by Jan Kiszka
We need to update EFER.NX before building the nEPT state via
nested_ept_init_mmu_context. Otherwise, we risk to create an MMU context
that claims to have NX disabled while the guest EPT used NX. This will
cause spurious faults for L2.

Hmm, I do not see how nested ept mmu depends on guests EFER.NX setting.
It just sets mmu->nx to true.

Don't ask me for the details behind this, but update_permission_bitmask
called by kvm_init_shadow_ept_mmu is using it e.g. And the

It uses it only in !ept case though and never looks at a guest setting
as far as I can tell. Is it possible that this was an artifact of all
nEPT code and the latest one does not need this patch?

Hmm, possibly. Let me recheck, hope I can find the reproduction pattern
again...

Yeah, drop it. Things work fine without it, and I just found an old
version of nEPT where kvm_init_shadow_EPT_mmu did this:

context->nx = is_nx(vcpu); /* TODO: ? */

Obviously, this patch was a workaround for that issue.

Jan

--
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Jan Kiszka

2013-08-25 06:46:53 UTC

Post by Jan Kiszka
These patches apply on top of kvm.git queue.
- rebased over queue
- added "Do not set identity page map for L2"
- dropped "Fix guest CR3 read-back on VM-exit"
KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0 in
load_vmcs12_host_state
KVM: nVMX: Do not set identity page map for L2
KVM: nVMX: Load nEPT state after EFER
KVM: nVMX: Implement support for EFER saving on VM-exit
KVM: nVMX: Update mmu.base_role.nxe after EFER loading on
VM-entry/exit
KVM: nVMX: Enable unrestricted guest mode support
arch/x86/kvm/vmx.c | 44 +++++++++++++++++++++++++++++++-------------
1 files changed, 31 insertions(+), 13 deletions(-)

Ping for this series. It still applies, now to next, without conflicts.

Jan

Paolo Bonzini

2013-08-25 10:01:37 UTC

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Post by Jan Kiszka

Post by Jan Kiszka
These patches apply on top of kvm.git queue.
Changes in v3: - rebased over queue - added "Do not set identity
page map for L2" - dropped "Fix guest CR3 read-back on VM-exit"
Jan Kiszka (6): KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0
in load_vmcs12_host_state KVM: nVMX: Do not set identity page map
for L2 KVM: nVMX: Load nEPT state after EFER KVM: nVMX: Implement
support for EFER saving on VM-exit KVM: nVMX: Update
Enable unrestricted guest mode support
arch/x86/kvm/vmx.c | 44
+++++++++++++++++++++++++++++++------------- 1 files changed, 31
insertions(+), 13 deletions(-)

Ping for this series. It still applies, now to next, without
conflicts.

I tested it and it works fine. It also looks good to me. However, I
wanted to leave time to Gleb so that he could also review it.

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJSGdYBAAoJEBvWZb6bTYbyQgsP/ihhQga53141nXf9U70lLbbP
rv4hJCsqwHSOmjlEww0qUJOAE1nwlSYwowkucLKSVeuKy2/MXl+qwfxdLkFVqwgs
CvSadewf69BrYDkIs9gaxZ++CDKOWGppoKU2EyWLTibiloUti9kATXcxNGyQNHgY
7SrNrwzayUyp4KCm2iXcmag9U3kRscTHH2zxKFGjKtrvsf3yNjDHQJlXFIV0r1IB
HJCjm5kb8k1r9MGJye1nR4ZCyzqNtqyxGLaJV6W4cz5kJcNIStJuL18SGGzf77jb
I4DpDredzzuI22CsnJEicucNpq5i9C7tIxqo9q5zeNsg9gya+SdkHZEhOGompEP6
tDrd0Apn3Bbz48kLuGF6VvX3g4iLNop5qGXsZ66NxnQdtT9Yfx1jOQA2yqZGfcPi
hSNTHQBHlcvsXTO3n7lkCHd0inyxe+4zxF1hQBzYtT9gPp3KQRt4nd2ZqcAjVZrb
iEqTyXaiTi+5FDJXMziWNompQwWwJ8muMKNCd+Z7371TtVATmx7CEmGsKoB7buGb
sZiah2uQ0FBkD6v4wSL9VY9iTb+IRBEgAB/9viBiOcyxAngiAcTT8WB1snxlMUmj
5M31qPIWHN1qbi/JBE+/K3yFgAHJPjkfTsavCVqjXl4Su5bMmUOoCHE30Z6/CIQV
VWVaAJAymWyY+o+ON8HQ
=8y8m
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Gleb Natapov

2013-08-27 11:18:50 UTC

Post by Paolo Bonzini
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Post by Jan Kiszka

Post by Jan Kiszka
These patches apply on top of kvm.git queue.
Changes in v3: - rebased over queue - added "Do not set identity
page map for L2" - dropped "Fix guest CR3 read-back on VM-exit"
Jan Kiszka (6): KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0
in load_vmcs12_host_state KVM: nVMX: Do not set identity page map
for L2 KVM: nVMX: Load nEPT state after EFER KVM: nVMX: Implement
support for EFER saving on VM-exit KVM: nVMX: Update
Enable unrestricted guest mode support
arch/x86/kvm/vmx.c | 44
+++++++++++++++++++++++++++++++------------- 1 files changed, 31
insertions(+), 13 deletions(-)

Ping for this series. It still applies, now to next, without
conflicts.

I tested it and it works fine. It also looks good to me. However, I
wanted to leave time to Gleb so that he could also review it.

Yes please, give me a little more time to get back on track.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Paolo Bonzini

2013-09-12 16:34:19 UTC

Post by Jan Kiszka
These patches apply on top of kvm.git queue.
- rebased over queue
- added "Do not set identity page map for L2"
- dropped "Fix guest CR3 read-back on VM-exit"
KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0 in
load_vmcs12_host_state
KVM: nVMX: Do not set identity page map for L2
KVM: nVMX: Load nEPT state after EFER
KVM: nVMX: Implement support for EFER saving on VM-exit
KVM: nVMX: Update mmu.base_role.nxe after EFER loading on
VM-entry/exit
KVM: nVMX: Enable unrestricted guest mode support
arch/x86/kvm/vmx.c | 44 +++++++++++++++++++++++++++++++-------------
1 files changed, 31 insertions(+), 13 deletions(-)

Applied 1 (v4), 2, 4, 6 to kvm/queue for 3.13, thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

29 Replies
2 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Jan Kiszka 2013-08-08 14:26:27 UTC

Jan Kiszka 2013-08-08 14:26:29 UTC

Jan Kiszka 2013-08-08 14:26:31 UTC

Jan Kiszka 2013-08-08 14:26:33 UTC

Jan Kiszka 2013-08-08 14:26:28 UTC

Gleb Natapov 2013-09-02 08:21:36 UTC

Jan Kiszka 2013-09-02 09:06:53 UTC

Gleb Natapov 2013-09-02 09:36:28 UTC

Jan Kiszka 2013-09-03 17:44:41 UTC

Gleb Natapov 2013-09-03 17:55:44 UTC

Jan Kiszka 2013-09-03 19:11:45 UTC

Gleb Natapov 2013-09-08 08:57:41 UTC

Arthur Chunqi Li 2013-09-10 13:14:14 UTC

Paolo Bonzini 2013-09-10 13:26:16 UTC

Gleb Natapov 2013-09-15 11:01:04 UTC

Jan Kiszka 2013-08-08 14:26:32 UTC

Gleb Natapov 2013-09-03 08:39:39 UTC

Jan Kiszka 2013-09-03 08:51:31 UTC

Gleb Natapov 2013-09-03 09:04:11 UTC

Jan Kiszka 2013-09-03 09:32:10 UTC

Jan Kiszka 2013-08-08 14:26:30 UTC

Gleb Natapov 2013-09-02 13:16:12 UTC

Jan Kiszka 2013-09-02 17:58:30 UTC

Gleb Natapov 2013-09-02 18:09:35 UTC

Jan Kiszka 2013-09-02 18:20:18 UTC

Jan Kiszka 2013-09-02 18:38:11 UTC

Jan Kiszka 2013-08-25 06:46:53 UTC

Paolo Bonzini 2013-08-25 10:01:37 UTC

Gleb Natapov 2013-08-27 11:18:50 UTC

Paolo Bonzini 2013-09-12 16:34:19 UTC

about - legalese

Loading...