[PATCH v12 0/6] arm/KVM: dirty page logging support for ARMv7 (3.17.0-rc1)

Discussion:

Mario Smarduch

2014-10-22 22:34:05 UTC

This patch series introduces dirty page logging for ARMv7 and adds some degree
of generic dirty logging support for x86, armv7 and later armv8.

I implemented Alex's suggestion after he took a look at the patches at kvm
forum to simplify the generic/arch split - leaving mips, powerpc, s390,
(ia64 although broken) unchanged. x86/armv7 now share some dirty logging code.
armv8 dirty log patches have been posted and tested but for time being armv8
is non-generic as well.

I briefly spoke to most of you at kvm forum, and this is the patch series
I was referring to. Implementation changed from previous version (patches
1 & 2), those who acked previous revision, please review again.

Last 4 patches (ARM) have been rebased for newer kernel, with no signifcant
changes.

Testing:
- Generally live migration + checksumming of source/destination memory regions
is used validate correctness.
- qemu machvirt, VExpress - Exynos 5440, FastModels - lmbench + dirty guest
memory cycling.
- ARMv8 Foundation Model/kvmtool - Due to slight overlap in 2nd stage handlers
did a basic bringup using qemu.
- x86_64 qemu default machine model, tested migration on HP Z620, tested
convergence for several dirty page rates

See https://github.com/mjsmar/arm-dirtylog-tests
- Dirtlogtest-setup.pdf for ARMv7
- https://github.com/mjsmar/arm-dirtylog-tests/tree/master/v7 - README

The patch affects armv7,armv8, mips, ia64, powerpc, s390, x86_64. Patch
series has been compiled for affected architectures:

- x86_64 - defconfig
- ia64 - ia64-linux-gcc4.6.3 - defconfig, ia64 Kconfig defines BROKEN worked
around that to make sure new changes don't break build. Eventually build
breaks due to other reasons.
- mips - mips64-linux-gcc4.6.3 - malta_kvm_defconfig
- ppc - powerpc64-linux-gcc4.6.3 - pseries_defconfig
- s390 - s390x-linux-gcc4.6.3 - defconfig
- armv8 - aarch64-linux-gnu-gcc4.8.1 - defconfig

ARMv7 Dirty page logging implementation overivew-
- initially write protects VM RAM memory region - 2nd stage page tables
- add support to read dirty page log and again write protect the dirty pages
- second stage page table for next pass.
- second stage huge page are dissolved into small page tables to keep track of
dirty pages at page granularity. Tracking at huge page granularity limits
migration to an almost idle system. Small page size logging supports higher
memory dirty rates.
- In the event migration is canceled, normal behavior is resumed huge pages
are rebuilt over time.

Changes since v11:
- Implemented Alex's comments to simplify generic layer.

Changes since v10:
- addressed wanghaibin comments
- addressed Christoffers comments

Changes since v9:
- Split patches into generic and architecture specific variants for TLB Flushing
and dirty log read (patches 1,2 & 3,4,5,6)
- rebased to 3.16.0-rc1
- Applied Christoffers comments.

Mario Smarduch (6):
KVM: Add architecture-defined TLB flush support
KVM: Add generic support for dirty page logging
arm: KVM: Add ARMv7 API to flush TLBs
arm: KVM: Add initial dirty page locking infrastructure
arm: KVM: dirty log read write protect support
arm: KVM: ARMv7 dirty page logging 2nd stage page fault

arch/arm/include/asm/kvm_asm.h | 1 +
arch/arm/include/asm/kvm_host.h | 14 +++
arch/arm/include/asm/kvm_mmu.h | 20 ++++
arch/arm/include/asm/pgtable-3level.h | 1 +
arch/arm/kvm/Kconfig | 2 +
arch/arm/kvm/Makefile | 1 +
arch/arm/kvm/arm.c | 2 +
arch/arm/kvm/interrupts.S | 11 ++
arch/arm/kvm/mmu.c | 209 +++++++++++++++++++++++++++++++--
arch/x86/include/asm/kvm_host.h | 3 -
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/Makefile | 1 +
arch/x86/kvm/x86.c | 86 --------------
include/linux/kvm_host.h | 4 +
virt/kvm/Kconfig | 6 +
virt/kvm/dirtylog.c | 112 ++++++++++++++++++
virt/kvm/kvm_main.c | 2 +
17 files changed, 380 insertions(+), 96 deletions(-)
create mode 100644 virt/kvm/dirtylog.c

--
1.7.9.5

Mario Smarduch

2014-10-22 22:34:06 UTC

Permalink

This patch adds support for architecture implemented VM TLB flush, currently
ARMv7 defines HAVE_KVM_ARCH_TLB_FLUSH_ALL. This leaves other architectures
unaffected using the generic version. In subsequent patch ARMv7 defines
HAVE_KVM_ARCH_TLB_FLUSH_ALL and it's own TLB flush interface.

Signed-off-by: Mario Smarduch <***@samsung.com>
---
virt/kvm/Kconfig | 3 +++
virt/kvm/kvm_main.c | 2 ++
2 files changed, 5 insertions(+)

diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index fc0c5e6..3796a21 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -37,3 +37,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT

config KVM_VFIO
bool
+
+config HAVE_KVM_ARCH_TLB_FLUSH_ALL
+ bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..887df87 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -184,6 +184,7 @@ static bool make_all_cpus_request(struct kvm *kvm, unsigned int req)
return called;
}

+#ifndef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
void kvm_flush_remote_tlbs(struct kvm *kvm)
{
long dirty_count = kvm->tlbs_dirty;
@@ -194,6 +195,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
}
EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
+#endif

void kvm_reload_remote_mmus(struct kvm *kvm)
{

--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Mario Smarduch

2014-10-22 22:34:07 UTC

Permalink

This patch defines KVM_GENERIC_DIRTYLOG, and moves dirty log read function
to it's own file virt/kvm/dirtylog.c. x86 is updated to use the generic
dirty log interface, selecting KVM_GENERIC_DIRTYLOG in its Kconfig and
makefile. No other architectures are affected, each uses it's own version.
This changed from previous patch revision where non-generic architectures
were modified.

In subsequent patch armv7 does samething. All other architectures continue
use architecture defined version.

Signed-off-by: Mario Smarduch <***@samsung.com>
---
arch/x86/include/asm/kvm_host.h | 3 --
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/Makefile | 1 +
arch/x86/kvm/x86.c | 86 ------------------------------
include/linux/kvm_host.h | 4 ++
virt/kvm/Kconfig | 3 ++
virt/kvm/dirtylog.c | 112 +++++++++++++++++++++++++++++++++++++++
7 files changed, 121 insertions(+), 89 deletions(-)
create mode 100644 virt/kvm/dirtylog.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7c492ed..934dc24 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -805,9 +805,6 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,

void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
-void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
- struct kvm_memory_slot *slot,
- gfn_t gfn_offset, unsigned long mask);
void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index f9d16ff..dca6fc7 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -40,6 +40,7 @@ config KVM
select HAVE_KVM_MSI
select HAVE_KVM_CPU_RELAX_INTERCEPT
select KVM_VFIO
+ select KVM_GENERIC_DIRTYLOG
---help---
Support hosting fully virtualized guest machines using hardware
virtualization extensions. You will need a fairly recent
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 25d22b2..2536195 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -12,6 +12,7 @@ kvm-y += $(KVM)/kvm_main.o $(KVM)/ioapic.o \
$(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += $(KVM)/assigned-dev.o $(KVM)/iommu.o
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
+kvm-$(CONFIG_KVM_GENERIC_DIRTYLOG) +=$(KVM)/dirtylog.o

kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
i8254.o cpuid.o pmu.o
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8f1e22d..1467fa4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3595,92 +3595,6 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
return 0;
}

-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * We need to keep it in mind that VCPU threads can write to the bitmap
- * concurrently. So, to avoid losing data, we keep the following order for
- * each bit:
- *
- * 1. Take a snapshot of the bit and clear it if needed.
- * 2. Write protect the corresponding page.
- * 3. Flush TLB's if needed.
- * 4. Copy the snapshot to the userspace.
- *
- * Between 2 and 3, the guest may write to the page using the remaining TLB
- * entry. This is not a problem because the page will be reported dirty at
- * step 4 using the snapshot taken before and step 3 ensures that successive
- * writes will be logged for the next call.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
-{
- int r;
- struct kvm_memory_slot *memslot;
- unsigned long n, i;
- unsigned long *dirty_bitmap;
- unsigned long *dirty_bitmap_buffer;
- bool is_dirty = false;
-
- mutex_lock(&kvm->slots_lock);
-
- r = -EINVAL;
- if (log->slot >= KVM_USER_MEM_SLOTS)
- goto out;
-
- memslot = id_to_memslot(kvm->memslots, log->slot);
-
- dirty_bitmap = memslot->dirty_bitmap;
- r = -ENOENT;
- if (!dirty_bitmap)
- goto out;
-
- n = kvm_dirty_bitmap_bytes(memslot);
-
- dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
- memset(dirty_bitmap_buffer, 0, n);
-
- spin_lock(&kvm->mmu_lock);
-
- for (i = 0; i < n / sizeof(long); i++) {
- unsigned long mask;
- gfn_t offset;
-
- if (!dirty_bitmap[i])
- continue;
-
- is_dirty = true;
-
- mask = xchg(&dirty_bitmap[i], 0);
- dirty_bitmap_buffer[i] = mask;
-
- offset = i * BITS_PER_LONG;
- kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
- }
-
- spin_unlock(&kvm->mmu_lock);
-
- /* See the comments in kvm_mmu_slot_remove_write_access(). */
- lockdep_assert_held(&kvm->slots_lock);
-
- /*
- * All the TLBs can be flushed out of mmu lock, see the comments in
- * kvm_mmu_slot_remove_write_access().
- */
- if (is_dirty)
- kvm_flush_remote_tlbs(kvm);
-
- r = -EFAULT;
- if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
- goto out;
-
- r = 0;
-out:
- mutex_unlock(&kvm->slots_lock);
- return r;
-}
-
int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
bool line_status)
{
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a4c33b3..05ebcbe 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -592,6 +592,10 @@ int kvm_get_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log, int *is_dirty);
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log);
+void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
+ gfn_t gfn_offset,
+ unsigned long mask);

int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
bool line_status);
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 3796a21..368fc4a 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -40,3 +40,6 @@ config KVM_VFIO

config HAVE_KVM_ARCH_TLB_FLUSH_ALL
bool
+
+config KVM_GENERIC_DIRTYLOG
+ bool
diff --git a/virt/kvm/dirtylog.c b/virt/kvm/dirtylog.c
new file mode 100644
index 0000000..67ffffa
--- /dev/null
+++ b/virt/kvm/dirtylog.c
@@ -0,0 +1,112 @@
+/*
+ * kvm generic dirty logging support, used by architectures that share
+ * comman dirty page logging implementation.
+ *
+ * Copyright (C) 2006 Qumranet, Inc.
+ * Copyright 2010 Red Hat, Inc. and/or its affiliates.
+ *
+ * Authors:
+ * Avi Kivity <***@qumranet.com>
+ * Yaniv Kamay <***@qumranet.com>
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kvm_host.h>
+
+#include <asm/processor.h>
+#include <linux/io.h>
+#include <linux/uaccess.h>
+
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * We need to keep it in mind that VCPU threads can write to the bitmap
+ * concurrently. So, to avoid losing data, we keep the following order for
+ * each bit:
+ *
+ * 1. Take a snapshot of the bit and clear it if needed.
+ * 2. Write protect the corresponding page.
+ * 3. Flush TLB's if needed.
+ * 4. Copy the snapshot to the userspace.
+ *
+ * Between 2 and 3, the guest may write to the page using the remaining TLB
+ * entry. This is not a problem because the page will be reported dirty at
+ * step 4 using the snapshot taken before and step 3 ensures that successive
+ * writes will be logged for the next call.
+ */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+{
+ int r;
+ struct kvm_memory_slot *memslot;
+ unsigned long n, i;
+ unsigned long *dirty_bitmap;
+ unsigned long *dirty_bitmap_buffer;
+ bool is_dirty = false;
+
+ mutex_lock(&kvm->slots_lock);
+
+ r = -EINVAL;
+ if (log->slot >= KVM_USER_MEM_SLOTS)
+ goto out;
+
+ memslot = id_to_memslot(kvm->memslots, log->slot);
+
+ dirty_bitmap = memslot->dirty_bitmap;
+ r = -ENOENT;
+ if (!dirty_bitmap)
+ goto out;
+
+ n = kvm_dirty_bitmap_bytes(memslot);
+
+ dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
+ memset(dirty_bitmap_buffer, 0, n);
+
+ spin_lock(&kvm->mmu_lock);
+
+ for (i = 0; i < n / sizeof(long); i++) {
+ unsigned long mask;
+ gfn_t offset;
+
+ if (!dirty_bitmap[i])
+ continue;
+
+ is_dirty = true;
+
+ mask = xchg(&dirty_bitmap[i], 0);
+ dirty_bitmap_buffer[i] = mask;
+
+ offset = i * BITS_PER_LONG;
+ kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
+ }
+
+ spin_unlock(&kvm->mmu_lock);
+
+ /* See the comments in kvm_mmu_slot_remove_write_access(). */
+ lockdep_assert_held(&kvm->slots_lock);
+
+ /*
+ * All the TLBs can be flushed out of mmu lock, see the comments in
+ * kvm_mmu_slot_remove_write_access().
+ */
+ if (is_dirty)
+ kvm_flush_remote_tlbs(kvm);
+
+ r = -EFAULT;
+ if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
+ goto out;
+
+ r = 0;
+out:
+ mutex_unlock(&kvm->slots_lock);
+ return r;
+}

--
1.7.9.5

Mario Smarduch

2014-10-22 22:34:08 UTC

Permalink

This patch adds ARMv7 architecture TLB Flush function.

Acked-by: Christoffer Dall <christoffer.dall at linaro.org>
Signed-off-by: Mario Smarduch <***@samsung.com>
---
arch/arm/include/asm/kvm_asm.h | 1 +
arch/arm/include/asm/kvm_host.h | 12 ++++++++++++
arch/arm/kvm/Kconfig | 1 +
arch/arm/kvm/interrupts.S | 11 +++++++++++
4 files changed, 25 insertions(+)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 3a67bec..25410b2 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -96,6 +96,7 @@ extern char __kvm_hyp_code_end[];

extern void __kvm_flush_vm_context(void);
extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);

extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
#endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 6dfb404..fad598c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -220,6 +220,18 @@ static inline void __cpu_init_hyp_mode(phys_addr_t boot_pgd_ptr,
kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
}

+/**
+ * kvm_arch_flush_remote_tlbs() - flush all VM TLB entries
+ * @kvm: pointer to kvm structure.
+ *
+ * Interface to HYP function to flush all VM TLB entries without address
+ * parameter.
+ */
+static inline void kvm_flush_remote_tlbs(struct kvm *kvm)
+{
+ kvm_call_hyp(__kvm_tlb_flush_vmid, kvm);
+}
+
static inline int kvm_arch_dev_ioctl_check_extension(long ext)
{
return 0;
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 466bd29..a099df4 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -23,6 +23,7 @@ config KVM
select HAVE_KVM_CPU_RELAX_INTERCEPT
select KVM_MMIO
select KVM_ARM_HOST
+ select HAVE_KVM_ARCH_TLB_FLUSH_ALL
depends on ARM_VIRT_EXT && ARM_LPAE
---help---
Support hosting virtualized guest machines. You will also
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index 01dcb0e..79caf79 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -66,6 +66,17 @@ ENTRY(__kvm_tlb_flush_vmid_ipa)
bx lr
ENDPROC(__kvm_tlb_flush_vmid_ipa)

+/**
+ * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs
+ *
+ * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address
+ * parameter
+ */
+
+ENTRY(__kvm_tlb_flush_vmid)
+ b __kvm_tlb_flush_vmid_ipa
+ENDPROC(__kvm_tlb_flush_vmid)
+
/********************************************************************
* Flush TLBs and instruction caches of all CPUs inside the inner-shareable
* domain, for all VMIDs

--
1.7.9.5

Mario Smarduch

2014-10-22 22:34:09 UTC

Permalink

Patch adds support for initial write protection of VM memlsot. This patch
series assumes that huge PUDs will not be used in 2nd stage tables, which is
awlays valid on ARMv7.

Signed-off-by: Mario Smarduch <***@samsung.com>
---
arch/arm/include/asm/kvm_host.h | 2 +
arch/arm/include/asm/kvm_mmu.h | 20 +++++
arch/arm/include/asm/pgtable-3level.h | 1 +
arch/arm/kvm/mmu.c | 140 +++++++++++++++++++++++++++++++++
4 files changed, 163 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index fad598c..72510ca 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -245,4 +245,6 @@ static inline void vgic_arch_setup(const struct vgic_params *vgic)
int kvm_perf_init(void);
int kvm_perf_teardown(void);

+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
+
#endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 5cc0b0f..08ab5e8 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -114,6 +114,26 @@ static inline void kvm_set_s2pmd_writable(pmd_t *pmd)
pmd_val(*pmd) |= L_PMD_S2_RDWR;
}

+static inline void kvm_set_s2pte_readonly(pte_t *pte)
+{
+ pte_val(*pte) = (pte_val(*pte) & ~L_PTE_S2_RDWR) | L_PTE_S2_RDONLY;
+}
+
+static inline bool kvm_s2pte_readonly(pte_t *pte)
+{
+ return (pte_val(*pte) & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY;
+}
+
+static inline void kvm_set_s2pmd_readonly(pmd_t *pmd)
+{
+ pmd_val(*pmd) = (pmd_val(*pmd) & ~L_PMD_S2_RDWR) | L_PMD_S2_RDONLY;
+}
+
+static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
+{
+ return (pmd_val(*pmd) & L_PMD_S2_RDWR) == L_PMD_S2_RDONLY;
+}
+
/* Open coded p*d_addr_end that can deal with 64bit addresses */
#define kvm_pgd_addr_end(addr, end) \
({ u64 __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK; \
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index 06e0bc0..d29c880 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -130,6 +130,7 @@
#define L_PTE_S2_RDONLY (_AT(pteval_t, 1) << 6) /* HAP[1] */
#define L_PTE_S2_RDWR (_AT(pteval_t, 3) << 6) /* HAP[2:1] */

+#define L_PMD_S2_RDONLY (_AT(pmdval_t, 1) << 6) /* HAP[1] */
#define L_PMD_S2_RDWR (_AT(pmdval_t, 3) << 6) /* HAP[2:1] */

/*
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 16e7994..3b86522 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -45,6 +45,7 @@ static phys_addr_t hyp_idmap_vector;
#define pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))

#define kvm_pmd_huge(_x) (pmd_huge(_x) || pmd_trans_huge(_x))
+#define kvm_pud_huge(_x) pud_huge(_x)

static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
{
@@ -746,6 +747,133 @@ static bool transparent_hugepage_adjust(pfn_t *pfnp, phys_addr_t *ipap)
return false;
}

+#ifdef CONFIG_ARM
+/**
+ * stage2_wp_ptes - write protect PMD range
+ * @pmd: pointer to pmd entry
+ * @addr: range start address
+ * @end: range end address
+ */
+static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
+{
+ pte_t *pte;
+
+ pte = pte_offset_kernel(pmd, addr);
+ do {
+ if (!pte_none(*pte)) {
+ if (!kvm_s2pte_readonly(pte))
+ kvm_set_s2pte_readonly(pte);
+ }
+ } while (pte++, addr += PAGE_SIZE, addr != end);
+}
+
+/**
+ * stage2_wp_pmds - write protect PUD range
+ * @pud: pointer to pud entry
+ * @addr: range start address
+ * @end: range end address
+ */
+static void stage2_wp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end)
+{
+ pmd_t *pmd;
+ phys_addr_t next;
+
+ pmd = pmd_offset(pud, addr);
+
+ do {
+ next = kvm_pmd_addr_end(addr, end);
+ if (!pmd_none(*pmd)) {
+ if (kvm_pmd_huge(*pmd)) {
+ if (!kvm_s2pmd_readonly(pmd))
+ kvm_set_s2pmd_readonly(pmd);
+ } else {
+ stage2_wp_ptes(pmd, addr, next);
+ }
+ }
+ } while (pmd++, addr = next, addr != end);
+}
+
+/**
+ * stage2_wp_puds - write protect PGD range
+ * @kvm: pointer to kvm structure
+ * @pud: pointer to pgd entry
+ * @addr: range start address
+ * @end: range end address
+ *
+ * While walking the PUD range huge PUD pages are ignored.
+ */
+static void stage2_wp_puds(struct kvm *kvm, pgd_t *pgd,
+ phys_addr_t addr, phys_addr_t end)
+{
+ pud_t *pud;
+ phys_addr_t next;
+
+ pud = pud_offset(pgd, addr);
+ do {
+ next = kvm_pud_addr_end(addr, end);
+ if (!pud_none(*pud)) {
+ /* TODO:PUD not supported, revisit later if supported */
+ BUG_ON(kvm_pud_huge(*pud));
+ stage2_wp_pmds(pud, addr, next);
+ }
+ } while (pud++, addr = next, addr != end);
+}
+
+/**
+ * stage2_wp_range() - write protect stage2 memory region range
+ * @kvm: The KVM pointer
+ * @addr: Start address of range
+ * @end: End address of range
+ */
+static void stage2_wp_range(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
+{
+ pgd_t *pgd;
+ phys_addr_t next;
+
+ pgd = kvm->arch.pgd + pgd_index(addr);
+ do {
+ /*
+ * Release kvm_mmu_lock periodically if the memory region is
+ * large. Otherwise, we may see kernel panics with
+ * CONFIG_DETECT_HUNG_TASK, CONFIG_LOCK_DETECTOR,
+ * CONFIG_LOCK_DEP. Additionally, holding the lock too long
+ * will also starve other vCPUs.
+ */
+ if (need_resched() || spin_needbreak(&kvm->mmu_lock))
+ cond_resched_lock(&kvm->mmu_lock);
+
+ next = kvm_pgd_addr_end(addr, end);
+ if (pgd_present(*pgd))
+ stage2_wp_puds(kvm, pgd, addr, next);
+ } while (pgd++, addr = next, addr != end);
+}
+
+/**
+ * kvm_mmu_wp_memory_region() - write protect stage 2 entries for memory slot
+ * @kvm: The KVM pointer
+ * @slot: The memory slot to write protect
+ *
+ * Called to start logging dirty pages after memory region
+ * KVM_MEM_LOG_DIRTY_PAGES operation is called. After this function returns
+ * all present PMD and PTEs are write protected in the memory region.
+ * Afterwards read of dirty page log can be called.
+ *
+ * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
+ * serializing operations for VM memory regions.
+ */
+void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
+{
+ struct kvm_memory_slot *memslot = id_to_memslot(kvm->memslots, slot);
+ phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+ phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+
+ spin_lock(&kvm->mmu_lock);
+ stage2_wp_range(kvm, start, end);
+ spin_unlock(&kvm->mmu_lock);
+ kvm_flush_remote_tlbs(kvm);
+}
+#endif
+
static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
struct kvm_memory_slot *memslot,
unsigned long fault_status)
@@ -1129,6 +1257,18 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
unmap_stage2_range(kvm, gpa, size);
spin_unlock(&kvm->mmu_lock);
}
+
+#ifdef CONFIG_ARM
+ /*
+ * At this point memslot has been committed and there is an
+ * allocated dirty_bitmap[], dirty pages will be be tracked while the
+ * memory slot is write protected.
+ */
+ if ((change != KVM_MR_DELETE) && change != KVM_MR_MOVE &&
+ (mem->flags & KVM_MEM_LOG_DIRTY_PAGES))
+ kvm_mmu_wp_memory_region(kvm, mem->slot);
+#endif
+
}

int kvm_arch_prepare_memory_region(struct kvm *kvm,

--
1.7.9.5