[PATCH v2 0/3] fix stuck in accessing hwrng attributes

Discussion:

[PATCH v2 0/3] fix stuck in accessing hwrng attributes

Amos Kong

2014-09-15 16:02:26 UTC

If we read hwrng by long-running dd process, it takes too much cpu
time and almost hold the mutex lock. When we check hwrng attributes
from sysfs by cat, it gets stuck in waiting the lock releaseing.
The problem can only be reproduced with non-smp guest with slow backend.

This patchset resolves the issue by changing rng_dev_read() to always
schedule 10 jiffies after release mutex lock, then cat process can
have chance to get the lock and execute protected code without stuck.

Thanks.

V2: update commitlog to describe PATCH 2, split second patch.

Amos Kong (3):
virtio-rng cleanup: move some code out of mutex protection
hw_random: fix stuck in catting hwrng attributes
hw_random: increase schedule timeout in rng_dev_read()

drivers/char/hw_random/core.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

--
1.9.3

Amos Kong

2014-09-15 16:02:27 UTC

It doesn't save too much cpu time as expected, just a cleanup.

Signed-off-by: Amos Kong <***@redhat.com>
---
drivers/char/hw_random/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25..c591d7e 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
return -ERESTARTSYS;
if (current_rng)
name = current_rng->name;
- ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
mutex_unlock(&rng_mutex);
+ ret = snprintf(buf, PAGE_SIZE, "%s\n", name);

return ret;
}
@@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev,
ssize_t ret = 0;
struct hwrng *rng;

+ buf[0] = '\0';
err = mutex_lock_interruptible(&rng_mutex);
if (err)
return -ERESTARTSYS;
- buf[0] = '\0';
list_for_each_entry(rng, &rng_list, list) {
strncat(buf, rng->name, PAGE_SIZE - ret - 1);
ret += strlen(rng->name);
strncat(buf, " ", PAGE_SIZE - ret - 1);
ret++;
}
+ mutex_unlock(&rng_mutex);
strncat(buf, "\n", PAGE_SIZE - ret - 1);
ret++;
- mutex_unlock(&rng_mutex);

return ret;
}

--
1.9.3

Michael Büsch

2014-09-15 16:13:20 UTC

On Tue, 16 Sep 2014 00:02:27 +0800

Post by Amos Kong
It doesn't save too much cpu time as expected, just a cleanup.
---
drivers/char/hw_random/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25..c591d7e 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
return -ERESTARTSYS;
if (current_rng)
name = current_rng->name;
- ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
mutex_unlock(&rng_mutex);
+ ret = snprintf(buf, PAGE_SIZE, "%s\n", name);

I'm not sure this is safe.
Name is just a pointer.
What if the hwrng gets unregistered after unlock and just before the snprintf?

Post by Amos Kong
return ret;
}
@@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev,
ssize_t ret = 0;
struct hwrng *rng;
+ buf[0] = '\0';
err = mutex_lock_interruptible(&rng_mutex);
if (err)
return -ERESTARTSYS;
- buf[0] = '\0';
list_for_each_entry(rng, &rng_list, list) {
strncat(buf, rng->name, PAGE_SIZE - ret - 1);
ret += strlen(rng->name);
strncat(buf, " ", PAGE_SIZE - ret - 1);
ret++;
}
+ mutex_unlock(&rng_mutex);
strncat(buf, "\n", PAGE_SIZE - ret - 1);
ret++;
- mutex_unlock(&rng_mutex);
return ret;
}

This looks ok.

--
Michael

Amos Kong

2014-09-16 00:30:31 UTC

Post by Michael BÃ¼sch
On Tue, 16 Sep 2014 00:02:27 +0800

Post by Amos Kong
It doesn't save too much cpu time as expected, just a cleanup.
---
drivers/char/hw_random/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25..c591d7e 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -270,8 +270,8 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
return -ERESTARTSYS;
if (current_rng)
name = current_rng->name;
- ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
mutex_unlock(&rng_mutex);
+ ret = snprintf(buf, PAGE_SIZE, "%s\n", name);

I'm not sure this is safe.
Name is just a pointer.
What if the hwrng gets unregistered after unlock and just before the snprintf?

Oh, it points to protected current_rng->name, I will drop this
cleanup. Thanks.

Post by Michael BÃ¼sch

Post by Amos Kong
return ret;
}
@@ -284,19 +284,19 @@ static ssize_t hwrng_attr_available_show(struct device *dev,
ssize_t ret = 0;
struct hwrng *rng;
+ buf[0] = '\0';
err = mutex_lock_interruptible(&rng_mutex);
if (err)
return -ERESTARTSYS;
- buf[0] = '\0';
list_for_each_entry(rng, &rng_list, list) {
strncat(buf, rng->name, PAGE_SIZE - ret - 1);
ret += strlen(rng->name);
strncat(buf, " ", PAGE_SIZE - ret - 1);
ret++;
}
+ mutex_unlock(&rng_mutex);
strncat(buf, "\n", PAGE_SIZE - ret - 1);
ret++;
- mutex_unlock(&rng_mutex);
return ret;
}

This looks ok.
--
Michael

--
Amos.

Amos Kong

2014-09-15 16:02:28 UTC

I started a QEMU (non-smp) guest with one virtio-rng device, and read
random data from /dev/hwrng by dd:

# dd if=/dev/hwrng of=/dev/null &

In the same time, if I check hwrng attributes from sysfs by cat:

# cat /sys/class/misc/hw_random/rng_*

The cat process always gets stuck with slow backend (5 k/s), if we
use a quick backend (1.2 M/s), the cat process will cost 1 to 2
minutes. The stuck doesn't exist for smp guest.

Reading syscall enters kernel and call rng_dev_read(), it's user
context. We used need_resched() to check if other tasks need to
be run, but it almost always return false, and re-hold the mutex
lock. The attributes accessing process always fails to hold the
lock, so the cat gets stuck.

User context doesn't allow other user contexts run on that CPU,
unless the kernel code sleeps for some reason. This is why the
need_reshed() always return false here.

This patch removed need_resched() and always schedule other tasks
then other tasks can have chance to hold the lock and execute
protected code.

Signed-off-by: Amos Kong <***@redhat.com>
---
drivers/char/hw_random/core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index c591d7e..263a370 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,

mutex_unlock(&rng_mutex);

- if (need_resched())
- schedule_timeout_interruptible(1);
+ schedule_timeout_interruptible(1);

if (signal_pending(current)) {
err = -ERESTARTSYS;

--
1.9.3

Rusty Russell

2014-09-18 02:43:08 UTC

Post by Amos Kong
I started a QEMU (non-smp) guest with one virtio-rng device, and read
# dd if=/dev/hwrng of=/dev/null &
# cat /sys/class/misc/hw_random/rng_*
The cat process always gets stuck with slow backend (5 k/s), if we
use a quick backend (1.2 M/s), the cat process will cost 1 to 2
minutes. The stuck doesn't exist for smp guest.
Reading syscall enters kernel and call rng_dev_read(), it's user
context. We used need_resched() to check if other tasks need to
be run, but it almost always return false, and re-hold the mutex
lock. The attributes accessing process always fails to hold the
lock, so the cat gets stuck.
User context doesn't allow other user contexts run on that CPU,
unless the kernel code sleeps for some reason. This is why the
need_reshed() always return false here.
This patch removed need_resched() and always schedule other tasks
then other tasks can have chance to hold the lock and execute
protected code.

OK, this is going to be a rant.

Your explanation doesn't make sense at all. Worse, your solution breaks
the advice of Kernighan & Plaugher: "Don't patch bad code - rewrite
it.".

But worst of all, this detailed explanation might have convinced me you
understood the problem better than I did, and applied your patch.

I did some tests. For me, as expected, the process spends its time
inside the virtio rng read function, holding the mutex and thus blocking
sysfs access; it's not a failure of this code at all.

Your schedule_timeout() "fix" probably just helps by letting the host
refresh entropy, so we spend less time waiting in the read fn.

I will post a series, which unfortunately is only lightly tested, then
I'm going to have some beer to begin my holiday. That may help me
forget my disappointment at seeing respected fellow developers
monkey-patching random code they don't understand.

Grrr....
Rusty.

Post by Amos Kong
---
drivers/char/hw_random/core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index c591d7e..263a370 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
mutex_unlock(&rng_mutex);
- if (need_resched())
- schedule_timeout_interruptible(1);
+ schedule_timeout_interruptible(1);
if (signal_pending(current)) {
err = -ERESTARTSYS;
--
1.9.3

Rusty Russell

2014-09-18 02:48:24 UTC

The previous patch added one potential problem: we can still be
reading from a hwrng when it's unregistered. Add a wait for zero
in the hwrng_unregister path.

Signed-off-by: Rusty Russell <***@rustcorp.com.au>
---
drivers/char/hw_random/core.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index dc9092a1075d..b4a21e9521cf 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex);
static DEFINE_MUTEX(reading_mutex);
static int data_avail;
static u8 *rng_buffer, *rng_fillbuf;
+static DECLARE_WAIT_QUEUE_HEAD(rng_done);
static unsigned short current_quality;
static unsigned short default_quality; /* = 0; default to "off" */

@@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref)

if (rng->cleanup)
rng->cleanup(rng);
+ wake_up_all(&rng_done);
}

static void set_current_rng(struct hwrng *rng)
@@ -529,6 +531,9 @@ void hwrng_unregister(struct hwrng *rng)
}

mutex_unlock(&rng_mutex);
+
+ /* Just in case rng is reading right now, wait. */
+ wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0);
}
EXPORT_SYMBOL_GPL(hwrng_unregister);

--
1.9.1

Herbert Xu

2014-10-21 14:15:23 UTC

Post by Rusty Russell
The previous patch added one potential problem: we can still be
reading from a hwrng when it's unregistered. Add a wait for zero
in the hwrng_unregister path.
---
drivers/char/hw_random/core.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index dc9092a1075d..b4a21e9521cf 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -60,6 +60,7 @@ static DEFINE_MUTEX(rng_mutex);
static DEFINE_MUTEX(reading_mutex);
static int data_avail;
static u8 *rng_buffer, *rng_fillbuf;
+static DECLARE_WAIT_QUEUE_HEAD(rng_done);
static unsigned short current_quality;
static unsigned short default_quality; /* = 0; default to "off" */
@@ -98,6 +99,7 @@ static inline void cleanup_rng(struct kref *kref)
if (rng->cleanup)
rng->cleanup(rng);
+ wake_up_all(&rng_done);
}
static void set_current_rng(struct hwrng *rng)
@@ -529,6 +531,9 @@ void hwrng_unregister(struct hwrng *rng)
}
mutex_unlock(&rng_mutex);
+
+ /* Just in case rng is reading right now, wait. */
+ wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0);

While it's obviously better than what we have now, I don't believe
this is 100% safe as the cleanup function might still be running
even after the ref count hits zero. Once we return from this function
the module may be unloaded so we need to ensure that nothing is
running at this point.

Cheers,

--
Email: Herbert Xu <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Rusty Russell

2014-09-18 02:48:26 UTC

Another interesting anti-pattern.

Signed-off-by: Rusty Russell <***@rustcorp.com.au>
---
drivers/char/hw_random/core.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 6a34feca6b43..96fa06716e95 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -485,7 +485,6 @@ int hwrng_register(struct hwrng *rng)
goto out_unlock;
}
}
- INIT_LIST_HEAD(&rng->list);
list_add_tail(&rng->list, &rng_list);

if (old_rng && !rng->init) {

--
1.9.1

Rusty Russell

2014-09-18 02:48:23 UTC

current_rng holds one reference, and we bump it every time we want
to do a read from it.

This means we only hold the rng_mutex to grab or drop a reference,
so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
block on read of /dev/hwrng.

Using a kref is overkill (we're always under the rng_mutex), but
a standard pattern.

This also solves the problem that the hwrng_fillfn thread was
accessing current_rng without a lock, which could change (eg. to NULL)
underneath it.

Signed-off-by: Rusty Russell <***@rustcorp.com.au>
---
drivers/char/hw_random/core.c | 135 ++++++++++++++++++++++++++++--------------
include/linux/hw_random.h | 2 +
2 files changed, 94 insertions(+), 43 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index b1b6042ad85c..dc9092a1075d 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -42,6 +42,7 @@
#include <linux/delay.h>
#include <linux/slab.h>
#include <linux/random.h>
+#include <linux/err.h>
#include <asm/uaccess.h>

@@ -91,6 +92,59 @@ static void add_early_randomness(struct hwrng *rng)
add_device_randomness(bytes, bytes_read);
}

+static inline void cleanup_rng(struct kref *kref)
+{
+ struct hwrng *rng = container_of(kref, struct hwrng, ref);
+
+ if (rng->cleanup)
+ rng->cleanup(rng);
+}
+
+static void set_current_rng(struct hwrng *rng)
+{
+ BUG_ON(!mutex_is_locked(&rng_mutex));
+ kref_get(&rng->ref);
+ current_rng = rng;
+}
+
+static void drop_current_rng(void)
+{
+ BUG_ON(!mutex_is_locked(&rng_mutex));
+ if (!current_rng)
+ return;
+
+ kref_put(&current_rng->ref, cleanup_rng);
+ current_rng = NULL;
+}
+
+/* Returns ERR_PTR(), NULL or refcounted hwrng */
+static struct hwrng *get_current_rng(void)
+{
+ struct hwrng *rng;
+
+ if (mutex_lock_interruptible(&rng_mutex))
+ return ERR_PTR(-ERESTARTSYS);
+
+ rng = current_rng;
+ if (rng)
+ kref_get(&rng->ref);
+
+ mutex_unlock(&rng_mutex);
+ return rng;
+}
+
+static void put_rng(struct hwrng *rng)
+{
+ /*
+ * Hold rng_mutex here so we serialize in case they set_current_rng
+ * on rng again immediately.
+ */
+ mutex_lock(&rng_mutex);
+ if (rng)
+ kref_put(&rng->ref, cleanup_rng);
+ mutex_unlock(&rng_mutex);
+}
+
static inline int hwrng_init(struct hwrng *rng)
{
if (rng->init) {
@@ -113,12 +167,6 @@ static inline int hwrng_init(struct hwrng *rng)
return 0;
}

-static inline void hwrng_cleanup(struct hwrng *rng)
-{
- if (rng && rng->cleanup)
- rng->cleanup(rng);
-}
-
static int rng_dev_open(struct inode *inode, struct file *filp)
{
/* enforce read-only access to this chrdev */
@@ -154,21 +202,22 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
ssize_t ret = 0;
int err = 0;
int bytes_read, len;
+ struct hwrng *rng;

while (size) {
- if (mutex_lock_interruptible(&rng_mutex)) {
- err = -ERESTARTSYS;
+ rng = get_current_rng();
+ if (IS_ERR(rng)) {
+ err = PTR_ERR(rng);
goto out;
}
-
- if (!current_rng) {
+ if (!rng) {
err = -ENODEV;
- goto out_unlock;
+ goto out;
}

mutex_lock(&reading_mutex);
if (!data_avail) {
- bytes_read = rng_get_data(current_rng, rng_buffer,
+ bytes_read = rng_get_data(rng, rng_buffer,
rng_buffer_size(),
!(filp->f_flags & O_NONBLOCK));
if (bytes_read < 0) {
@@ -200,7 +249,6 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
ret += len;
}

- mutex_unlock(&rng_mutex);
mutex_unlock(&reading_mutex);

if (need_resched())
@@ -210,15 +258,16 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
err = -ERESTARTSYS;
goto out;
}
+
+ put_rng(rng);
}
out:
return ret ? : err;
-out_unlock:
- mutex_unlock(&rng_mutex);
- goto out;
+
out_unlock_reading:
mutex_unlock(&reading_mutex);
- goto out_unlock;
+ put_rng(rng);
+ goto out;
}

@@ -257,8 +306,8 @@ static ssize_t hwrng_attr_current_store(struct device *dev,
err = hwrng_init(rng);
if (err)
break;
- hwrng_cleanup(current_rng);
- current_rng = rng;
+ drop_current_rng();
+ set_current_rng(rng);
err = 0;
break;
}
@@ -272,17 +321,15 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
- int err;
ssize_t ret;
- const char *name = "none";
+ struct hwrng *rng;

- err = mutex_lock_interruptible(&rng_mutex);
- if (err)
- return -ERESTARTSYS;
- if (current_rng)
- name = current_rng->name;
- ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
- mutex_unlock(&rng_mutex);
+ rng = get_current_rng();
+ if (IS_ERR(rng))
+ return PTR_ERR(rng);
+
+ ret = snprintf(buf, PAGE_SIZE, "%s\n", rng ? rng->name : "none");
+ put_rng(rng);

return ret;
}
@@ -357,12 +404,16 @@ static int hwrng_fillfn(void *unused)
long rc;

while (!kthread_should_stop()) {
- if (!current_rng)
+ struct hwrng *rng;
+
+ rng = get_current_rng();
+ if (IS_ERR(rng) || !rng)
break;
mutex_lock(&reading_mutex);
- rc = rng_get_data(current_rng, rng_fillbuf,
+ rc = rng_get_data(rng, rng_fillbuf,
rng_buffer_size(), 1);
mutex_unlock(&reading_mutex);
+ put_rng(rng);
if (rc <= 0) {
pr_warn("hwrng: no data available\n");
msleep_interruptible(10000);
@@ -423,14 +474,13 @@ int hwrng_register(struct hwrng *rng)
err = hwrng_init(rng);
if (err)
goto out_unlock;
- current_rng = rng;
+ set_current_rng(rng);
}
err = 0;
if (!old_rng) {
err = register_miscdev();
if (err) {
- hwrng_cleanup(rng);
- current_rng = NULL;
+ drop_current_rng();
goto out_unlock;
}
}
@@ -457,22 +507,21 @@ EXPORT_SYMBOL_GPL(hwrng_register);

void hwrng_unregister(struct hwrng *rng)
{
- int err;
-
mutex_lock(&rng_mutex);

list_del(&rng->list);
if (current_rng == rng) {
- hwrng_cleanup(rng);
- if (list_empty(&rng_list)) {
- current_rng = NULL;
- } else {
- current_rng = list_entry(rng_list.prev, struct hwrng, list);
- err = hwrng_init(current_rng);
- if (err)
- current_rng = NULL;
+ drop_current_rng();
+ if (!list_empty(&rng_list)) {
+ struct hwrng *tail;
+
+ tail = list_entry(rng_list.prev, struct hwrng, list);
+
+ if (hwrng_init(tail) == 0)
+ set_current_rng(tail);
}
}
+
if (list_empty(&rng_list)) {
unregister_miscdev();
if (hwrng_fill)
diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h
index 914bb08cd738..c212e71ea886 100644
--- a/include/linux/hw_random.h
+++ b/include/linux/hw_random.h
@@ -14,6 +14,7 @@

#include <linux/types.h>
#include <linux/list.h>
+#include <linux/kref.h>

/**
* struct hwrng - Hardware Random Number Generator driver
@@ -44,6 +45,7 @@ struct hwrng {

/* internal. */
struct list_head list;
+ struct kref ref;
};

/** Register a new Hardware Random Number Generator driver. */

--
1.9.1

Amos Kong

2014-09-18 12:22:26 UTC

Post by Rusty Russell
current_rng holds one reference, and we bump it every time we want
to do a read from it.
This means we only hold the rng_mutex to grab or drop a reference,
so accessing /sys/devices/virtual/misc/hw_random/rng_current doesn't
block on read of /dev/hwrng.
Using a kref is overkill (we're always under the rng_mutex), but
a standard pattern.
This also solves the problem that the hwrng_fillfn thread was
accessing current_rng without a lock, which could change (eg. to NULL)
underneath it.

Hi Rusty,

Post by Rusty Russell
---
drivers/char/hw_random/core.c | 135 ++++++++++++++++++++++++++++--------------
include/linux/hw_random.h | 2 +
2 files changed, 94 insertions(+), 43 deletions(-)

...

Post by Rusty Russell
static int rng_dev_open(struct inode *inode, struct file *filp)
{
/* enforce read-only access to this chrdev */
@@ -154,21 +202,22 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
ssize_t ret = 0;
int err = 0;
int bytes_read, len;
+ struct hwrng *rng;
while (size) {
- if (mutex_lock_interruptible(&rng_mutex)) {
- err = -ERESTARTSYS;
+ rng = get_current_rng();
+ if (IS_ERR(rng)) {
+ err = PTR_ERR(rng);
goto out;
}
-
- if (!current_rng) {
+ if (!rng) {
err = -ENODEV;
- goto out_unlock;
+ goto out;
}
mutex_lock(&reading_mutex);
if (!data_avail) {
- bytes_read = rng_get_data(current_rng, rng_buffer,
+ bytes_read = rng_get_data(rng, rng_buffer,
rng_buffer_size(),
!(filp->f_flags & O_NONBLOCK));
if (bytes_read < 0) {
@@ -200,7 +249,6 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
ret += len;
}
- mutex_unlock(&rng_mutex);
mutex_unlock(&reading_mutex);
if (need_resched())
@@ -210,15 +258,16 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
err = -ERESTARTSYS;

We need put_rng() in this error path. Otherwise, unhotplug will hang
in the end of hwrng_unregister()

| /* Just in case rng is reading right now, wait. */
| wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0);

Steps to reproduce the hang:
guest) # dd if=/dev/hwrng of=/dev/null
cancel dd process after 10 seconds
guest) # dd if=/dev/hwrng of=/dev/null &
hotunplug rng device from qemu monitor
result: device can't be removed (still can find in QEMU monitor)

diff --git a/drivers/char/hw_random/core.c
b/drivers/char/hw_random/core.c
index 96fa067..4e22d70 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -258,6 +258,7 @@ static ssize_t rng_dev_read(struct file *filp,
char __user *buf,

if (signal_pending(current)) {
err = -ERESTARTSYS;
+ put_rng(rng);
goto out;
}

Post by Rusty Russell
goto out;
}
+
+ put_rng(rng);
}
return ret ? : err;
- mutex_unlock(&rng_mutex);
- goto out;
+
mutex_unlock(&reading_mutex);
- goto out_unlock;
+ put_rng(rng);
+ goto out;
}
@@ -257,8 +306,8 @@ static ssize_t hwrng_attr_current_store(struct device *dev,
err = hwrng_init(rng);
if (err)
break;
- hwrng_cleanup(current_rng);
- current_rng = rng;
+ drop_current_rng();
+ set_current_rng(rng);
err = 0;
break;
}
@@ -272,17 +321,15 @@ static ssize_t hwrng_attr_current_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
- int err;
ssize_t ret;
- const char *name = "none";
+ struct hwrng *rng;
- err = mutex_lock_interruptible(&rng_mutex);
- if (err)
- return -ERESTARTSYS;
- if (current_rng)
- name = current_rng->name;
- ret = snprintf(buf, PAGE_SIZE, "%s\n", name);
- mutex_unlock(&rng_mutex);
+ rng = get_current_rng();
+ if (IS_ERR(rng))
+ return PTR_ERR(rng);
+
+ ret = snprintf(buf, PAGE_SIZE, "%s\n", rng ? rng->name : "none");
+ put_rng(rng);
return ret;
}
@@ -357,12 +404,16 @@ static int hwrng_fillfn(void *unused)
long rc;
while (!kthread_should_stop()) {
- if (!current_rng)
+ struct hwrng *rng;
+
+ rng = get_current_rng();
+ if (IS_ERR(rng) || !rng)
break;
mutex_lock(&reading_mutex);
- rc = rng_get_data(current_rng, rng_fillbuf,
+ rc = rng_get_data(rng, rng_fillbuf,
rng_buffer_size(), 1);
mutex_unlock(&reading_mutex);
+ put_rng(rng);

^^^
This put_rng() called a deadlock. I will describe in the bottom.

Post by Rusty Russell
if (rc <= 0) {
pr_warn("hwrng: no data available\n");
msleep_interruptible(10000);
@@ -423,14 +474,13 @@ int hwrng_register(struct hwrng *rng)
err = hwrng_init(rng);
if (err)
goto out_unlock;
- current_rng = rng;
+ set_current_rng(rng);
}
err = 0;
if (!old_rng) {
err = register_miscdev();
if (err) {
- hwrng_cleanup(rng);
- current_rng = NULL;
+ drop_current_rng();
goto out_unlock;
}
}
@@ -457,22 +507,21 @@ EXPORT_SYMBOL_GPL(hwrng_register);
void hwrng_unregister(struct hwrng *rng)
{
- int err;
-
mutex_lock(&rng_mutex);
list_del(&rng->list);
if (current_rng == rng) {
- hwrng_cleanup(rng);
- if (list_empty(&rng_list)) {
- current_rng = NULL;
- } else {
- current_rng = list_entry(rng_list.prev, struct hwrng, list);
- err = hwrng_init(current_rng);
- if (err)
- current_rng = NULL;
+ drop_current_rng();
+ if (!list_empty(&rng_list)) {
+ struct hwrng *tail;
+
+ tail = list_entry(rng_list.prev, struct hwrng, list);
+
+ if (hwrng_init(tail) == 0)
+ set_current_rng(tail);
}
}
+
if (list_empty(&rng_list)) {
unregister_miscdev();
if (hwrng_fill)

hwrng_unregister() and put_rng() grab the lock, if hwrng_unregister()
takes the lock, hwrng_fillfn() will stay at put_rng() to wait the
lock.

Right now, thread_stop() is insider lock protection, but we try to
wake up the fillfn thread and wait for its completion.

| wake_up_process(k);
| wait_for_completion(&kthread->exited);

The solution is moving kthread_stop() outsider of lock protection.

@@ -524,11 +525,11 @@ void hwrng_unregister(struct hwrng *rng)

if (list_empty(&rng_list)) {
unregister_miscdev();
+ mutex_unlock(&rng_mutex);
if (hwrng_fill)
kthread_stop(hwrng_fill);
- }
-
- mutex_unlock(&rng_mutex);
+ } else
+ mutex_unlock(&rng_mutex);

/* Just in case rng is reading right now, wait. */
wait_event(rng_done, atomic_read(&rng->ref.refcount) == 0);

================
After applied my additional two fixes, both cating hung and hotunplug
issues were resolved.

| test 0:
| hotunplug rng device from qemu monitor
|
| test 1:
| guest) # dd if=/dev/hwrng of=/dev/null &
| hotunplug rng device from qemu monitor
|
| test 2:
| guest) # dd if=/dev/random of=/dev/null &
| hotunplug rng device from qemu monitor
|
| test 4:
| guest) # dd if=/dev/hwrng of=/dev/null &
| cat /sys/devices/virtual/misc/hw_random/rng_*
|
| test 5:
| guest) # dd if=/dev/hwrng of=/dev/null
| cancel dd process after 10 seconds
| guest) # dd if=/dev/hwrng of=/dev/null &
| hotunplug rng device from qemu monitor
|
| test 6:
| use a fifo as rng backend, execute test 0 ~ 5 with no input of fifo

Test are all passed :-)

I know you are going or you already started your holiday, I will post
a v2 with my additional patches.

Thanks, Amos

Post by Rusty Russell
diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h
index 914bb08cd738..c212e71ea886 100644
--- a/include/linux/hw_random.h
+++ b/include/linux/hw_random.h
@@ -14,6 +14,7 @@
#include <linux/types.h>
#include <linux/list.h>
+#include <linux/kref.h>
/**
* struct hwrng - Hardware Random Number Generator driver
@@ -44,6 +45,7 @@ struct hwrng {
/* internal. */
struct list_head list;
+ struct kref ref;
};
/** Register a new Hardware Random Number Generator driver. */
--
1.9.1

Rusty Russell

2014-09-18 02:48:25 UTC

Interesting anti-pattern.

Signed-off-by: Rusty Russell <***@rustcorp.com.au>
---
drivers/char/hw_random/core.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index b4a21e9521cf..6a34feca6b43 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -472,14 +472,13 @@ int hwrng_register(struct hwrng *rng)
}

old_rng = current_rng;
+ err = 0;
if (!old_rng) {
err = hwrng_init(rng);
if (err)
goto out_unlock;
set_current_rng(rng);
- }
- err = 0;
- if (!old_rng) {
+
err = register_miscdev();
if (err) {
drop_current_rng();

--
1.9.1

Rusty Russell

2014-09-18 02:48:22 UTC

There's currently a big lock around everything, and it means that we
can't query sysfs (eg /sys/devices/virtual/misc/hw_random/rng_current)
while the rng is reading. This is a real problem when the rng is slow,
or blocked (eg. virtio_rng with qemu's default /dev/random backend)

This doesn't help (it leaves the current lock untouched), just adds a
lock to protect the read function and the static buffers, in preparation
for transition.

Signed-off-by: Rusty Russell <***@rustcorp.com.au>
---
drivers/char/hw_random/core.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index aa30a25c8d49..b1b6042ad85c 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -53,7 +53,10 @@
static struct hwrng *current_rng;
static struct task_struct *hwrng_fill;
static LIST_HEAD(rng_list);
+/* Protects rng_list and current_rng */
static DEFINE_MUTEX(rng_mutex);
+/* Protects rng read functions, data_avail, rng_buffer and rng_fillbuf */
+static DEFINE_MUTEX(reading_mutex);
static int data_avail;
static u8 *rng_buffer, *rng_fillbuf;
static unsigned short current_quality;
@@ -81,7 +84,9 @@ static void add_early_randomness(struct hwrng *rng)
unsigned char bytes[16];
int bytes_read;

+ mutex_lock(&reading_mutex);
bytes_read = rng_get_data(rng, bytes, sizeof(bytes), 1);
+ mutex_unlock(&reading_mutex);
if (bytes_read > 0)
add_device_randomness(bytes, bytes_read);
}
@@ -128,6 +133,7 @@ static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size,
int wait) {
int present;

+ BUG_ON(!mutex_is_locked(&reading_mutex));
if (rng->read)
return rng->read(rng, (void *)buffer, size, wait);

@@ -160,13 +166,14 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
goto out_unlock;
}

+ mutex_lock(&reading_mutex);
if (!data_avail) {
bytes_read = rng_get_data(current_rng, rng_buffer,
rng_buffer_size(),
!(filp->f_flags & O_NONBLOCK));
if (bytes_read < 0) {
err = bytes_read;
- goto out_unlock;
+ goto out_unlock_reading;
}
data_avail = bytes_read;
}
@@ -174,7 +181,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
if (!data_avail) {
if (filp->f_flags & O_NONBLOCK) {
err = -EAGAIN;
- goto out_unlock;
+ goto out_unlock_reading;
}
} else {
len = data_avail;
@@ -186,7 +193,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
if (copy_to_user(buf + ret, rng_buffer + data_avail,
len)) {
err = -EFAULT;
- goto out_unlock;
+ goto out_unlock_reading;
}

size -= len;
@@ -194,6 +201,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
}

mutex_unlock(&rng_mutex);
+ mutex_unlock(&reading_mutex);

if (need_resched())
schedule_timeout_interruptible(1);
@@ -208,6 +216,9 @@ out:
out_unlock:
mutex_unlock(&rng_mutex);
goto out;
+out_unlock_reading:
+ mutex_unlock(&reading_mutex);
+ goto out_unlock;
}

@@ -348,13 +359,16 @@ static int hwrng_fillfn(void *unused)
while (!kthread_should_stop()) {
if (!current_rng)
break;
+ mutex_lock(&reading_mutex);
rc = rng_get_data(current_rng, rng_fillbuf,
rng_buffer_size(), 1);
+ mutex_unlock(&reading_mutex);
if (rc <= 0) {
pr_warn("hwrng: no data available\n");
msleep_interruptible(10000);
continue;
}
+ /* Outside lock, sure, but y'know: randomness. */
add_hwgenerator_randomness((void *)rng_fillbuf, rc,
rc * current_quality * 8 >> 10);
}

--
1.9.1

Amos Kong

2014-09-18 12:47:45 UTC

Post by Rusty Russell

Post by Amos Kong
I started a QEMU (non-smp) guest with one virtio-rng device, and read
# dd if=/dev/hwrng of=/dev/null &
# cat /sys/class/misc/hw_random/rng_*
The cat process always gets stuck with slow backend (5 k/s), if we
use a quick backend (1.2 M/s), the cat process will cost 1 to 2
minutes. The stuck doesn't exist for smp guest.
Reading syscall enters kernel and call rng_dev_read(), it's user
context. We used need_resched() to check if other tasks need to
be run, but it almost always return false, and re-hold the mutex
lock. The attributes accessing process always fails to hold the
lock, so the cat gets stuck.
User context doesn't allow other user contexts run on that CPU,
unless the kernel code sleeps for some reason. This is why the
need_reshed() always return false here.
This patch removed need_resched() and always schedule other tasks
then other tasks can have chance to hold the lock and execute
protected code.

Hi Rusty,

Post by Rusty Russell
OK, this is going to be a rant.
Your explanation doesn't make sense at all. Worse, your solution breaks
the advice of Kernighan & Plaugher: "Don't patch bad code - rewrite
it.".
But worst of all, this detailed explanation might have convinced me you
understood the problem better than I did, and applied your patch.

I'm sorry about the misleading.

Post by Rusty Russell
I did some tests. For me, as expected, the process spends its time
inside the virtio rng read function, holding the mutex and thus blocking
sysfs access; it's not a failure of this code at all.

Got it now.

The catting hang bug was found when I try to fix unhotplug issue, the
unhotplug issue can't be reproduced if I try to debug by gdb or
printk. So I forgot to debug cat hang ... but spend time to misunderstand
schedle code :(

Post by Rusty Russell
Your schedule_timeout() "fix" probably just helps by letting the host
refresh entropy, so we spend less time waiting in the read fn.
I will post a series, which unfortunately is only lightly tested, then
I'm going to have some beer to begin my holiday. That may help me
forget my disappointment at seeing respected fellow developers
monkey-patching random code they don't understand.

I just posted a V2 with two additional fixes, hotunplugging works well now :)

Post by Rusty Russell
Grrr....

Enjoy your holiday!
Amos

Post by Rusty Russell
Rusty.

Post by Amos Kong
---
drivers/char/hw_random/core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index c591d7e..263a370 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -195,8 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
mutex_unlock(&rng_mutex);
- if (need_resched())
- schedule_timeout_interruptible(1);
+ schedule_timeout_interruptible(1);
if (signal_pending(current)) {
err = -ERESTARTSYS;
--
1.9.3

--
Amos.

Amos Kong

2014-09-15 16:02:29 UTC

This patch increases the schedule timeout to 10 jiffies, it's more
appropriate, then other takes can easy to hold the mutex lock.

Signed-off-by: Amos Kong <***@redhat.com>
---
drivers/char/hw_random/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 263a370..b5d1b6f 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -195,7 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,

mutex_unlock(&rng_mutex);

- schedule_timeout_interruptible(1);
+ schedule_timeout_interruptible(10);

if (signal_pending(current)) {
err = -ERESTARTSYS;

--
1.9.3

Michael Büsch

2014-09-15 16:13:31 UTC

On Tue, 16 Sep 2014 00:02:29 +0800

Post by Amos Kong
This patch increases the schedule timeout to 10 jiffies, it's more
appropriate, then other takes can easy to hold the mutex lock.
---
drivers/char/hw_random/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 263a370..b5d1b6f 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -195,7 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
mutex_unlock(&rng_mutex);
- schedule_timeout_interruptible(1);
+ schedule_timeout_interruptible(10);
if (signal_pending(current)) {
err = -ERESTARTSYS;

Does a schedule of 1 ms or 10 ms decrease the throughput?
I think we need some benchmarks.

--
Michael

Amos Kong

2014-09-16 00:27:40 UTC

Post by Michael BÃ¼sch
On Tue, 16 Sep 2014 00:02:29 +0800

Post by Amos Kong
This patch increases the schedule timeout to 10 jiffies, it's more
appropriate, then other takes can easy to hold the mutex lock.
---
drivers/char/hw_random/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 263a370..b5d1b6f 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -195,7 +195,7 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
mutex_unlock(&rng_mutex);
- schedule_timeout_interruptible(1);
+ schedule_timeout_interruptible(10);
if (signal_pending(current)) {
err = -ERESTARTSYS;

Does a schedule of 1 ms or 10 ms decrease the throughput?

In my test environment, 1 jiffe always works (100%), as suggested by
Amit 10 jiffes is more appropriate.

After applied current 3 patches, there is a throughput regression.

1.2 M/s -> 6 K/s

We can only schedule in the end of loop (size == 0), and only for
non-smp guest. So smp guest won't be effected.

| if (!size && num_online_cpus() == 1)
| schedule_timeout_interruptible(timeout);

Set timeout to 1:
non-smp guest with quick backend (1.2M/s) -> about 49K/s)

Set timeout to 10:
non-smp guest with quick backend (1.2M/s) -> about 490K/s)

We might need other benchmark to test the performance, but we can
see the bug clearly caused a regression.

As we discussed in other thread, need_resched() should work in this
case, so those patches might be wrong fixing.

Post by Michael BÃ¼sch
I think we need some benchmarks.
--
Michael

--
Amos.

Michael Büsch

2014-09-16 15:01:16 UTC

On Tue, 16 Sep 2014 08:27:40 +0800

Post by Amos Kong
non-smp guest with quick backend (1.2M/s) -> about 490K/s)

That sounds like an awful lot. This is a 60% loss in throughput.
I don't think we can live with that.

--
Michael

Herbert Xu

2014-09-17 09:30:04 UTC

Post by Amos Kong
If we read hwrng by long-running dd process, it takes too much cpu
time and almost hold the mutex lock. When we check hwrng attributes
from sysfs by cat, it gets stuck in waiting the lock releaseing.
The problem can only be reproduced with non-smp guest with slow backend.
This patchset resolves the issue by changing rng_dev_read() to always
schedule 10 jiffies after release mutex lock, then cat process can
have chance to get the lock and execute protected code without stuck.

Sorry I'm not going to accept your fix which simply papers over
the problem.

Please bite the bullet and convert this over to RCU.

Cheers,

--
Email: Herbert Xu <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

18 Replies
36 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

Amos Kong 2014-09-15 16:02:26 UTC

Amos Kong 2014-09-15 16:02:27 UTC

Michael Büsch 2014-09-15 16:13:20 UTC

Amos Kong 2014-09-16 00:30:31 UTC

Amos Kong 2014-09-15 16:02:28 UTC

Rusty Russell 2014-09-18 02:43:08 UTC

Rusty Russell 2014-09-18 02:48:24 UTC

Herbert Xu 2014-10-21 14:15:23 UTC

Rusty Russell 2014-09-18 02:48:26 UTC

Rusty Russell 2014-09-18 02:48:23 UTC

Amos Kong 2014-09-18 12:22:26 UTC

Rusty Russell 2014-09-18 02:48:25 UTC

Rusty Russell 2014-09-18 02:48:22 UTC

Amos Kong 2014-09-18 12:47:45 UTC

Amos Kong 2014-09-15 16:02:29 UTC

Michael Büsch 2014-09-15 16:13:31 UTC

Amos Kong 2014-09-16 00:27:40 UTC

Michael Büsch 2014-09-16 15:01:16 UTC

Herbert Xu 2014-09-17 09:30:04 UTC

about - legalese

Loading...