diff options
author | Neil Horman <nhorman@openssl.org> | 2025-01-19 22:43:37 +0100 |
---|---|---|
committer | Tomas Mraz <tomas@openssl.org> | 2025-01-22 20:16:11 +0100 |
commit | fbd34c03e3ca94d3805e97a01defdf8b6037f61c (patch) | |
tree | 7d729270ddb85bb57c4444be53bb6760e1935367 /fuzz/corpora/client/77b4eb674ab814acdcdf74d15981358545997e1d | |
parent | byteorder.h: Fix MSVC compiler error C2371 (diff) | |
download | openssl-master.tar.xz openssl-master.zip |
ppc64le occasionally still fails the threadstest on __rcu_torture
From several days of debugging, I think I've landed on the problem.
Occasionally, under high load I observe the following pattern
CPU0 CPU1
update_qp get_hold_current_qp
atomic_and_fetch(qp->users, ID_MASK, RELEASE)
atomic_add_fetch(qp->users, 1, RELEASE
atomic_or_fetch(qp->users, ID_VAL++, RELEASE)
When this pattern occurs, the atomic or operation fails to see the published
value of CPU1 and when the or-ed value is written back to ram, the incremented
value in get_hold_current_qp is overwritten, meaning the hold that the reader
placed on the rcu lock is lost, allowing the writer to complete early, freeing
memory before a reader is done reading any held memory.
Why this is only observed on ppc64le I'm not sure, but it seems like a pretty
clear problem.
fix it by implementing ATOMIC_COMPARE_EXCHANGE_N, so that, on the write side in
update_qp, we can ensure that updates are only done if the read side hasn't
changed anything. If it has, retry the operation.
With this fix, I'm able to run the threads test overnight (4000 iterations and
counting) without failure.
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26478)
Diffstat (limited to 'fuzz/corpora/client/77b4eb674ab814acdcdf74d15981358545997e1d')
0 files changed, 0 insertions, 0 deletions