Hardware Synchronization
Synchronization Nədir?
Synchronization - multi-threaded sistemdə shared data-ya thread-safe çıxış təmin etməkdir.
Niyə Hardware Dəstəyi Lazımdır?
Software-only həllər kifayət deyil!
// BROKEN! Race condition
int counter = 0;
// Thread 1 və Thread 2
counter++; // Read-Modify-Write - NOT atomic!
Atomic Operations
Atomic - bölünməz, all-or-nothing əməliyyat.
1. Atomic Read/Write
// x86
mov rax, [address] ; Aligned read - atomic
mov [address], rax ; Aligned write - atomic
Alignment vacibdir:
- Aligned: Atomic
- Unaligned: NOT atomic (span multiple cache lines)
2. Atomic Increment/Decrement
// x86 - LOCK prefix
lock inc [address] ; Atomic increment
lock dec [address] ; Atomic decrement
lock add [address], 5 ; Atomic add
3. Atomic Exchange
; x86
xchg rax, [address] ; Swap rax <-> memory (atomic)
// C11/C++11
int old = atomic_exchange(&var, new_value);
Compare-And-Swap (CAS)
CAS - ən güclü atomic primitive.
Pseudo-code
bool CAS(int* address, int expected, int new_value) {
atomic {
if (*address == expected) {
*address = new_value;
return true;
} else {
return false;
}
}
}
x86 Implementation
; CMPXCHG instruction
mov eax, expected
mov ebx, new_value
lock cmpxchg [address], ebx
; If *address == eax: *address = ebx, ZF=1
; Else: eax = *address, ZF=0
C11/C++11
std::atomic<int> var(10);
int expected = 10;
int desired = 20;
// Compare expected with var, if equal set to desired
bool success = var.compare_exchange_strong(expected, desired);
if (success) {
// var is now 20
} else {
// var was not 10, expected now contains actual value
}
CAS Loop Pattern
void atomic_increment(std::atomic<int>& var) {
int old_val = var.load();
int new_val;
do {
new_val = old_val + 1;
} while (!var.compare_exchange_weak(old_val, new_val));
}
ABA Problem
Həll: Version counter və ya pointer tag
Test-And-Set
Spin lock implementasiyası üçün.
bool test_and_set(bool* lock) {
atomic {
bool old = *lock;
*lock = true;
return old;
}
}
x86 Implementation
; Using XCHG
mov al, 1
xchg al, [lock]
; al now contains old value
Spin Lock
volatile bool lock = false;
void acquire_lock() {
while (test_and_set(&lock)) {
// Spin (busy wait)
}
}
void release_lock() {
lock = false;
}
Spin Lock Optimization
// Better: Test-and-test-and-set
void acquire_lock(volatile bool* lock) {
while (true) {
// First test (no bus lock)
while (*lock);
// Then test-and-set (bus lock)
if (!test_and_set(lock))
break;
}
}
Üstünlük: Az bus traffic
Load-Link / Store-Conditional (LL/SC)
ARM, PowerPC, RISC-V-də CAS əvəzinə.
ARM64 Example
; Load-Exclusive / Store-Exclusive
retry:
ldxr w1, [x0] ; Load-exclusive from address x0
add w1, w1, #1 ; Increment
stxr w2, w1, [x0] ; Store-exclusive (w2=0 if success)
cbnz w2, retry ; If failed, retry
Semantics
int load_link(int* address) {
// Mark address as "linked"
return *address;
}
bool store_conditional(int* address, int value) {
if (link_still_valid()) {
*address = value;
return true;
}
return false;
}
Link breaks if:
- Başqa CPU həmin cache line-ı dəyişdirsə
- Context switch olarsa
- Exception olarsa
LL/SC vs CAS
| Xüsusiyyət | LL/SC | CAS |
|---|---|---|
| ABA problem | Yoxdur | Var |
| Implementation | Sadə | Kompleks |
| Flexibility | Yüksək | Aşağı |
| Architecture | ARM, PowerPC, RISC-V | x86 |
Fetch-And-Add (FAA)
Atomic add və return old value.
; x86
mov eax, 5
lock xadd [address], eax
; [address] += eax, eax = old value
// C11/C++11
int old = atomic_fetch_add(&counter, 5);
Lock-Free Counter
std::atomic<int> counter(0);
void increment() {
counter.fetch_add(1, std::memory_order_relaxed);
}
Memory Barriers
Synchronization üçün memory ordering təmin etmək.
Types
1. Full Barrier
std::atomic_thread_fence(std::memory_order_seq_cst);
2. Acquire Barrier
std::atomic_thread_fence(std::memory_order_acquire);
3. Release Barrier
std::atomic_thread_fence(std::memory_order_release);
Producer-Consumer with Barriers
// Producer
data = compute();
std::atomic_thread_fence(std::memory_order_release);
ready.store(true, std::memory_order_relaxed);
// Consumer
while (!ready.load(std::memory_order_relaxed));
std::atomic_thread_fence(std::memory_order_acquire);
process(data);
Lock-Free Data Structures
Lock-Free Stack
template<typename T>
class LockFreeStack {
struct Node {
T data;
Node* next;
};
std::atomic<Node*> head;
public:
void push(T value) {
Node* new_node = new Node{value, nullptr};
new_node->next = head.load();
while (!head.compare_exchange_weak(new_node->next, new_node));
}
bool pop(T& result) {
Node* old_head = head.load();
while (old_head &&
!head.compare_exchange_weak(old_head, old_head->next));
if (old_head) {
result = old_head->data;
delete old_head; // ABA problem!
return true;
}
return false;
}
};
Lock-Free Queue
Michael-Scott Queue (MS-Queue):
template<typename T>
class LockFreeQueue {
struct Node {
T data;
std::atomic<Node*> next;
};
std::atomic<Node*> head;
std::atomic<Node*> tail;
public:
LockFreeQueue() {
Node* dummy = new Node{};
head.store(dummy);
tail.store(dummy);
}
void enqueue(T value) {
Node* new_node = new Node{value, nullptr};
while (true) {
Node* last = tail.load();
Node* next = last->next.load();
if (last == tail.load()) {
if (next == nullptr) {
if (last->next.compare_exchange_weak(next, new_node)) {
tail.compare_exchange_weak(last, new_node);
return;
}
} else {
tail.compare_exchange_weak(last, next);
}
}
}
}
bool dequeue(T& result) {
while (true) {
Node* first = head.load();
Node* last = tail.load();
Node* next = first->next.load();
if (first == head.load()) {
if (first == last) {
if (next == nullptr) return false;
tail.compare_exchange_weak(last, next);
} else {
result = next->data;
if (head.compare_exchange_weak(first, next)) {
delete first;
return true;
}
}
}
}
}
};
Hardware Transactional Memory (HTM)
HTM - hardware atomic transactions.
Intel TSX (Transactional Synchronization Extensions)
// Hardware Lock Elision (HLE)
while (__atomic_test_and_set(&lock, __ATOMIC_ACQUIRE)) {
_mm_pause();
}
// Critical section
__atomic_clear(&lock, __ATOMIC_RELEASE);
// Restricted Transactional Memory (RTM)
unsigned status = _xbegin();
if (status == _XBEGIN_STARTED) {
// Transaction
if (abort_condition) {
_xabort(1);
}
_xend();
} else {
// Fallback to lock
pthread_mutex_lock(&mutex);
// Critical section
pthread_mutex_unlock(&mutex);
}
Üstünlüklər:
- Optimistic concurrency
- Daha yüksək throughput
Çatışmazlıqlar:
- Transaction size limit
- Unpredictable abort
- Fallback lazımdır
Wait-Free vs Lock-Free
Praktik Nümunələr
1. Atomic Flag (Spin Lock)
std::atomic_flag lock = ATOMIC_FLAG_INIT;
void acquire() {
while (lock.test_and_set(std::memory_order_acquire));
}
void release() {
lock.clear(std::memory_order_release);
}
2. Reference Counting
class RefCounted {
std::atomic<int> ref_count{1};
public:
void addRef() {
ref_count.fetch_add(1, std::memory_order_relaxed);
}
void release() {
if (ref_count.fetch_sub(1, std::memory_order_acq_rel) == 1) {
delete this;
}
}
};
3. Seqlock (Linux Kernel)
struct seqlock {
atomic_t sequence;
spinlock_t lock;
};
// Writer
void write_seqlock(seqlock_t* sl) {
spin_lock(&sl->lock);
sl->sequence++; // Odd = write in progress
smp_wmb();
}
void write_sequnlock(seqlock_t* sl) {
smp_wmb();
sl->sequence++; // Even = write complete
spin_unlock(&sl->lock);
}
// Reader (lock-free!)
do {
seq = sl->sequence;
smp_rmb();
// Read data
smp_rmb();
} while (seq & 1 || seq != sl->sequence);
Performance Considerations
1. Cache Line Contention
// BAD: False sharing
struct {
atomic<int> counter1;
atomic<int> counter2;
} counters; // Same cache line!
// GOOD: Padding
struct alignas(64) {
atomic<int> counter;
} counter1, counter2;
2. Memory Order Selection
// Relaxed: Fastest, no synchronization
counter.fetch_add(1, memory_order_relaxed);
// Acquire/Release: Moderate
flag.store(true, memory_order_release);
while (!flag.load(memory_order_acquire));
// Seq_cst: Slowest, full synchronization
counter.fetch_add(1, memory_order_seq_cst);
3. Exponential Backoff
void spin_with_backoff(atomic_flag& lock) {
int backoff = 1;
while (lock.test_and_set(memory_order_acquire)) {
for (int i = 0; i < backoff; i++) {
_mm_pause(); // x86 pause instruction
}
backoff = min(backoff * 2, MAX_BACKOFF);
}
}
Debugging Synchronization
ThreadSanitizer (TSan)
# Compile with TSan
g++ -fsanitize=thread -g program.cpp
# Run
./a.out
# Reports data races
Helgrind (Valgrind)
valgrind --tool=helgrind ./program
Common Bugs
- Data race: Unsynchronized access
- Deadlock: Circular wait
- Livelock: Threads yield to each other
- ABA problem: CAS false positive
- Memory ordering bugs: Missing barriers
Best Practices
-
Use standard library
std::mutex, std::atomic, std::lock_guard -
Minimize critical sections
// Do work outside lock
auto result = compute();
{
std::lock_guard<std::mutex> lock(mutex);
shared_data = result;
} -
Prefer lock-free for simple cases
atomic<int> counter; // Instead of mutex + int -
Document memory order
// Release-acquire pair for producer-consumer
flag.store(true, memory_order_release); -
Test thoroughly
- Stress testing
- Thread sanitizers
- Multiple architectures
Əlaqəli Mövzular
- Memory Ordering: Memory consistency models
- Cache Memory: Cache coherence
- Multiprocessor Systems: Shared memory
- Parallelism: Lock-free parallelism
- Performance: Synchronization overhead