Implementing Distributed Systems Concepts

Implementing Distributed Systems Concepts

Distributed systems are complex, but understanding key concepts and patterns makes them manageable. This guide covers practical implementation of consensus algorithms, distributed coordination, and fault tolerance.

Core Concepts

Consensus Algorithms

Consensus ensures multiple nodes agree on a single value, even in the presence of failures.

Raft Algorithm Implementation:

class RaftNode:
    def __init__(self, node_id):
        self.node_id = node_id
        self.state = "follower"
        self.current_term = 0
        self.voted_for = None
        self.log = []

    def start_election(self):
        self.state = "candidate"
        self.current_term += 1
        self.voted_for = self.node_id
        # Request votes from other nodes

Distributed Locks

Implement distributed locking using Redis:

import redis
import time
import uuid

class DistributedLock:
    def __init__(self, redis_client, key, timeout=10):
        self.redis = redis_client
        self.key = key
        self.timeout = timeout
        self.identifier = str(uuid.uuid4())

    def acquire(self):
        end = time.time() + self.timeout
        while time.time() < end:
            if self.redis.set(self.key, self.identifier, nx=True, ex=self.timeout):
                return True
            time.sleep(0.001)
        return False

    def release(self):
        pipe = self.redis.pipeline(True)
        while True:
            try:
                pipe.watch(self.key)
                if pipe.get(self.key) == self.identifier:
                    pipe.multi()
                    pipe.delete(self.key)
                    pipe.execute()
                    return True
                pipe.unwatch()
                break
            except redis.WatchError:
                pass
        return False

Fault Tolerance Patterns

Circuit Breaker

Prevent cascading failures:

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"
    OPEN = "open"
    HALF_OPEN = "half_open"

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failure_count = 0
        self.last_failure_time = None
        self.state = CircuitState.CLOSED

    def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN")

        try:
            result = func(*args, **kwargs)
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise e

    def on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED

    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

Best Practices

  1. Design for Partial Failures: Always assume some components will be unavailable
  2. Implement Idempotency: Operations should be safe to retry
  3. Use Timeouts: Every network call should have a timeout
  4. Monitor and Alert: Track system health and performance metrics

Distributed systems require careful design and implementation, but these patterns provide a solid foundation for building resilient, scalable systems.