NodeSync: a Distributed Key-Value Store

Introduction

Distributed systems are defined less by what they do and more by what can go wrong. NodeSync is a distributed key-value store built to explore this reality firsthand. Instead of abstract algorithms, NodeSync focuses on concrete behaviors: node crashes, network delays, and leadership changes.

This article focuses on design decisions, trade-offs, and implementation insights gained while building NodeSync.

🔗 Project: https://github.com/MANISH-K-07/NodeSync

📘 Docs: https://manish-k-07.github.io/NodeSync/

Design Goals

NodeSync was designed with the following principles:

Minimal external dependencies
Explicit network communication
Clear separation of responsibilities
Emphasis on correctness over performance

The project intentionally avoids:

Frameworks that hide complexity
External coordination services
Advanced optimizations

This ensures that every behavior is visible and understandable.

Node Responsibilities

Each node in NodeSync acts independently.

Responsibilities include:

Managing local key-value state
Handling client commands
Monitoring peer health
Participating in leader election
Replicating writes when required

There is no special bootstrap node or controller.

3 nodes (example)

python nodes/node.py 5000 127.0.0.1:5001 127.0.0.1:5002
python nodes/node.py 5001 127.0.0.1:5000 127.0.0.1:5002
python nodes/node.py 5002 127.0.0.1:5000 127.0.0.1:5001

The node with the highest port becomes leader.

16:55:49 [NodeSync  ] Node 5000 running on 127.0.0.1:5000
16:55:49 [NodeSync  ] Peers: [('127.0.0.1', 5001), ('127.0.0.1', 5002)]
16:55:49 [ELECTION  ] New leader elected: 5002

Each node runs as an independent process.

The first argument is the node's port (used as its ID)
Remaining arguments are peer addresses

Client Interaction Model

Clients communicate with nodes over TCP using simple text-based commands.

Supported commands include:

SET key value
GET key
LEADER
CONSISTENCY strong | eventual

Clients are unaware of which node is leader. This transparency simplifies client logic but increases system responsibility.

Client Interaction (PowerShell Example)

$client = New-Object System.Net.Sockets.TcpClient("127.0.0.1", 5002)
$stream = $client.GetStream()
$writer = New-Object System.IO.StreamWriter($stream)
$reader = New-Object System.IO.StreamReader($stream)
$writer.AutoFlush = $true

$writer.WriteLine("SET hello world")
$reader.ReadLine()

$writer.WriteLine("GET hello")
$reader.ReadLine()

Forwarding Writes to the Leader

When a client sends a SET command to a follower:

The follower forwards the command to the leader
The leader executes the write
The response is relayed back to the client

This forwarding mechanism ensures:

Centralized write ordering
Reduced conflict risk
Predictable behavior under concurrency

Replication and Quorum

Replication is explicit and acknowledgment-based.

Key concepts:

Leader counts itself as one acknowledgment
Followers respond with ACK after applying updates
Strong consistency requires quorum
Eventual consistency does not

This design makes quorum mechanics observable rather than implicit.

Handling Failures

Failures are treated as first-class events.

NodeSync handles:

Peer crashes
Temporary unavailability
Recovery after failure

Behavior includes:

Marking peers DOWN on heartbeat failure
Logging recovery events
Re-electing leaders dynamically

These behaviors expose the instability that real systems must tolerate.

Example Failure Scenario

Below is a typical sequence when a leader node fails:

16:55:32 [NodeSync  ] Node 5001 running on 127.0.0.1:5001
16:55:32 [NodeSync  ] Peers: [('127.0.0.1', 5000), ('127.0.0.1', 5002)]
16:55:34 [ELECTION  ] New leader elected: 5002
16:56:03 [FAILURE   ] Peer ('127.0.0.1', 5002) is DOWN
16:56:03 [ELECTION  ] New leader elected: 5001

This sequence demonstrates:

Failure detection via heartbeat timeout
Automatic leader re-election
Continued system operation without manual intervention

Consistency as a Runtime Choice

Allowing consistency to be changed at runtime provides a powerful learning tool.

It demonstrates:

How availability changes under failures
How write guarantees affect success rates
Why system designers must choose carefully

Few features made the trade-offs as tangible as this one.

Available Modes

Eventual consistency (default)
- Writes are replicated asynchronously
- Lower latency
- Temporary inconsistencies may occur
Strong consistency (quorum-based)
- Leader waits for acknowledgements from a majority of nodes
- Higher latency
- Linearizable writes as long as quorum is available

Switching Consistency at Runtime

Consistency can be changed dynamically using a client command:

CONSISTENCY eventual
CONSISTENCY strong

The consistency mode applies to the node receiving the command (typically the leader).

If quorum cannot be reached in strong mode, the write fails with:

FAIL: quorum not reached

This enables direct comparison of consistency-performance trade-offs.

Observability as a Design Requirement

Logging was not added as an afterthought.

NodeSync logs are:

Structured
Timestamped
Human-readable
Event-driven

This made debugging distributed behavior feasible.

Log Format

Each log entry follows the format:

HH:MM:SS [TAG ] message

Example:

17:09:47 [NodeSync  ] Node 5000 running on 127.0.0.1:5000

What This Project Taught Me

Key takeaways include:

Distributed systems are dominated by edge cases
Failure handling outweighs happy-path logic
Simplicity improves correctness
Design trade-offs are unavoidable

Most importantly, building NodeSync changed how I read research papers and system designs. Abstract ideas now map to real behaviors.

Closing Thoughts

NodeSync is a learning system, not a production system. But its value lies precisely in that limitation. By stripping away abstraction layers, it exposes the true nature of distributed systems.

For anyone studying systems engineering or distributed computing, building a project like NodeSync is an invaluable experience.

👉https://github.com/MANISH-K-07/NodeSync

👨‍💻About the Author

I’m Manish Krishna Kandrakota — a computer science undergraduate exploring distributed systems, backend infrastructure, and systems-level engineering. I build projects from scratch to understand how real-world systems handle failures, coordination, and consistency. I plan to pursue graduate studies in computer science with a focus on research-driven software engineering and security.

My recent work includes distributed storage systems, static analysis tools, and security-focused software projects. I write to document my learning and share practical insights from building complex systems hands-on. I have built multiple static analysis and security-focused tools, including CodeChecker, PyScope, and SecureFlow, with an emphasis on clarity of design, explicit trade-offs, and academic relevance.

🔗 GitHub: MANISH-K-07
🔗 LinkedIn: manish-k-kandrakota

Designing NodeSync: A Distributed Key-Value Store with Replication, Consensus, and Failure Handling

Introduction

Design Goals

Node Responsibilities

3 nodes (example)

Client Interaction Model

Client Interaction (PowerShell Example)

Forwarding Writes to the Leader

Replication and Quorum

Handling Failures

Example Failure Scenario

Consistency as a Runtime Choice

Available Modes

Switching Consistency at Runtime

Observability as a Design Requirement

Log Format

What This Project Taught Me

Closing Thoughts

👉https://github.com/MANISH-K-07/NodeSync

👨‍💻About the Author

Comments

More from this blog

ModelTrace: A Research-Grade Neural Network Inspection Framework in PyTorch

Idea behind Py2C: Designing a Python-to-C Compiler with IR, Optimizations, and Clean Code Generation

Project SecureFlow: Designing a Lightweight Static Taint Analyzer Using JavaParser

Inside PyScope: Designing a Python Profiler with Built-In Regression Detection

Command Palette

Introduction

Design Goals

Node Responsibilities

3 nodes (example)

Client Interaction Model

Client Interaction (PowerShell Example)

Forwarding Writes to the Leader

Replication and Quorum

Handling Failures

Example Failure Scenario

Consistency as a Runtime Choice

Available Modes

Switching Consistency at Runtime

Observability as a Design Requirement

Log Format

What This Project Taught Me

Closing Thoughts

👉https://github.com/MANISH-K-07/NodeSync

👨‍💻About the Author

Comments

More from this blog