Scale Up

What is Load Balancing & Why Use It?

Topic: Fundamentals

Arrows splitting from one path into multiple paths, representing traffic distribution

A load balancer acts as a reverse proxy, sitting in front of your application servers and distributing client requests across all servers capable of fulfilling those requests. Instead of clients connecting directly to a specific backend server, they connect to the load balancer's IP address.

Key Benefits:

Scalability: Allows you to handle more traffic by adding more backend servers (horizontal scaling) without changing the entry point for clients.
High Availability / Fault Tolerance: If one backend server fails, the load balancer detects this (via health checks) and redirects traffic to the remaining healthy servers, minimizing downtime.
Improved Performance: Distributes the load, preventing any single server from becoming overwhelmed, leading to faster response times.
Increased Reliability: Provides redundancy for your application.
Flexibility: Easier to perform maintenance on backend servers without interrupting service by taking them out of the pool temporarily.

How Load Balancers Work: The Basics

Topic: Mechanism

Abstract network diagram showing connections between nodes, representing data flow through a load balancer

Conceptually, a load balancer manages a pool of backend servers. Here's a simplified flow:

A client sends a request to the load balancer's public IP address (often called a Virtual IP or VIP).
The load balancer receives the request.
It determines which backend server in its pool should handle the request based on a configured algorithm and server health.
The load balancer forwards the request to the chosen healthy backend server.
The backend server processes the request and sends the response back to the load balancer.
The load balancer relays the response back to the original client.

To the client, it appears they are communicating with a single, highly responsive server. The load balancer handles the complexity of the backend pool distribution transparently.

Common Load Balancing Algorithms

Topic: Distribution Methods

Abstract geometric shapes suggesting different distribution patterns or algorithms

Load balancers use algorithms to decide where to send the next request. Common ones include:

Round Robin: Distributes requests sequentially across the list of available servers. Simple and predictable, but doesn't account for server load or varying capacity.
Least Connections: Directs traffic to the server with the fewest active connections. This is often a better choice for balancing load when request processing times vary significantly.
Least Response Time: Sends requests to the server that is currently responding the fastest (requires monitoring response times).
IP Hash: Calculates a hash based on the client's IP address (and sometimes port) to determine the target server. This ensures that requests from the same client consistently go to the same server (useful for sticky sessions, but can lead to uneven load).
Weighted Variations: Allows assigning different weights to servers, sending more traffic to more powerful machines (e.g., Weighted Round Robin, Weighted Least Connections).

The best algorithm depends on the application's traffic patterns and server capabilities.

Health Checks: Ensuring Server Availability

Topic: Reliability

Code on screen with green indicators, representing successful health checks

Load balancers constantly monitor the health of the backend servers in their pool. This prevents traffic from being sent to unresponsive or failing servers.

How Health Checks Work:

The load balancer periodically sends a request (e.g., a simple TCP connection attempt, an HTTP HEAD request to a specific endpoint) to each server.
It waits for a valid response within a defined timeout period.
If a server fails a configurable number of consecutive health checks, the load balancer marks it as "unhealthy" and temporarily removes it from the pool of active servers.
Once a server starts passing health checks again, it's added back into rotation.

Properly configured health checks are critical for achieving high availability. Without them, the load balancer might continue sending traffic to dead servers, causing errors for users.

Sticky Sessions (Session Persistence)

Topic: State Management

Data charts showing consistent connections, representing sticky sessions

Some applications require that all requests from a specific user during a session are sent to the same backend server. This is called session persistence or "sticky sessions." This might be needed if the server stores session-specific data locally (e.g., items in a shopping cart before modern distributed session management).

Methods:

Source IP Affinity: Uses the client's IP address (like IP Hash algorithm) to consistently route them. Can be problematic if multiple users share an IP (NAT) or if a user's IP changes.
Cookie-Based Affinity: The load balancer inserts a special cookie into the first response to the client. Subsequent requests from that client include the cookie, allowing the load balancer to route them back to the original server. This is generally more reliable than IP-based affinity.

Trade-offs: While sometimes necessary, sticky sessions can interfere with even load distribution (one server might get overloaded with sticky users) and complicate scaling or server maintenance. Modern applications often strive for stateless backend servers, storing session data externally (e.g., in Redis or a database) to avoid the need for stickiness.

Load Balancers in the Cloud (e.g., AWS ELB)

Topic: Cloud Services

Laptop showing cloud infrastructure diagram with AWS or similar icons

Cloud providers offer managed load balancing services that simplify setup, management, and scaling. AWS Elastic Load Balancing (ELB) is a prime example:

Application Load Balancer (ALB): Operates at Layer 7 (HTTP/HTTPS). Ideal for web applications, microservices. Offers advanced routing rules (path-based, host-based), target groups, integrated health checks, and security features.
Network Load Balancer (NLB): Operates at Layer 4 (TCP/UDP/TLS). Designed for extreme performance, high throughput, and low latency. Handles millions of requests per second. Preserves source IP address. Good for TCP-based applications, gaming, IoT.
Gateway Load Balancer (GLB): Operates at Layer 3 (IP). Used to deploy, scale, and manage third-party virtual network appliances (like firewalls, intrusion detection systems).
Classic Load Balancer (CLB): Older generation, operates at Layer 4 or 7. Generally recommended to use ALB or NLB for new applications.

Managed services handle scaling the load balancer itself, integrating with other cloud services (like auto-scaling groups), and provide monitoring and logging, significantly reducing operational overhead compared to self-managing load balancers.