Mission Critical Linux High-Availability Cluster Overview 9/12/2000 Tim Burke burke@missioncriticallinux.com

Mission Critical Linux High-Availability Cluster Overview 9/12/2000 Tim Burke burke@missioncriticallinux.com

Agenda

Types of clusters
Linux cluster landscape
Mission Critical Linux clusters
Demo description
Demo

Types of Clusters

HPTC (High-Performance Technical Computing)
- Parallel decomposition of compute intensive programs: (i.e. weather modeling, seismic analysis, mathematical computations)
- Attributes:
  - Application modified to fit parallel computing paradigm (ie MPI, comm protocols, recovery, etc)
  - Weak data integrity semantics
- Beowulf - 100’s of nodes

Load-Balancing Clusters

Receive incoming client requests
Determine appropriate server (round robin, #of connections, system utilization metrics, static bindings)
Allow servers to migrate in/out of pool
Linux Virtual Server - (LVS)
- Highly effective for static web content. Deficient for dynamic content.
- TurboLinux Cluster - LVS derivative
- Red Hat High Availability Server - (formerly Piranha) Administrative GUI & packaging of LVS
- VA Linux UltraMonkey - packaging & docs on LVS

Linux Virtual Server

High Availability (HA) Clusters

Goal: Combine multiple systems and peripherals to appear as a single system that remains operational in the event of component failure.
NSPF - No Single Point of Failure
History: Implemented by proprietary vendors, typically requiring custom hardware. (i.e. VMS, TruCluster, SGI Failsafe, IBM Phoenix, NT Wolfpack)

Typical HA Cluster Lifecycle

Failover Clusters
- A single instance of the application runs on one cluster member at any point in time.
- Cluster members monitor each other’s health and take over running the application in the event of failure.
Parallel HA Clusters
- Application runs simultaneously on all cluster members for performance boost; requires application customization.
Single System Image
- Appears to be a single system. (Unified PID space, filesystem namespace). Typically requires no application customization.
The lean and nimble takeover.

Disaster Tolerant HA Clusters

Variant of High Availability clusters spanning geographical distance
- Campus outage (FibreChannel)
- Dedicated long-line links (eg. T1)
Pros: survive site outage
Cons: $$$$$$$$$$$$$
- distance
- bandwidth
- latency

Linux Cluster Landscape

Open Source Projects
LVS - Linux Virtual Server - load balancer typically used for web traffic dispatching.
Beowulf - High Performance Technical Computing
Linux-HA effort
- Collection of parts in varying states of completion
- Aspirations to cover failover & single system image
- Refocusing on porting SGI Failsafe to Linux

Linux Cluster Landscape (cont.)

High-Availability Cluster Products on the market
- MCLX – Convolo Cluster
- SteelEye – LifeKeeper (currently not safe for filesystems & databases)
Porting in Progress
- SGI Failsafe - will be open source
- HP MC/ServiceGuard
Also shipping: a number of weak products, susceptible to data corruption.

Role of the Failover Cluster

To ensure that a single instance of the application is only ever running on one cluster member at a time.
Why is “running on one member” crucial?
- Allows you to run “off the shelf” applications.
- Filesystems can only be mounted by one system.
- Databases typically run on one system at a time.
What happens when “run on one member” fails?
- Application runs on none of the cluster members
- Application concurrently runs on multiple cluster members -> data corruption ensues (weak data integrity guarantees).

Typical Failover Cluster Operation

Cluster members monitor each other’s health by heartbeating over multiple communication channels (network, serial, proprietary).
Start cluster services when the other member is down.
The hard part -- knowing when the other member is down. A credible commercial cluster offering must address:
- True system failure - system died, crashed, lost power
- Planned maintenance, clean shutdown
- Communications partition (ie, network outage)
- System hangs (with subsequent resurrection)

Mission Critical Linux Cluster Attributes

The first credible Linux cluster
- Correct behavior in the face of all failure scenarios
- Provides strong data integrity guarantees in the event of failure
- Utilizes commodity hardware
- Distribution independent
- First to market with productized solution that ensures data integrity in the face of multiple points of failure

Mission Critical Linux Kimberlite Cluster Technology

Open Source (6/2000)
Complete high-availability failover infrastructure
Comprehensive documentation
Design specification

Mission Critical Linux Convolo Cluster

Fully supported product
Based on Kimberlite core
Binary RPM & Debian installers
GUI for configuration & monitoring
Boxed set (CD, docs)
$995 per node
90-day support

Mission Critical Linux Cluster

Mission Critical Linux Cluster Attributes

Strong membership
- Quorum disk-based algorithm
- Heartbeat channels
Strong data integrity
- Remote power switch
- Quorum disk-based shared state
Generic service infrastructure
System management GUI & CLI

Heartbeat Mechanism

Periodic polling
- network (Ethernet LAN & point-to-point)
- serial (point-to-point; not PPP)
Heartbeat node status
- based on full set of channels
- used as policy input to Quorum membership algorithm

Quorum

Foundation of cluster membership
Uses 2 partitions on shared storage (shadowed)
Crucial 3rd “vote” in cluster membership decisions
Member periodically updates its own state information and monitors state of other cluster members
Member cleanly marks DOWN state on shutdown
Disk access failure -> member removed from cluster

Strong Data Integrity

Before performing service failover:
- Verify service was cleanly stopped
- Verify failed node is truly down
Remote Power Switch
- Serial connection to partner’s power switch
- Power cycle partner on failure
- Forms “I/O Barrier”

Shared JBOD Storage

Shared RAID Storage

Service Infrastructure

Service - application & associated data to be made highly available.
Service Resources - IP addresses, filesystems, disks
Service Properties - failover policy, preferred server
Service script - used to start & stop service

Example Services

Oracle, MySQL
NFS
Apache
User defined service - e.g. Panasonic Jukebox controller demo
Upcoming - Mail, Print, Samba, other databases

Multiple Concurrent Services

Separate NFS exports
Separate Oracle DB instances
Active-Active configuration
Hot-standby configuration

Cluster Configuration

Installation scripts prompt for initial parameters.
Web-based GUI used to define services and to monitor status.
Command line utility provides access to all configuration settings.
Defined subsystem configuration APIs.

Configuration GUI

References

MCLX Kimberlite - oss.missioncriticallinux.com
MCLX Convolo - www.missioncriticallinux.com/products/convolo
Beowulf clusters - www.beowulf.org
Linux-HA Project - www.linux-ha.org
Linux Virtual Server - www.linuxvirtualserver.org
TurboLinux Cluster - www.turbolinux.com
Red Hat High Availability Server - www.redhat.com/support/wpapers/piranha/x32.html

References (cont)

VA Linux UltraMonkey - ultramonkey.sourceforge.net
SGI Failsafe - www.linux-ha.org/LinuxFailSafe
SteelEye - www.steeleye.com

Demo Description

Complete e-commerce site
Front end LVS load balancing of HTTP traffic
Back-end Convolo Cluster for high availability Oracle database
Attributes
- Scalable
- No single point of failure