Mission Critical Linux
High-Availability Cluster Overview
9/12/2000
Tim Burke
burke@missioncriticallinux.com
Agenda
- Types of clusters
- Linux cluster landscape
- Mission Critical Linux clusters
- Demo description
- Demo
Types of Clusters
- HPTC (High-Performance Technical Computing)
- Parallel decomposition of compute intensive programs: (i.e. weather modeling, seismic analysis, mathematical computations)
- Attributes:
- Application modified to fit parallel computing paradigm (ie MPI, comm protocols, recovery, etc)
- Weak data integrity semantics
- Beowulf - 100s of nodes
Load-Balancing Clusters
- Receive incoming client requests
- Determine appropriate server (round robin, #of connections, system utilization metrics, static bindings)
- Allow servers to migrate in/out of pool
- Linux Virtual Server - (LVS)
- Highly effective for static web content. Deficient for dynamic content.
- TurboLinux Cluster - LVS derivative
- Red Hat High Availability Server - (formerly Piranha) Administrative GUI & packaging of LVS
- VA Linux UltraMonkey - packaging & docs on LVS
Linux Virtual Server
High Availability (HA) Clusters
- Goal: Combine multiple systems and peripherals to appear as a single system that remains operational in the event of component failure.
- NSPF - No Single Point of Failure
- History: Implemented by proprietary vendors, typically requiring custom hardware. (i.e. VMS, TruCluster, SGI Failsafe, IBM Phoenix, NT Wolfpack)
Typical HA Cluster Lifecycle
- Failover Clusters
- A single instance of the application runs on one cluster member at any point in time.
- Cluster members monitor each others health and take over running the application in the event of failure.
- Parallel HA Clusters
- Application runs simultaneously on all cluster members for performance boost; requires application customization.
- Single System Image
- Appears to be a single system. (Unified PID space, filesystem namespace). Typically requires no application customization.
- The lean and nimble takeover.
Disaster Tolerant HA Clusters
- Variant of High Availability clusters spanning geographical distance
- Campus outage (FibreChannel)
- Dedicated long-line links (eg. T1)
- Pros: survive site outage
- Cons: $$$$$$$$$$$$$
- distance
- bandwidth
- latency
Linux Cluster Landscape
- Open Source Projects
- LVS - Linux Virtual Server - load balancer typically used for web traffic dispatching.
- Beowulf - High Performance Technical Computing
- Linux-HA effort
- Collection of parts in varying states of completion
- Aspirations to cover failover & single system image
- Refocusing on porting SGI Failsafe to Linux
Linux Cluster Landscape (cont.)
- High-Availability Cluster Products on the market
- MCLX Convolo Cluster
- SteelEye LifeKeeper (currently not safe for filesystems & databases)
- Porting in Progress
- SGI Failsafe - will be open source
- HP MC/ServiceGuard
- Also shipping: a number of weak products, susceptible to data corruption.
Role of the Failover Cluster
- To ensure that a single instance of the application is only ever running on one cluster member at a time.
- Why is running on one member crucial?
- Allows you to run off the shelf applications.
- Filesystems can only be mounted by one system.
- Databases typically run on one system at a time.
- What happens when run on one member fails?
- Application runs on none of the cluster members
- Application concurrently runs on multiple cluster members -> data corruption ensues (weak data integrity guarantees).
Typical Failover Cluster Operation
- Cluster members monitor each others health by heartbeating over multiple communication channels (network, serial, proprietary).
- Start cluster services when the other member is down.
- The hard part -- knowing when the other member is down. A credible commercial cluster offering must address:
- True system failure - system died, crashed, lost power
- Planned maintenance, clean shutdown
- Communications partition (ie, network outage)
- System hangs (with subsequent resurrection)
Mission Critical Linux
Cluster Attributes
- The first credible Linux cluster
- Correct behavior in the face of all failure scenarios
- Provides strong data integrity guarantees in the event of failure
- Utilizes commodity hardware
- Distribution independent
- First to market with productized solution that ensures data integrity in the face of multiple points of failure
Mission Critical Linux Kimberlite Cluster Technology
- Open Source (6/2000)
- Complete high-availability failover infrastructure
- Comprehensive documentation
- Design specification
Mission Critical Linux
Convolo Cluster
- Fully supported product
- Based on Kimberlite core
- Binary RPM & Debian installers
- GUI for configuration & monitoring
- Boxed set (CD, docs)
- $995 per node
- 90-day support
Mission Critical Linux Cluster
Mission Critical Linux
Cluster Attributes
- Strong membership
- Quorum disk-based algorithm
- Heartbeat channels
- Strong data integrity
- Remote power switch
- Quorum disk-based shared state
- Generic service infrastructure
- System management GUI & CLI
Heartbeat Mechanism
- Periodic polling
- network (Ethernet LAN & point-to-point)
- serial (point-to-point; not PPP)
- Heartbeat node status
- based on full set of channels
- used as policy input to Quorum membership algorithm
Quorum
- Foundation of cluster membership
- Uses 2 partitions on shared storage (shadowed)
- Crucial 3rd vote in cluster membership decisions
- Member periodically updates its own state information and monitors state of other cluster members
- Member cleanly marks DOWN state on shutdown
- Disk access failure -> member removed from cluster
Strong Data Integrity
- Before performing service failover:
- Verify service was cleanly stopped
- Verify failed node is truly down
- Remote Power Switch
- Serial connection to partners power switch
- Power cycle partner on failure
- Forms I/O Barrier
Shared JBOD Storage
Shared RAID Storage
Service Infrastructure
- Service - application & associated data to be made highly available.
- Service Resources - IP addresses, filesystems, disks
- Service Properties - failover policy, preferred server
- Service script - used to start & stop service
Example Services
- Oracle, MySQL
- NFS
- Apache
- User defined service - e.g. Panasonic Jukebox controller demo
- Upcoming - Mail, Print, Samba, other databases
Multiple Concurrent Services
- Separate NFS exports
- Separate Oracle DB instances
- Active-Active configuration
- Hot-standby configuration
Cluster Configuration
- Installation scripts prompt for initial parameters.
- Web-based GUI used to define services and to monitor status.
- Command line utility provides access to all configuration settings.
- Defined subsystem configuration APIs.
Configuration GUI
References
- MCLX Kimberlite - oss.missioncriticallinux.com
- MCLX Convolo - www.missioncriticallinux.com/products/convolo
- Beowulf clusters - www.beowulf.org
- Linux-HA Project - www.linux-ha.org
- Linux Virtual Server - www.linuxvirtualserver.org
- TurboLinux Cluster - www.turbolinux.com
- Red Hat High Availability Server - www.redhat.com/support/wpapers/piranha/x32.html
References (cont)
- VA Linux UltraMonkey - ultramonkey.sourceforge.net
- SGI Failsafe - www.linux-ha.org/LinuxFailSafe
- SteelEye - www.steeleye.com
Demo Description
- Complete e-commerce site
- Front end LVS load balancing of HTTP traffic
- Back-end Convolo Cluster for high availability Oracle database
- Attributes
- Scalable
- No single point of failure