User Tools

Site Tools


recipe:setup-a-cluster

Setup an high-available cluster

The cluster module enables two stand-alone HSMX gateways to form an highly redundant cluster. One HSMX will be set the active node and will regulate all traffic, it's twin HSMX will be the passive node and remains stand-by. The machines communicate constantly with each other to operate as one.

In a healthy cluster When the active node becomes unreachable the passive node performs a smooth fail-over. This means that the passive HSMX takes over the active node role and will start processing traffic. When two machines are clustered and the passive node fails, the active node will log this event but no action will be taken. HSMX only fails-back onto the original active node if it comes back online and is more healthy.

During the role switch most subscriber generated network traffic will experience little to none connection interruptions and most services used by subscribers will transition without the user noticing.

Architecture

Shared IP

A virtual IP is an IP that will always point to the active node. This comes in handy when you want to use the HSMX API or if you simply want to connect to the active gateway.

The HSMX cluster nodes need to communicate with each-other to behave as a single-entity. For this communication two IP-addresses can be configured. It's recommended to employ a dedicated network interface solely used for the cluster communication and use the WAN-interface as fallback cluster interface.

Tip: The cluster link may be routed although it's not a best-practice.

Subscriber Networks

Both nodes should be connected to the same subscriber network (segment/subnet). The HSMX will only activate the subscriber network IP-address on the active node and upon node switch issue a notification (GARP) that the new machine is active.

Configuration

Before setting-up a cluster make sure both machines are in healthy-state. Network configuration and firewall settings should be set on both machines. The cluster settings will only have to be applied on one of the nodes. Two clustered HSMX nodes will share the entire configuration except for the following items:

  • HSMX Languages
  • Licenses
  • Performance
  • SSL Certificates
  • Firewall

Basic network

Throughout this recipe the same network configuration will be assumed. WAN-upstream connectivity is guided through 172.20.0.1/22 on Port WAN (rename-interfaces) of both machines. The cluster link is setup on Port Cluster Node <N> and the high-available subscriber network should be configured on both machines!

Firewall

The default firewall configuration are good start for a single-node HSMX setup. We will adjust the firewall settings for the cluster-link communication. Two clustered HSMX machines have the following minimum cluster-link network requirements.

Port Service Protocol
80/tcp Web-management HTTP
873/tcp Cluster File Synchronization RSYNC
5432/tcp Cluster database PSQL
5555/udp Cluster communication HEARTBEAT

Browse to Security –> Firewall Settings → Add a new port and implement firewall-rules to enable cluster communication. Once each rule is added Apply to restart the firewall.

  • Fill out a description for this port
  • Choose the correct Ethernet interface (eg. Cluster Node <N>) or All
  • Direction: Incoming packet
  • Protocol: depending on the rule UDP or TCP (or All
  • State: All for a stateless firewall configuration.
  • Port: 80, 5432, 5555 or Any port

Tip: We recommend filling in the Source-IP. Assuming directionality is set to Incoming packet we configure the neighbors (HSMX node) cluster-link IP.
Tip: There are predefined rules available when you add a port that will already preset Direction, Protocol, Action, State and the correct Port for you. Using this feature will allow you to speed-up the process.

Cluster settings

Now we are ready to set up our cluster network. This configuration only takes place on the to be active node. Browse to Network –> Cluster Settings.

  • Virtual IPs
    This is the virtual IP-address shared on the WAN-interface of the HSMX nodes using VRRP. Configure an IP-address that's unused and within the network range of your WAN-interface.
    In this guide 172.20.0.155/24 is employed.
  • Synchronization IPs
    These are addresses are used by the HSMX to communicate and synchronize the state between the two gateways we have configured in Network → Network configuration for the cluster-link (dedicated WAN-interface or shared with regular WAN-interface). Optionally a secondary backup interface can be configured to avoid a single network failure causing communication loss between the cluster participants.
    In this guide we only configure the cluster-link IP addresses 192.168.81.1 and 192.168.81.2.
  • Network Interfaces
    These are the IP-addresses that are only active on a single node at any moment. Typically there's little configuration needed as these addresses can be filled-in for you by using the icon to retrieve the data of the other gateway

Configure the IP-addresses and test the cluster status by clicking the Test connection button. Cluster status can be used to verify whether the two participating gateways can communicate properly using the configured communication IP addresses. When all is set and the connection is successful you can activate the cluster.

Tip: You can simply press the icon on the right to retrieve the data of the other gateway automatically.
Note: PPPoe interfaces cannot be used for cluster configuration.

Cluster advanced settings

The cluster-link employs a four-state cluster-status based on contact with the other node and it's health-status. These settings govern the trigger of automatic fail-over. Setting these values can low can result in unstable cluster and too high in delayed fail-over upon failure, down-time inducing.

  • Ping interval
    The interval in which the ping commands are sent (in seconds).
  • Ping timeout
    The timeout before a ping command is marked as failed (in seconds).
  • Max failed pings
    Number of failed pings before the slave becomes primary.
  • Ping pongs
    Number of pings before the system performs a health check of the other gateway.
  • Sleep after health check
    How long the scripts sleep after a health check (in seconds).
  • Health timeout
    The timeout before a health message command is marked as failed (in seconds).
  • Max failed healths
    Number of failed health checks before the slave becomes primary.

Note: all time related variables are in seconds, use comma to specify up to microsecond precision.
Note: make sure keep identical configurations of these settings on both nodes
Note: advanced cluster settings are applied near-immediately when the cluster is activated.

What is synchronized

Everything is shared between two clustered HSMX machines except:

  • All interfaces but WAN-IPs (except for shared-IP on WAN-interfaces). Note: the default route is synced as well so make sure to use the same default gateway on both nodes
  • Language
  • License
  • Performance
  • SSL Certificates
    • To renew SSL certificates while your cluster is running. Install the certificate on the passive node first and reboot that machine. Once the machine accepted his SSL certificate (after reboot) make your cluster fail-over. You can now install your certificate on the current passive node.
  • Firewall
  • Backups and it's settings

Test

The cluster employs two methods to synchronize the system-state. The database-synchronization can be easily tested by creating a subscriber voucher on the active node. File-synchronization can be tested by changing the admin user-account password in System→Access Control→Users. If you can successfully sign-in using the new credentials on the passive node and the database is synchronized your cluster is up and running.

Troubleshooting

Connection fails

  • Something may be wrong with your firewall settings:
    • Check network interface Port
    • Check the Source/Destination IP addresses
    • Check directionality of firewall rule (Incoming packet or Outgoing packet).
    • Check the State
    • Restart firewalls on both gateways

Subscriber cannot get online

  • Check if the virtual IP is filled in correct (this cannot be an existing IP)
  • Check if the subnet of the virtual IP is filled in correct
  • Check if the NAT settings are correct

Health: Synchronization

recipe/setup-a-cluster.txt · Last modified: 2021/06/03 14:40 (external edit)