User Tools

Site Tools


best-practices

Architecture

The following list should be considered as guidelines or rules of thumb based on our collective experience. They should not be considered hard-limits or guarantees, they are what we think would perform well in terms of subscriber experience. The main design goals are to keep a certain balance between subscriber counts, subscriber networks and CPU cores. There is no theoretical maximum subscriber limit but in our experience keeping it below 2500 per network insures smooth operation, preferably even around a thousand subscribers. The number of subscriber networks has a performance relationship with the number of CPU cores available; a subscriber network doesn't 'reserve' a CPU core but we tend to provision between 0.5 and 2 CPU cores per network. Reducing the overbooking of cores/network ratio gives you more headroom in subscriber count per network or allows you to employ subscriber 'heavy' features (ie. non-URL based content-filtering). Subscriber networks with no connected subscribers idle networks consume little performance and shouldn't be accounted for.

Examples

  • HSMX-5000
    • 2x 2500
    • 4x 1250
    • 8x 675
    • 16x 313
  • HSMX-1000
    • 1x 1000
    • 2x 500
    • 4x 250
  • HSMX-100
    • 1x 100
    • 2x 50

On smaller installations we typically see higher core/network ratios due to a single or maybe two subscriber network, this allows them to deviate from the 'guideline' and connect more subscribers on a single/few network(s) since there are more CPU cores working the instances. Larger appliances such as the 5000 and upwards typically deploy between four and sixteen networks because they need to spread high subscriber counts accross multiple subscriber networks accross the physical CPU cores.

Clusters

During a fail-over the failed machine is tasked with synchronizing it's user-data and bringing it's network interfaces down. This is a sequential process where the total number of subscriber networks largely dominate fail-over converge-time. At the same time the newly active machine brings it's interfaces back on-line and signs in all active subscribers. The second main factor is the number of active subscribers the system has to bring back online. The performance cost of a bringing a single subscriber back online is determined by the presence of a wan-load-balancing, network-policy, content-filter, QoS policy and one-to-one NAT policies. Currently we wouldn't recommend deploying over twenty-four subscriber networks on a clustered machine to make sure a configuration change doesn't trigger a cluster fail-over.

Configuration

  • Do not use domain-names that end in .local, Apple devices will not get online (mDNS).
  • Don't lower the DHCP lease-time to less than 12 hours; it will make the database grow exponentially and lead to performance problems in the long-run.
  • Always use connectivity-monitor on WAN interfaces: this allows for a finer-grained upstream connectivity detection.
  • Regularly create-a-backup or use the built-in remote back-up storage facility.
  • Delete the 10.10.10.1/30 alias from the WAN interface -unless you use it-, it only adds unnecessary load.
  • Database performance settings guideline (please don't deviate from this as it will lead to degraded performance over time)
    • on systems with 2G: it should be 256MB or maximum 768MB.
    • with 4GB RAM the same rule applies because they are sized for more users, which puts more stress on other components.
    • with 8GB and 16GB you can allocate up to 2GB to the database.
      • without content filter and 8GB or when you have 16GB you can go as high as 4GB for database allocation.
  • Don't apply a backup from a newer version on an older version

Upgrades

  • Create a backup and download this to your local machine.
  • Check the size of the log-files under Tools → Download Log. If they are large (multiple GB) download and erase them (or contact us).
  • Run Database optimize full under System → Task Manager (5.1 and higher)
  • Make sure there's at least a GB free space on the system partition

Clusters

Important Do not run mixed HSMX versions in cluster.

  • Disable the cluster
  • Update both machines
  • Re-enable the cluster
best-practices.txt · Last modified: 2021/06/03 14:40 (external edit)