As part of our Resin 4.0 work, we’re going through the various Resin services and improving the testing and quality of each service. In Resin 4.0.9, we revamped the heartbeat service which is at the center of Resin’s clustering configuration.
In each cluster, Resin assigns the first three servers to be the triad servers, the triple-redundant hub for the clustering model. Each of the triad servers can act as a backup for any of the others.
For the heartbeat service, every Resin server in the cluster connects to each triad server and sends a heartbeat every minute. If the connection drops or the heartbeat is missing, the triad knows immediately that the server has failed and can take appropriate measures to deal with the failure. For example, if a server is down, Resin won’t waste effort trying to send messages to the dead server. The heartbeat, however, retries the downed server every 60 seconds, so Resin will know the status as soon as it comes up.