Resin Clustered Deployment

When you deploy an application using eclipse or ant, Resin propagates the .war updates to the cluster using the triad as the reliable store. The deployment is both incremental and transactional. Only changed files need to be sent, and the updates are only visible when they are complete and visible. Partial or incomplete deployment does not update the site.
- eclipse/ant sends the .war to triad server A
- Triad server A replicates the .war to triad servers B and C
- The triad updates the rest of the cluster with the new .war file
- Once all servers have the new version complete and verified, they can restart the web-app with the new data
Stage 1: eclipse/ant to triad server
Updating a triad server with the new .war is the first step in deployment, and the only step
where a developer or administrator needs to intervene, because every other step occurs automatically (with the possible exception of restarts if the restart mode is manual.)
Like all the steps, the eclipse process is transactional, i.e. the updates either complete or are rolled back. The deployment system is designed so partial updates are not possible, because you certainly don’t want part of an application updated or only some servers updated.
The communication between eclipse/ant and Resin uses HMTP/BAM, the normal HTTP interface, and requires authentication for security. For efficiency, Resin only updates files that have changed. So if you update one .jsp or .php in your web-app, the eclipse plugin will only send a single file.
Once eclipse/ant has uploaded entire .war and the triad server has verified the upload, Resin can move to stage 2: updating the triad.
Stage 2: triad server A replicates to full triad
Once triad server A has a complete and verified update, it can replicate the files to the other two servers in the triad. Having three servers control the cluster pod gives a number of benefits:
- reliability: with all data triply redundant, the deployment is secure from server failures
- administration: by focusing on three servers out of a cluster, an administrator can concentrate on a failure plan centered on three servers, and treat the other servers as less important
- scalability: because the 64 servers in a cluster pod use three servers as the reliable store, the load is distributed

Two servers is not sufficient because of normal, scheduled maintenance. In a two-server configuration, bringing down on server makes the second one a single point of failure, and also double the load on the remaining server. And if the second server happens to fail, the site no longer has a reliable store.
With three servers, a normal maintenance brings down only a third of the capacity and keeps up two servers backing each other up. If you use virtual servers, the maintenance or repair of a triad server can be very fast. One of our larger customers can bring up a replacement virtual server in 15 minute. With the triple redundancy of the triad, their users will never notice.
Once all three servers in the triad have the updated application and have verified the contents, they can update the rest of the cluster. Because all three have identical data, the other servers can load-balance across all three triad servers.
Stage 3: triad updates cluster pod
Because stage 3 starts with a reliable, triplicate copy of the deployment in the triad, it can support dynamic additions and removals of servers. In the update case, all current servers get a new copy of the .war, and in a dynamic server addition, the new server gets the current copy of all the .war files from the triad.

The three triad servers share the update load for the cluster pod. Because each cluster pod is limited to a maximum of 64 servers, large sites will create multiple identical pods to maintain scalability.
Like stages 1 and 2, the updates are incremental and transactional. Only the changed files are sent to the clusters servers, which reduces network traffic and improves deployment time. For transactional purposes, the servers only complete their updates and move to stage 4 when the entire update is complete and verified.
Stage 4: servers start new web-apps
Finally, each server has the update and can be restarted with the new version of the application. The restart can either be automatic using the dependency-check-interval, or manual using JMX to restart each server under an administrator’s control.
