Resin 4.0 clustering
A major chunk of the work for Resin 4.0 is refactoring our clustering architecture and code to better support dynamic and virtual servers, now that that ISPs are more flexible about adding and removing servers to handle load spikes. As a major side benefit to this work, we expect to simplify deployment and management of single server sites, and also offer distributed and single caching services like JCache. This added flexibility requires a redesign of our clustered sessions and load balancing.

Our goals for Resin 4.0 architecture include the following:
- Keep single servers simple to configure and manage
- Adding and remove dynamic servers allowed during runtime, at short notice for load and power management
- Improve scalability and performance of distributed caches and sessions
- Improve reliability with the triplicate-backup in the triad servers
- Focus administration effort on the triad servers, other servers can come and go
- Add remote deployment for both single server and clustered server configurations
The architecture we’ve chosen is a replicated hub-and-spoke model. The first three servers form a fully-connected and replicated hub (the “triad”). Following servers connect to all three triad servers, but do not normally connect to each other, avoiding the nxn scalability issue. Single and dual server configurations form a mini-hub, i.e. they are fully-connected and replicated, and are configured and used exactly the same as larger configurations.
Each “triad domain” is a triad of servers as the hub and additional servers as spokes, up to a total of 64. Sites larger than 64 will partition into multiple triad domains. Most communication like caching will remain in the triad domain, while shared information is communicated between the hubs.
Because all the critical data is stored on the triad-hub, the spoke-servers can be taken down for maintenance or disappear without affecting the reliability of the system. Because the triad-hub servers are critical, we recommend using a virtual server system so you can quickly replace a triad server during maintenance or system failure.
