Resin Cloud, Clustering 3.0
Resin Cloud Support Overview
Cloud support is Resin’s 3rd generation clustering. It is based on years of perfecting ways to setup and manage a cluster. It is the culmination of decisions about the best way to do things. Caucho has been doing clustering with Resin longer than many other companies existed, and longer than many of the big players were in the Java market. Our clustering predates most if not all other solutions. And our clustering support is not bolted on as an after thought. We didn’t buy it. Caucho is an engineering company. We built it. We perfected it. It is not just built in, its ingrained in. Resin is designed from the ground up to support clustering and cloud computing.
Over the years, we have learned many things about clustering. You will see these key learnings in our clustering support. We learned how to minimize the replication and copying of values to the bare essentials. We learned how to maximize a clusters topology to get maximum failover support and still have flexibility, and elasticity, but most importantly we have what few cluster architectures provide, we have operational predictability. There is no magic in our cluster. It is a simple, robust, hub and spoke architecture. Simple in its design, yet elegant and powerful.
Our third generation clustering, our true Java EE cloud support, is designed to be an elastic cluster. Resin has the ability to add and remove servers from a live cluster. This ability includes centralization and replication of cache data, application version management, JMS queues and session replications.
Resin clustering uses a Triad set of servers. A Triad is a triple redundant hub. The Triad forms the hub in our hub and spoke network. The first three static servers make up the Triad. The Triad servers are responsible for load-balancing clustered services as well as servicing regular requests. The clustered services, the Triad performs, includes load balancing, caching, JMS, and clustered deployment. Non-Triad servers, spoke servers, that are added enroll automatically in clustered services. The spoke servers can be static or dynamic/elastic.
A cluster can have as little as one server in it. A small shop might have 3 triad members and four static servers with the ability to add dynamic servers if needed. A large shop might have a master pod with sixty four servers with 9 other pods of 64 (640). To scale from three servers to 64 servers takes no additional configuration at all. Beyond sixty four servers there is some additional configuration, but then you can can add thousands of servers and still be part of the same cluster. This ability to scale from few to large with no config changes and from large to very large without much change in configuration is the hallmark of how Resin works. Resin is about minimal configuration, with maximum leverage.
With Resin every server in the cluster serves up requests for the same set of applications. Resin does not divide servers up to serve different applications.
Resin includes a fast, light weight, load balancer. You can also plug Resin into Apache or have it work with a hardware load balancer.
One advantage of using Resin’s buit-in load balancer is that it is cloud/cluster aware, which means if you add a new server to the cluster the load balancer automatically knows about it and starts directing new requests to it. Resin cloud load balancer can add/remove servers to distribute the load at runtime (dynamically), and the load balancer knows about the servers status.
All servers in the cluster share queue, session, and object cache data. This cluster data is replicated for fault tolerance.
Simple, Operational Predictability, Rock Solid, and Clustering
One key differentiator between Resin’s cloud support and other clustering solutions is Resin is a straight-forward elastic-server cloud support.
You can use Resin in a public cloud because its hub and spoke architecture fits public IaaS offerings well which is not the case with other offerings.
With Resin servers can join and leave with no extra configuration.
No Magic, No Overkill, just Rock Solid, Spoke and Hub
Resin prides itself on operational predictability and simplicity. Resin takes a “No Magic” approach to clustering. This “No Magic” approach is designed to avoid a lot of headaches associated with other clustering solutions. There is no massive data shuffling problems when you add or lose a server. There are no group splits, it is simple and reliable. Resin does not have overly chatty conversations because Resin does not try to massively replicate data, which only increases the complexity.
Resin is very easy to understand and configure, so you don’t need an army of overpriced consultants. There is no 400 page cluster configurations guide. It is not needed. Resin clustering is simply easy to use and it just works. The simplicity in using it, does not mean that we don’t sweat over every byte, because we do. Resin clustering is a finally tuned work of engineering. Simplicity is by design, and the design is hardened steel.
To support elastic-server cloud, Resin provides the following:
- data redundancy within the triad hub;
- application deployment through a clustered transactional repository;
- distributed caching and store for servlet sessions; elastic JMS message queues and JCache caching;
- and, load balancing to the elastic spoke servers.
Resin clustering also provides distributed management, sensing, a watchdog system, and health checks. These watchdog and health features go from a nice to have feature to an essential feature once you are dealing with multiple servers. If you want a cloud solution that can recovers from DoS attacks, and usage spikes then Resin is your server.
Resin cluster features
In Resin, clustering is always enabled. Even if you only have a single server, that server belongs to its own cluster. As you add more servers, the servers are added to the cluster, and automatically gain benefits like clustered health monitoring, heartbeats, distributed management, triple redundancy, and distributed deployment.
With Resin clustering you get
- HTTP Load balancing and failover
- Elastic servers: adding and removing servers dynamically
- Distributed deployment and versioning of applications
- A triple-redundant triad as the reliable cluster hub
- Watchdog system: Heartbeat monitoring of all servers in the cluster
- Health sensors, metering and statistics across the cluster
- Clustered JMS queues and topics
- Distributed JMX management
- Clustered caches and distributed sessions
Scaling Resin from One to Many
Most web applications start with a single server, at least during development. Later severs get added for increased performance and/or reliability. Developers are familiar with single server deployment. As web application usage grows, single server can have hardware limitations especially if more than one app is hosted on app server. Hardware limits can occur like chronic high CPU and memory usage etc. Other resources can be adversely impacted.
Server load-balancing solves scaling problem by letting you deploy a single web app to more than one physical machine. Machines share the web application traffic. Server load-balancers are used to reduce total workload on a single machine and providing better performance. Load-balancing is achieved by redirecting network traffic across multiple machines via a hardware or software load-balancer. Resin supports software load balancer as well as hardware load balancers. Resin’s load balancer is cluster aware so as you spin up new nodes, they automatically garnish some of the load without extra configuration.
Load-balancing also increases reliability/up-time. If one or more servers go down for maintenance or due to failure, other servers can still continue to handle traffic. With a single server application, any down-time is directly visible to the user, drastically decreasing reliability.
Clustered Deployment
Resin masters cloud deployment. You have to master deployment to really enjoy the benefits of Java EE in the cloud. You can’t have multiple versions of a war file running haphazardly. You need consistency. You need operational predictability. You need versions to be managed even for dynamic servers. Servers need to be elastic and elastic servers need support to make sure they have the right version of the application.
Resin Pro deployment goes from simple to a powerful transactional cloud deploy with no real change in complexity. You can simply copy a *.war file to webapps folder or you can use the command line to deploy. Poof you are done. This is the model most people are familiar with.
However, once you use the command line to deploy to a server, you automatically deploy to the cloud. Recall, that a single server is just a cluster of one, and our cloud support is our 3rd generation clustering support. Using the command line distributes a new web-application to all servers in cloud, using a transactional store (Git based) to ensure consistency. The cloud can be 1, 10, 20, 50, 500, 4000 or more servers. You can rollback, promote, stage a web application easily across the entire cloud.
If a server restarts, or a new server dynamically joins, the servers always make sure they have the correct version of the applications running. This provides a level of operational predictability that is required for cloud Java EE support and missing from other offerings.
graceful version upgrade
Resin supports a graceful version upgrade feature that is unparalleled. The old version of the web-app continues to receive request from old sessions. The new version get the new requests. User see a consistent version as the web site update occurs with no downtime required. The whole cloud of servers can be upgraded this way. No down time. No performance degradation. It also is possible to deploy a web application to a live server without interruption by using the load balancers and backup servers. Instantly every server is upgraded to the new version.
Choose Resin for the cloud because Resin gets cloud deployment.
Stateful Web Applications
If web applications were stateless, load-balancing alone would do the trick for scalability. Many web applications are heavily stateful. Even very simple web applications use the HTTP session to keep track of current login, shopping-cart-like functionality and so on. This has become even more of the case with frameworks like JSF, CDI, Spring MVC, Spring Webflow and Seam which have conversational state stored in session. When stateful web applications are load-balanced, HTTP sessions are shared across application servers.
You can use a sticky session to pin a session to a server (server affinity). Session state still needs to be shared (replicated) among multiple servers for failover so users don’t lose their session data. With failover of a stateful application, session has to be moved to new server picked by load balancer and thus the server must be replicated in the Triad. Without failover, the user would lose a session when a load-balanced server experiences down-time. Resin provides a sticky session load balancer as well as clustered replicated sessions.
Triad Spoke and Hub details
The Triad Spoke and Hub model is based on lessons learned from 13 years of clustering support. Triad is the hub. Triad provides triple redundancy so you can perform maintenance on one server box while two remain up for fault tolerance. With this approach the load increases 50% for each remaining server instead of increasing 200% if you just using a backup model. The most important persistent data is stored and replicated on Triad servers so it is easy to pay special attention to these servers. All other servers can be cycled in and out as needed (they can be elastic).
Problems with complex clustering solutions
There are cluster solutions of varying degrees of features, complexity and magic. They range from very expensive to free. There are clustering solutions that rely on custom classloaders, automatic discovery, and self forming group clustering. Then there are clustering solutions that strive to be configurable beyond imagination with 10,000 different options. The ability to understand and implement these clustering solutions can be challenging. Problems such as group splitting and such can be very hard to diagnose or sometimes even the ability to realize that there is a problem is compromised. The more complex and full of “features” the clustering solution is, the harder it is to implement correctly.
If your clustering solution has a four hundred page document and the company that supports or produces your clustering solutions has high priced team of consultants, then you have picked a clustering solution that is hard to implement. It does not matter if it is Open Source or commercial clustering either because if you have to pay for consultants and support to implement it, the actual license cost could be mere dust on the scale. Also, if you decide on public cloud solutions there is a good chance that these clustering solutions will not work without a lot of special configuration.
Even if you don’t hire a army of their consultants, more moving parts means more time figuring out which levers to pull and which feature to include. Mistakes can be costly as complicated clusters have clustering problems that are notoriously hard to diagnose. A simpler solution, that just works, means less moving parts and more ROI. A simpler solutions means you get operational predictability.
Resin’s approach to cloud and clustering
Resin clustering takes a different approach. It is built based on the features you need, and nothing else. There is no magic. There is no complex, configurable auto-discovery, or self-forming groups. It is straight forward. It is easy to learn too.
There is an XML file that dictates the topology of your cluster. You can configure a hub, spoke servers and a load balancer. You can configure as little as two servers and the rest of the servers would be dynamic with no further XML configuration needed. Adding additional servers are easy because they just point to the hub and the hub manages the clustering bits. Since our cluster is TCP/IP based, if you can ping a hub server from a new server machine, then you can add that new server machine to the cluster with a command line switch. You don’t need networking engineers teams working with specialized cluster setup consultants to setup a special network just to be able to connect our servers in a cluster. This also makes our clustering solution a perfect fit for public IaaS providers. It just works.
Just because it is simple does not mean there is not a lot of engineering rigor involved. We pained over each byte that gets sent out to reduce the messaging to the bare minimum and data transfer to as small as it can be and no smaller. We push data out to spoke servers so they do not have to always get data from the hub. The data is versioned with hash keys to verify if data changed since last time retrieved. This focus on simplicity makes our clustering model easier to understand and implement. The focus on Resin clustering is simplicity to implement but performance and scalability amped up to the max.
The focus on producing the simplest thing that would work instead of adding every buzzword, and neat sounding concept is key to our philosophy at Caucho. Less is more.
Caucho is an engineering company not a company led by champagne induced architectural visions of clustering grandeur. This philosophy produces the easiest to configure, easiest to support, and a very performant clustering solution. Our clustering solution has excellent ROI.
Triad Architecture: Hub
After many years of developing Resin’s distributed clustering for a wide variety of user configurations, we refined our clustering network to a hub-and-spoke model using a triple-redundant triad of servers as the cluster hub. The triad model provides interlocking benefits for the dynamically scaling server configurations people are using.
When you bring down one server for maintenance, the triple redundancy continues to maintain reliability. You can add and remove dynamic servers safely without affecting the redundant state nor causing tons of data replication. Your network processing will be load-balanced across three servers, eliminating a single point of failure. Your servers will scale evenly across the cluster because each new server only needs to speak to the triad, not all servers.
Your large site can split the cluster into independent “pods” of less than 64-servers to maintain scalability, each pod has its own triad. The first triad of the first pod is the master triad. In this way, your cluster can scale to thousands of servers.
Static and Elastic Spoke Servers
Non-Triad servers are spokes in th hub-and-spoke model. They can be configured in the resin XML or added at will. The ’spoke’ servers are not needed for reliability. The spoke servers don’t store primary cache values, primary application deployment repository, or queue storage (JMS). The replication of cache data is lazy to reduce network traffic. The cache can serve local data, but it checks with the triad if the data has changed. The spoke servers can be added at will. When you add and remove spoke servers, it does not affect the primary system. . With other network configurations, removing a server forces a shuffling around of backup data to the remaining servers, but Resin’s hub and spoke model is optimized to reduce excessive data shuffling if a server is removed because only the Triad has the master copies of cluster data.
There are two type of spoke servers: dynamic server (servers which are elastic) and static servers. Dynamic servers do not need a fixed IP. Dynamic servers are easy to bring up and down (elastic). Static serves are configured by id and ip address in the resin.xml. The first three static servers are configured are always Triad servers. Triad servers, static spoke servers, and dynamic spoke servers also handle normal requests.
Triad Servers
The triad also gives a simpler user model for understanding where things are stored: the important data that needs to be replicated is always stored in the Triad. This includes master queue data, master cache data, and master deployment files. The other servers just use the Triad. This model is easy to understand. There is no complex data sharing and copying scheme. Non-Triad servers, spoke servers, are more expendable. Understanding the triad means you can feel confident about removing non-triad servers, and also know that the three triad servers are worth additional reliability attention.

October 26th, 2011 at 1:27 pm
[...] cloud this year and people stopped by our booth to ask about Caucho’s cloud technology. Resin’s triad hub-spoke model and dynamic deployment based on .git seemed to impress everyone we talked to. Also, check out the cloud deployment demo we had running [...]
October 30th, 2011 at 7:56 pm
[...] cloud this year and people stopped by our booth to ask about Caucho’s cloud technology. Resin’s triad hub-spoke model and dynamic deployment based on .git seemed to impress everyone we talked to. Check out the cloud deployment demo we had running on a [...]