Health Check System Enhancements
Now that Resin is passing all the required TCKs and is ready for Java EE 6 Web Profile certification, we plan to shift focus to quality and stability for the next few months. Â Updates to the HealthCheck system, which are already underway, are an important part of that effort.
Resin 4.0.16 and later includes “health.xml”, a new configuration file dedicated to health checks, action, and meters. Â This is a big step forward from Resin’s <ping> or <resin:PingThread> that some users may be familiar with. Â While external application monitoring is important to your overall system architecture, the real power of Resin’s health monitoring is the ability to easily trigger things like restarts and thread dumps that would otherwise require custom external hooks.
Following Resin 4 conventions, health.xml contains stateless tag based XML configuration that describes objects to be created by CanDI. Â Provided you’re running Resin Pro, it’s imported out of the box into resin.xml using a <resin:import>. Â If you’re upgrading from an older version and don’t have the import, there’s no need to worry. Â Resin will detect a missing health.xml and internally setup all the standard health checks and remediation actions. Â For those that want greater control, take a look at health.xml and the following examples. Â This should give you a good idea of the variety of ways you now have to monitor your applications from within Resin and trigger actions if desired.
Following are just a few examples of what you can do, with more documentation forthcoming. Â Health configuration is intentionally similar to Resin’s rewrite rules. Â All these snippets belong in the health.xml file.
Restart Resin every 6 hours.
<health:Restart> <health:IfUptime limit="6h"/> </health:Restart>
Monitor CPU usage, print a warning at 90% usage, and dump threads at 99% usage after being rechecked.
<health:CpuHealthCheck> <warning-threshold>95</warning-threshold> <critical-threshold>99</critical-threshold> </health:CpuHealthCheck> <health:DumpThreads> <health:IfHealthCritical healthCheck="${cpuHealthCheck}"/> <health:IfRechecked/> </health:DumpThreads>
Monitor the tenured/heap memory pool, dump the heap if memory is low, and restart if it continues to be low unless it’s between the hours of 7 and 11.
<health:MemoryTenuredHealthCheck> <memory-free-min>1m</memory-free-min> </health:MemoryTenuredHealthCheck> <health:DumpHeap> <health:IfHealthCritical healthCheck="${memoryTenuredHealthCheck}"/> </health:DumpHeap> <health:Restart> <health:IfCriticalRechecked healthCheck="${memoryTenuredHealthCheck}"/> <health:Not> <health:IfCron> <enable-at>0 7 * * *</enable-at> <disable-at>0 11 * * *</disable-at> </health:IfCron> </health:Not> </health:Restart>
Email an administrator if the number of JMV threads exceeds a limit. Â The recent addition of a handy “mbean” function enables JMX access from EL.
<health:ExprHealthCheck> <critical-test>${mbean('java.lang:type=Threading').ThreadCount > 100}</critical-test> </health:ExprHealthCheck> <mail name="healthMailer"> <from>resin@yourdomain.com</from> <smtp-host>mail.yourdomain.com</smtp-host> <smtp-port>25</smtp-port> </mail> <health:SendMail mail="${healthMailer}"> <to>admin@yourdomain.com</to> <to>another_admin@yourdomain.com</to> <health:IfHealthCritical healthCheck="${exprHealthCheck}"/> </health:SendMail>
Execute an external shell script if a custom JSP page returns a non-success HTTP response after being rechecked and it’s status is not flapping. Â Execute a different script upon recovery. Â (Flapping occurs when health status changes too frequently; convenient for filtering out nuisance alerts.)
<health:HttpStatusHealthCheck ee:Named="myPingCheck"> <url>http://localhost:8080/pingtest.jsp</url> <socket-timeout>10s</socket-timeout> <regexp>^2</regexp> </health:HttpStatusHealthCheck> <health:ExecCommand> <command>/opt/scripts/alarm.sh</command> <health:IfHealthCritical healthCheck="${myPingCheck}"/> <health:IfRechecked/> <health:Not> <health:IfFlapping healthCheck="${myPingCheck}"/> </health:Not> </health:ExecCommand> <health:ExecCommand> <command>/opt/scripts/fixed.sh</command> <health:IfRecovered healthCheck="${myPingCheck}"/> </health:ExecCommand>
Tags: cpu, dump heap, dump threads, flapping, health, memory, recover, restart, thread count
