main website home
  • About this blog

    This blog features updates, opinions, and technical notes from Caucho engineers about Caucho products, the enterprise Java industry, and PHP. Caucho Technology is the creator of the Resin Application Server and the Quercus PHP in Java engine. A leader in Java performance since 1998, Caucho is a Sun JavaEE licensee with over 9000 customers worldwide.
  • Tags

    ajaxworld bam candi cdi cloud cluster comet configuration deploy devoxx eclipse ejb embedded flash flex google app engine hessian hmtp ioc java ee 6 javaone javazone jms messaging newsletter nyjug osgi php pomegranate quercus resin resin 4.0 REST servlet sfjug silicon valley code camp spring testing training tssjs watchdog webbeans web profile websockets wordpress
  • Meta

    • Register
    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org
« JSON for WebSockets
Resin 4.0.18 release »

Health Check System Enhancements

Now that Resin is passing all the required TCKs and is ready for Java EE 6 Web Profile certification, we plan to shift focus to quality and stability for the next few months.  Updates to the HealthCheck system, which are already underway, are an important part of that effort.

Resin 4.0.16 and later includes “health.xml”, a new configuration file dedicated to health checks, action, and meters.  This is a big step forward from Resin’s <ping> or <resin:PingThread> that some users may be familiar with.  While external application monitoring is important to your overall system architecture, the real power of Resin’s health monitoring is the ability to easily trigger things like restarts and thread dumps that would otherwise require custom external hooks.

Following Resin 4 conventions, health.xml contains stateless tag based XML configuration that describes objects to be created by CanDI.  Provided you’re running Resin Pro, it’s imported out of the box into resin.xml using a <resin:import>.  If you’re upgrading from an older version and don’t have the import, there’s no need to worry.  Resin will detect a missing health.xml and internally setup all the standard health checks and remediation actions.  For those that want greater control, take a look at health.xml and the following examples.  This should give you a good idea of the variety of ways you now have to monitor your applications from within Resin and trigger actions if desired.

Following are just a few examples of what you can do, with more documentation forthcoming.  Health configuration is intentionally similar to Resin’s rewrite rules.  All these snippets belong in the health.xml file.

Restart Resin every 6 hours.

<health:Restart> <health:IfUptime limit="6h"/> </health:Restart>

Monitor CPU usage, print a warning at 90% usage, and dump threads at 99% usage after being rechecked.

<health:CpuHealthCheck> <warning-threshold>95</warning-threshold> <critical-threshold>99</critical-threshold> </health:CpuHealthCheck> <health:DumpThreads> <health:IfHealthCritical healthCheck="${cpuHealthCheck}"/> <health:IfRechecked/> </health:DumpThreads>

Monitor the tenured/heap memory pool, dump the heap if memory is low, and restart if it continues to be low unless it’s between the hours of 7 and 11.

<health:MemoryTenuredHealthCheck> <memory-free-min>1m</memory-free-min> </health:MemoryTenuredHealthCheck> <health:DumpHeap> <health:IfHealthCritical healthCheck="${memoryTenuredHealthCheck}"/> </health:DumpHeap> <health:Restart> <health:IfCriticalRechecked healthCheck="${memoryTenuredHealthCheck}"/> <health:Not> <health:IfCron> <enable-at>0 7 * * *</enable-at> <disable-at>0 11 * * *</disable-at> </health:IfCron> </health:Not> </health:Restart>

Email an administrator if the number of JMV threads exceeds a limit.  The recent addition of a handy “mbean” function enables JMX access from EL.

<health:ExprHealthCheck> <critical-test>${mbean('java.lang:type=Threading').ThreadCount > 100}</critical-test> </health:ExprHealthCheck> <mail name="healthMailer"> <from>resin@yourdomain.com</from> <smtp-host>mail.yourdomain.com</smtp-host> <smtp-port>25</smtp-port> </mail> <health:SendMail mail="${healthMailer}"> <to>admin@yourdomain.com</to> <to>another_admin@yourdomain.com</to> <health:IfHealthCritical healthCheck="${exprHealthCheck}"/> </health:SendMail>

Execute an external shell script if a custom JSP page returns a non-success HTTP response after being rechecked and it’s status is not flapping.  Execute a different script upon recovery.  (Flapping occurs when health status changes too frequently; convenient for filtering out nuisance alerts.)

<health:HttpStatusHealthCheck ee:Named="myPingCheck"> <url>http://localhost:8080/pingtest.jsp</url> <socket-timeout>10s</socket-timeout> <regexp>^2</regexp> </health:HttpStatusHealthCheck> <health:ExecCommand> <command>/opt/scripts/alarm.sh</command> <health:IfHealthCritical healthCheck="${myPingCheck}"/> <health:IfRechecked/> <health:Not> <health:IfFlapping healthCheck="${myPingCheck}"/> </health:Not> </health:ExecCommand> <health:ExecCommand> <command>/opt/scripts/fixed.sh</command> <health:IfRecovered healthCheck="${myPingCheck}"/> </health:ExecCommand>

Tags: cpu, dump heap, dump threads, flapping, health, memory, recover, restart, thread count

This entry was posted on Friday, April 22nd, 2011 at 12:02 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Leave a Reply

You must be logged in to post a comment.


Caucho Technology is proudly powered by WordPress and Quercus®
Entries (RSS) and Comments (RSS).

  • HOME |
  • CONTACT US |
  • DOCUMENTATION |
  • BLOG |
  • WIKI 4 |
  • WIKI 3 |
  • Resin: Java Application Server
Copyright (c) 1998-2012 Caucho Technology, Inc. All rights reserved.
caucho® , resin® and quercus® are registered trademarks of Caucho Technology, Inc.
resin® is a cloud optimized, java® application server that supports the java ee webprofile ®