main website home
  • About this blog

    This blog features updates, opinions, and technical notes from Caucho engineers about Caucho products, the enterprise Java industry, and PHP. Caucho Technology is the creator of the Resin Application Server and the Quercus PHP in Java engine. A leader in Java performance since 1998, Caucho is a Sun JavaEE licensee with over 9000 customers worldwide.
  • Tags

    ajaxworld bam candi cdi cloud cluster comet configuration deploy devoxx eclipse ejb embedded flash flex google app engine hessian hmtp ioc java ee 6 javaone javazone jms messaging newsletter nyjug osgi php pomegranate quercus resin resin 4.0 REST servlet sfjug silicon valley code camp spring testing training tssjs watchdog webbeans web profile websockets wordpress
  • Meta

    • Register
    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org
« Servlet 3.0 TCK - web-profile step 1
Resin Java EE 6 Web Profile Implementation »

Using Google App Engine’s Datastore and Task Queues in PHP with Quercus

I gave a talk Wednesday at the Silicon Valley Google Technology Users’ Group on using Quercus in the App Engine. One of the examples I gave was using the low-level data API from PHP and scheduling PHP “tasks” using Task Queues. I’ll walk through the source of that demo here to give you an idea of how Quercus makes it easy to mesh a Java platform with PHP code. At the end, I’ll also give you an idea of what the next steps would be to take this demo and use the techniques in a real application or framework.

The demo application grabs items from an RSS feed on a regular basis, stores them persistently, and has a page that displays the stored items. To implement this functionality, we have two PHP scripts:

  • index.php which displays the stored items
  • rss-task.php which fetches and stores the items

Behind the scenes, both scripts are actually using Java objects and methods to use the App Engine’s Datastore and task queues. The rss-task.php is actually a worker script that is run entirely as a background “process” in the App Engine task queues. The picture below shows an overview of the architecture:

php-gae-tasks

Let’s look at rss-task.php first:

<?php

// Java imports

import com.google.appengine.api.datastore.Entity;
import com.google.appengine.api.datastore.DatastoreServiceFactory;
import com.google.appengine.api.datastore.KeyFactory;

import com.google.appengine.api.labs.taskqueue.QueueFactory;

// Grab the raw RSS data from a URL
function fetch_rss($url)
{
  $ch = curl_init($url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch, CURLOPT_HEADER, 0);
  $data = curl_exec($ch);
  curl_close($ch);

  return $data;
}

// Transform the XML RSS items into Datastore entities and store them
function store_items($items)
{
  $service = DatastoreServiceFactory::getDatastoreService();

  foreach ($items as $item) {
    $key = KeyFactory::createKey("item", $item->guid);
    $entity = new Entity($key);

    foreach ($item->children() as $child) {
      $name = $child->getName();

      if ($name == "pubDate") {
        $entity->setProperty($name, strtotime($child));
      }
      else {
        $entity->setProperty($name, strval($child));
      }
    }

    $service->put($entity);
  }
}

// Schedule this task to run again in a few minutes
function reschedule()
{
  $queue = QueueFactory::getDefaultQueue();

  $builder =
    java_class(‘com.google.appengine.api.labs.taskqueue.TaskOptions$Builder’);

  $queue->add($builder->url($_SERVER["SCRIPT_NAME"])
                      ->countdownMillis(600000));
}

//
// Main code
//

// Emil’s Twitter feed
$url = "http://twitter.com/statuses/user_timeline/26025897.rss";

$data = fetch_rss($url);
$rss = new SimpleXmlElement($data, LIBXML_NOCDATA);

store_items($rss->channel->item);
reschedule();

?>

If you look at the main code at the end of the script, you see the work flow:

  1. Fetch an RSS feed over HTTP

    The fetch_rss() function implements this. Isn’t PHP great for these types of functions? :-)

  2. Store the RSS items in the Datastore

    The store_items() function uses the PHP-Java interface native in Quercus to create the Java objects like Keys and Entities needed to interact with the Java Datastore API. It’s pretty straightforward — we create a key with the RSS item’s GUID, then store all of the item’s child tags’ data as properties in the entity. At the end, we just store the entity under the key that we created. One thing to note is that we specially convert the publish data (pubDate child tag) to a epoch long. That will make it easier to sort the items when we query them later.

  3. Reschedule this script to run again (in 10 minutes)

    reschedule() implements this using the PHP-Java interface again to get access to the task queue Java API. This API really fits PHP scripts well actually because it is based on calling URLs.

<?php

// Java imports
import com.google.appengine.api.datastore.Query;
import com.google.appengine.api.datastore.DatastoreServiceFactory;

// Load the items from the Datastore
function load_items()
{
  $service = DatastoreServiceFactory::getDatastoreService();

  $query = new Query("item")->addSort("pubDate");
  $prepared = $service->prepare($query);

  return $prepared->asIterable();
}

// Format and display the items
function display_items($items)
{
  echo "<ul>\n";

  foreach ($items as $item) {
    $url  = $item->getProperty("link");
    $title  = $item->getProperty("title");

    echo "<li><a href=\"{$url}\">{$title}</a></li>";
  }

  echo "</ul>\n";
}

//
// Main code
//

$items = load_items();
display_items($items);

?>

The index.php file is the other side of the application - it loads and displays the items that we stored in the rss-task.php script:

  1. Load the items

    The load_items() function prepares a query for all the items, sorted by the “pubDate” property. This is why we converted the dates to epoch time in the rss-task.php. Notice that we return the items as a Java Iterable. We’ll loop over this very naturally in display_items() below.

  2. Display the items

    The display_items() function just loops over the $items passed in (which is again, a Java Iterable, not a PHP array) and displays the data from them in a list.

Conclusion

This application shows how to use some of the App Engine’s facilities that are really convenient for PHP developers: a really nice key-value datastore and a task queue. In a real application or framework, you’d want to abstract a lot of these features further, but you can see the underlying operation here. We’d love to see some framework developers incorporate these features, using the App Engine as an alternative store. We’ve used some extended syntax (specifically the Java import) to make the example cleaner and emphasize the use of Java, but you can access Java objects and classes using completely valid PHP syntax via the java() and java_class() functions which would make the code portable. If any framework developers are interested in trying this out, please let us know!

P.S.
If you need a review of how to set up a Quercus application in the App Engine, check out this earlier blog post.

Tags: google app engine, google datastore, php, quercus

This entry was posted on Friday, February 5th, 2010 at 6:50 pm and is filed under Engineering. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

5 Responses to “Using Google App Engine’s Datastore and Task Queues in PHP with Quercus”

  1. sinojelly Says:
    February 7th, 2010 at 4:28 am

    i’m very interesting in wordpress on gae, can you help me to find wordpress-on-gae-quercus.zip? thank you very much!
    my email: sinojelly@163.com

  2. Emil Says:
    February 8th, 2010 at 11:52 am

    Hi sinojelly,

    The link for the WordPress zip is in this blog post: http://blog.caucho.com/?p=196

    Best,
    Emil

  3. imaffett Says:
    March 1st, 2010 at 6:30 pm

    Hey Emil,

    I’m really digging the Quercus solution on GAE. I’m looking into using jiql, although writing a code generator for the Java classes wouldn’t be hard (I already do CRUD for PHP). Any advice on looking into that, or do you have something else in the works.

    Additionally, I’m new to this and am wondering what commands you run to setup the tables on your local dev server? Is it just executing the create table commands?

    thanks,
    Ian

  4. dd14 Says:
    April 10th, 2010 at 2:05 am

    I’m trying to use Quercus 4.0.4 with GAE to avoid the Proxy.net disabled error, but now I’m getting this error:

    java.lang.ClassNotFoundException: com.caucho.quercus.servlet.QuercusServlet
    at com.google.appengine.runtime.Request.process-0e96e3ad93a74868(Request.java)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at org.mortbay.util.Loader.loadClass(Loader.java:91)
    at org.mortbay.util.Loader.loadClass(Loader.java:71)
    at org.mortbay.jetty.servlet.Holder.doStart(Holder.java:73)
    at org.mortbay.jetty.servlet.ServletHolder.doStart(ServletHolder.java:242)
    at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:685)
    at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
    at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
    at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
    at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
    at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
    at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:191)
    at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:168)
    at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
    at com.google.apphosting.runtime.JavaRuntime.handleRequest(JavaRuntime.java:243)
    at com.google.apphosting.base.RuntimePb$EvaluationRuntime$6.handleBlockingRequest(RuntimePb.java:5485)
    at com.google.apphosting.base.RuntimePb$EvaluationRuntime$6.handleBlockingRequest(RuntimePb.java:5483)
    at com.google.net.rpc.impl.BlockingApplicationHandler.handleRequest(BlockingApplicationHandler.java:24)
    at com.google.net.rpc.impl.RpcUtil.runRpcInApplication(RpcUtil.java:398)
    at com.google.net.rpc.impl.Server$2.run(Server.java:852)
    at com.google.tracing.LocalTraceSpanRunnable.run(LocalTraceSpanRunnable.java:56)
    at com.google.tracing.LocalTraceSpanBuilder.internalContinueSpan(LocalTraceSpanBuilder.java:536)
    at com.google.net.rpc.impl.Server.startRpc(Server.java:807)
    at com.google.net.rpc.impl.Server.processRequest(Server.java:369)
    at com.google.net.rpc.impl.ServerConnection.messageReceived(ServerConnection.java:442)
    at com.google.net.rpc.impl.RpcConnection.parseMessages(RpcConnection.java:319)
    at com.google.net.rpc.impl.RpcConnection.dataReceived(RpcConnection.java:290)
    at com.google.net.async.Connection.handleReadEvent(Connection.java:474)
    at com.google.net.async.EventDispatcher.processNetworkEvents(EventDispatcher.java:831)
    at com.google.net.async.EventDispatcher.internalLoop(EventDispatcher.java:207)
    at com.google.net.async.EventDispatcher.loop(EventDispatcher.java:103)
    at com.google.net.rpc.RpcService.runUntilServerShutdown(RpcService.java:251)
    at com.google.apphosting.runtime.JavaRuntime$RpcRunnable.run(JavaRuntime.java:404)
    at java.lang.Thread.run(Unknown Source)

    I know very little about Java, so I have no idea.
    Any one have any ideas?

  5. yesme Says:
    October 4th, 2010 at 12:31 pm

    Hi Emil,

    Thanks for your posting! I’m planning to try Quercus in my own project to:
    1) gain possible performance boost. (JVM is pretty resource consuming, though…)
    2) make use of the Java lib (protocol buffer), given it doesn’t exist in PHP.

    However, as I’ve seen from this recent post: http://ocportal.com/site/news/view/chris_grahams_blog/hiphop_php_–_some.htm, Quercus project seems to be pretty silence these days, especially after Facebook launched their HipHop project. The forum (http://forum.caucho.com/forumdisplay.php?f=5) is pretty empty, though…

    Could you kindly share the updates on Quercus’s engineering progress and reveal its roadmap? It must be very helpful to the external guys like me who held big interest to this project.

    Many thanks,
    Jacky (jacky.chao.wang#gmail)

Leave a Reply

You must be logged in to post a comment.


Caucho Technology is proudly powered by WordPress and Quercus®
Entries (RSS) and Comments (RSS).

  • HOME |
  • CONTACT US |
  • DOCUMENTATION |
  • BLOG |
  • WIKI 4 |
  • WIKI 3 |
  • Resin: Java Application Server
Copyright (c) 1998-2012 Caucho Technology, Inc. All rights reserved.
caucho® , resin® and quercus® are registered trademarks of Caucho Technology, Inc.
resin® is a cloud optimized, java® application server that supports the java ee webprofile ®