Using Google App Engine’s Datastore and Task Queues in PHP with Quercus
I gave a talk Wednesday at the Silicon Valley Google Technology Users’ Group on using Quercus in the App Engine. One of the examples I gave was using the low-level data API from PHP and scheduling PHP “tasks” using Task Queues. I’ll walk through the source of that demo here to give you an idea of how Quercus makes it easy to mesh a Java platform with PHP code. At the end, I’ll also give you an idea of what the next steps would be to take this demo and use the techniques in a real application or framework.
The demo application grabs items from an RSS feed on a regular basis, stores them persistently, and has a page that displays the stored items. To implement this functionality, we have two PHP scripts:
- index.php which displays the stored items
- rss-task.php which fetches and stores the items
Behind the scenes, both scripts are actually using Java objects and methods to use the App Engine’s Datastore and task queues. The rss-task.php is actually a worker script that is run entirely as a background “process” in the App Engine task queues. The picture below shows an overview of the architecture:

Let’s look at rss-task.php first:
// Java imports
import com.google.appengine.api.datastore.Entity;
import com.google.appengine.api.datastore.DatastoreServiceFactory;
import com.google.appengine.api.datastore.KeyFactory;
import com.google.appengine.api.labs.taskqueue.QueueFactory;
// Grab the raw RSS data from a URL
function fetch_rss($url)
{
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
// Transform the XML RSS items into Datastore entities and store them
function store_items($items)
{
$service = DatastoreServiceFactory::getDatastoreService();
foreach ($items as $item) {
$key = KeyFactory::createKey("item", $item->guid);
$entity = new Entity($key);
foreach ($item->children() as $child) {
$name = $child->getName();
if ($name == "pubDate") {
$entity->setProperty($name, strtotime($child));
}
else {
$entity->setProperty($name, strval($child));
}
}
$service->put($entity);
}
}
// Schedule this task to run again in a few minutes
function reschedule()
{
$queue = QueueFactory::getDefaultQueue();
$builder =
java_class(‘com.google.appengine.api.labs.taskqueue.TaskOptions$Builder’);
$queue->add($builder->url($_SERVER["SCRIPT_NAME"])
->countdownMillis(600000));
}
//
// Main code
//
// Emil’s Twitter feed
$url = "http://twitter.com/statuses/user_timeline/26025897.rss";
$data = fetch_rss($url);
$rss = new SimpleXmlElement($data, LIBXML_NOCDATA);
store_items($rss->channel->item);
reschedule();
?>
If you look at the main code at the end of the script, you see the work flow:
- Fetch an RSS feed over HTTP
The fetch_rss() function implements this. Isn’t PHP great for these types of functions?
- Store the RSS items in the Datastore
The store_items() function uses the PHP-Java interface native in Quercus to create the Java objects like Keys and Entities needed to interact with the Java Datastore API. It’s pretty straightforward — we create a key with the RSS item’s GUID, then store all of the item’s child tags’ data as properties in the entity. At the end, we just store the entity under the key that we created. One thing to note is that we specially convert the publish data (pubDate child tag) to a epoch long. That will make it easier to sort the items when we query them later.
- Reschedule this script to run again (in 10 minutes)
reschedule() implements this using the PHP-Java interface again to get access to the task queue Java API. This API really fits PHP scripts well actually because it is based on calling URLs.
// Java imports
import com.google.appengine.api.datastore.Query;
import com.google.appengine.api.datastore.DatastoreServiceFactory;
// Load the items from the Datastore
function load_items()
{
$service = DatastoreServiceFactory::getDatastoreService();
$query = new Query("item")->addSort("pubDate");
$prepared = $service->prepare($query);
return $prepared->asIterable();
}
// Format and display the items
function display_items($items)
{
echo "<ul>\n";
foreach ($items as $item) {
$url = $item->getProperty("link");
$title = $item->getProperty("title");
echo "<li><a href=\"{$url}\">{$title}</a></li>";
}
echo "</ul>\n";
}
//
// Main code
//
$items = load_items();
display_items($items);
?>
The index.php file is the other side of the application - it loads and displays the items that we stored in the rss-task.php script:
- Load the items
The load_items() function prepares a query for all the items, sorted by the “pubDate” property. This is why we converted the dates to epoch time in the rss-task.php. Notice that we return the items as a Java Iterable. We’ll loop over this very naturally in display_items() below.
- Display the items
The display_items() function just loops over the $items passed in (which is again, a Java Iterable, not a PHP array) and displays the data from them in a list.
Conclusion
This application shows how to use some of the App Engine’s facilities that are really convenient for PHP developers: a really nice key-value datastore and a task queue. In a real application or framework, you’d want to abstract a lot of these features further, but you can see the underlying operation here. We’d love to see some framework developers incorporate these features, using the App Engine as an alternative store. We’ve used some extended syntax (specifically the Java import) to make the example cleaner and emphasize the use of Java, but you can access Java objects and classes using completely valid PHP syntax via the java() and java_class() functions which would make the code portable. If any framework developers are interested in trying this out, please let us know!
P.S.
If you need a review of how to set up a Quercus application in the App Engine, check out this earlier blog post.
Tags: google app engine, google datastore, php, quercus

February 7th, 2010 at 4:28 am
i’m very interesting in wordpress on gae, can you help me to find wordpress-on-gae-quercus.zip? thank you very much!
my email: sinojelly@163.com
February 8th, 2010 at 11:52 am
Hi sinojelly,
The link for the WordPress zip is in this blog post: http://blog.caucho.com/?p=196
Best,
Emil
March 1st, 2010 at 6:30 pm
Hey Emil,
I’m really digging the Quercus solution on GAE. I’m looking into using jiql, although writing a code generator for the Java classes wouldn’t be hard (I already do CRUD for PHP). Any advice on looking into that, or do you have something else in the works.
Additionally, I’m new to this and am wondering what commands you run to setup the tables on your local dev server? Is it just executing the create table commands?
thanks,
Ian
April 10th, 2010 at 2:05 am
I’m trying to use Quercus 4.0.4 with GAE to avoid the Proxy.net disabled error, but now I’m getting this error:
java.lang.ClassNotFoundException: com.caucho.quercus.servlet.QuercusServlet
at com.google.appengine.runtime.Request.process-0e96e3ad93a74868(Request.java)
at java.lang.ClassLoader.loadClass(Unknown Source)
at org.mortbay.util.Loader.loadClass(Loader.java:91)
at org.mortbay.util.Loader.loadClass(Loader.java:71)
at org.mortbay.jetty.servlet.Holder.doStart(Holder.java:73)
at org.mortbay.jetty.servlet.ServletHolder.doStart(ServletHolder.java:242)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:685)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140)
at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1250)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:517)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:467)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.createHandler(AppVersionHandlerMap.java:191)
at com.google.apphosting.runtime.jetty.AppVersionHandlerMap.getHandler(AppVersionHandlerMap.java:168)
at com.google.apphosting.runtime.jetty.JettyServletEngineAdapter.serviceRequest(JettyServletEngineAdapter.java:123)
at com.google.apphosting.runtime.JavaRuntime.handleRequest(JavaRuntime.java:243)
at com.google.apphosting.base.RuntimePb$EvaluationRuntime$6.handleBlockingRequest(RuntimePb.java:5485)
at com.google.apphosting.base.RuntimePb$EvaluationRuntime$6.handleBlockingRequest(RuntimePb.java:5483)
at com.google.net.rpc.impl.BlockingApplicationHandler.handleRequest(BlockingApplicationHandler.java:24)
at com.google.net.rpc.impl.RpcUtil.runRpcInApplication(RpcUtil.java:398)
at com.google.net.rpc.impl.Server$2.run(Server.java:852)
at com.google.tracing.LocalTraceSpanRunnable.run(LocalTraceSpanRunnable.java:56)
at com.google.tracing.LocalTraceSpanBuilder.internalContinueSpan(LocalTraceSpanBuilder.java:536)
at com.google.net.rpc.impl.Server.startRpc(Server.java:807)
at com.google.net.rpc.impl.Server.processRequest(Server.java:369)
at com.google.net.rpc.impl.ServerConnection.messageReceived(ServerConnection.java:442)
at com.google.net.rpc.impl.RpcConnection.parseMessages(RpcConnection.java:319)
at com.google.net.rpc.impl.RpcConnection.dataReceived(RpcConnection.java:290)
at com.google.net.async.Connection.handleReadEvent(Connection.java:474)
at com.google.net.async.EventDispatcher.processNetworkEvents(EventDispatcher.java:831)
at com.google.net.async.EventDispatcher.internalLoop(EventDispatcher.java:207)
at com.google.net.async.EventDispatcher.loop(EventDispatcher.java:103)
at com.google.net.rpc.RpcService.runUntilServerShutdown(RpcService.java:251)
at com.google.apphosting.runtime.JavaRuntime$RpcRunnable.run(JavaRuntime.java:404)
at java.lang.Thread.run(Unknown Source)
I know very little about Java, so I have no idea.
Any one have any ideas?
October 4th, 2010 at 12:31 pm
Hi Emil,
Thanks for your posting! I’m planning to try Quercus in my own project to:
1) gain possible performance boost. (JVM is pretty resource consuming, though…)
2) make use of the Java lib (protocol buffer), given it doesn’t exist in PHP.
However, as I’ve seen from this recent post: http://ocportal.com/site/news/view/chris_grahams_blog/hiphop_php_–_some.htm, Quercus project seems to be pretty silence these days, especially after Facebook launched their HipHop project. The forum (http://forum.caucho.com/forumdisplay.php?f=5) is pretty empty, though…
Could you kindly share the updates on Quercus’s engineering progress and reveal its roadmap? It must be very helpful to the external guys like me who held big interest to this project.
Many thanks,
Jacky (jacky.chao.wang#gmail)