For the Resin 4.0 remote deployment, I wanted some key features: incremental updates, transactional updates, reliable replication, manageability, and solid performance. It’s taken several months of toying with alternate architectures until the current design came together. At the core of the deployment architecture is the .git source code repository.
- atomic updates
- consistent file store - no broken updates
- independent - multiple writers and readers avoid seeing partial updates
- durable store to the filesystem (and replicated)
- incremental updates only of changed files
The key insight in the .git design is using a hash to identify versioned files. The hash design solves the transactional problem by treating files as immortal, immutable, written-in-stone documents.
Hashing as an Immutable Store
In a Jorge Borges story, The Library of Babel, the library contains every book possible with a specific length: 410-pages ~= 64 ^ (410 * 1024) ~= 2 ^ 4000000. Because the Library is an immutable store of every book, there’s no versioning or updates or editions, just different books. (Finding the right book is a bit tricky because it’s a little beyond the capabilities of the Dewey decimal system.) In other words, it’s the perfect model for a versioning repository.
Our task is easier because the number of our files is much smaller - every version of every file in every .war. In other words, less than 1G total files, which is trivial compared to the Library. Like the Library, the key to the repository is the indexing system. Fortunately, we can use cryptographic hashing to solve our problem.
Our Library index is the SHA-1 hash of each file and directory (160-bits = 20 bytes). So the hash contents of today’s version of InjectManager.java (version=5149) - H(InjectManager.java,5149) - is 68d9ec24fc479c7f565455fb1a353f0d3c23ae0d. If I change InjectManager.java, H(InjectManager,5150) is 8236d2f4f6b154067b59ccfb19181d9e866881b7. Using the hash, Resin can identify a versioned instance of a file uniquely, and use the 20-byte hash as a shorthand for the file.
Resin stores each file of an updated .war file by the 20-byte sha1 hash in the .git repository. The filesystem name is stored in a directory, not the file itself, and the directory stores the hash of the file. In other words, the 160-bit hash is a filesystem pointer to the file, just like a Java pointer is a 32-bit or 64-bit pointer to memory. Directories are then maps from the names to the 160-bit hash, and the directories themselves have a hash H(com/caucho/server/inject,5149) which can be stored in the repository and linked by other directories, just like an operating system directory structure.
Because the directories form a tree, eventually Resin produces a single 160-bit hash code that represents an entire versioned filesystem, i.e the hash code for the war, and Resin can now use the .war’s hash code to identify the whole war in 20-bytes.
Example: .war versioning
I’ll make these concepts concrete with an example: I’m uploading a test.war, version 1, to Resin and then uploading a test.war version 2. The difference between the versions is a change to a test.jsp file. The contents of my .war file might look like the following:
META-INF/MANIFEST.MF WEB-INF/web.xml test.jsp
Resin needs to convert the .war into .git files, which means compressing the individual files, creating directories, and hashing everything. When it’s done, my upload has turned into 6 files, 3 blobs containing the file contents, and 3 trees containing the directory. The hash of the root directory (3213..5f3c) identifies the versioned .war file:
test.war version 1 .git repository
58e3..003d = BLOB test.jsp f823..acb8 = BLOB web.xml 9943..4ccd = BLOB MANIFEST.MF 8c10..ad3f = TREE META-INF 2177..57b3 = TREE WEB-INF 3213..5f3c = TREE root
When I change test.jsp, two files change: the blob contents for test.jsp change, and the root tree also changes, because the root tree holds the hash code for test.jsp. When uploading the .war, Resin only needs to send the two changes, not the entire .war file. With the changes, the hash identity of the test.war changes to a0d4..6cc2.
test.war version 2 .git repository
eca3..a9b3 = BLOB test.jsp (**) f823..acb8 = BLOB web.xml 9943..4ccd = BLOB MANIFEST.MF 8c10..ad3f = TREE META-INF 2177..57b3 = TREE WEB-INF a0d4..6cc2 = TREE root (**)
The beauty of the system is that I can upload all 8 files: the original 6 plus the 2 changed files, and store them in the repository at the same time. As long as Resin uses the old .war root hash, the new updates are invisible to the rest of the system. The upload can take as long or as short as necessary and duplicate uploads just send the same file twice, changing nothing. If the upload disconnects, the file is safe because its hash self-validates the file. At validation time, Resin can check the sha-1 of the actual content to the sha-1 name and if they differ, Resin can throw the broken file away and ask a Triad server for a clean copy.
Using the .git repository as write-only store solves my transactional requirement as well, as I’ll detail below. Because transactions worry about changes, and Resin eliminates most changes with the write-only store, the transactional requirements become simple. I’ll go through the ACID requirements.
The atomic updates matter at two levels: updating an individual file and updating a .war version. For the individual file, Resin can create a temporary file during upload and use rename() as an atomic operation on Unix. On Windows, we don’t have the exact atomicity, and need to rely on the self-validation of the file. Still, rename is faster than an upload, and the atomic requirements are less strict because the file isn’t used until it’s update is complete.
Updating a .war version means replacing the 20-byte hash of the old version with the 20-byte hash of the new version, a much smaller task than replacing the whole .war file. The replacement only occurs after the new version is uploaded and completely validated. The actual war update uses Resin 4.0’s new distributed cache for its actual changes, so part of the version update is deferred to another system. The .git repository simplifies the .war update because it replaces a .war copy with a 20-byte update, a simpler problem.
The consistency of the .git repository is maintained by the self-validation of the blobs and trees with their sha-1 hash, and by the write-only storage. It’s always okay to write an extra blob, even a useless or out-of-date file, because a bogus file will never be linked by its sha-1 hash. In other words, Resin never need to rollback, and can send obsolete files to the store and leave obsolete files in the store. As long as they’re not linked, they only take up space and disk space is cheap.
The write-only, self-identifying nature of the files simplifies the isolation problem because uploading a blob or tree with the same hash produces the same bytes and name. Duplicate updates are safe. Early updates are safe. A reader or second writer doesn’t even see the changes until the final, root hash is changed. Until that hash changes, each writer is just writing random files to the disk, affecting no one else. Even a simultaneous update of the same file is safe because all the bytes are identical.
In the .git repository, the child blobs and directories are saved to disk and validated before the parent trees update, and the entire .war is saved and validated before the hash code for the root changes. In a cluster, the entire .war is replicated to the three Triad servers before the final version switch.
The current Resin 4.0 snapshot implements the .git repository as the underlying deployment store for both single-server and clustered deployments. When you use Emil’s Eclipse plugin or ant/maven tasks, the .war updates deploy to the internal .git repository following the architecture I’ve described. So you automatically gain the benefits of the reliable, transactional store.
Because the deployment repository is an actual .git repository, you can explore it in the resin-data/.git repository. You will need to use .git tools because the SHA-1 names make the repository impenetrable to normal filesystem browsing.