Real-Time Web Frame Encoding
For the real-time web, the binary frames need to encode four pieces of data: the frame length, final/non-final frame, control vs payload, and the binary payload itself.
As a review, the entire framing protocol looks like the following:
stream ::= message*
message ::= (non-final-frame)* final-frame
When we split frame into control and data frames, the full protocol looks like:
stream ::= message* message ::= data-message message ::= control-message data-message ::= (non-final-data-frame)* final-data-frame control-message ::= (non-final-control-frame)* final-control-frame
The control/data interleaving needs to be revisited. It might be that control frames must always be a single frame, but they can appear anywhere. So an alternative would be:
stream ::= message* message ::= (non-final-data-frame control-frame*)* final-data-frame message ::= control-frame
For the length, there are two basic strategies: fixed length and variable length. The RTW has a wide range of message lengths: from large video chunks, which could be something like a meg, to small, twitch commands for a fast-paced game like Quake, which would have 32-byte or 64-byte messages. So a variable length is a better choice, if slightly more complicated.
A clean variable-length encoding is used by UTF-8, one of my favorite small protocols. The initial bits from the first byte tells the length of the encoded character. So lets start with that. If we use 4 bits, our range is 0-15 bytes, which is more than enough, since we’d probably want to also restrict the max length to 32 or 31 bits.
Encoding
Our encoding, then looks like the following:
frame ::= control-byte length{0,15} binary-data
The control-byte looks like the following:
+-----+------+---+---+---+---+---+---+ | ctl | more | reser | meta-length | +-----+------+---+---+---+---+---+---+
ctl identifies the frame as a control frame or a data frame. If it’s 1, it’s a control frame; if it’s 0, it’s a data frame.
more is the continuation bit. If it’s 1, the frame is a non-final frame; if it’s 0, it’s a final frame.
reser are two unused and reserved bits.
meta-length is the number of bytes in the frame length, allowing for 0-15 length bytes.
Examples
Single, 64-byte data packet that an interactive game like Quake would use:
x01 x20 <64 bytes of data>
Large 256k video final chunk using sendfile/splice for fast kernel level data:
x03 x03 x00 x00 <256k bytes of data>
Non-final chunk for Java serialized data which exceeds a 8k server buffer:
x42 x20 x00 <8k bytes of partial serialization>
Control “NOP” message for heartbeat/keepalive:
x81 x01 x00
Control “PING-REQUEST” for keepalive
x81 x01 x01
Control “PING-RESPONSE” for keepalive
x81 x01 x02
Control “CLOSE” to gracefully close the connection:
x81 x01 x7f
Control “NEGOTIATE” for parameter negotiation:
x81 x20 x05 "Keepalive-Period: 60"
