I’ve put together draft-ferg-hybi-websockets-latest as a proposal for the HyBi baseline draft, as a counterproposal to the hixie draft.
Posts Tagged ‘websockets’
The WebSockets protocol needs the concept of a sub-protocol to make sure the client and server are sending messages they both understand. A quake client, for example, can only talk to another quake client, not a chat client, and a quake/3.1 client might not be able to talk to a quake/5.3 client. To make sure the clients and servers are taking the same protocol, WebSockets introduces a sub-protocol validation.
Although “sub-protocol” might sound somewhat complicated, it’s just recognition that applications will define simple protocols on top of WebSockets like they define XML formats and schema using the XML syntax, or JSON applications define objects to pass back and forth. Like XML and JSON, WebSockets is a layer that applications build on.
Some examples that WebSockets applications will create are JSON packets over WebSockets, XML over WebSockets, XMPP over WebSockets, and Hessian packets over WebSockets, as well as custom protocols like Quake or a tic-tac-toe game.
The client and server will validate the protocol to make sure a Quake/2.0 client won’t get confused talking to a Quake/1.0 server. At the beginning of WebSockets, the client HTTP handshake sends a Sec-WebSocket-Protocol header with the sub-protocol name like quake.idsoftware.com/1.0. If the server understands that version, it will respond with a Sec-WebSocket-Protocol of quake.idsoftware.com/1.0. If not, it will close the connection.
Although the protocol string is arbitrary, it’s a good idea to use unique names like “quake.idsoftware.com” with a version “/1.0″.
I’ve also seen the term “real-time web” (RTW), which I like as well, but here I want to dive a bit into the underlying bytes that go back and forth to make the real-time web happen.
The binary “message” at the heart of the real-time web is a sequence of bytes controlled by the application: JMS-style messages, XMPP (Jabber) frames, a JSON object, serialized Java in Hessian, the packets for a Quake game, stock ticker updates, iPhone app messages, toll booth status control panel, on-demand music streaming, auto-manufacturing overview consoles.
Because the messages can vary in length from tiny, fast Quake messages where response time is critical, to larger packets like the music and video streaming, the underlying protocol must handle that range, but still be memory-efficient. It would be absurd to force a server to buffer an entire video before sending it, or even fully serialize an XML message just to find the length.
So a sane protocol needs binary-length encoded chunks (called “frames”) combined into messages. “Messages” are understood by the application, “frames” are invisible to the application but are used by clients, servers, and intermediaries to manage the messages.
Bringing those requirements together, the minimal protocol looks like the following:
stream ::= message*
message ::= (non-final-frame)* final-frame
final-frame ::= final-flag length <bytes>
non-final-frame ::= non-final-flag length <bytes>
At the moment, that’s a bit abstract since I haven’t defined the encodings for the length or the final-flag or allowed any kind of control messages, but it’s the heart of the protocol.
The key is sending chunks of binary data, where servers can use their own fixed buffering (like 8k buffers) to send arbitrary-length binary data.
Any text data is easily encoded as UTF-8 over the binary payload.