The OpenTok Platform Collaborative Editor

Screen Shot 2014-09-16 at 6.01.54 AMWe always want to share as much as possible with our community so today we’re sharing a description of how we developed the opentok-editor collaborative editor using ot.js and CodeMirror. You can see the editor in action at and you can see how to use it for yourself at the opentok-editor github page. We love to see people using our open source projects so please feel free to file issues and contribute pull-requests to this project on Github.


opentok-editor uses CodeMirror which is a really nice open source editor written for the browser in JavaScript. It’s intended for editing code and has a bunch of language modes and different editor bindings (eg. sublime, emacs, vi, etc.).  ot.js already had a CodeMirrorAdapter implemented to use CodeMirror with ot.js. However, if you wanted to use a different editor it would just be a matter of writing your own adapter.

Operational Transformation

opentok-editor uses ot.js which implements an algorithm called Operational Transformation. This is the same technology used in EtherpadGoogle Wave and Google Docs to enable collaborative editing of documents. ot.js is actually the same library used by Firebase to build their collaborative editor, Firepad.

The way operational transformation works is actually quite straightforward: it basically  breaks down edits to a document into a series of operations. In a simple text document we assume that it is just a sequence of characters with a 0 based index, a String. The only operations you can perform are either insertions or deletions. ot.js represents each operation as an array with the first element being the index and the second element either being the characters inserted, if it’s an insert, or the number of characters to delete if it’s a deletion. For example:

  • Start with the String “I do not like eggs and ham”.

  • Participant Joe inserts ” green” at position 13 ([13, ” green”]) which gives him “I do not like green eggs and ham”.

  • At the same time participant Sam deletes 7 characters at position 8 ([8, -7]) which gives him “I like eggs and ham”.

Subsequently, each of these operations is sent to each of the other participants along with a revision number. When a participant receives the operation they transform it with the other operations they have made depending on the revision number. So for example:

  • Participant Sam has the string “I like eggs and ham”:

    • he receives Joe’s operation, [13, ” green”]

    • he transforms this with his change [8, -7] and adjusts the insert index, taking away the 7 characters he deleted. Which gives him [6, ” green”].

    • performing the operation [6, ” green”] on his string, “I like eggs and ham”, gives him “I like green eggs and ham”.

  • Participant Joe has the string “I do not like green eggs and ham”:

    • he receives Sam’s operation, [8, -7]

    • he transforms this with his change, [13, ” green”] and doesn’t need to adjust anything because his change happened after this one, so the index is correct.

    • performing [8, -7] on his string, “I do not like green eggs and ham”, gives him, “I like green eggs and ham”.

Both participants now have the same result, “I like green eggs and ham”. The great thing about operational transforms is that it doesn’t matter what the latency of the connection is nor how many operations there are, everyone should come to the same result once they are at the same revision number.

collaborative editor image

Making Operational Transformation Peer-to-peer using OpenTok Signals

The great thing about ot.js is that they handle all of the operational transformations for us. ot.js already has Adapters for and for using Ajax to send messages between clients, all we had to do was to create a new OpenTokAdapter that sends the operations using OpenTok signals.

One caveat of using OpenTok signals to send these messages is that there is no server-side authority of what the state of the document is when you first load the page. To get around this, we rely on our peers in the room to tell us what the state is. When we first join the session and load the opentok-editor we request the current state of the document by sending out an opentok-editor-request-doc signal to everyone in the room. After waiting 10 seconds for someone to respond with an opentok-editor-doc signal that contains the whole contents of the document along with the revision number. If no one responds then we assume that the document is empty and we initialise the document to be at revision 0 with no contents. We then send out an opentok-editor-doc signal to anyone else in the room that might be waiting for the document to be initialised.

The other adapters for ot.js (SocketIOAdapter and AjaxAdapter) assume there is a central authority, a server, that is handling all of the transformations. The clients just receive one operation at a time and they are all in order because they have already been transformed by the server. This is not the case for our application because we will be receiving operations from all different participants at the same time. In this way every peer-to-peer client needs to act like a server, taking the revision numbers into account and transforming the operations from different participants. It turned out doing this was quite easy, we just needed to include the ot.Server class from ot.js and call the receiveOperation method for every operation.

This means that we also need to send a history of operations to each new participant. This way if they receive the history from one participant and then another participant sends them an operation that needs to be transformed against a previous operation then the new participant can handle that. But the new participant doesn’t need the full history of all of the operations that have ever happened, they just need to know the last few.  To be safe, we are including the last 50 operations. It’s unlikely that we will receive an operation from someone that hasn’t processed the last 50 operations that someone else has.

The other caveat of using OpenTok signals is that the content of the document is not persisted. If everyone in the room leaves then no one is maintaining the state of the document, which can be annoying. This is actually the same way that the opentok-whiteboard works. Given the open and transient nature of this actually works ok for this use-case. Ideally I think the data would stick around for a little while before clearing out. This way if you refresh it will still be there but if someone picks the same room name next week they won’t see the document you were writing. Perhaps this will have to be a future enhancement once we have server-side signaling.