Several factors deeply impact the experience one has in a group or one-one video chat conversation. There exists a vast body of literature analyzing the various human factors impacting interpersonal communication, via a computer, such as affinity, commitment and attention. Human-Computer interaction experts, social scientists and computer researchers have studied the various dimensions which make the subjective experience of people participating in a conversation compelling.
Audio quality is one such critical component.
While “audio quality” itself has several constituent parts such a fidelity (how clear is the person’s voice), and artifacts (do I hear static or echo), and delay (are there large gaps in the continuity of speech), the ones we hear most frequently about is echo and delay. Echo is disruptive to the video-conversation itself. We are all familiar with the feedback loop and howling noise when people get on stage and test a microphone whose volume is way too high. A similar phenomenon can also occurs when the gain (volume) on a microphone device is set way too high and we obtain a feedback loop when another person joins the session.
Echo can be eliminated by using a headset, but here at TokBox, we have been taking a very hard look at audio-quality issues and how to solve them, headset free. We wanted the experience to be as seamless as possible to our users so we recently implemented some pretty neat technology like server-side echo suppression and server based voice activity detection (which automatically detects who the current active speaker in a conversation is).
Our echo-suppressor at the server is basically like an automated walkie-talkie mode, only one person can be heard at anytime (half-duplex). This cuts the feedback loop and suppresses echo. We even built our own little anechoic chamber to test our system. While this works fine under controlled conditions, several factors like the acoustic characteristics of the room, nature of the audio devices used, and number of people in the call can impact effectiveness. Several interesting research projects also point to reduced engagement in a half-duplex mode.
Acoustic echo cancellation (AEC) on the client is the best way to solve this problem, since this provides the most optimal way to eliminate echo while maintaining a continuous conversation on both sides (full-duplex). Adobe has been actively listening to its community and recently, in Flash 10.3 introduced AEC at the client. The Enhanced Audio API provides a bunch of support for echo-cancellation, noise-suppression and voice-activity detection. We have built out the OpenTok API to seamlessly use echo-cancellation under the covers if we find Flash Player 10.3 and supported hardware. This means all OpenTok applications out there will now leverage the underlying support to provide for a echo-free experience. Go ahead, give it a shot and let us know.