Cameras, Microphones and the Dawn of AI

Cameras, microphones, AI and smartphones

Some 40,000 years ago, our distant ancestors were recording vivid images of the world they saw around them onto the walls of caves on the Indonesian island of Sulawesi and on the walls of El Castillo cave in northern Spain.

These paintings are believed to be the earliest works of art made by humans and they point to a long history of our need to record and communicate information in a visual manner – and of our ability to process and take meaning from these visual cues. Our brain has evolved to process these cues quickly and efficiently so that we can react and survive.

Of course, how our brain interprets the world is influenced by our past experiences, the society around us and our reactions to risks and danger. We learn continuously as we pass information from our sensory inputs to be computed and analysed by the brain.

As humans, our understanding of the world would be limited if it were not for the input received via our eyes and ears. Much the same is true of computer: as we move into a new era of advanced computing and AI with exciting developments in the work they can do, cameras, microphones and other inputs such as sensors will be play an evermore important role in how we interact with devices.

Powering future applications – Deep Learning & AI

Before we look at some of the applications that are made possible by leveraging these inputs, it’s necessary to consider what kind of computing is going on behind the “eyes and ears” of our devices.  

What Deep Learning and AI have in common is that they draw on experience to constantly improve and become more accurate. As humans, we have only one set of eyes and ears each – by combining the inputs and experiences of thousands if not millions of devices, these computing techniques allow for patterns to be spotted which would be impossible for the untrained human eye. Advances in deep learning and ML in areas like signal processing and computer vision have opened up a whole host of capabilities that were in the realm of science fiction a decade ago. In certain domains, AI algorithms beat human level performance for the first time.

Imagenet Image recognition

For the first time algorithms are becoming more accurate than humans in image recognition

A further important development in the progress of the applications of the future is the establishment of frameworks such as ARKit and ARCore. Libraries in smartphones and other devices are opening up and making functionality readily available for developers to start creating augmented reality applications for major platforms.

The input devices for these kinds of computing can be cameras, mics, other sensors or a combination of all of these.  So how can you leverage these inputs and combine them with new technologies such as AI and AR to create smart applications which allow us to interact with devices in a new way?

Hyper-Personalization – smartphones, tablets & beyond

There are a huge number of devices around us on a day-to-day basis which are already equipped with high quality inputs. Smartphones, tablets and laptops come equipped with camera and mic as standard, as well as motion sensors and accelerometers in the case of phones.

The business world is seeing more and more integrations with artificial intelligence which will change the way that we interact online with services, and cause a shift in some of the cognitive load from human to machine. Companies such as Jargon.ai are already using machine learning to allow professionals to carry out more productive meetings. Video is analysed to highlight key takeaways, providing valuable analysis of participants’ reactions, and full meeting transcripts at the end of the meeting.

One increasingly common use of the camera is for authentication purposes. Both Apple with the recently announced iPhone X and Samsung with the S8 have face detection and biometric authentications as first class citizens of their platform. This feature can be used not only to access the phone, but also to securely access services such as online banking or when making transactions. Just as an example, the FaceID technology in Apple iPhone X has a false positive rate of 1 in 1,000,000 (as compared to 1 in 50,000 for Touch ID).

AI Facial recognition

As manufacturing improves and the price points for new sensors comes down, we’ll begin to see new opportunities for interaction models with the smartphones in our pocket. One such example already in existence where companies like UnifyID makes use of sensors in the phone to carry out gait analysis. This means that you can fluidly and seamlessly prove your identity by the way you walk amongst other things. Another example is the SCiO near-infrared spectrometer available in the ChangHong H2 smartphone that can sense the chemical makeup of materials. This enables a whole host of interesting applications to be built on top of it both in the personal and professional domains.

While the sensors and inputs in our smartphones may be the entry point for our interaction in this way, increasingly they will be available in other devices, as we see the trend towards the deconstruction of the smartphone. All of the available sensor data coupled with smart algorithms are paving the way to hyper-personalization – we are entering the era where even application UX can adapt to each user in extremely personalized ways and even learning patterns over time of how each person interacts with a service. 

The world as seen from outside the smartphone

Have you ever considered what it will be like to live in a world where all of our devices are connected to the internet? Not just your smartphone, but your oven. This is already happening. The smartphone is slowly being ‘complemented by an ecosystem of devices’ or ‘deconstructed into multiple other devices’, depending on how you look at it, and the same is true for those all important inputs and sensors.

Apple watch wearables

Wearables like watches now have cellular connectivity and are at the forefront of health and fitness tracking technology which lets your device learn about you constantly. Of course, having cellular connectivity is principally about making calls, and that requires a microphone as standard. What kind of applications will we see leveraging the audio input from wearable devices in the future?

This shift moves beyond just wearables – think about the emergence of the “Smart Home”. 

Nest doorbell camera AI

Your Nest camera or video doorbell can live stream a view of your delivery man dropping a package off to your smartphone, and you can even use voice commands to get your fridge to bring your La Croix to you. Voice assistants like Amazon’s Alexa can set reminders, enable you to order your favorite breakfast cereal and much more. As a result, there is a large and growing application ecosystem of voice apps now available.

And beyond the home, the Smart City waits for us. As computer vision and machine learning progresses at a blistering pace, facial recognition technology has advanced to the point where it can be used to take trains, and a recent study showed how recognition technology could even identify individuals when their face was obscured.

Perhaps the best example of just how far these interactions have come is the current pilot of ‘smile to pay’ by China’s Alipay in KFC. The concept is simple: order your food, and smile to pay for it. A 3D camera will scan and recognize your face and seamlessly authorize payment. The simplicity of it belies just what an awesome integration this is: customers using the most natural and human gesture, a smile, to interact with a machine in such an everyday environment creates a magical and intuitive experience but it is one which will surely become the norm.

Smile to pay Facial recognition Cameras and AI

The myriad of devices that are slowly creeping into the homes and cities of the future fundamentally interact with us through audio-visual cues – but this interaction is still in its infancy. Richer integrations into applications and services are going to make technology simply fade into the background – almost invisible but ever present and always available to assist when needed.

Another reality – using Augmented reality

If the previous use cases allow us to interact with the existing world in a new way, augmented reality let’s us see a whole new world. There are now companies using the mic and cameras to understand the world and feed back a new vision in real-time.

For many of the youngest generation, their first experience of AR will come directly from the smartphone by using apps such as Snapchat which has just announced that 3D bitmoji are now available to layer onto real-world views. At the same time, companies like Ikea are using augmented reality to help customers place 3-D overlays of furniture and other products onto a room. These applications even consider lighting elements in the room to ensure shoppers can quickly and realistically design their interior space.

In the workplace, intelligent assistance technologies help lift efficiency, raise productivity and improve safety in the workplace, especially in field services. No longer does a technician venture out alone to assess and repair equipment. With technology such as the Daqri smart helmet, technicians are able to use the information from the world around them to enhance their understanding of what they are looking at, and are no longer alone in the field.

Smart glasses AI and cameras

Contextual Communications

Originally we had the phone, which was exclusively about communicating. Then along came the smartphone and the phone suddenly combined browsing with comms. As we moved into an era of apps, the uses of our smartphones became increasingly diverse, from booking travel to managing finance. Now, with the move to embedding communications into applications and devices, the role of the smartphone as the sole point of interaction has changed.

It’s now possible to make and receive calls from a voice assistant without ever laying hands on the phone, and smart watches are also getting cell service. WebRTC and third party platforms like OpenTok allow communications to be embedded into any application, across devices, and opens up the possibility for new integrations allowing us to interact with our devices in a natural, human way.  Communications is making the leap to being both universal and contextual.

Conclusions

Advances in cameras and microphones coupled with low-latency communications and edge computing are at the vanguard of the new computing revolution. Communications in this new world is going to become ever pervasive. While there are still social and ethical issues that need to be worked out, on the whole this trend has the potential to have massive positive impact on society and challenging the status quo.