The potential of augmented reality glasses has been a topic of interest on this site, but these devices have not become commonplace for everyday use. They often seem bulky or offer limited functionality. However, AI integration could redefine their purpose beyond simple smartphone screen emulation. Imagine them focusing on capturing images and sounds, then delivering information via audio output. Such a design would require glasses almost identical to traditional models but equipped with a camera and a microphone, capable of transmitting sound through the frames, all driven by an artificial intelligence engine. Meta and other manufacturers are continuing this innovative approach, leading to a new generation of smart glasses.
Meta takes the plunge into AI-enabled glasses
When its Ray-Ban Meta Smart glasses launched in late 2023, Meta was met with a lukewarm response from the market. The concept of glasses equipped with a camera to capture photos and videos was seen more as a novelty for influencers rather than a must-have gadget, not to mention the privacy concerns it raised. Yet a crucial change occurred in December of that year. Meta, the conglomerate behind Facebook, has revealed plans to incorporate multimodal artificial intelligence capabilities, mirroring Google's efforts with its Gemini AI. This advancement meant that the glasses would transcend basic responses to commands and media capture; they were ready to start analyzing images and unlocking previously unimaginable features.
What is multimodal artificial intelligence?
Multimodal artificial intelligence marks a revolutionary advancement in the field of AI systems by integrating and processing a variety of data types including text, images, audio and video. This approach allows AI to understand and interact with the world in more complex and ambitious ways. Unlike unimodal systems focused on a single type of data, multimodal AI can interpret complex information from multiple sources simultaneously. This multifaceted understanding enables tasks to be executed with an unprecedented level of precision and insight.
For smart glasses, this development means that entering text or voice commands is no longer necessary. Instead, these devices can analyze the scene in front of the user and provide insights based on the visual data they collect. The scope of this technology is limitless, opening new doors on how we interact with and understand our environment.
What does the new generation of glasses allow you to do?
Glasses from Meta and other companies like Brilliant Labs or Envision typically require a connection to a smartphone, which handles heavy computing tasks. Currently, the models available on the market are limited to the analysis of photographs. Once multimodal AI processes the image, it enables features such as:
- Provide recipe suggestions based on ingredients available in the refrigerator.
- Detail the nutritional values of a food.
- Indicate the store where an item of clothing or item can be purchased.
- Diagnose a domestic dysfunction and suggest possible solutions.
- Identify plants or animals.
- Read and translate texts.
- Translate for speakers of other languages.
Several applications, such as recipe generation, are already available, while others, some of which are still to be developed, will gradually become a reality. For example, the Brilliant Labs glasses model has a micro-OLED display that allows for augmented reality applications, such as viewing a sofa in a different color. That said, there is one area where these types of devices could potentially be life-changing.
A leap forward in accessibility
People who are visually impaired or blind quickly recognized the transformative potential of this technology. They can now ask questions about anything in their field of vision, whether it's an object, a person or a text, and the glasses will provide a detailed explanation. Beyond AI glasses, innovative wearable devices that eschew traditional lenses in favor of a camera-equipped headset are being developed.
One notable innovation comes from the University of Singapore, where a headset with a 13-megapixel camera was created. This device captures images on user command, and the built-in AI then analyzes the size, shape and color of the photographed object. Distinctively, this model works independently without the need to connect to a smartphone or any other external device.
Although this particular headset, which broadcasts sound directly through the bones of the skull, is not yet available, other models from Meta and various manufacturers are already on the market. These devices promise to significantly improve the quality of life of people with disabilities, providing them with unprecedented levels of independence and interaction with their environment.
Source:
Pictures: