Google’s teaser at I/O 2024 We got a glimpse of where AI assistants are headed in the future. It’s a multimodal feature that combines the smarts of Gemini with the image recognition power you get in Google Lens, and powerful natural language responses. However, while the promotional video was slick, once you try it out, it’s clear that there’s a long way to go before something like the Astra hits your phone. Here are his three takeaways from his first experience with Google’s next-generation AI.
Sam’s take:
Since most people now interact with digital assistants using voice, Astra’s multimodality (i.e., using visual and audio in addition to text/voice) for communicating with AI will soon become is relatively new. In theory, computer-based entities could act and act like real assistants or agents, rather than just robotic ones that respond to voice commands. This was one of his buzzwords that Google noticed on this show.
In our demo, we had the option to have Astra tell a story based on some objects we placed in front of the camera. A lovely story was then told about a dinosaur and its trusty baguette trying to escape from a creepy red light. It was fun, the story was cute, and the AI worked as well as I expected. But at the same time, it’s far from the seemingly omniscient assistant we saw in Google’s teaser. And aside from maybe entertaining kids with original bedtime stories, I didn’t feel like Astra handled the information as much as you’d like.
Next, her colleague Carissa drew an idyllic scene on the touchscreen, at which point Astra correctly identified the flowers and sun she had drawn. But the most fascinating demo was her second time back running Astra on a Pixel 8 Pro. This allows you to point your camera at a collection of objects while tracking and remembering the location of each one. It was even smart enough to recognize my stash of clothes and sunglasses, even though these objects weren’t originally part of the demo.
In some ways, our experience has highlighted the highs and lows of AI’s potential. Just having your digital assistant tell you where you left your keys or how many apples were in your fruit bowl before you head out to the grocery store could potentially save you real-time time. But speaking with some of the researchers behind Astra, we found there are still many hurdles to overcome.
Unlike many of Google’s recent AI features, Astra (which Google describes as a “research preview”) still requires the help of the cloud rather than running on your device. It also supports some degree of object persistence, but their “memory” only lasts for one session, and currently only for a few minutes. And even if Astra can remember things for longer, you still have to consider things like storage and latency. You run the risk of slowing down the AI with each object Astra remembers, resulting in a more stilted experience. So while it’s clear that Astra has a lot of potential, my excitement was tempered by the knowledge that it would be a while before more full-featured features were available.
Carissa’s thoughts:
Of all the advances in generative AI, the one that excites me the most is multimodal AI. While the latest models are powerful, it’s hard to get excited about iterative updates for text-based chatbots. But the idea of an AI that can recognize and respond to queries about its surroundings in real time feels like something out of a science fiction movie. We’ll also see more clearly how the latest wave of AI advances will be implemented in new devices such as smart glasses.
Google provided a hint with Project Astra. Project Astra may include a glasses component someday, but for now it’s mostly experimental (the video during the I/O keynote was clearly a “research prototype”). It didn’t feel like a science fiction movie at all.
It was able to accurately recognize objects placed around the room and answer subtle questions about them, such as “Which of these toys should a 2-year-old play with?” He was able to recognize my doodles and make up stories about the different toys I showed him.
However, most of Astra’s features appeared to be on par with what Meta has available in smart glasses. Meta’s multimodal AI is aware of its surroundings and can even do some creative writing for you. Meta also touts the feature as experimental, but at least it’s widely available.
A feature of Astra that may set Google’s approach apart is the fact that it has built-in “memory.” Even after scanning a large number of objects, it can “remember” where certain items were placed. So far, Astra’s memory appears to be limited to a relatively short time frame, but members of the research team said it could theoretically be expanded. That would obviously open up more possibilities for the technology and make the Astra look more like a real assistant. He doesn’t need to know where he put his glasses 30 seconds ago, but if he could remember where he put them last night, it would actually feel like science fiction come true.
But as with much generative AI, the most exciting possibilities are those yet to be realized. Astra may get there eventually, but right now it feels like Google still has a lot of work to do to get there.
Check out all the news from Google I/O 2024 here!
https://www.engadget.com/google-project-astra-hands-on-full-of-potential-but-its-going-to-be-a-while-235607743.html?src=rss