In the future, large language models could make it so that you can just have a conversation with your phone to take a photo instead of pressing the camera buttons. Conversational interfaces like these could maybe one day power much more than just phones, perhaps even watches and security cameras.
That's according to product managers at Meta and Arm, which partnered to work on a pair of compact AI models unveiled at today's Meta Connect event that are built to run on phones. Both companies are getting into the increasingly competitive effort to get generative AI on phones -- which has become the must-have feature on phones -- with Galaxy AI on the Samsung Galaxy S24 series, Gemini AI on the Google Pixel 9 Pro and Apple Intelligence set to launch on the new iPhone 16 series.
Meta's new AI models are relatively smaller than other LLMs at 1 billion and 3 billion parameters (labeled Llama 3.2 1B and 3B, respectively). They're suitable for use on phones, and potentially other small devices too. They're for use on the "edge" -- in other words, not with computation through the cloud but on-device.Â
"We think this is a really good opportunity for us to move a lot of the inference to on-device and edge use cases," said Ragavan Srinivasan, vice president of product management for generative AI at Meta.
Smartphones and other devices will be able to use these smaller models for things like text summarization, for example summing up a bunch of emails, Srinivasan explained, and creating calendar invites -- things deeply integrated into mobile workflows.Â
The 1B and 3B models are purposely smaller to work on phones and can only understand text. Two larger models released in the Llama 3.2 generation, 11B and 90B, are too big to run on phones and are multimodal, meaning you could submit text and images to get complex answers. They replace previous generation 8B and 70B models that could only understand text.
Meta worked closely with Arm, which designs architecture for CPUs and other silicon that's used in chips from companies such as Qualcomm, Apple, Samsung, Google and more. With over 300 billion Arm-based devices in the world, there's a big footprint of computers and phones that can use these models. Meta and Arm, through their partnership, are invested in helping the around 15 million developers for apps on Arm devices build software supporting these Llama 3.2 models.Â
"What Meta is doing here is truly changing the kind of access to these leading edge models and what the developer community is going to be able to do with it," said Chris Bergey, general manager of the client line of business at Arm.
The partnership is invested in helping developers support the smaller Llama 3.2 models and rapidly integrate them into their apps. They could harness the LLMs to create new user interfaces and ways to interact with devices, Bergey theorizes. Instead of pressing a button to open their camera app, you could have a conversation with your device and explain you they want it to do, for example.
Given the amount of devices, and the speed at which they can deploy a smaller model like the 1B or 3B, Bergey says developers could start supporting them in their apps soon. "I think early next year, if not even late this year," he said.
Conventional LLM logic goes that the more parameters, the more powerful the language model. The 1B and 3B, with only 1 and 3 billion parameters respectively, have far fewer parameters than other LLMs. Although parameter size is a proxy for intelligence, as Srinivasan says, it's not necessarily the same thing. The Llama 3.2 models build on Meta's Llama 3 series of models released earlier this year, including the most powerful one the company has produced in the Llama 3.1 model 405B, which Meta said at the time was the largest publicly available LLM -- and which the company used as a teacher of sorts for the 1B and 3B models.
Developers want to use smaller models for a vast majority of their roofline or on-device tasks, Srinivasan said. They want to choose which tasks are so complex they're sent to the higher-parameter 8B and 70B models (of the Llama 3 generation announced in April) that require computation on larger devices and in the cloud -- but from a user perspective, this should all be very seamless when an app switches between them.
"What it should result in is really snappy responses to prompts that require a quick response, and then a graceful blending of capabilities that then go to the cloud for some of the more higher-capacity models," Srinivasan said.
The benefit of having such relatively small-parameter models as the 1B and 3B are their comparatively better efficiencies -- providing answers with 1 watt of power or within 8 milliseconds, Bergey suggested, compared to the power drains and longer compute times of larger models. That could make them suitable for platforms that aren't as powerful, like smartwatches, headphones or other accessories, although there are still challenges with providing enough power and memory to run LLMs. For now, smartphones are suitable because they have both.
In the future, smaller-parameter models could be well-suited for devices that don't have traditional user interfaces or rely on external devices to be controlled, like security cameras. "I think this definitely goes well beyond smartphones relative to the applicability, especially as you get into smaller models," Bergey said.


