Back to Newsroom
Edge LLMs: Benefits, Challenges, and Solutions
August 21, 2024
PALO ALTO, CALIFORNIA

At Zyphra, we are deeply invested in optimizing the user experience for AI. We believe the future of AI will involve a combination of cloud and edge deployment strategies with an increasing shift towards on-device inference for various use cases. In particular, we have been looking closely at how to improve the experience on edge devices by carefully designing and crafting hardware-aware models, and by applying personalization techniques. Our Zamba series of models, exemplifies our commitment to innovative foundation model R&D with useful applications on the edge.

This blog post discusses the key factors to consider when deploying models on edge devices. We emphasize the significant hardware constraints of these devices, and identify techniques to efficiently utilize local hardware resources - quantization, low-rank adapters, and real-time parameter offloading from storage.

We explore two case studies on Memory Bandwidth and Memory Capacity for iPhone 15 Pro (Apple) and Jetson Orin (Nvidia) edge platforms.

Authors
Andrew Greene, Kamil Rocki, Tomas Figliolia, Travis Oliphant, Beren Millidge
Collaborators
Daniel A Roberts (Sequoia Capital & MIT), Andrey Gromov (Meta FAIR), Kushal Tirumala (Meta FAIR) and Hassan Shapourian (Cisco)