Google has officially introduced Gemma 3n, a new line of AI models built specifically for on-device intelligence. Designed to handle a wide range of inputs — including text, images, audio, and video — the Gemma 3n series brings multimodal functionality to devices with limited resources.
Unveiled after a preview earlier this year, the Gemma 3n models are engineered to run efficiently on phones, tablets, laptops, desktops, and even single-node cloud accelerators. The models come in two configurations — E2B and E4B — which are optimized for performance while requiring minimal memory: around 2GB and 3GB, respectively. Though these models have raw parameter counts of 5 billion (E2B) and 8 billion (E4B), they perform similarly to more compact 2B and 4B models, thanks to architectural efficiency.
Released for production use on June 26, the models are now available through platforms like Hugging Face and Kaggle, and developers can experiment with them via Google AI Studio.
Gemma 3n is built on the same foundation as Google’s Gemini Nano, incorporating technologies like the MatFormer framework for adaptable compute scaling, Per Layer Embeddings (PLE) to optimize memory usage, and custom modules like LAuReL and AltUp for performance tuning. The models also include enhanced audio and visual encoders, tailored for edge scenarios.
Support for 140 languages is built-in for text processing, with 35 languages enabled for multimodal interpretation. Notably, the E4B model surpassed a score of 1300 on the LMArena benchmark, making it the first AI model under 10B parameters to hit that performance milestone, according to Google.
The broader Gemma family, which debuted in 2024, includes a variety of models targeted at specialized tasks — from healthcare solutions to safety tools — and community-driven initiatives like enterprise vision systems and localized models for Japanese language use, the company added.
