Google Releases Gemma 3: The Most Powerful Model At 27B
Google new Gemma 3 27B performance is freaking insane considering its size.
Google released a brand new model called Gemma 3 with 27 billion parameters. Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Gemma 3 is designed for developers building AI apps that can run anywhere from phones to workstations with support for over 35 languages and the ability to handle text, images, and short videos.
According to the company’s blog post, it’s the “world’s best single-accelerator model,” outpacing Meta’s Llama, DeepSeek, and OpenAI’s o1-preview and o3-mini-high on a single GPU host. That’s impressive for a model built to stay lean.
Here’s what you need to know about Gemma 3.
What is Gemma 3?
Unlike Google’s proprietary Gemini models, which power their consumer-facing tools, Gemma 3 is open-source. It means it’s accessible and available for use to anyone. It comes in four sizes: 1 billion, 4 billion, 12 billion, and 27 billion parameters.
The new model introduces several key features:
Image and Text Input: Multimodal capabilities enable you to input both images and text, allowing for deeper analysis and understanding of visual data.
128K Token Context: Offers a 16x larger context window, letting you analyze extensive data sets and tackle more complex problems.
Wide Language Support: Operate in your preferred language or extend your AI app’s reach, with support for over 140 languages.
Developer-Friendly Model Sizes: Available in multiple sizes (1B, 4B, 12B, 27B) and precision levels, allowing you to select the best fit for your task and computing resources.
The models are now available for download on HuggingFace.
If you’re planning to run it on your local machine, here are the approximate GPU or TPU memory requirements for running inferences with each size of the Gemma 3 model versions.
Memory consumption increases based on the total number of tokens needed for your prompt. The more tokens your prompt requires, the more memory is used, in addition to the memory needed to load the model itself.
Google describes it as their most advanced open model to date. “These are our most advanced, portable and responsibly developed open models yet,” the company said in its official blog post.
The original Gemma launched a year ago and has since racked up over 100 million downloads. Google says the community has created 60,000 variants, forming what they call the “Gemmaverse.”
You can learn more about the technical details of Gemma 3 here.
How It Compares to Other Models
In blind, side-by-side evaluations conducted by human raters (Chiang et al., 2024), Gemma 3 demonstrated impressive performance against competing models. Using the Elo rating system (a widely recognized approach for assessing relative performance) Gemma-3–27B-IT received preliminary ratings that place it ahead of notable competitors, including Meta’s Llama, DeepSeek, and OpenAI’s o1-preview.
This positions Gemma 3 as a highly competitive option in the rapidly evolving landscape of open-source multimodal AI models. Here’s a simpler graph to see how Gemma 3 compares to other models.
Gemma 3 were also evaluated across zero-shot benchmarks comparing various abilities against previous iterations like Gemma 2, as well as Gemini 1.5 and Gemini 2.0.
These evaluations indicate substantial improvements in capabilities, showcasing Gemma 3’s enhanced ability to generalize and effectively handle diverse tasks without prior specific training.
How to Access Gemma 3
If you just want to try it out, Google AI Studio lets you run it right in your browser — no setup needed. Head over to aistudio.google.com and set the model to “Gemma 3 27B”.
For developers, you can grab an API key from AI Studio and integrate it using the Google GenAI SDK. Here’s a sample Python usage in Vertex AI API.
from google import genai
from google.genai.types import HttpOptions
client = genai.Client(http_options=HttpOptions(api_version="v1"))
response = client.models.generate_content(
model="gemini-2.0-flash-001",
contents="How does AI work?",
)
print(response.text)
# Example response:
# Okay, let's break down how AI works. It's a broad field, so I'll focus on the ...
#
# Here's a simplified overview:
# ...
If you need more control, Gemma 3 is available on Hugging Face, Kaggle, and Ollama, with all four sizes, plus ShieldGemma 2. Fine-tuning is supported out of the box, and you can run it on Google Colab or your own GPU.
When it comes to deployment, there are plenty of options. You can scale with Vertex AI, launch quickly with Cloud Run and Ollama, or optimize performance with NVIDIA’s API Catalog. The model is tuned for NVIDIA GPUs, Google Cloud TPUs, and AMD GPUs via ROCm, with CPU support through Gemma.cpp.
Academic researchers get a nice bonus — Google is offering $10,000 in Cloud credits through the Gemma 3 Academic Program. Applications opened today and will run for four weeks.
Take note of the eligibility:
Only individuals (faculty members, researchers or equivalent) affiliated with a valid academic institution, or academic research organization can apply. Please note that credits will be granted at Google’s discretion.
Final Thoughts
Gemma 3’s performance is seriously impressive, especially considering its size. The fact that a 27B model can keep up with or even outperform much larger models says a lot about how far AI efficiency has come. It raises an interesting question — do we really need massive models for most tasks, or are we just scaling up unnecessarily?
The 128K token context is a huge improvement for its size, but the real game-changers are its multimodal capabilities and optimized inference speed. That said, I don’t have many real-world use cases in mind that would take full advantage of all those tokens. Still, having the option is always a plus.
I haven’t done extensive testing yet, but from what I’ve seen so far, the initial reaction from the AI community is overwhelmingly positive. I’ll be experimenting more with Gemma 3 on Ollama soon, and I’m especially curious to see how well the multimodal features perform.
If you’re interested in AI development, I’d say it’s definitely worth checking out. Whether you’re testing it in Google AI Studio, fine-tuning it on Hugging Face, or deploying it through Vertex AI, there are plenty of ways to see how it stacks up against other models. Try it out and see what you think.
Hi there, thanks for making it to the end of this post! My name is Jim, and I’m an AI enthusiast passionate about exploring the latest news, guides, and insights in the world of generative AI. If you’ve enjoyed this content and would like to support my work, consider becoming a paid subscriber. Your support means a lot!