I just found out that you can run Google’s new Gemma 4 models completely offline using the Google AI Edge Gallery app.

It’s surprisingly simple—the app acts as a local sandbox so you don’t have to mess with any code or cloud APIs. Once you download the model files within the app, everything stays on-device. They have the E2B (Effective 2B) and E4B (Effective 4B) versions available, which are specifically optimized to be “small but mighty” for mobile hardware.

This is a great way to experiment with the new models because you get to see how they actually handle reasoning and image analysis on your phone’s own GPU. No server pings, no latency, and it’s a solid way to test the limits of what local AI can do right now.

References: