Member-only story
Unlock the Power of AI on Your Phone: Build Your Own LLM Server on Android
A step-by-step guide to setting up your local Android LLM server for faster LLM experiments.
Note: This blog post was proof read by AI and I applied some suggestions.
LLMs (Large Language Models) have become a hot topic recently and are a lot of fun to experiment with.
Large language models (LLMs) can be tricky to work with due to limitations in both local and cloud inference. Locally, LLMs can run slowly on CPUs and cause out-of-memory (OOM) errors on GPUs. Cloud inference, while faster, often restricts the types of optimizations we can perform.
I noticed that, unlike laptops, most modern phones (including my Pixel 6) come equipped with an NPU, typically used for computational photography.
This led me to an idea: could we run a lightweight LLM server on a phone and use a laptop to connect and perform prompt engineering (e.g., with LangChain)?
To achieve this, we need a few things:
- A method to run LLMs on Android, which can be done using MediaPipe and TensorFlow Lite.
- A way to expose the LLMs as a common API service, which can be accomplished by exposing the LLM service as an…