OpenAI API Compatible access to Google Gemini and Embedding Models

Posted by theeashman@reddit | LocalLLaMA | View on Reddit | 13 comments

Hey all, I want to share a project I've recently worked on that provides seamless, all-in-one access to Google's commercial Gemini and embedding models using the OpenAI API - completely free of charge! You can check it out here: https://github.com/ekatiyar/gemini-openai-proxy

There's been some discussion here recently about Google's Gemini models, especially regarding their 1 million token context window, which is ideal for use cases like long-form chats, creative writing, and handling large codebases. While Google provides generous usage limits for free users with their API, the lack of native OpenAI API compatibility makes integration challenging, especially since many popular open-source tools and applications only support the OpenAI API. For instance, I use Open Web UI, which only supports the Ollama or OpenAI API protocols.

While services like OpenRouter offer access to Gemini models via an OpenAI-like API, their usage costs can quickly add up, especially when taking advantage of the 1 million token context window. There are some existing projects which implement proxies for communicating with Google's models through the OpenAI API. However, these solutions are incomplete - some only support chat completion, while others focus only on embeddings.

To address this gap, I've forked zhu327's excellent gemini-openai-proxy, and extended it to support Google's embedding model, creating an all-in-one solution to interact with all of Google's commercial models for free using the OpenAI API. This means you can use it both for chats and embedding capabilities in your applications, including those that rely on RAG.

Benefits:

OpenAI-compatible access to Google's models at no cost.
Comprehensive support for chat completion (with streaming) and embedding models.

Features:

Easy deployment via a Docker image.
Minimal configuration required to get started.
Flexible hosting options—run it locally or on cloud servers.

The source code and setup instructions are on github: https://github.com/ekatiyar/gemini-openai-proxy and you can pull the docker image from ghcr.io/ekatiyar/gemini-openai-proxy:latest

Give it a try and let me know what you think! Contributions and feedback are welcome.

[-]

coder543@reddit

Google’s Gemini models (not embedding) are already available through an OpenAI compatibility layer that Google offers: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/call-gemini-using-openai-library