Build an OpenAI-Compatible API for Open-Source LLMs from Scratch

New and powerful open-source AI models are entering the market every few months. DeepSeek R1 was released a few weeks ago, and the AI world was excited. According to DeepSeek, the model was much cheaper to train and performed better than OpenAI’s o1 model in several benchmarks.
Open-source models like DeepSeek or Llama 3 provide many benefits for developers. For example, you can fine-tune them on-premise to create large language models (LLMs) for your specific use cases. However, there’s also the challenge of how to combine the strengths of different models in one application efficiently. The solution is to build your own REST API.
With a REST API, you can host various open-source models customized for your applications. So, you can use the different strengths of each model. In addition, an own API is more cost-efficient in the long term and ensures data privacy.
Many Python libraries support the API schema of OpenAI. For this reason, it makes sense to build an OpenAI-compatible API. To do this, we’ve followed the official OpenAI API reference (S'ouvre dans une nouvelle fenêtre).
We’ll discuss the following points:
Benefits of an own API
Technical requirements
Create a Chat Completion API
Simple REST API with FastAPI
Q&A Endpoint without Streaming
Q&A Endpoint with Streaming
Tool Use Support
Example: Web Search Tool with AG2
Configuration
Definition of the Tool
Definition of the Agents
Registration of the Agents
Testing the Web Search Tool
Conclusion
Appendix: Full API Code
To read this post you'll need to become a member. Members help us fund our work to ensure we can stick around long-term.
See our plans (S'ouvre dans une nouvelle fenêtre)
Déjà membre ? Connexion (S'ouvre dans une nouvelle fenêtre)