Jupyternaut with vLLM¶
(Return to the Chat Interface page)
vLLM is a fast and easy-to-use library for LLM inference and serving. The vLLM website explains installation and usage.
Note
To use vLLM via OpenRouter as described below you will need to upgrade to jupyter-ai >= 2.29.1.
Depending on your hardware set up you will install vLLM using these instructions. It is best to install it in a dedicated python environment.
Once it is installed you may start serving any model with the command:
vllm serve <model_name>
As an example, the deployment of the Phi-3-mini-4k-instruct model is shown below, with checks to make sure it is up and running:

vllm serves up the model at the following URL: http://<url>:8000/v1
Therefore, to use a model from a vLLM server, make sure to type in the model id in the Jupyternaut settings and also add the URL into the base_api model parameter (in the same way as shown for Ollama above).