Using vLLM in Jupyter AI#

(Return to the Chat Interface page)

vLLM is a fast and easy-to-use library for LLM inference and serving. The vLLM website explains installation and usage.

Note

To use vLLM via OpenRouter as described below you will need to upgrade to jupyter-ai >= 2.29.1.

Depending on your hardware set up you will install vLLM using these instructions. It is best to install it in a dedicated python environment.

Once it is installed you may start serving any model with the command:

vllm serve <model_name>

As an example, the deployment of the Phi-3-mini-4k-instruct model is shown below, with checks to make sure it is up and running:

Screen shot of steps and checks in deploying a model using vllm.

vllm serves up the model at the following URL: http://<url>:8000/v1

Start up Jupyter AI and update the AI Settings as follows (notice that we are using OpenRouter as the provider, which is a unified interface for LLMs based on OpenAI’s API interface):

Screen shot of AI setting for using vllm.

Since vLLM may be addressed using OpenAI’s API, you can test if the model is available using the API call as shown:

Screen shot of using vllm programmatically with its API.

The model may be used in Jupyter AI’s chat interface as shown in the example below:

Screen shot of using vllm in Jupyter AI chat.

(Return to the Chat Interface page)