Jupyternaut with vLLM

(Return to the Chat Interface page)

vLLM is a fast and easy-to-use library for LLM inference and serving. The vLLM website explains installation and usage.

Note

To use vLLM via OpenRouter as described below you will need to upgrade to jupyter-ai >= 2.29.1.

Depending on your hardware set up you will install vLLM using these instructions. It is best to install it in a dedicated python environment.

Once it is installed you may start serving any model with the command:

vllm serve <model_name>

As an example, the deployment of the Phi-3-mini-4k-instruct model is shown below, with checks to make sure it is up and running:

Screen shot of steps and checks in deploying a model using vllm.

vllm serves up the model at the following URL: http://<url>:8000/v1

Therefore, to use a model from a vLLM server, make sure to type in the model id in the Jupyternaut settings and also add the URL into the base_api model parameter (in the same way as shown for Ollama above).

(Return to the Chat Interface page)