# Using vLLM in Jupyter AI [(Return to the Chat Interface page)](index.md#vllm-usage) `vLLM` is a fast and easy-to-use library for LLM inference and serving. The [vLLM website](https://docs.vllm.ai/en/latest/) explains installation and usage. :::{note} To use `vLLM` via `OpenRouter` as described below you will need to upgrade to `jupyter-ai >= 2.29.1`. ::: Depending on your hardware set up you will install `vLLM` using these [instructions](https://docs.vllm.ai/en/latest/getting_started/installation/index.html). It is best to install it in a dedicated python environment. Once it is installed you may start serving any model with the command: ```python vllm serve ``` As an example, the deployment of the `Phi-3-mini-4k-instruct` model is shown below, with checks to make sure it is up and running: Screen shot of steps and checks in deploying a model using vllm.

Screen shot of steps and checks in deploying a model using vllm.

`vllm` serves up the model at the following URL: `http://:8000/v1` Start up Jupyter AI and update the AI Settings as follows (notice that we are using [OpenRouter](openrouter.md) as the provider, which is a unified interface for LLMs based on OpenAI's API interface): Screen shot of AI setting for using vllm.

Screen shot of AI setting for using vllm.

Since vLLM may be addressed using OpenAI's API, you can test if the model is available using the API call as shown: Screen shot of using vllm programmatically with its API.

Screen shot of using vllm programmatically with its API.

The model may be used in Jupyter AI's chat interface as shown in the example below: Screen shot of using vllm in Jupyter AI chat.

[(Return to the Chat Interface page)](index.md#vllm-usage)