How to run LLMs on a local (or remote) machine

Posted Nov 15, 2024 Updated Feb 14, 2025

By Aleksandr Mitkov

4 min read

End result

Ollama instance with open-sourced LLM weights running on a local machine (or any machine on the local network for that matter), accessible via open-webui or Python API on the same machine or on the local network.
all running in GPU-enabled docker containers

Prerequisites

Root access to a VM with Ubuntu 24.04 LTS, or an unprivileged LXC with GPU enabled (using cgroup2 if it’s Proxmox)
NVIDIA GPU with at least 5 GB of VRAM

To do

Steps to reproduce

(optional) check the time and date on the machine, adjust the timezone if needed
1 2 date sudo timedatectl set-timezone Asia/Shanghai
or sudo dpkg-reconfigure tzdata

(optional) set up local mirrors

  
sudo tee sudo nano /etc/apt/sources.list.d/ubuntu.sources <<- 'EOF'
Types: deb
URIs: https://mirrors.tuna.tsinghua.edu.cn/ubuntu
Suites: noble noble-updates noble-backports
Components: main restricted universe multiverse
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg

Types: deb
URIs: http://security.ubuntu.com/ubuntu/
Suites: noble-security
Components: main restricted universe multiverse
Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg

EOF

sudo apt update && sudo apt upgrade -y
sudo apt install -y git curl nvidia-smi nvtop

install NVIDIA drivers

the easy way

  
sudo apt install ubuntu-drivers-common
sudo ubuntu-drivers devices
sudo apt install nvidia-driver-550
sudo shutdown -r now

the recommended way

  
curl -fSsL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/3bf863cc.pub | gpg --dearmor | sudo tee /usr/share/keyrings/nvidia-drivers.gpg > /dev/null 2>&1
sudo apt update
sudo apt install dirmngr ca-certificates software-properties-common apt-transport-https dkms
echo 'deb [signed-by=/usr/share/keyrings/nvidia-drivers.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/ /' | sudo tee /etc/apt/sources.list.d/nvidia-drivers.list
sudo apt update
sudo apt install nvidia-driver-560
apt list --installed | grep nvidia
sudo shutdown -r now

check with nvidia-smi --query-gpu=compute_cap --format=csv
(if direct access is blocked) set up proxy. assuming the proxy is running on localhost:12334. change the address and the port number if needed
1 export https_proxy=http://localhost:12334 http_proxy=http://localhost:12334

install Docker (will be needed to run open-webui)

  
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl
sudo install -m 0755 -d /etc/apt/keyrings
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add the repository to Apt sources:
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -aG docker $USER
exit

(if direct access is blocked) set up proxy for the docker daemon to pull images

  
sudo tee /etc/docker/daemon.json <<-'EOF'
{
  "proxies": {
    "http-proxy": "http://localhost:12334",
    "https-proxy": "http://localhost:12334",
    "no-proxy": "localhost,127.0.0.0/8"
  }
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

(if direct access is blocked) another way to set up the proxy is (note the trailing slash):

  
sudo mkdir -p /etc/systemd/system/docker.service.d
sudo tee /etc/systemd/system/docker.service.d/http-proxy.conf <<-'EOF'
[Service]
Environment="HTTP_PROXY=http://localhost:12334/"
Environment="HTTPS_PROXY=http://localhost:12334/"
Environment="NO_PROXY=localhost,127.0.0.0/8"
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

(if direct access is blocked) might also need to add proxy settings for all accounts

  
sudo tee /etc/environment <<-'EOF'
http_proxy="http://localhost:12334"
https_proxy="http://localhost:12334"
no_proxy="localhost,127.0.0.0/8"
EOF
source /etc/environment

check the proxy settings
- docker info | grep -i proxy

install nvidia-container-toolkit

  
sudo apt install nvidia-container-toolkit -y
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
docker info|grep -i runtime
docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

(optional, but recommended) install Portainer

  
docker volume create portainer_data
docker run -d -p 8000:8000 -p 9443:9443 --name portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer-ce:latest
docker ps

navigate to https://localhost:9443 and set up the admin account

later update like this:

  
docker stop portainer
docker rm portainer
docker pull portainer/portainer-ce:2.21.4
docker run -d -p 8000:8000 -p 9443:9443 --name=portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer-ce:2.21.4

install Ollama

start the container

  
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --restart unless-stopped  --name ollama ollama/ollama

check if it’s running

  
sudo ss -ntlp | grep ollama
curl http://localhost:11434

pull models from the Ollama hub

docker exec -it ollama ollama pull llama3.2

install open-webui

  
docker run -d --gpus=all --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart unless-stopped ghcr.io/open-webui/open-webui:ollama

docker container ls
navigate to http://localhost:8080 and register the admin account

the database is located in /var/lib/docker/volumes/open-webui

later update with:

  
docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui

disable proxy settings with unset https_proxy http_proxy or reopen the terminal

(optional) install the Python API. use venv or conda to manage the environment, do not install into the system directly

pip install ollama

test

  
from ollama import Client
client = Client(host='http://localhost:11434')
response = client.chat(model='llama3.2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])

Use

https://localhost:9443 - Portainer web interface
https://localhost:8080 - open-webui

References

This post is licensed under CC BY 4.0 by the author.