How to run LLMs on a local (or remote) machine
How to run LLMs on a local (or remote) machine
End result
- Ollama instance with open-sourced LLM weights running on a local machine (or any machine on the local network for that matter), accessible via open-webui or Python API on the same machine or on the local network.
- all running in GPU-enabled docker containers
Prerequisites
- Root access to a VM with Ubuntu 24.04 LTS, or an unprivileged LXC with GPU enabled (using cgroup2 if it’s Proxmox)
- NVIDIA GPU with at least 5 GB of VRAM
To do
Steps to reproduce
- (optional) check the time and date on the machine, adjust the timezone if needed
1 2
date sudo timedatectl set-timezone Asia/Shanghai
or
sudo dpkg-reconfigure tzdata
- (optional) set up local mirrors
1 2 3 4 5 6 7 8 9 10 11 12 13 14
sudo tee sudo nano /etc/apt/sources.list.d/ubuntu.sources <<- 'EOF' Types: deb URIs: https://mirrors.tuna.tsinghua.edu.cn/ubuntu Suites: noble noble-updates noble-backports Components: main restricted universe multiverse Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg Types: deb URIs: http://security.ubuntu.com/ubuntu/ Suites: noble-security Components: main restricted universe multiverse Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg EOF
sudo apt update && sudo apt upgrade -y
sudo apt install -y git curl nvidia-smi nvtop
- install NVIDIA drivers
- the easy way
1 2 3 4
sudo apt install ubuntu-drivers-common sudo ubuntu-drivers devices sudo apt install nvidia-driver-550 sudo shutdown -r now
- the recommended way
1 2 3 4 5 6 7 8
curl -fSsL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/3bf863cc.pub | gpg --dearmor | sudo tee /usr/share/keyrings/nvidia-drivers.gpg > /dev/null 2>&1 sudo apt update sudo apt install dirmngr ca-certificates software-properties-common apt-transport-https dkms echo 'deb [signed-by=/usr/share/keyrings/nvidia-drivers.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/ /' | sudo tee /etc/apt/sources.list.d/nvidia-drivers.list sudo apt update sudo apt install nvidia-driver-560 apt list --installed | grep nvidia sudo shutdown -r now
- the easy way
- check with
nvidia-smi --query-gpu=compute_cap --format=csv
- (if direct access is blocked) set up proxy. assuming the proxy is running on
localhost:12334
. change the address and the port number if needed1
export https_proxy=http://localhost:12334 http_proxy=http://localhost:12334
- install Docker (will be needed to run open-webui)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
# Add Docker's official GPG key: sudo apt-get update sudo apt-get install ca-certificates curl sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin sudo usermod -aG docker $USER exit
- (if direct access is blocked) set up proxy for the docker daemon to pull images
1 2 3 4 5 6 7 8 9 10 11
sudo tee /etc/docker/daemon.json <<-'EOF' { "proxies": { "http-proxy": "http://localhost:12334", "https-proxy": "http://localhost:12334", "no-proxy": "localhost,127.0.0.0/8" } } EOF sudo systemctl daemon-reload sudo systemctl restart docker
- (if direct access is blocked) another way to set up the proxy is (note the trailing slash):
1 2 3 4 5 6 7 8 9
sudo mkdir -p /etc/systemd/system/docker.service.d sudo tee /etc/systemd/system/docker.service.d/http-proxy.conf <<-'EOF' [Service] Environment="HTTP_PROXY=http://localhost:12334/" Environment="HTTPS_PROXY=http://localhost:12334/" Environment="NO_PROXY=localhost,127.0.0.0/8" EOF sudo systemctl daemon-reload sudo systemctl restart docker
- (if direct access is blocked) might also need to add proxy settings for all accounts
1 2 3 4 5 6
sudo tee /etc/environment <<-'EOF' http_proxy="http://localhost:12334" https_proxy="http://localhost:12334" no_proxy="localhost,127.0.0.0/8" EOF source /etc/environment
check the proxy settings
docker info | grep -i proxy
install nvidia-container-toolkit
1 2 3 4 5
sudo apt install nvidia-container-toolkit -y sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker docker info|grep -i runtime docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
- (optional, but recommended) install Portainer
1 2 3
docker volume create portainer_data docker run -d -p 8000:8000 -p 9443:9443 --name portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer-ce:latest docker ps
- navigate to https://localhost:9443 and set up the admin account
- later update like this:
1 2 3 4
docker stop portainer docker rm portainer docker pull portainer/portainer-ce:2.21.4 docker run -d -p 8000:8000 -p 9443:9443 --name=portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer-ce:2.21.4
- install Ollama
- start the container
1
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --restart unless-stopped --name ollama ollama/ollama
- check if it’s running
1 2
sudo ss -ntlp | grep ollama curl http://localhost:11434
- start the container
- pull models from the Ollama hub
1
docker exec -it ollama ollama pull llama3.2
- install open-webui
1
docker run -d --gpus=all --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart unless-stopped ghcr.io/open-webui/open-webui:ollama
docker container ls
- navigate to http://localhost:8080 and register the admin account
the database is located in
/var/lib/docker/volumes/open-webui
- later update with:
1
docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui
- disable proxy settings with
unset https_proxy http_proxy
or reopen the terminal - (optional) install the Python API. use venv or conda to manage the environment, do not install into the system directly
1
pip install ollama
- test
1 2 3 4 5 6 7 8
from ollama import Client client = Client(host='http://localhost:11434') response = client.chat(model='llama3.2', messages=[ { 'role': 'user', 'content': 'Why is the sky blue?', }, ])
- test
Use
- https://localhost:9443 - Portainer web interface
- https://localhost:8080 - open-webui
References
This post is licensed under CC BY 4.0 by the author.