Post

How to run LLMs on a local (or remote) machine

How to run LLMs on a local (or remote) machine

End result

  • Ollama instance with open-sourced LLM weights running on a local machine (or any machine on the local network for that matter), accessible via open-webui or Python API on the same machine or on the local network.
  • all running in GPU-enabled docker containers

Prerequisites

  • Root access to a VM with Ubuntu 24.04 LTS, or an unprivileged LXC with GPU enabled (using cgroup2 if it’s Proxmox)
  • NVIDIA GPU with at least 5 GB of VRAM

To do

Steps to reproduce

  • (optional) check the time and date on the machine, adjust the timezone if needed
    1
    2
    
    date
    sudo timedatectl set-timezone Asia/Shanghai
    

    or sudo dpkg-reconfigure tzdata

  • (optional) set up local mirrors
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    
    sudo tee sudo nano /etc/apt/sources.list.d/ubuntu.sources <<- 'EOF'
    Types: deb
    URIs: https://mirrors.tuna.tsinghua.edu.cn/ubuntu
    Suites: noble noble-updates noble-backports
    Components: main restricted universe multiverse
    Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg
    
    Types: deb
    URIs: http://security.ubuntu.com/ubuntu/
    Suites: noble-security
    Components: main restricted universe multiverse
    Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg
    
    EOF
    
  • sudo apt update && sudo apt upgrade -y
  • sudo apt install -y git curl nvidia-smi nvtop
  • install NVIDIA drivers
    • the easy way
      1
      2
      3
      4
      
      sudo apt install ubuntu-drivers-common
      sudo ubuntu-drivers devices
      sudo apt install nvidia-driver-550
      sudo shutdown -r now
      
    • the recommended way
      1
      2
      3
      4
      5
      6
      7
      8
      
      curl -fSsL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/3bf863cc.pub | gpg --dearmor | sudo tee /usr/share/keyrings/nvidia-drivers.gpg > /dev/null 2>&1
      sudo apt update
      sudo apt install dirmngr ca-certificates software-properties-common apt-transport-https dkms
      echo 'deb [signed-by=/usr/share/keyrings/nvidia-drivers.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/ /' | sudo tee /etc/apt/sources.list.d/nvidia-drivers.list
      sudo apt update
      sudo apt install nvidia-driver-560
      apt list --installed | grep nvidia
      sudo shutdown -r now
      
  • check with nvidia-smi --query-gpu=compute_cap --format=csv
  • (if direct access is blocked) set up proxy. assuming the proxy is running on localhost:12334. change the address and the port number if needed
    1
    
    export https_proxy=http://localhost:12334 http_proxy=http://localhost:12334
    
  • install Docker (will be needed to run open-webui)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    
    # Add Docker's official GPG key:
    sudo apt-get update
    sudo apt-get install ca-certificates curl
    sudo install -m 0755 -d /etc/apt/keyrings
    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
    sudo chmod a+r /etc/apt/keyrings/docker.asc
    
    # Add the repository to Apt sources:
    echo \
      "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
      $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
      sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt-get update
    sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    sudo usermod -aG docker $USER
    exit
    
  • (if direct access is blocked) set up proxy for the docker daemon to pull images
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    
    sudo tee /etc/docker/daemon.json <<-'EOF'
    {
      "proxies": {
        "http-proxy": "http://localhost:12334",
        "https-proxy": "http://localhost:12334",
        "no-proxy": "localhost,127.0.0.0/8"
      }
    }
    EOF
    sudo systemctl daemon-reload
    sudo systemctl restart docker
    
  • (if direct access is blocked) another way to set up the proxy is (note the trailing slash):
    1
    2
    3
    4
    5
    6
    7
    8
    9
    
    sudo mkdir -p /etc/systemd/system/docker.service.d
    sudo tee /etc/systemd/system/docker.service.d/http-proxy.conf <<-'EOF'
    [Service]
    Environment="HTTP_PROXY=http://localhost:12334/"
    Environment="HTTPS_PROXY=http://localhost:12334/"
    Environment="NO_PROXY=localhost,127.0.0.0/8"
    EOF
    sudo systemctl daemon-reload
    sudo systemctl restart docker
    
  • (if direct access is blocked) might also need to add proxy settings for all accounts
    1
    2
    3
    4
    5
    6
    
    sudo tee /etc/environment <<-'EOF'
    http_proxy="http://localhost:12334"
    https_proxy="http://localhost:12334"
    no_proxy="localhost,127.0.0.0/8"
    EOF
    source /etc/environment
    
  • check the proxy settings

    • docker info | grep -i proxy
  • install nvidia-container-toolkit

    1
    2
    3
    4
    5
    
    sudo apt install nvidia-container-toolkit -y
    sudo nvidia-ctk runtime configure --runtime=docker
    sudo systemctl restart docker
    docker info|grep -i runtime
    docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
    
  • (optional, but recommended) install Portainer
    1
    2
    3
    
    docker volume create portainer_data
    docker run -d -p 8000:8000 -p 9443:9443 --name portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer-ce:latest
    docker ps
    
    • navigate to https://localhost:9443 and set up the admin account
    • later update like this:
      1
      2
      3
      4
      
      docker stop portainer
      docker rm portainer
      docker pull portainer/portainer-ce:2.21.4
      docker run -d -p 8000:8000 -p 9443:9443 --name=portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer-ce:2.21.4
      
  • install Ollama
    • start the container
      1
      
      docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --restart unless-stopped  --name ollama ollama/ollama
      
    • check if it’s running
      1
      2
      
      sudo ss -ntlp | grep ollama
      curl http://localhost:11434
      
  • pull models from the Ollama hub
    1
    
    docker exec -it ollama ollama pull llama3.2
    
  • install open-webui
    1
    
    docker run -d --gpus=all --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart unless-stopped ghcr.io/open-webui/open-webui:ollama
    
    • docker container ls
    • navigate to http://localhost:8080 and register the admin account

    the database is located in /var/lib/docker/volumes/open-webui

    • later update with:
      1
      
      docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui
      
  • disable proxy settings with unset https_proxy http_proxy or reopen the terminal
  • (optional) install the Python API. use venv or conda to manage the environment, do not install into the system directly
    1
    
    pip install ollama
    
    • test
      1
      2
      3
      4
      5
      6
      7
      8
      
      from ollama import Client
      client = Client(host='http://localhost:11434')
      response = client.chat(model='llama3.2', messages=[
        {
          'role': 'user',
          'content': 'Why is the sky blue?',
        },
      ])
      

Use

References

This post is licensed under CC BY 4.0 by the author.