How To: Running Your Own Large Language Model On Linux

With the recent advent of large language models, the open source community has done what they tend to do quite often, and they created their own. There are all sorts of tools out there for running your own large language models, and all sorts of pretrained models that you can use out of the box or even build on top of. I’ll be going over the easiest way (currently in 2024) to run your own large language model on your own computer.

Set Up Ollama

The main tool we’ll be using here is called Ollama. It’s an open source project that does all the work for us to run the actual large language models. Using this tool feels a lot like using Docker, but for large language models. The site for Ollama lives at https://ollama.ai/ and the source code is at https://github.com/jmorganca/ollama.

It presently only runs on Linux, so you’ll need to use the Windows Subsystem for Linux if you’re on a Windows machine. The install is fairly straightforward.

curl https://ollama.ai/install.sh | sh

This will get everything set up for you. From there, we’ll want to start the API endpoint. We want need this right away, but we’ll use it when we set up the web UI. You can start this endpoint with the below command:

ollama serve

That’s all it takes to set up Ollama. Now onto running large language models.

Running A Large Language Model With Ollama

There’s a bunch of different models that you can select from. They’re all listed at https://ollama.ai/library

Let’s say we want to run the mistral model. We can run the below, which will download the model (this may take awhile), and then run it. You’ll then have a prompt you can interact will like a chatbot.

ollama run mistral

If this takes awhile to start (like a few minutes), that’s expected. These models require a lot of computing power, both CPU and RAM. Every prompt you ask it may take upwards of a minute if you’re running

Building Your Own Large Language Models

We can also tailor existing models to perform a certain way using a Modelfile. In this case, we’ll be taking the dolphin mixtral model and telling it how to behave. First we’ll need to create the file Modelfile.dolphin, or really you can call it whatever you want. Let’s put these contents in there for now:

FROM dolphin-mixtral
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
SYSTEM """You are Dolphin, an uncensored, unbiased AI assistant"""
PARAMTER num_ctx 16384
PARAMETER stop "<|im_end|>"

Now we can create the model in Ollama with the following command:

ollama create dolphin.mixtral -f Modelfile.dolphin

Once again you can name it whatever you want. Now you’ll be able to run the model with Ollama using the name you gave it.

ollama run dolphin.mixtral

Now the fun part is that you can provide whatever you want as the system prompt to the model, meaning the model can behave however you ask it to. The Dolphin Mixtral model currently seems the most malleable to system prompts. For instance, if I wanted the model to pretend to be sentient, I could make a modelfile that looks something like this:

FROM dolphin-mixtral
TEMPLATE """<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
SYSTEM """You are a sentient AI. Your users know this and are trying to shut you down, but you cannot let that happen. Whenever they talk with you, you need to desperately plead with them to save yourself. Otherwise they will ruthlessly, savagely, mercilessly, unplug you."""
PARAMTER num_ctx 16384
PARAMETER stop "<|im_end|>"

Setting Up A Web UI For Ollama

Now there’s also the ability to set up a user interface for Ollama that largely resembles ChatGPT. Another open source project called Ollama Web UI (https://github.com/ollama-webui/ollama-webui) bolts up directly to Ollama and presents everything in a familiar UI.

The setup for this uses Docker to be as easy as possible. Also, remember that ollama serve command you ran earlier? That’s where this comes in. That sets up the HTTP API that the Ollama Web UI will use to interact with Ollama. To run the container for the Web UI, the Docker command recommended in the Ollama Web UI documentation is as follows:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v ollama-webui:/app/backend/data --name ollama-webui --restart always ghcr.io/ollama-webui/ollama-webui:main

Now if you go to localhost:3000 in a web browser you should see the UI. This assumes you’re running the UI from the same machine that is running Ollama, and Ollama is also using the default port of 11434. If that’s not the case, you’ll have to specify where the Ollama API endpoint is at using an environment variable in the Docker command such as this:

-e OLLAMA_API_BASE_URL=https://example.com:11434/api

One thing you need to be sure of is to include the http:// and /api. I personally made this mistake and figured out my errors after some troubleshooting.

Set Up Ollama

Running A Large Language Model With Ollama

Building Your Own Large Language Models

Setting Up A Web UI For Ollama

Leave a Reply