Serge - LLaMA made easy 🦙

License Discord

A chat interface based on llama.cpp for running Alpaca models. Entirely self-hosted, no API keys needed. Fits on 4GB of RAM and runs on the CPU.

  • SvelteKit frontend
  • Redis for storing chat history & parameters
  • FastAPI + langchain for the API, wrapping calls to llama.cpp using the python bindings

demo.webm

Getting started

Setting up Serge is very easy. Starting it up can be done in a single command:

docker run -d \
    --name serge \
    -v weights:/usr/src/app/weights \
    -v datadb:/data/db/ \
    -p 8008:8008 \
    ghcr.io/serge-chat/serge:latest

Then just go to http://localhost:8008/ and you're good to go!

The API documentation can be found at http://localhost:8008/api/docs

Windows

Make sure you have docker desktop installed, WSL2 configured and enough free RAM to run models. (see below)

Kubernetes & docker compose

Setting up Serge on Kubernetes or docker compose can be found in the wiki: https://github.com/serge-chat/serge/wiki/Integrating-Serge-in-your-orchestration#kubernetes-example

Models

Currently the following models are supported:

  • Alpaca-LoRA-65B
  • GPT4-Alpaca-LoRA-30B
  • GPT4All-13B
  • Guanaco-7B
  • Guanaco-13B
  • Guanaco-33B
  • Guanaco-65B
  • Koala-7B
  • Koala-13B
  • Lazarus-30B
  • Nous-Hermes-13B
  • OpenAssistant-30B
  • Samantha-7B
  • Samantha-13B
  • Samantha-33B
  • Stable-Vicuna-13B
  • Vicuna-CoT-7B
  • Vicuna-CoT-13B
  • Vicuna-v1.1-7B
  • Vicuna-v1.1-13B
  • Wizard-Mega-13B
  • Wizard-Vicuna-Uncensored-7B
  • Wizard-Vicuna-Uncensored-13B
  • Wizard-Vicuna-Uncensored-30B
  • WizardLM-30B
  • WizardLM-Uncensored-7B
  • WizardLM-Uncensored-13B
  • WizardLM-Uncensored-30B

If you have existing weights from another project you can add them to the serge_weights volume using docker cp.

⚠️ A note on memory usage

LLaMA will just crash if you don't have enough available memory for your model.

  • 7B requires about 4.5GB of free RAM
  • 13B requires about 12GB free
  • 30B requires about 20GB free

Support

Feel free to join the discord if you need help with the setup: https://discord.gg/62Hc6FEYQH

Contributing

Serge is always open for contributions! If you catch a bug or have a feature idea, feel free to open an issue or a PR.

If you want to run Serge in development mode (with hot-module reloading for svelte & autoreload for FastAPI) you can do so like this:

git clone https://github.com/serge-chat/serge.git
DOCKER_BUILDKIT=1 docker compose -f docker-compose.dev.yml up -d --build

You can test the production image with

DOCKER_BUILDKIT=1 docker compose up -d --build
Description
A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
Readme 6 MiB
Languages
Svelte 56.2%
Python 29.1%
CSS 5%
Shell 3.5%
TypeScript 2.8%
Other 3.4%