Reduce RAM Usage

If you are deploying Open WebUI in a RAM-constrained environment (such as a Raspberry Pi, small VPS, or shared hosting), there are several strategies to significantly reduce memory consumption.

On a Raspberry Pi 4 (arm64) with version v0.3.10, these optimizations reduced idle memory consumption from >1GB to ~200MB (as observed with docker container stats).

Quick Start

Set the following environment variables for immediate RAM savings:

# Use external embedding instead of local SentenceTransformers
RAG_EMBEDDING_ENGINE=ollama

# Use external Speech-to-Text instead of local Whisper
AUDIO_STT_ENGINE=openai

tip

These settings can also be configured in the Admin Panel > Settings interface - set RAG embedding to Ollama or OpenAI, and Speech-to-Text to OpenAI or WebAPI.

Why Does Open WebUI Use So Much RAM?

Much of the memory consumption comes from locally loaded ML models. Even when using an external LLM (OpenAI or separate Ollama instance), Open WebUI may load additional models for:

Feature	Default	RAM Impact	Solution
RAG Embedding	Local SentenceTransformers	~500-800MB	Use Ollama or OpenAI embeddings
Speech-to-Text	Local Whisper	~300-500MB	Use OpenAI or WebAPI
Reranking	Disabled	~200-400MB when enabled	Keep disabled or use external
Image Generation	Disabled	Variable	Keep disabled if not needed

⚙️ Environment Variables for RAM Reduction

Offload Embedding to External Service

The biggest RAM saver is using an external embedding engine:

# Option 1: Use Ollama for embeddings (if you have Ollama running separately)
RAG_EMBEDDING_ENGINE=ollama

# Option 2: Use OpenAI for embeddings
RAG_EMBEDDING_ENGINE=openai
OPENAI_API_KEY=your-api-key

Offload Speech-to-Text

Local Whisper models consume significant RAM:

# Use OpenAI's Whisper API
AUDIO_STT_ENGINE=openai

# Or use browser-based WebAPI (no external service needed)
AUDIO_STT_ENGINE=webapi

Disable Unused Features

Disable features you don't need to prevent model loading:

# Disable image generation (prevents loading image models)
ENABLE_IMAGE_GENERATION=False

# Disable code execution (reduces overhead)
ENABLE_CODE_EXECUTION=False

# Disable code interpreter
ENABLE_CODE_INTERPRETER=False

Reduce Background Task Overhead

These settings reduce memory usage from background operations:

# Disable autocomplete (high resource usage)
ENABLE_AUTOCOMPLETE_GENERATION=False

# Disable automatic title generation
ENABLE_TITLE_GENERATION=False

# Disable tag generation
ENABLE_TAGS_GENERATION=False

# Disable follow-up suggestions
ENABLE_FOLLOW_UP_GENERATION=False

Database and Cache Optimization

# Disable real-time chat saving (reduces database overhead)
ENABLE_REALTIME_CHAT_SAVE=False

# Reduce thread pool size for low-resource systems
THREAD_POOL_SIZE=10

Vector Database Multitenancy

If using Milvus or Qdrant, enable multitenancy mode to reduce RAM:

# For Milvus
ENABLE_MILVUS_MULTITENANCY_MODE=True

# For Qdrant  
ENABLE_QDRANT_MULTITENANCY_MODE=True

🚀 Recommended Minimal Configuration

For extremely RAM-constrained environments, use this combined configuration:

# Offload ML models to external services
RAG_EMBEDDING_ENGINE=ollama
AUDIO_STT_ENGINE=openai

# Disable all non-essential features
ENABLE_IMAGE_GENERATION=False
ENABLE_CODE_EXECUTION=False
ENABLE_CODE_INTERPRETER=False
ENABLE_AUTOCOMPLETE_GENERATION=False
ENABLE_TITLE_GENERATION=False
ENABLE_TAGS_GENERATION=False
ENABLE_FOLLOW_UP_GENERATION=False

# Reduce worker overhead
THREAD_POOL_SIZE=10

💡 Additional Tips

Monitor Memory Usage: Use docker container stats or htop to monitor RAM consumption
Restart After Changes: Environment variable changes require a container restart
Fresh Deployments: Some environment variables only take effect on fresh deployments without an existing config.json
Consider Alternatives: For very constrained systems, consider running Open WebUI on a more capable machine and accessing it remotely

Improve Local LLM Performance - For optimizing performance without reducing features
Environment Variable Configuration - Complete list of all configuration options

Reduce RAM Usage​

Quick Start​

Why Does Open WebUI Use So Much RAM?​

⚙️ Environment Variables for RAM Reduction​

Offload Embedding to External Service​

Offload Speech-to-Text​

Disable Unused Features​

Reduce Background Task Overhead​

Database and Cache Optimization​

Vector Database Multitenancy​

🚀 Recommended Minimal Configuration​

💡 Additional Tips​

Related Guides​