DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Getting Started With LangChain for Beginners
  • AI-Powered Professor Rating Assistant With RAG and Pinecone
  • Have LLMs Solved the Search Problem?
  • Driving RAG-Based AI Infrastructure

Trending

  • Automatic Code Transformation With OpenRewrite
  • The Cypress Edge: Next-Level Testing Strategies for React Developers
  • Zero Trust for AWS NLBs: Why It Matters and How to Do It
  • How to Convert XLS to XLSX in Java
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Unlocking Local AI: Build RAG Apps Without Cloud or API Keys

Unlocking Local AI: Build RAG Apps Without Cloud or API Keys

In this tutorial, we will use Chipper, an open-source framework that simplifies building local RAG applications without cloud dependencies or API keys.

By 
Santhosh Vijayabaskar user avatar
Santhosh Vijayabaskar
DZone Core CORE ·
Feb. 14, 25 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
3.9K Views

Join the DZone community and get the full member experience.

Join For Free

Retrieval-augmented generation (RAG) is transforming how we interact with AI models by combining retrieval techniques with generative models. But what if you could build RAG applications locally, without API keys or cloud dependencies?

Let's meet Chipper, an open-source framework that makes building local RAG apps simple. No more struggling with document chunking, vector databases, LLM integration, and UI setups separately. With Chipper, you can set up a self-contained RAG system on your local machine in minutes.

In this tutorial, we'll walk through:

  • How RAG architectures work under the hood
  • How to set up a local RAG application with Chipper
  • Customizing and optimizing Chipper for better performance
  • A real-world example: Building a legal document analyzer

Let’s get started!

Understanding RAG With Chipper

Before diving into the setup, let’s break down the retrieval-augmented generation (RAG) pipeline and how Chipper simplifies it:

Document Ingestion and Chunking

  • Chipper automatically splits documents into meaningful chunks (sentence or paragraph-based) for better retrieval.
  • You can customize chunk sizes for different use cases.

Embedding and Vectorization

  • Each document chunk is converted into vector embeddings using a pre-trained model.
  • Chipper uses Facebook AI Similarity Search (FAISS) to store and index these embeddings efficiently.

Retrieval Mechanism

  • When a user queries, Chipper searches FAISS for similar document chunks.
  • The most relevant pieces are sent to the LLM to generate a response.

LLM Integration and Query Processing

  • Chipper acts as a proxy for Ollama, sending retrieved document content as context for the LLM’s response.

RAG with Chipper

Step 1: Install and Configure Ollama (LLM Runtime)

Since Chipper requires a local LLM to function, we’ll use Ollama, a lightweight LLM runtime.

1. Install Ollama

On macOS (via Homebrew)

Shell
 
brew install ollama


On Linux

Shell
 
curl -fsSL https://ollama.ai/install.sh | sh


On Windows

Download and install Ollama from Ollama’s official site.

2. Verify Ollama Installation

Check if Ollama is installed:

Shell
 
ollama --version


If Ollama is not running, start it manually:

Shell
 
ollama serve

Terminal view


3. Download the Phi-4 Model (or an Alternative)

Chipper uses a local LLM via Ollama. If no model is found, it will automatically download Phi-4.

To manually pull Phi-4, run:

Shell
 
ollama pull phi4


Alternatively, you can use Mistral 7B (a more powerful model):

Shell
 
ollama pull mistral


4. Configure Ollama to Use a Specific Model

If you want to manually set which model Chipper should use, edit:

Shell
 
nano services/api/.env


Look for this line and update it to your preferred model:

Shell
 
OLLAMA_MODEL=phi4


Save and exit (CTRL + X, then Y, then Enter).

Step 2: Install Chipper

1. Clone the Chipper Repository

Shell
 
git clone [email protected]:TilmanGriesel/chipper.git
cd chipper


2. Launch Chipper Using Docker

Chipper is packaged into Docker containers, which makes it easy to set up.

Run the following command to start Chipper:

Shell
 
./run.sh up

Run the command ./run.sh up

This will:

  • Download and build all required services
  • Launch Chipper’s processing services
  • Connect to the local LLM (via Ollama)

To stop Chipper:

Shell
 
./run.sh down


Note: This step may take some time as Docker downloads all required dependencies.

Step 3: Index and Query Documents

1. Load Documents into Chipper

Chipper allows you to drop in documents for retrieval.

Shell
 
mkdir my_docs
mv research_paper.pdf my_docs/


Now, index them:

Shell
 
chipper ingest my_docs/


This will:

  • Chunk the documents
  • Generate vector embeddings
  • Store them in FAISS or ChromaDB

2. Run a Query

Once indexed, you can query the documents:

Shell
 
./run.sh cli
Shell
 
YOU: "What are the key takeaways from the research paper?"


Chipper retrieves the most relevant document chunks and sends them to the local LLM (via Ollama) for response generation.

Step 4: Run Chipper as a Local AI Assistant 

1. Launch Chipper in the Web Browser

Once Chipper is running, you can interact with it via the web browser. Follow these steps:

  1. Open your terminal and ensure Chipper is running:

    Shell
     
    ./run.sh up

    If Chipper is already running, you should see logs indicating it is active.

  2. Open your browser and navigate to:

    Shell
     
    http://localhost:21200

    This will launch the Chipper UI, where you can interact with your RAG application. 

    Interact with your RAG app


  3. In the web UI, enter your prompt and start querying your indexed documents or testing general AI capabilities.

Enter a prompt

2. Run Chipper in Command Line

To start an interactive chat session in the terminal:

Shell
 
./run.sh cli


Example:

Example of an interactive chat session

Conclusion

By now, you’ve set up Chipper and Ollama, built a local RAG-powered AI assistant, and explored how to interact with it via both the command line and the browser. This hands-on journey has given you the foundation to harness the power of AI — all running privately on your machine, without cloud dependencies. This ensures privacy, speed, cost efficiency, and total control over your AI workflows.

  • No API Keys; runs fully offline.
  • Custom model support. Use Phi-4, Mistral, or Llama 3.
  • Supports web scraping and audio transcription.
  • Optimized for RAG applications in research, legal, and enterprise use cases.

What’s Next?

Now that you've got Chipper up and running, here are some exciting ways to build on what you’ve learned:

  • Experiment with document chunking and vector database configurations
  • Build a custom local AI assistant!
  • Experiment with different models (ollama pull mistral or ollama pull llama3) to see how responses vary.
  • Try indexing more complex document sets and fine-tuning the retrieval process.
  • Dive into Chipper’s API integrations and explore how it can be embedded into existing applications.
  • Check out the official guide here.
AI API large language model RAG

Opinions expressed by DZone contributors are their own.

Related

  • Getting Started With LangChain for Beginners
  • AI-Powered Professor Rating Assistant With RAG and Pinecone
  • Have LLMs Solved the Search Problem?
  • Driving RAG-Based AI Infrastructure

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: