DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Integrate Spring With Open AI
  • Enhancing Query Performance With AI and Vector Search in Azure Cosmos DB for PostgreSQL
  • PostgresML: Streamlining AI Model Deployment With PostgreSQL Integration
  • Getting Started With Spring AI and PostgreSQL PGVector

Trending

  • Measuring the Impact of AI on Software Engineering Productivity
  • AI's Dilemma: When to Retrain and When to Unlearn?
  • Comprehensive Guide to Property-Based Testing in Go: Principles and Implementation
  • Unlocking Data with Language: Real-World Applications of Text-to-SQL Interfaces
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Implementing RAG With Spring AI and Ollama Using Local AI/LLM Models

Implementing RAG With Spring AI and Ollama Using Local AI/LLM Models

In this article, learn how to use AI with RAG independent from external AI/LLM services with Ollama-based AI/LLM models.

By 
Sven Loesekann user avatar
Sven Loesekann
·
Feb. 06, 24 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
5.5K Views

Join the DZone community and get the full member experience.

Join For Free

This article is based on this article that describes the AIDocumentLibraryChat project with a RAG-based search service based on the Open AI Embedding/GPT model services.

The AIDocumentLibraryChat project has been extended to have the option to use local AI models with the help of Ollama. That has the advantage that the documents never leave the local servers. That is a solution in case it is prohibited to transfer the documents to an external service.

Architecture

With Ollama, the AI model can run on a local server. That changes the architecture to look like this:

Ollama architecture

The architecture can deploy all needed systems in a local deployment environment that can be controlled by the local organization. An example would be to deploy the AIDocumentLibraryChat application, the PostgreSQL DB, and the Ollama-based AI Model in a local Kubernetes cluster and to provide user access to the AIDocumentLibraryChat with an ingress. With this architecture, only the results are provided by the AIDocumentLibraryChat application and can be accessed by external parties.

The system architecture has the UI for the user and the application logic in the AIDocumentLibraryChat application. The application uses Spring AI with the ONNX library functions to create the embeddings of the documents. The embeddings and documents are stored with JDBC in the PostgreSQL database with the vector extension. To create the answers based on the documents/paragraphs content, the Ollama-based model is called with REST. The AIDocumentLibraryChat application, the Postgresql DB, and the Ollama-based model can be packaged in a Docker image and deployed in a Kubernetes cluster. That makes the system independent of external systems. The Ollama models support the needed GPU acceleration on the server.

The shell commands to use the Ollama Docker image are in the runOllama.sh file. The shell commands to use the Postgresql DB Docker image with vector extensions are in the runPostgresql.sh file.

Building the Application for Ollama

The Gradle build of the application has been updated to switch off OpenAI support and switch on Ollama support with the useOllama property:

Kotlin
 
plugins {
 id 'java'
 id 'org.springframework.boot' version '3.2.1'
 id 'io.spring.dependency-management' version '1.1.4'
}

group = 'ch.xxx'
version = '0.0.1-SNAPSHOT'

java {
 sourceCompatibility = '21'
}

repositories {
 mavenCentral()
 maven { url "https://repo.spring.io/snapshot" }
}

dependencies {
 implementation 'org.springframework.boot:spring-boot-starter-actuator'
 implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
 implementation 'org.springframework.boot:spring-boot-starter-security'
 implementation 'org.springframework.boot:spring-boot-starter-web'
 implementation 'org.springframework.ai:spring-ai-tika-document-reader:
   0.8.0-SNAPSHOT'
 implementation 'org.liquibase:liquibase-core'
 implementation 'net.javacrumbs.shedlock:shedlock-spring:5.2.0'
 implementation 'net.javacrumbs.shedlock:
   shedlock-provider-jdbc-template:5.2.0'
 implementation 'org.springframework.ai:
   spring-ai-pgvector-store-spring-boot-starter:0.8.0-SNAPSHOT'
 implementation 'org.springframework.ai:
   spring-ai-transformers-spring-boot-starter:0.8.0-SNAPSHOT'
 testImplementation 'org.springframework.boot:spring-boot-starter-test'
 testImplementation 'org.springframework.security:spring-security-test'
 testImplementation 'com.tngtech.archunit:archunit-junit5:1.1.0'
 testRuntimeOnly 'org.junit.platform:junit-platform-launcher'
 
 if(project.hasProperty('useOllama')) {
   implementation 'org.springframework.ai:
     spring-ai-ollama-spring-boot-starter:0.8.0-SNAPSHOT'
 } else {	    
   implementation 'org.springframework.ai:
     spring-ai-openai-spring-boot-starter:0.8.0-SNAPSHOT'
 }
}

bootJar {
 archiveFileName = 'aidocumentlibrarychat.jar'
}

tasks.named('test') {
 useJUnitPlatform()
}


The Gradle build adds the Ollama Spring Starter and the Embedding library with 'if(project.hasProperty('useOllama))' statement, and otherwise, it adds the OpenAI Spring Starter.

Database Setup

The application needs to be started with the Spring Profile 'ollama' to switch on the features needed for Ollama support. The database setup needs a different embedding vector type that is changed with the application-ollama.properties file:

Properties files
 
...
spring.liquibase.change-log=classpath:/dbchangelog/db.changelog-master-ollama.xml
...


The spring.liquibase.change-log property sets the Liquibase script that includes the Ollama initialization. That script includes the db.changelog-1-ollama.xml script with the initialization:

XML
 
<databaseChangeLog
  xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog
  http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-3.8.xsd">
    <changeSet id="8" author="angular2guy">
      <modifyDataType tableName="vector_store" columnName="embedding" 
        newDataType="vector(384)"/> 
    </changeSet>
</databaseChangeLog>


The script changes the column type of the embedding column to vector(384) to support the format that is created by the Spring AI ONNX Embedding library.

Add Ollama Support to the Application

To support Ollama-based models, the application-ollama.properties file has been added:

Properties files
 
spring.ai.ollama.base-url=${OLLAMA-BASE-URL:http://localhost:11434}
spring.ai.ollama.model=stable-beluga:13b
spring.liquibase.change-log=classpath:/dbchangelog/db.changelog-master-ollama.xml
document-token-limit=150


The spring.ai.ollama.base-url property sets the URL to access the Ollama model. The spring.ai.ollama.model sets the name of the model that is run in Ollama. The document-token-limit sets the amount of tokens that the model gets as context from the document/paragraph.

The DocumentService has new features to support the Ollama models:

Java
 
private final String systemPrompt = "You're assisting with questions about 
  documents in a catalog.\n" + "Use the information from the DOCUMENTS section to provide accurate answers.\n" + "If unsure, simply state that you don't know.\n" + "\n" + "DOCUMENTS:\n" + "{documents}";

private final String ollamaPrompt = "You're assisting with questions about 
  documents in a catalog.\n" + "Use the information from the DOCUMENTS 
  section to provide accurate answers.\n" + "If unsure, simply state that you 
  don't know.\n \n" + " {prompt} \n \n" + "DOCUMENTS:\n" + "{documents}";

@Value("${embedding-token-limit:1000}")
private Integer embeddingTokenLimit;
@Value("${document-token-limit:1000}")
private Integer documentTokenLimit;	
@Value("${spring.profiles.active:}")
private String activeProfile;


Ollama supports only system prompts that require a new prompt that includes the user prompt in the {prompt} placeholder. The embeddingTokenLimit and the documentTokenLimit are now set in the application properties and can be adjusted for the different profiles. The activeProfile property gets the space-separated list of the profiles the application was started with.

Java
 
public Long storeDocument(Document document) {
  ...
  var aiDocuments = tikaDocuments.stream()
    .flatMap(myDocument1 -> this.splitStringToTokenLimit(
      myDocument1.getContent(), embeddingTokenLimit).stream()
    .map(myStr -> new TikaDocumentAndContent(myDocument1, myStr)))
    .map(myTikaRecord -> new org.springframework.ai.document.Document(
      myTikaRecord.content(), myTikaRecord.document().getMetadata()))
    .peek(myDocument1 -> myDocument1.getMetadata().put(ID,      
      myDocument.getId().toString()))
    .peek(myDocument1 -> myDocument1.getMetadata()
      .put(MetaData.DATATYPE, MetaData.DataType.DOCUMENT.toString()))
    .toList();
  ...
}

public AiResult queryDocuments(SearchDto searchDto) {
...
  Message systemMessage = switch (searchDto.getSearchType()) {
    case SearchDto.SearchType.DOCUMENT ->        
      this.getSystemMessage(documentChunks, 
        this.documentTokenLimit, searchDto.getSearchString());
    case SearchDto.SearchType.PARAGRAPH -> 
      this.getSystemMessage(mostSimilar.stream().toList(), 
        this.documentTokenLimit, searchDto.getSearchString());
...
};

private Message getSystemMessage(
  String documentStr = this.cutStringToTokenLimit(
    similarDocuments.stream().map(entry -> entry.getContent())
      .filter(myStr -> myStr != null && !myStr.isBlank())
      .collect(Collectors.joining("\n")), tokenLimit);
  SystemPromptTemplate systemPromptTemplate = this.activeProfile
    .contains("ollama")	? new SystemPromptTemplate(this.ollamaPrompt)
      : new SystemPromptTemplate(this.systemPrompt);
  Message systemMessage = systemPromptTemplate.createMessage(
    Map.of("documents", documentStr, "prompt", prompt));
  return systemMessage;
}


The storeDocument(...) method now uses the embeddingTokenLimit of the properties file to limit the text chunk to create the embedding. The queryDocument(...) method now uses the documentTokenLimit of the properties file to limit the text chunk provided to the model for the generation.

The systemPromptTemplate checks the activeProfile property for the ollama profile and creates the SystemPromptTemplate that includes the question. The createMessage(...) method creates the AI Message and replaces the documents and prompt placeholders in the prompt string.

Conclusion

Spring AI works very well with Ollama. The model used in the Ollama Docker container was stable-beluga:13b. The only difference in the implementation was the changed dependencies and the missing user prompt for the Llama models, but that is a small fix. 

Spring AI enables very similar implementations for external AI services like OpenAI and local AI services like Ollama-based models. That decouples the Java code from the AI model interfaces very well. 

The performance of the Ollama models required a decrease of the document-token-limit from 2000 for OpenAI to 150 for Ollama without GPU acceleration. The quality of the AI Model answers has decreased accordingly. To run an Ollama model with parameters that will result in better quality with acceptable response times of the answers, a server with GPU acceleration is required.

For commercial/production use a model with an appropriate license is required. That is not the case for the beluga models: the falcon:40b model could be used.

AI Spring Framework PostgreSQL

Published at DZone with permission of Sven Loesekann. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Integrate Spring With Open AI
  • Enhancing Query Performance With AI and Vector Search in Azure Cosmos DB for PostgreSQL
  • PostgresML: Streamlining AI Model Deployment With PostgreSQL Integration
  • Getting Started With Spring AI and PostgreSQL PGVector

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: