Inspiration
Getting started in a new internship position is never easy. You're not only left in a completely new physical environment/workplace, you also have to get acquainted with a completely new code base and tech stack. When we first started our internships, we all had to onboard and download specific packages and receive permissions on GitHub. We found that a lot of interns spent our first few weeks learning how to download and navigate these packages/dependencies instead of making meaningful contributions to real work. A key problem was that we were digging through Cake documentation created years upon years in the past. Even when we reached out for help, our mentors and supervisors could not remember what to do, having set up their workspaces so long ago. Thus, this made the onboarding process even harder. As we began to start upon our code, we received frequent suggestions from GitHub's Co-Pilot on how to improve the quality of our work. This was when we started wondering why we couldn't use similar technologies for the tedious onboarding processes which cost myself and companies valuable time. This is why we developed NavigAIt. An AI-driven coach to seamlessly enable onboarding processes, bettering both employees and employers alike!
What it does
NavigAIt enables users to receive live, customized, audio feedback/assistance while they navigate through new technologies/codebases, specifically those that one might face during employee onboarding. By feeding continuous audio/display input, it will effectively analyze the user's thought process and identify errors and pain points while directing them to the next step to successfully complete a given task! NavigAIt is able to communicate directly with the employee/user through its TTS/STT dataflows. It's a tool that allows for hands on experience with the user, allowing 24/7 support with questions, concerns, and issues even when mentors are busy! Employees are thus given the same training and will be on the same page when working together, building a more competent and coordinated team.
How we built it
NavigAIt's frontend is built with Next.js and React, and its backend is built with a RESTful Node.js and Express.js server integrated with Google Cloud Platform (GCP). We developed a RAG-based LLM system using Gemini and Vertex AI that uses company onboarding guidelines and the user's actions on the screen to direct the user.
Challenges we ran into
A monumental challenge for our backend logic was regarding the live, uninterrupted data flow from the user audio input and the display capture. Normally, an implementation would feature some sort of toggle to turn on and off a data stream (to signify when a sentence/phrase is complete). However, we deemed this unsuitable for our product that was meant to truly make onboarding a pain-free experience; we wanted our AI assistant to be as close as possible to a real supervisor helping you install your codebase. After struggling for several hours, we cleverly used Google's client libraries and APIs to engineer a continuous audio input stream, using periods of silence to dictate separation between audio.
Our audio stream had an additional issue - transcribing and generating text whilst outputting responses, creating many overlapping messages and a generally buggy product. Fortunately, with a cleverly placed global variable, we ensured that no interfering inputs/outputs could be processed during the model's TTS sequence and greatly enhanced the quality of life for the user.
Project refactoring with Frontend and Backend. We had some trouble setting up the express.js server and taking into account its usability in receiving requests in both the Gemini model and the react frontend. Later, we were able to set up an api module in the frontend that ensured seamless communication between all components of the project.
When setting up the Google Cloud workspace and API access, we quickly ran into permission issues regarding the GCP, which wasted a lot of valuable time especially in a 24-hour hackathon. Larris was able to do some digging into Google's documentation and provide suitable environment variables to authenticate Google's AI platforms.
Accomplishments that we're proud of
We're not only proud of setting up the continuous audio stream but also our model tuning/prompting for its specific use case. Large multimodal models such as Gemini are originally designed for readability purposes - with lists, dotjots, and special text styling. Moreover, the model tends to provide lengthy, arduous responses that were not at all suitable for live audio communication. Therefore, by starting simple and carefully tuning the model, it began to excel with live responses and assisting the user based on their screen. We also were able to provide the model with a functioning memory. There was a severe drawback by giving it infinite memory, however - the sheer quantity of text often outweighed its original objective and it lost track of its prompt style/format. On the other hand, a lack of memory was no good either, since the model had to be able to understand the context and unique step in the codebase. We ended up implementing a Queue data structure, ensuring that the model had ample context for its assistive processing but a well-defined limit to keep it on track and increase generation efficiency.
What we learned
A definite overarching theme of our project was LLM processing/augmentation. Fine-tuning a model or providing it with a suitable wrapper/context is much easier said than done. It's not too hard to develop a mediocre model, but the challenge of tuning an ideal model for its use case is something we will think about for years to come!
Last but not least, the importance of collaboration in programming! We did run into organization/structural issues throughout the duration of our hacking, and we realize how important it is to stay on the same page, keep the git repository updated, and integrate our ideas into something big!
What's next for NavigAIt
- Data-driven feedback to continuously enhance training modules.
- LLM optimization - memory summarization module
- LLM/RAG optimization - direct model audio input
- Further use cases (general coding assistance, IT help, etc.)
Built With
- express.js
- gcp
- gemini
- html
- javascript
- nextjs
- node.js
- react
- vertexai

Log in or sign up for Devpost to join the conversation.