Inspiration

First responders often get into unforeseen situations where they are in danger but are unable to call for backup or help. We wanted a system that would be able to detect danger from a variety of inputs to gain a full understanding of the threat level and context, then contact authorities if situation escalates. The application could also be used for assessing danger for anyone at the scene and acting quicker than human instinct.

What it does

Model uses 2 inputs, video and audio to assess risk by detecting weaponry, dangerous motions of individuals, and distinctive keywords that indicate a dangerous situation. Using all three of these inputs we are able to rate situations based on a "Aggregate Risk Score", a numerical reflection of how dangerous a situation is for an officer. If this situation is deemed dangerous, the software will contact authorities through the phone, providing key context behind situations threat and weaponry present, etc.

How we built it

We used the Arduino UNO Q connected to a webcam to capture audio and video for our on system Python-based classification/identification models. Leveraging the UNO Q, parallel processing for audio and video streams, we achieved a near real-time translation. The UNO Q enables the module to be portable, communicating over WiFi for models' inputs and results. On the processing end, we have three separate models running on 2 different threads in Python. One thread is running the video stream by taking in the webcam stream, performing calculations using pose-detection/stance-detection and object detection models, and outputting the altered video stream. Another thread is constantly transcribing the conversation, looking for distinctive keywords that would indicate a dangerous situation, using the power of semantics. We use these three models together to develop an aggregate score assessing danger in the current situation. If this aforementioned "danger" exceeds a threshold, an emergency dispatch call is made using Twilio. The phone call provides a summary of the situation using both the visual and acoustic cues of the scene. All these is is displayed in our frontend, where the live and altered video stream is displayed, alongside a live feed of current identifications in audio and video. The application also bodly shows danger at all times. The frontend was made using React.js and using REST architecture we were able to create seamless communication between the frontend and backend.

Challenges we ran into

The biggest challenge we ran into was the processing power of our mobile hardware, The Arduino. The Arduino only having 2 GB of RAM and limited CPU called innovation as engineers from our end. As most of the classification and identification models we made use of required computation, we off-sited the work onto a more powerful device. Through our innovation, we discovered a more realistic situation when expanding and extrapolating a project to the real-world. The relationship between on-field equipment and a more powerful server is something we were able to properly emulate. Another problem, our group ran into was the classification of weapons through our model. We saw that often most lightweight models would hallucinate weapons, and raise the risk level higher than warranted. To fix such an issue we fine-tuned the parameters of an open-source YOLOv8 Model.

Accomplishments that we're proud of

We are proud of successfully building a an end to end multi-modal system that combines video-based pose detection, object detection, and semantic audio transcription into a unified Aggregate Risk Score. By using a fine-tuning open-source YOLOv8 model, we significantly reduced false positives in weapon detection and improved contextual accuracy. Despite hardware limitations on the Arduino UNO Q, we engineered a hybrid edge-to-server architecture that enabled near real-time processing. We also implemented automated emergency escalation using Twilio to place dispatch calls with contextual summaries when risk thresholds are exceeded. Finally, our React.js dashboard cleanly visualizes live video, audio transcripts, detection, and risk levels in an intuitive interface.

What we learned

Through this project, we learned that edge AI systems must carefully balance portability with computational constraints. We discovered that model performance varies significantly depending on context, making fine-tuning and threshold calibration essential for safety-critical applications. We also realized that false positives can be more harmful than missed detections in high-stakes environments, reinforcing the importance of multi-modal validation. Beyond machine learning, we gained experience in multi-threaded system design, REST-based communication, latency optimization, and multithreading processing. Most importantly, we developed a deeper appreciation for the ethical responsibility involved in deploying AI systems that influence emergency responses.

What's next for Brie-safe

Next, we plan to optimize our models through pruning and quantization to enable true on-device inference. We aim to incorporate temporal modeling techniques to better understand escalation patterns rather than relying on frame-by-frame analysis. Improving dynamic risk calibration using learned thresholds instead of fixed cutoffs is another priority. We also plan to strengthen our cloud infrastructure with encrypted communication and secure authentication mechanisms. Finally, we hope to collaborate with first responders to conduct controlled field testing and refine the system using real-world feedback.

Built With

Share this project:

Updates