Inspiration
Every day people work, study, and drive while cognitively fatigued, and most of us notice it too late. By the time stress or fatigue feels obvious, performance, productivity, and safety may already be slipping.
We were inspired by the gap between physical health tracking and cognitive awareness. We track steps, heart rate, and sleep, but we rarely get useful feedback about our real-time cognitive state. We wanted to build an ambient AI system that can surface those signals early enough to matter.
What it does
CognitiveSense is a multimodal AI system that detects fatigue, stress, and distraction using signals such as:
- Facial expressions
- Blink rate
- Posture
- Eye movement
- Speech tone
The system analyzes these signals in real time and gives feedback before a user reaches the point where they are clearly overloaded.
We demonstrate CognitiveSense in two environments:
Desktop Monitor
While someone works at a computer, the system analyzes facial signals, posture, and voice tone using the webcam and microphone. If fatigue indicators rise, it recommends actions such as:
- Taking a break
- Hydrating
- Adjusting posture
Smart Cognitive Mirror
The mirror is really our proof that this can work as an IoT system, not just a desktop demo. A lightweight Raspberry Pi client streams sensor data to the analysis server over a local network, then receives the live cognitive state and feedback back to the device.
That same pattern can extend to vehicle mirrors or other ambient displays where the device stays simple and the harder AI work happens on the server.
To keep the system private and secure, devices can connect over a trusted local network or through a private VPN such as Tailscale when they need to interoperate across different networks.
How we built it
We built CognitiveSense as a modular system with a clear language and build split:
- Python powers the multimodal inference pipeline, state tracking, control API, and LLM-driven feedback
- TypeScript powers the desktop app and frontend surfaces
- C powers the Raspberry Pi mirror streamer and hardware-facing display layer
We also split the tooling by job:
- uv for Python environments and checks
- pnpm for the TypeScript app
- CMake for the native mirror code
- Docker for one-command backend deployment
Key components include:
- MediaPipe Face Mesh for facial landmarks, blink detection, and eye movement analysis
- MediaPipe Pose for posture detection
- Audio Transformer models for audio feature extraction for speech tone analysis
These signals are processed by a state detection system that monitors changes in cognitive indicators. When a meaningful change is detected, the system sends the current signals to an LLM, which generates contextual feedback and recommendations for the user.
We built two interfaces using the same core pipeline:
- A desktop monitoring interface
- A mirror display interface
The important architectural point is that the core intelligence is shared, while the clients stay lightweight.
Challenges we ran into
One of the biggest challenges was combining multiple weak signals into something meaningful without drowning the system in noise or constant updates.
We also had to balance token-cost with accuracy, since running multiple vision and audio steps simultaneously gets expensive quickly.
Another challenge was designing one architecture that could support both a desktop app and a networked IoT mirror without duplicating code. We solved this with shared core intelligence and thin environment-specific clients.
Accomplishments that we're proud of
- Building a working multimodal AI system that combines facial signals, posture, and speech features into a real-time cognitive awareness tool
- Reusing one core system across desktop and IoT form factors instead of building separate demos
- Shipping a full-stack architecture across Python, TypeScript, and C with a matching deployment story
- Keeping the system private and self-hostable through LAN or VPN-based connectivity
What we learned
Through this project we learned how powerful multimodal AI becomes when weak signals are fused together.
We also learned that architecture matters as much as model choice. A strong ambient AI product has to be accurate, deployable, and flexible enough to run across different devices and networks.
What's next for CognitiveSense
Next we want to improve the system with temporal awareness so it can track fatigue trends over longer periods of time and adapt to individual baselines.
We also want to explore applications such as:
- Driver safety systems
- Workplace wellness tools
- Student productivity assistants
Our long-term vision is to build secure ambient AI systems that help people understand and protect their cognitive well-being across desktops, mirrors, and other everyday IoT environments.
Log in or sign up for Devpost to join the conversation.