Videos: Mirror Desktop

Inspiration

Every day people work, study, and drive while cognitively fatigued, and most of us notice it too late. By the time stress or fatigue feels obvious, performance, productivity, and safety may already be slipping.

We were inspired by the gap between physical health tracking and cognitive awareness. We track steps, heart rate, and sleep, but we rarely get useful feedback about our real-time cognitive state. We wanted to build an ambient AI system that can surface those signals early enough to matter.


What it does

CognitiveSense is a multimodal AI system that detects fatigue, stress, and distraction using signals such as:

  • Facial expressions
  • Blink rate
  • Posture
  • Eye movement
  • Speech tone

The system analyzes these signals in real time and gives feedback before a user reaches the point where they are clearly overloaded.

We demonstrate CognitiveSense in two environments:

Desktop Monitor

While someone works at a computer, the system analyzes facial signals, posture, and voice tone using the webcam and microphone. If fatigue indicators rise, it recommends actions such as:

  • Taking a break
  • Hydrating
  • Adjusting posture

Smart Cognitive Mirror

The mirror is really our proof that this can work as an IoT system, not just a desktop demo. A lightweight Raspberry Pi client streams sensor data to the analysis server over a local network, then receives the live cognitive state and feedback back to the device.

That same pattern can extend to vehicle mirrors or other ambient displays where the device stays simple and the harder AI work happens on the server.

To keep the system private and secure, devices can connect over a trusted local network or through a private VPN such as Tailscale when they need to interoperate across different networks.


How we built it

We built CognitiveSense as a modular system with a clear language and build split:

  • Python powers the multimodal inference pipeline, state tracking, control API, and LLM-driven feedback
  • TypeScript powers the desktop app and frontend surfaces
  • C powers the Raspberry Pi mirror streamer and hardware-facing display layer

We also split the tooling by job:

  • uv for Python environments and checks
  • pnpm for the TypeScript app
  • CMake for the native mirror code
  • Docker for one-command backend deployment

Key components include:

  • MediaPipe Face Mesh for facial landmarks, blink detection, and eye movement analysis
  • MediaPipe Pose for posture detection
  • Audio Transformer models for audio feature extraction for speech tone analysis

These signals are processed by a state detection system that monitors changes in cognitive indicators. When a meaningful change is detected, the system sends the current signals to an LLM, which generates contextual feedback and recommendations for the user.

We built two interfaces using the same core pipeline:

  • A desktop monitoring interface
  • A mirror display interface

The important architectural point is that the core intelligence is shared, while the clients stay lightweight.


Challenges we ran into

One of the biggest challenges was combining multiple weak signals into something meaningful without drowning the system in noise or constant updates.

We also had to balance token-cost with accuracy, since running multiple vision and audio steps simultaneously gets expensive quickly.

Another challenge was designing one architecture that could support both a desktop app and a networked IoT mirror without duplicating code. We solved this with shared core intelligence and thin environment-specific clients.


Accomplishments that we're proud of

  • Building a working multimodal AI system that combines facial signals, posture, and speech features into a real-time cognitive awareness tool
  • Reusing one core system across desktop and IoT form factors instead of building separate demos
  • Shipping a full-stack architecture across Python, TypeScript, and C with a matching deployment story
  • Keeping the system private and self-hostable through LAN or VPN-based connectivity

What we learned

Through this project we learned how powerful multimodal AI becomes when weak signals are fused together.

We also learned that architecture matters as much as model choice. A strong ambient AI product has to be accurate, deployable, and flexible enough to run across different devices and networks.


What's next for CognitiveSense

Next we want to improve the system with temporal awareness so it can track fatigue trends over longer periods of time and adapt to individual baselines.

We also want to explore applications such as:

  • Driver safety systems
  • Workplace wellness tools
  • Student productivity assistants

Our long-term vision is to build secure ambient AI systems that help people understand and protect their cognitive well-being across desktops, mirrors, and other everyday IoT environments.

Built With

Share this project:

Updates