ThreadPilot: Cross-Team Workflow Autopilot

What inspired us The idea for ThreadPilot was born out of the sheer chaos of modern engineering communication. In hardware-software organizations, engineering teams juggle hundreds of Slack messages daily across mechanical, electrical, software, product, and QA channels. We realized that critical information—like cross-team dependencies and sudden blockers—frequently gets buried in the noise. As a result, team leads and executives were wasting 30 to 60 minutes every single day just manually skimming channels to stay informed. We wanted to eliminate this bottleneck and turn unstructured chat noise into actionable, structured intelligence.

What it does ThreadPilot is an AI-powered autopilot that prevents important decisions from getting lost in long threads. Instead of forcing everyone to read every message, ThreadPilot generates a 2-minute daily digest that is intelligently personalized by persona, such as Executive, Team Lead, PM, or IC.

Going beyond simple summarization, it actively detects cross-team blockers and dependencies. To keep workflows completely seamless, it takes that analyzed data and automatically issues Jira tickets and blocks off Google Calendar events for the relevant team members.

How we built it We designed a serverless, event-driven architecture heavily utilizing AWS and Python.

  • Compute & Orchestration: An AWS EventBridge cron job triggers our 5-stage AWS Lambda digest pipeline daily at 9 AM.
  • AI Layer: We used Anthropic Claude hosted on Amazon Bedrock for our multi-agent setup, specifically the TeamAnalyzer and DependencyLinker agents, to extract decisions and map out complex cross-team dependencies without needing static training data.
  • State & Storage: Persistent memory, like blocker states and dependency graphs, are stored in Amazon S3, while Amazon DynamoDB handles run states and persona preferences.
  • Delivery: To respect Slack API rate limits when sending direct messages, we decoupled generation from delivery using an Amazon SQS queue.
  • Data Validation: We used Pydantic for strict data model validation and serialization of the AI outputs.

Challenges we ran into Our core challenge was converting highly unstructured, nuanced human conversation into structured intelligence. We quickly realized that standard, rule-based logic simply cannot handle the ambiguity of real team chats. For instance, rule-based systems miss implicit decisions and have high false-positive rates when simply looking for the keyword "blocked".

Moving to a Generative AI approach fixed this, but introduced the need for continuous refinement. To combat this, we built a continuous learning feedback system where users can react to the Slack digests with emojis to indicate if the information is accurate, wrong, or missing context. This system automatically patches our prompts and improves quality over time.

Accomplishments that we're proud of & What we learned We successfully built a system that cuts the time needed to stay aware of project status by 95%—taking it from nearly an hour down to under 2 minutes. Furthermore, it catches missed blockers 8x faster than manual tracking.

We are incredibly proud of our feedback loop's mathematical performance during our 14-day simulation. By Day 10, the system accumulated user reactions and automatically adjusted its confidence and prompts. We calculated our peak accuracy (\( A \)) based on positive user reactions (\( R_{acc} \)) versus missing context (\( R_{miss} \)) and wrong extractions (\( R_{wrong} \)):

$$A = \frac{R_{acc}}{R_{acc} + R_{miss} + R_{wrong}}$$

Substituting our Day 10 sample data into the model:

$$A = \frac{14}{14 + 1 + 0} \approx 0.933$$

This feedback loop yielded a validated peak accuracy of 93%. Through this project, we learned a massive amount about prompt engineering, zero-shot generalization, and how to architect a scalable, near-zero operational cost AI pipeline using serverless AWS tools.

+ 3 more
Share this project:

Updates