Inspiration
Modern software teams rely heavily on platforms like GitHub to collaborate and ship code, but repository analytics are still mostly reactive. Teams typically discover problems only after bugs appear in production. I wanted to explore a different approach: what if repository data could be analyzed with machine learning to predict risk before it causes failures? The idea behind RepoPulse was to turn repositories into living systems that continuously measure their own health. By combining repository metadata, contributor activity, and machine learning models, I aimed to build a platform that helps developers understand not just what happened, but what is likely to happen next.
What it does
RepoPulse is an AI-powered repository analytics dashboard that analyzes GitHub repositories and transforms raw repository activity into actionable insights. The platform provides: š Pull Request Risk Detection Machine learning models analyze pull requests and assign a risk score, helping teams prioritize reviews for changes that are more likely to introduce bugs. š Repository Health Scoring RepoPulse calculates a comprehensive repository health score based on risk levels, code churn, anomaly signals, and development velocity. š® Predictive File Churn Analysis The system predicts which files are likely to become unstable or frequently modified in the future. ā ļø Contributor Anomaly Detection An anomaly detection model identifies unusual contributor behavior such as sudden spikes in commits or irregular development patterns. š Visual Analytics Dashboard Interactive dashboards show repository trends, hotspots, contributor networks, and historical performance. ā” Real-Time Updates Live WebSocket updates allow teams to see repository changes and analysis results in real time. š„ Team Collaboration Multi-user workspaces allow engineering teams to share insights and track repository health collectively.
How I built it
RepoPulse was designed using a microservices architecture to separate analytics, machine learning, and frontend interaction. Frontend Next.js 14 React 18 TailwindCSS Recharts for interactive visualizations Backend API Node.js with Express GitHub API integration via Octokit PostgreSQL database for repository analytics storage WebSocket server for real-time updates Machine Learning Service Python with FastAPI scikit-learn models for prediction and anomaly detection pandas for repository data processing The backend collects repository data from GitHub, stores it in PostgreSQL, and sends it to the ML service for analysis. The ML service returns predictions and risk scores, which are displayed through an interactive dashboard.
Challenges I ran into
Integrating machine learning with real-time analytics: One of the biggest challenges was designing a system where ML predictions could be generated quickly enough to support interactive dashboards. Designing meaningful repository health metrics: Combining multiple signals (PR risk, churn, anomalies, and merge velocity) into a single interpretable health score required experimentation and tuning. Handling complex repository data: GitHub repositories contain large volumes of commit, file, and contributor data. Efficiently processing and visualizing that information required careful backend architecture. Coordinating multiple services: Ensuring smooth communication between the Node.js backend, Python ML service, and frontend application required robust API design.
Accomplishments that I am proud of
ā Successfully built a full-stack AI analytics platform within the hackathon timeframe. ā Designed and integrated multiple machine learning models for repository intelligence. ā Implemented real-time analytics dashboards with live updates. ā Created a repository health scoring system that combines multiple engineering metrics. ā Built a scalable architecture that separates frontend, backend, and ML services.
What I learned
This project taught me several important lessons about building intelligent developer tools: Machine learning can provide powerful insights when applied to engineering data. Predictive analytics can help teams move from reactive debugging to proactive development. Designing meaningful metrics requires understanding both data science and developer workflows. Building production-ready tools requires balancing performance, usability, and architecture. I also gained hands-on experience integrating full-stack systems with ML pipelines and real-time event streaming.
What's next for RepoPulse
I see several exciting directions for expanding RepoPulse: Advanced ML models: Deep learning models for code complexity and bug prediction. Multi-repository intelligence: Cross-repository analysis for large organizations. IDE integrations: Integrations with tools like Visual Studio Code so developers can see risk insights directly while coding. Automated code review assistance: AI-generated suggestions for pull request improvements. DevOps pipeline integrations: Connecting RepoPulse with CI/CD tools for automated quality gates. My long-term vision is to build a predictive engineering intelligence platform that helps development teams understand and improve the health of their software ecosystems.
Built With
- express.js
- fastapi
- javascript
- next.js
- node.js
- octokit
- pandas
- postgresql
- python
- react
- recharts
- scikit-learn
- sql
- tailwindcss
- websockets

Log in or sign up for Devpost to join the conversation.