In Q2 2025, I convinced our VP of Product to give my engineering team a full quarter with no new feature work. No new product requirements. No roadmap commitments. Twelve weeks to do nothing but fix things.
It took me four months to get approval and a business case that included an estimate of how much the existing technical debt was costing us in engineering velocity, incident frequency, and onboarding time. I’ll share that business case at the end of this post. First, I want to tell you what we actually found when we went looking.
The Audit: What We Knew vs. What We Found
We started the quarter with a structured audit. Every engineer on the team spent the first two weeks documenting every instance of debt they encountered in their area of ownership. No fixes yet — just documentation. We built a shared Notion database with a simple schema: description, affected area, estimated blast radius (low/medium/high), and estimated remediation effort.
At the end of week two, we had 284 items.
I had expected 80.
Not all 284 were serious. About a third were minor code quality issues — inconsistent naming conventions, functions that had grown too long, test coverage gaps in non-critical paths. These were real but low-priority.
The genuinely alarming items fell into four categories.
Category 1: The Authentication Layer Nobody Owned
Our authentication system had been built in 2019 by a contractor. The contractor left. The engineer who onboarded the system moved to a different team in 2021. By 2025, nobody on the team had a complete mental model of how session management worked. We had a wiki page that was last updated in 2022 and was known to be partially inaccurate.
During the audit, two engineers independently flagged the auth system as “probably has security issues, but I’m afraid to touch it.” That sentence should never exist in a codebase. An auth system that engineers are afraid to touch is an unmonitored blast radius waiting to happen.
We spent three weeks rewriting it from scratch with full documentation, modern JWT practices, and a threat model that was reviewed by an external security consultant. During the rewrite, we found one medium-severity vulnerability (session tokens were not being properly invalidated on logout in certain edge cases involving mobile clients). It had likely existed for three years.
Category 2: Seventeen Copies of the Same Logic
Across six years of growth, the same fundamental patterns had been reimplemented independently seventeen times in different parts of the codebase. Pagination logic. Address formatting. Phone number normalisation. Currency display.
None of the seventeen implementations were wrong in isolation. But when we changed our phone number normalisation to support international formats, we had to update seventeen places instead of one. We missed four of them on the first pass, which caused a bug in the supplier onboarding form that was reported by two suppliers before we caught it.
We consolidated these into a @ib/utils package (more on that in the Turborepo migration post) and deleted approximately 3,400 lines of code. Lines of code deleted is my favourite metric in this business.
Category 3: The Database Query That Everyone Knew Was Slow
There was one specific PostgreSQL query in the order history service that ran on every page load for logged-in users. It joined seven tables. It had been identified as slow in 2022, added to the backlog, and continuously deprioritised in favour of features.
The query took, on average, 380ms. It ran approximately 200,000 times per day. Our database server was spending roughly 21 hours per day executing this single query.
Fixing it took one engineer four days. The fix was a materialised view and two strategic indexes. The query now runs in 12ms. The database load reduction freed up capacity that let us defer a planned horizontal scaling event by at least six months.
The four-day fix had been deprioritised for three years.
Category 4: Test Coverage in the Checkout Flow Was 23%
The checkout flow is the most business-critical part of any e-commerce platform. Ours had 23% test coverage. Engineers were afraid to touch it because they couldn’t be confident that their changes hadn’t broken an edge case they hadn’t thought of.
This fear had a concrete cost: the two engineers who primarily owned the checkout code had not added a major new feature to it in eight months. Every feature request that touched checkout got routed around it — added as middleware, implemented as a parallel flow, deferred indefinitely. The checkout code was effectively frozen by the team’s lack of confidence in it.
We wrote 280 new test cases over three weeks. Coverage went from 23% to 84%. In the two months since, the checkout team has shipped four significant features, compared to zero in the prior eight months.
The Business Case
Here is the framework I used to get the quarter approved. I want to share it because “technical debt is bad” is not a business case. “Technical debt is costing us X” is.
Engineering velocity impact: We tracked story points completed per sprint over the prior year and compared it against our team’s historical throughput. We were running at approximately 62% of our historical velocity. My estimate was that debt accounted for 20–25% of the gap, with the rest explained by team growth and a more complex system. At our engineering cost rate, that represented roughly ₹1.8 crore in annual productivity loss.
Incident frequency: We tracked incidents caused by regressions in poorly-understood code over six months. Debt-related incidents accounted for 34% of total incidents. Each incident had an average resolution cost (engineering time, customer impact, post-mortem) of approximately ₹3.5 lakh. Annualised: ₹2.4 crore.
Onboarding time: New engineers were taking an average of 11 weeks to reach full productivity. At our growth rate, we were onboarding 6 engineers per year. Our target was 6 weeks to productivity, which the team’s poor documentation and complexity was preventing. The lost productivity in those extra 5 weeks per engineer, annualised: ₹80 lakh.
Total estimated annual cost of the existing debt: ₹5 crore.
Cost of the debt reduction quarter: ₹1.4 crore in engineering time.
The ROI argument was straightforward. Even conservative estimates of partial debt remediation paid back the quarter’s investment within 8 months.
What Actually Changed
The obvious changes: the auth system is documented and understood, the shared utilities are consolidated, the slow query is fast, the checkout flow has test coverage.
The less obvious changes are more interesting.
Engineers talk about debt differently. Before the quarter, debt was an individual shame — something you mentioned quietly or not at all, because acknowledging it felt like admitting a failure. After spending twelve weeks treating debt as a first-class engineering concern, the culture shifted. Engineers now file debt items proactively. We’ve adopted a 20% rule: 20% of every sprint’s capacity is reserved for debt work. The backlog hasn’t grown back to 284 items.
Onboarding is measurably faster. The average time to first production PR for new engineers dropped from 11 weeks to 4 weeks. The documentation written during the debt quarter is now part of our onboarding curriculum.
The checkout team is shipping again. Four features in two months that couldn’t get past “we’re afraid to touch checkout” for most of last year.
What I Wish I’d Done Differently
Audit first, don’t fix as you find. The team’s instinct was to start fixing immediately. That instinct is wrong. If you fix as you find, you never build a complete picture of what you have. And without a complete picture, you can’t prioritise. We lost a week to premature fixing before I enforced the documentation-first rule.
Involve product earlier in the prioritisation. Some of the debt items we fixed had product implications — fixing them enabled features that product had given up on. If product had been in the prioritisation meeting, they could have helped us sequence the work to maximise business impact, not just technical impact.
Set a “debt budget” for the next two years before the quarter ends. We got stakeholder agreement for the debt quarter but didn’t use that momentum to establish a formal ongoing budget. The 20% sprint reservation came six weeks later and was harder to get approved. Strike while the coalition is assembled.
The Uncomfortable Conclusion
Technical debt is a leadership failure before it’s an engineering failure.
The authentication system nobody understood didn’t become unknown in a day. The seventeen implementations of the same logic didn’t write themselves. These things happen in the space between deadlines, in the sprint retrospective where “we need to document this” gets added to the backlog and never scheduled, in the roadmap meeting where the refactor gets deprioritised one more time.
The engineers on the team knew the debt was there. They knew it was slowing them down. They needed someone in a position of authority to make it a priority — to quantify it, make the business case, and protect the time.
That’s the job.
— Rohit Mishra, Senior Engineering Manager