How We Cut API Latency by 68% Using NestJS Caching Interceptors and Redis

A step-by-step account of profiling, layering cache strategies, and the three mistakes we made before we got it right

Published in IndustryBuying Engineering · 9 min read · Mar 8, 2026

Last October, our on-call engineer got paged at 3am because API response times had spiked above 2 seconds on the product detail endpoint. The root cause turned out to be a database query that had regressed after a schema migration, but what bothered me more than the incident itself was what we found when we pulled up the flame graphs: we had essentially no caching between our NestJS API and PostgreSQL. Every request was a round-trip to the database.

For 90% of our endpoints, that was fine — the queries were fast and traffic was modest. But the product detail endpoint serves 4 million requests a day. At that volume, even a 200ms query becomes a significant cost, and any query regression becomes an instant incident.

We spent the next six weeks building a proper caching architecture. P95 latency on that endpoint went from 340ms to 109ms. Database query load during peak hours dropped by 58%. This is what we built and how.

First: Profile, Don’t Guess

The worst caching mistake is caching the wrong things. Before touching a line of code, I asked the team to instrument every NestJS controller with response time tracking and build a heatmap of where time was actually being spent.

We used a custom NestJS interceptor to record timing per endpoint:

@Injectable()
export class TimingInterceptor implements NestInterceptor {
  constructor(private readonly metrics: MetricsService) {}

  intercept(context: ExecutionContext, next: CallHandler): Observable<any> {
    const start = Date.now();
    const endpoint = context.getHandler().name;

    return next.handle().pipe(
      tap(() => {
        const duration = Date.now() - start;
        this.metrics.recordLatency(endpoint, duration);
      }),
    );
  }
}

After two days of data collection, the picture was clear:

GET /products/:id — P95 340ms, called 46K times/hour at peak
GET /products/:id/related — P95 580ms, called 18K times/hour at peak
GET /categories/tree — P95 210ms, called 90K times/hour at peak (the category navigation tree, requested on every page load)
GET /suppliers/:id — P95 120ms, called 12K times/hour

Those top three were our targets. Everything else could wait.

Layer 1: In-Memory Cache for the Category Tree

The category tree never changes between deployments. It has ~2,000 nodes and is loaded on every page view. Caching it in memory was a no-brainer.

NestJS has a built-in cache module backed by cache-manager. For data that’s truly static between deploys, in-memory caching is lower latency than Redis because there’s no network hop:

// category.service.ts
@Injectable()
export class CategoryService {
  private categoryTreeCache: CategoryNode[] | null = null;
  private cacheBuiltAt: number | null = null;
  private readonly CACHE_TTL = 60 * 60 * 1000; // 1 hour

  async getCategoryTree(): Promise<CategoryNode[]> {
    const now = Date.now();

    if (
      this.categoryTreeCache &&
      this.cacheBuiltAt &&
      now - this.cacheBuiltAt < this.CACHE_TTL
    ) {
      return this.categoryTreeCache;
    }

    const tree = await this.buildCategoryTree();
    this.categoryTreeCache = tree;
    this.cacheBuiltAt = now;
    return tree;
  }
}

This is deliberately simple. No Redis, no cache invalidation complexity — when the category tree changes, a deployment resets the in-memory cache naturally. The GET /categories/tree endpoint went from 210ms P95 to 3ms P95.

Lesson 1: Don’t reach for Redis when process memory will do. Redis introduces a network hop (~1ms in a co-located setup, up to 15ms cross-AZ). For read-heavy, write-rare data that lives comfortably in memory, local caching wins on latency.

Layer 2: Redis for Product Data

Product data is different from category data. It changes frequently (price updates, stock changes), there are millions of records, and we run multiple app server instances — so we need a shared cache.

We built a Redis-backed caching interceptor using the @Cacheable decorator pattern:

// cache.decorator.ts
export function Cacheable(ttl: number, keyPrefix?: string) {
  return function (
    target: any,
    propertyKey: string,
    descriptor: PropertyDescriptor,
  ) {
    const originalMethod = descriptor.value;

    descriptor.value = async function (...args: any[]) {
      const cacheService = this.cacheService as CacheService;
      const cacheKey = `${keyPrefix ?? propertyKey}:${JSON.stringify(args)}`;

      const cached = await cacheService.get(cacheKey);
      if (cached) return cached;

      const result = await originalMethod.apply(this, args);
      await cacheService.set(cacheKey, result, ttl);
      return result;
    };

    return descriptor;
  };
}

Usage in the service layer:

// product.service.ts
@Cacheable(300, 'product:detail') // 5-minute TTL
async getProductById(id: string): Promise<Product> {
  return this.productRepository.findOneOrFail({ where: { id } });
}

@Cacheable(120, 'product:related') // 2-minute TTL
async getRelatedProducts(id: string): Promise<Product[]> {
  return this.productRepository.findRelated(id);
}

The TTL values were chosen based on the update frequency data from our catalog team: product details typically update 2–3 times per day, related products even less. A 5-minute stale window is imperceptible to users and dramatically reduces database load.

Layer 3: Cache Invalidation on Writes

The classic joke is that cache invalidation is one of the two hard problems in computer science. We kept ours simple by being opinionated about where invalidation happens.

Every write to the product catalogue goes through a ProductWriteService. That service is the only place that can mutate product data — it’s enforced by visibility rules in our NestJS module structure. So it’s also the only place we need to think about cache invalidation:

// product-write.service.ts
@Injectable()
export class ProductWriteService {
  async updateProduct(
    id: string,
    dto: UpdateProductDto,
  ): Promise<Product> {
    const updated = await this.productRepository.save({ id, ...dto });

    // Invalidate all cache keys that reference this product
    await Promise.all([
      this.cacheService.del(`product:detail:["${id}"]`),
      this.cacheService.del(`product:related:["${id}"]`),
      // Invalidate category listing caches that include this product
      this.invalidateCategoryListings(updated.categoryId),
    ]);

    return updated;
  }
}

The invalidateCategoryListings method uses Redis’s SCAN to find and delete all keys matching category:listing:${categoryId}:*. This is slower than a direct DEL, but category listings are invalidated infrequently (when products change categories or go in/out of stock), so the latency cost is acceptable.

The Three Mistakes We Made

Mistake 1: We cached at the controller layer, not the service layer.

Our first implementation put the cache interceptor on the NestJS controller. This means the cache key included the full request URL, including query parameters. A request for /products/abc123?utm_source=email would create a separate cache entry from /products/abc123?utm_source=organic — the same product, two cache entries, double the memory, half the hit rate.

Moving the cache to the service layer means the cache key is purely about the data identifier, not the HTTP request context. Hit rate went from 41% to 78% immediately.

Mistake 2: We set the same TTL for everything.

Our first pass used a single 300-second TTL across all endpoints. Products that update 20 times a day were cached for 5 minutes and served stale data to users mid-purchase. Products that barely change were evicted from cache unnecessarily.

We spent time analysing the update frequency distribution across our catalog and set TTLs accordingly: 30 seconds for high-velocity SKUs, 5 minutes for standard products, 1 hour for effectively-static catalog items.

Mistake 3: We forgot about the cache warming problem.

After deploying the Redis cache, every restart caused a thundering herd: the cache was empty, and every request hit the database simultaneously. We saw a 40-second spike to 800ms P99 after each deployment.

The fix was a cache warmer that runs as a NestJS startup hook, pre-loading the top 10,000 most-requested product IDs (sourced from our analytics pipeline) before the service becomes available:

// cache-warmer.service.ts
@Injectable()
export class CacheWarmerService implements OnApplicationBootstrap {
  async onApplicationBootstrap() {
    const topProducts = await this.analyticsService.getTopProductIds(10000);

    await pMap(topProducts, async (id) => {
      await this.productService.getProductById(id);
    }, { concurrency: 50 });

    this.logger.log(`Cache warmed with ${topProducts.length} products`);
  }
}

The 40-second post-deploy spike is gone. P99 is stable within 2 seconds of deployment.

Results

Endpoint	Before (P95)	After (P95)	Cache Hit Rate
`GET /products/:id`	340ms	109ms	82%
`GET /products/:id/related`	580ms	94ms	79%
`GET /categories/tree`	210ms	3ms	~100%
Database queries/hour (peak)	186,000	79,000	—

The overall API latency improvement (averaging across all endpoints) was 68%. Database load reduction was 58% during peak hours. Both numbers exceeded our target.

What’s Next

We’re currently experimenting with HTTP-level caching using stale-while-revalidate headers on the product API, which would let our CDN serve stale responses while revalidating in the background — eliminating cache misses from the user experience entirely. Early results are promising; that’ll be its own post.

If you’re running NestJS at scale and haven’t audited your caching strategy recently, start with the timing interceptor. Measure first. The answer to “what should I cache?” is always in the data.

— Rohit Mishra

Published in

IndustryBuying Engineering

1.2K followers · Last published Mar 29

Engineering stories, architecture decisions, and hard-won lessons from the team building India's leading B2B e-commerce platform. We write about what works, what doesn't, and what we wish we'd known sooner.

Written by

Rohit Mishra

3.2K followers · 84 following

Senior Engineering Manager at IndustryBuying, India's leading B2B e-commerce platform. I write about distributed systems, monorepos, and the hard engineering decisions behind scaling a marketplace. Previously at Flipkart and Razorpay.

How We Cut API Latency by 68% Using NestJS Caching Interceptors and Redis

First: Profile, Don’t Guess

Layer 1: In-Memory Cache for the Category Tree

Layer 2: Redis for Product Data

Layer 3: Cache Invalidation on Writes

The Three Mistakes We Made

Results

What’s Next

More from Rohit Mishra and IndustryBuying Engineering

How We Migrated 4 Codebases to Turborepo — and Cut Our Go-to-Market Time by 60% Using Claude

Scaling Next.js to 10 Million Product Listings Without Losing Your Mind

Building Real-Time Inventory Tracking with WebSockets, Redis Pub/Sub, and a Lot of Humility

We Spent a Full Quarter Paying Down Technical Debt. Here Is What We Found.