Scaling Next.js to 10 Million Product Listings Without Losing Your Mind

How we rearchitected IndustryBuying's storefront to handle a catalog that doubled every eight months — and the ISR, edge caching, and search lessons that made it possible

R
Published in IndustryBuying Engineering · 11 min read ·
64
Scaling Next.js to 10 Million Product Listings Without Losing Your Mind
Our staging environment load test at 2AM. The graph going up is good. Usually.

In 2023, IndustryBuying had roughly 2 million product listings. By early 2026, that number crossed 10 million. Our Next.js storefront, which had been built to handle a reasonable catalog with a reasonable traffic load, started showing its seams around the 4 million mark.

Pages that previously built in 18 minutes took 4 hours. Product pages returned stale data from CDN for up to 24 hours after inventory updates. Our Core Web Vitals, which we had worked hard to get into the green, drifted back into the red. Search felt like asking a librarian to find a book in a library where half the shelves hadn’t been catalogued yet.

This is the story of how we fixed it — not all at once, but methodically, over six months. And what we’d do differently if we started today.


The Fundamental Tension: Static Speed vs. Dynamic Reality

B2B e-commerce has a different data freshness requirement than B2C. When a buyer is evaluating industrial safety gloves for a procurement order, they don’t just want to see the product — they need the price (which changes with MOQ tiers), the stock level (which can swing dramatically intraday), and the supplier rating (which updates after each order fulfilment).

Our original architecture made a reasonable assumption: generate pages statically at build time, invalidate them when products update. That assumption worked at 2 million products. At 10 million, with a catalog that receives 40,000 updates per day, it collapsed.

The build itself became the first casualty.

// The build log that started the conversation
info - Generating static pages (0/10,247,836)
info - Generating static pages (2,561,959/10,247,836)
// ... 3h 42m later ...
error - Build exceeded memory limit

We were trying to SSG a catalog the size of a small country’s product database. At some point, you have to accept that not every page can be born at build time.


Decision 1: Stratified Rendering Strategy

The first thing we did was stop thinking about rendering as a binary choice and start thinking about it as a spectrum calibrated to data volatility.

We ended up with four tiers:

Tier 1 — Full Static (SSG) Pages where the content is truly static: brand pages, category landing pages, editorial content. These build at deploy time and live on the CDN indefinitely. No revalidation needed. ~2,000 pages.

Tier 2 — ISR with long TTL (24h revalidation) Sub-category pages. These change when we add new products to a category or run a promotion, but not more than once a day. revalidate: 86400. ~50,000 pages.

Tier 3 — ISR with short TTL (5-minute revalidation) Individual product pages for slow-moving catalog items. The price and stock might update a few times a day. revalidate: 300. ~8 million pages.

Tier 4 — On-demand ISR + client-side hydration High-velocity SKUs — fast-moving industrial consumables, items under active promotion, anything with real-time stock. We render the page shell via ISR and hydrate the price and stock client-side from a lightweight edge API. Users see the page instantly; the live data fills in within 200ms.

// pages/products/[slug].tsx
export const getStaticProps: GetStaticProps = async ({ params }) => {
  const product = await getProductBySlug(params.slug as string);

  // Tier routing based on product velocity metadata
  const revalidate = product.isHighVelocity
    ? false          // handled client-side, don't revalidate
    : product.updateFrequency === 'daily'
    ? 86400
    : 300;

  return {
    props: {
      product,
      isHighVelocity: product.isHighVelocity,
    },
    revalidate,
  };
};

This single architectural decision cut our full rebuild requirement from 10 million pages to ~4,000 pages — those in Tier 1 and 2 that we actually pre-generate at build time. Build time went from 4 hours to 6 minutes.


Decision 2: On-Demand ISR for Catalog Events

ISR with a TTL is blunt. If a product goes out of stock at 9:03am, your TTL-based ISR might serve “In Stock” until 9:08am. For most products, that’s acceptable. For a product that a buyer is actively evaluating and comparing, it’s a trust problem.

We implemented on-demand ISR using Next.js’s res.revalidate() API, triggered from our NestJS backend whenever specific events occur:

// NestJS event handler
@OnEvent('product.stock.updated')
async handleStockUpdate(event: StockUpdatedEvent) {
  if (event.previousStock > 0 && event.newStock === 0) {
    // Product just went out of stock — revalidate immediately
    await this.revalidationQueue.add({
      path: `/products/${event.productSlug}`,
      priority: 'high',
    });
  }

  if (event.priceDelta > 0.05) {
    // Price moved more than 5% — revalidate
    await this.revalidationQueue.add({
      path: `/products/${event.productSlug}`,
      priority: 'medium',
    });
  }
}

The revalidation queue batches calls to Next.js’s revalidation endpoint, with rate limiting to avoid hammering the app server during bulk catalog updates. We process ~2,000 on-demand revalidations per hour during peak catalog update windows with no meaningful latency impact.


Decision 3: The Search Problem Was Actually a Separate Problem

Our search was built on top of a PostgreSQL full-text search index. At 2 million products, the latency was acceptable — P95 around 180ms. At 10 million, P95 degraded to 1.4 seconds on complex queries. Buyers on B2B platforms run complex queries. “3/4 inch stainless steel hex bolt grade 8 minimum order 500” is a real search query we get hundreds of times a day.

We migrated to Elasticsearch with a custom tokenizer tuned for industrial part numbers and specifications. The engineering details of that migration deserve their own post, but the headline numbers: P95 query latency went from 1.4s to 38ms. Our search-to-product-page conversion rate improved by 22%.

The lesson: don’t try to solve a search problem with a database. They’re different tools for different things, and PostgreSQL full-text search is excellent right up to the point where it isn’t.


Decision 4: Edge Middleware for Personalization Without SSR Overhead

B2B buyers on IndustryBuying have negotiated pricing. A buyer from Company X might see a 12% discount on a product category that we don’t show to anonymous visitors. Rendering personalized pricing server-side meant every product page request had to hit the app server — no CDN caching for logged-in users.

We moved personalization to the edge using Next.js middleware. The middleware reads the buyer’s tier from a JWT claim, sets a cookie, and lets the CDN cache the page normally. The price rendering component reads the cookie and applies the discount calculation client-side using a formula the server passes down in the page props.

// middleware.ts
export function middleware(request: NextRequest) {
  const token = request.cookies.get('ib-auth');
  const pricingTier = token
    ? extractPricingTier(token.value)
    : 'public';

  const response = NextResponse.next();
  response.cookies.set('ib-pricing-tier', pricingTier, {
    httpOnly: false, // needs to be readable by client JS
    secure: true,
    sameSite: 'strict',
    maxAge: 3600,
  });

  return response;
}

This let us cache product pages at the CDN for all users while still showing personalised prices. CDN hit rate for product pages went from 34% (because logged-in users bypassed cache) to 91%.


The Metrics After Six Months

MetricBeforeAfter
Full build time4h 12m6m 20s
Product page TTFB (P95)820ms95ms
CDN hit rate34%91%
Search latency P951,400ms38ms
On-demand revalidations/hr0~2,000
LCP (Core Web Vitals)3.8s1.2s
INP340ms88ms

The LCP and INP improvements came from a combination of the CDN caching improvement (serving from edge instead of app server) and a parallel effort to defer non-critical JavaScript — but that’s another post.


What I’d Do Differently

Start with tiered rendering from day one. The temptation to SSG everything because “it’s fast” is real, but it only works until your catalog outgrows the build budget. Design your rendering strategy around data volatility, not engineering convenience.

Instrument before you optimise. We spent two weeks working on improvements we thought would matter before we had proper RUM (Real User Monitoring) data. Some of those improvements were irrelevant to real user experience. The Elasticsearch migration and the CDN cache hit rate were the two highest-impact changes — and neither of them would have been obvious without measuring the right things first.

On-demand ISR is underused. The Next.js docs make it look like a niche feature. It isn’t. For any catalog or content-heavy application, event-driven revalidation gives you the best of both worlds: static serving speed and near-real-time freshness. The implementation complexity is low; the impact is high.

The platform is in a much better place today. Whether it stays there as the catalog grows to 25 million is a question I’ll be able to answer in another post, probably at another 2AM.


Questions about the ISR setup or the Elasticsearch migration? The comments are open.

— Rohit Mishra, Senior Engineering Manager

64
I

Published in

IndustryBuying Engineering

1.2K followers · Last published Mar 29

Engineering stories, architecture decisions, and hard-won lessons from the team building India's leading B2B e-commerce platform. We write about what works, what doesn't, and what we wish we'd known sooner.

R

Written by

3.2K followers · 84 following

Senior Engineering Manager at IndustryBuying, India's leading B2B e-commerce platform. I write about distributed systems, monorepos, and the hard engineering decisions behind scaling a marketplace. Previously at Flipkart and Razorpay.

More from Rohit Mishra and IndustryBuying Engineering