Search Fundamentals

10 min read

Crawling & Indexing Product Pages

Google crawling your pages does not guarantee they will appear in search results. Crawling and indexing are two separate processes, and understanding the gap between them is critical for any store dealing with thousands of product URLs.

Crawling vs. Indexing: The Difference That Matters

Crawling means Googlebot visited your page and downloaded its content. Indexing means Google analyzed that content, found it worthy of inclusion, and stored it in its search index. A page can be crawled but not indexed, which happens more often than most store owners realize.

Think of crawling as Google walking through every aisle in your physical store. Indexing is Google deciding which products are worth putting on the shelf for shoppers to find. If a product page has thin content, duplicates another page, or has technical issues, Google may crawl it and then decide it does not deserve a spot in the index.

For a typical online store with 20,000 product pages, we commonly see 30% to 50% of those pages go unindexed. That means thousands of products are invisible in search results. The gap between crawled and indexed pages is where most ecommerce SEO opportunities hide.

Crawled: Googlebot visited and downloaded the page content
Indexed: Google analyzed and stored the page in its search database
Crawled but not indexed: Google saw the page but chose not to include it
Not crawled: Google has not visited the page yet or skipped it intentionally

Why Google Skips Indexing Product Pages

The most common reason Google refuses to index a product page is duplicate or near-duplicate content. When 500 products from the same manufacturer share identical descriptions differing only in the product name, Google sees little reason to index all 500 versions. It picks a few and ignores the rest.

Thin content is the second biggest culprit. A product page with a 20-word description, a price, and a buy button provides almost no information for Google to evaluate. Compare that to a competitor whose product page includes a 300-word unique description, customer reviews, specification tables, and usage instructions. Google will index the richer page and skip the thin one.

Technical signals can also prevent indexing. Pages that load slowly, return soft 404 errors, have conflicting canonical tags, or are blocked by noindex directives will never make it into the index regardless of their content quality.

Page quality signals matter too. If your site has a high ratio of low-quality pages, Google may reduce the crawl rate for your entire domain, making it harder for even your good pages to get indexed promptly.

Duplicate or near-duplicate descriptions across product pages
Thin content with fewer than 50 words of unique text
Slow page load times exceeding 5 seconds
Conflicting or incorrect canonical tags
Noindex tags applied accidentally by plugins or theme settings
Tip

Run a crawl with Screaming Frog or Sitebulb and filter for pages with fewer than 100 words of body text. Those thin pages are your top candidates for content improvement or consolidation.

Canonical Tags and Duplicate Content in Ecommerce

Canonical tags tell Google which version of a page is the original when multiple URLs display similar or identical content. For ecommerce sites, canonicalization is not optional. Without it, Google must guess which URL to index, and it often guesses wrong.

Product variants create the most common canonical scenario. A blue t-shirt at /products/cotton-tee?color=blue and a red version at /products/cotton-tee?color=red may share 90% of their page content. If these are truly the same product with a color selector, both URLs should canonicalize to the main product page at /products/cotton-tee. If the color variants have meaningfully different search demand (people search specifically for "blue cotton tee"), they may warrant separate indexed pages.

Faceted navigation generates even more canonical complexity. A URL like /shoes?size=10&color=black&brand=nike&sort=price-low is one of potentially millions of filter combinations. These filtered views should either canonical back to the main category page or be blocked from indexing entirely. The choice depends on whether that specific filter combination has genuine search demand.

We see stores make two common canonical mistakes. First, circular canonicals where page A canonicalizes to page B and page B canonicalizes back to page A. Second, canonicalizing all product variants to a single parent when each variant has independent search volume, essentially hiding rankable pages from Google.

Managing Index Bloat From Filters and Facets

Index bloat occurs when Google indexes thousands of low-value URLs that dilute your site's overall quality signals. For ecommerce, the primary source of index bloat is faceted navigation that generates filterable URLs.

Consider a furniture store with 200 products in the "sofas" category. If shoppers can filter by color (10 options), material (8 options), price range (5 brackets), and seating capacity (4 options), the possible URL combinations reach 1,600 before accounting for multi-select filters. Most of these filtered views show the same small set of products in different orders.

The standard approach to controlling index bloat involves three layers. First, use robots.txt to block Googlebot from crawling the most obvious low-value filter patterns. Second, apply noindex tags to filtered pages that Googlebot can still reach through other paths. Third, use canonical tags to point filtered views back to the main category page.

A more surgical approach is to selectively allow indexing on filter combinations that match real search queries. If people search for "leather sofas" in meaningful numbers, the /sofas?material=leather URL might be worth indexing. But /sofas?material=leather&color=brown&seats=3 almost certainly is not.

Shopify stores handle this differently from WooCommerce or Magento because Shopify does not generate filter URLs by default. Third-party filter apps like Smart Product Filter create these URLs, and each app handles canonical tags and indexation controls differently. Always verify how your filter app manages these technical details.

Audit your indexed URL count in GSC and compare to your intended indexable pages
Block low-value filter patterns in robots.txt as the first line of defense
Apply noindex to filtered pages that get crawled despite robots.txt rules
Selectively index high-value filter combinations with proven search demand
Review third-party filter app settings for canonical and indexation handling

Checking Indexation Status in Google Search Console

Google Search Console provides two primary tools for monitoring indexation. The Pages report (formerly Coverage report) shows how many of your pages are indexed and why the rest were excluded. The URL Inspection tool lets you check the status of individual pages.

In the Pages report, focus on the "Not indexed" tab. Google groups excluded pages by reason: "Crawled - currently not indexed", "Discovered - currently not indexed", "Duplicate without user-selected canonical", "Excluded by noindex tag", and several others. Each reason requires a different fix.

"Crawled - currently not indexed" means Google visited the page but chose not to add it to the index. This usually signals a content quality issue. Improving the page's content, adding unique descriptions, or enhancing it with reviews and structured data can help.

"Discovered - currently not indexed" means Google knows the URL exists but has not bothered to crawl it yet. This indicates low crawl priority, often caused by weak internal linking or the page being too deep in the site hierarchy.

The URL Inspection tool shows you exactly what Google sees when it crawls a specific page. Use it to verify that your canonical tags are being respected, that your page is rendering correctly, and that no accidental noindex tags are blocking indexation. We recommend inspecting 10 to 20 representative product pages monthly to catch issues early.

Tip

Export the "Not indexed" data from GSC as a spreadsheet and categorize pages by type (product, category, filter, blog). This reveals whether your indexation problems are concentrated in a specific page type, making the fix more targeted.

Practical Steps to Improve Product Page Indexation

Start by auditing which product pages are currently indexed. Use the site: operator in Google (site:yourstore.com/products/) to get a rough count, then cross-reference with GSC data for accuracy. If less than 70% of your product pages are indexed, you have work to do.

Write unique product descriptions for your top-selling and highest-margin products first. These pages have the most revenue potential from organic search. Even adding 150 to 200 words of unique, descriptive content per product page can make the difference between indexed and ignored.

Consolidate pages that serve no independent purpose. If you have 30 color variants of the same product and none of those color-specific terms have search volume, consolidate them under a single product page with a color selector. That one strong page will outperform 30 thin pages every time.

Strengthen internal linking to product pages you want indexed. Link from related blog posts, from the homepage's featured products section, and from other product pages via "customers also bought" or "related products" widgets. Each additional internal link signals to Google that the page matters.

Finally, keep your sitemap clean. Remove URLs that return 404 errors, that are set to noindex, or that you have decided to consolidate. A lean sitemap that only contains pages you genuinely want indexed gives Google a clearer picture of your site's structure.

Audit current indexation rates using GSC and site: operator
Write unique descriptions for top-selling products first
Consolidate thin variant pages under single strong product pages
Build internal links from blog posts, homepage, and related products
Clean your sitemap to include only genuinely indexable URLs

Work Together With SEO Experts who understand ecommerce

World’s first Ecom-founded SEO agency

Crawling & Indexing Product Pages - EcomSEO Academy | EcomSEO