PrestaShop faceted search creating thousands of URLs, Plesk blocking Bunny CDN IPs, and Google deindexing pages

I recently ran into a messy PrestaShop problem that started as a simple crawler issue and turned into a bigger indexing problem. The store itself had around 2,000 real pages: products, categories, CMS pages, and a few other normal shop URLs. But the PrestaShop faceted search/filter module was creating hundreds of thousands of possible filter combinations.

This happened on the latest PrestaShop 8 version available for this store at the time. I have not checked yet whether PrestaShop 9 / the current ps_facetedsearch version behaves differently, so I am not claiming this is confirmed in v9. The issue and workaround below are based on what I saw in PrestaShop 8.

This is not only happening on this store. There is already an open PrestaShop GitHub issue about large CPU/database spikes caused by faceted search URLs with many attributes. In that report, the store owner described MariaDB CPU jumping from low usage to very high usage, 524/500 errors, and suspected Googlebot crawling possible faceted subpages. The issue is still open and labeled as a ps_facetedsearch bug.

A normal category URL looked like this:

/category-name

But the filter module also generated URLs like this:

/category-name?q=Color-Red /category-name?q=Color-Red-Size-M /category-name?q=Color-Red-Size-M-Material-Cotton /category-name?q=Brand-X-Color-Red-Size-M-Material-Cotton

Multiply this by all categories, brands, colors, sizes, materials, availability filters, sorting options, and result limits, and suddenly a small shop can expose 500,000+ crawlable URLs. That would already be bad for SEO and crawl budget, but in this case, it got worse.

The store was running on a Plesk server behind Bunny CDN. Bots were crawling all these generated filter URLs, the requests were going through Bunny, and Plesk / Fail2Ban started blocking Bunny CDN IPs. Once that happened, Googlebot started getting 500 errors when trying to crawl the site.

The result was ugly: Google started dropping/deindexing URLs because it could not reliably access them anymore. We are still working on getting all of those URLs reindexed.

So this was not just a small SEO cleanup. It became a crawl, hosting, CDN, server protection, and indexation problem at the same time. The goal was not to block Google, Bing, ChatGPT, Claude, Facebook, or other useful crawlers from the whole site. The goal was to let them access the real pages, but stop them from wasting resources on useless PrestaShop filter combinations.

The solution

The fix was layered:

  1. Stop exposing faceted search URLs as normal crawlable links.
  2. Keep filters working for real users.
  3. Add robots.txt rules for filter parameters.
  4. Return 410 Gone for bot requests to useless filter URLs.
  5. Make sure Plesk / Fail2Ban does not block Bunny CDN IPs.
  6. Keep clean product, category, CMS, and blog URLs crawlable.
  7. Submit clean sitemaps and wait for Google to reindex the affected URLs.

Clean URLs should stay crawlable: /category-name, /product-name, /cms-page, /blog-post

Filter URLs should not be crawled: /category-name?q=Color-Red-Size-M, /category-name?order=price.asc, /category-name?resultsPerPage=72

The important lesson is: Do not block the whole bot. Block the useless URLs. Do not let Plesk block your CDN.

1. Remove crawlable filter links from PrestaShop faceted search

The biggest problem was that the filter sidebar was outputting normal links:

<a href="/category-name?q=Color-Red-Size-M">Red size M</a>

That is a crawlable link. Even if the link has rel="nofollow", the URL is still in href, so crawlers can discover it. The better setup is to keep the filter working for users, but stop putting the filter URL in a normal href.

Instead of this:

<a href="/category-name?q=Color-Red-Size-M">Red size M</a>

we want something like this:

<span>Red size M</span>

while the actual filter URL is kept only in a JavaScript data attribute:

<input data-search-url="/category-name?q=Color-Red-Size-M" type="checkbox">

This way, PrestaShop JavaScript can still update the product listing when a real user clicks the filter, but crawlers no longer see thousands of normal <a href="..."> filter links.

2. Override the PrestaShop faceted search templates

Do not edit the module files directly. Instead, create a theme override:

/themes/YOUR_THEME/modules/ps_facetedsearch/views/templates/front/catalog/

The two files I changed were: facets.tpl, and active-filters.tpl

If the folder does not exist, create it. The original faceted search template usually outputs something like this:

<a href="{$filter.nextEncodedFacetsURL}" class="_gray-darker search-link js-search-link" rel="nofollow" >{$filter.label}</a>

That href is the problem. The safe version keeps the original PrestaShop structure as much as possible, but removes the crawlable href. For checkbox and radio filters, the visible filter text can be changed to a non-link element:

<span class="_gray-darker search-link">{$filter.label}</span>

The input still keeps the real filter URL:

<input data-search-url="{$filter.nextEncodedFacetsURL|escape:'html':'UTF-8'}" type="checkbox" {if $filter.active}checked{/if} >

This worked because PrestaShop’s existing faceted search JavaScript already knows how to use input[data-search-url]. The first version I tried changed too much markup and broke the filter sidebar. The working solution was the minimal-change one: keep the original template structure and only remove the crawlable filter URLs from normal anchors.

3. Check the page source

After uploading the template overrides, clear the PrestaShop cache:

Advanced Parameters → Performance → Clear cache

If needed, also remove cached files from: /var/cache/prod/, and /var/cache/dev/

Then open a category page and view the source. Search for: ?q=

It is okay if you see filter URLs inside: data-search-url="..."

But you should not see filter URLs inside: href="..."

That is the important part. Users can still filter products, but crawlers are no longer being shown thousands of normal faceted search links.

4. Add robots.txt rules for filter parameters

The template change stops advertising the filter URLs as normal links. The next step is to tell compliant crawlers not to crawl filter parameters.

Edit: /httpdocs/robots.txt

Add:

User-agent: * Disallow: /*?q= Disallow: /*&q= Disallow: /*?selected_filters= Disallow: /*&selected_filters= Disallow: /*?filter= Disallow: /*&filter= Disallow: /*?order= Disallow: /*&order= Disallow: /*?resultsPerPage= Disallow: /*&resultsPerPage= Disallow: /*?n= Disallow: /*&n=

Keep your sitemap line at the bottom:

Sitemap: https://example.com/sitemap.xml

I would not block every URL with a query string unless you are completely sure you do not need any parameter URLs. For example, this can be too broad: Disallow: /*?*

For this case, I only wanted to block PrestaShop filter, sort, and page-size parameters.

5. Return 410 Gone for bot filter requests

Robots.txt is useful, but it is not enforced. Respectful crawlers will follow it. Some bots, scrapers, preview fetchers, SEO tools, and AI crawlers may still request the URLs, so I also added a server-level rule.

In PrestaShop, edit: /httpdocs/.htaccess

Put this near the top, before the main PrestaShop rewrite rules:

# Block search/social/AI bots from PrestaShop faceted/sort URLs <IfModule mod_rewrite.c> RewriteEngine On RewriteCond %{HTTP_USER_AGENT} (Googlebot|bingbot|CCBot|ChatGPT|GPTBot|OAI-SearchBot|anthropic-ai|ClaudeBot|Claude-SearchBot|Claude-User|Google-CloudVertexBot|Omgilibot|Omgili|FacebookBot|Meta-ExternalAgent|Meta-ExternalFetcher|Diffbot|DuckAssistBot|AI2Bot|Bytespider|PerplexityBot|ImagesiftBot|Kangaroo-Bot|cohere-ai|cohere-training-data-crawler|PanguBot|Timpibot|Webzio-Extended|YouBot|Amazonbot) [NC] RewriteCond %{QUERY_STRING} (^|&)(q|selected_filters|filter|order|resultsPerPage|n)= [NC] RewriteRule ^ - [G,L] </IfModule>

[G] returns: 410 Gone

So if one of those bots requests this: /category-name?q=Color-Red-Size-M

Apache returns 410. But clean pages still return 200:

/category-name, /product-name, /cms-page, /blog-post

This is exactly what I wanted: I do not want Googlebot wasting time on filter combinations, but I do want Googlebot crawling real product and category pages.

6. Optional: block earlier in Bunny or Cloudflare

In this case, the site was behind Bunny CDN, but the same principle applies to Cloudflare or any other CDN/WAF setup. The best place to stop heavy bot traffic is before it reaches the origin server.

If your CDN/WAF supports custom rules, block or challenge requests where the query string contains filter parameters like:

q=, selected_filters=, filter=, order=, resultsPerPage=, n=

For example, the logic is: 

If request query contains q= or selected_filters= or filter= or order= or resultsPerPage= or n= and the user agent is a known bot then block

CDN usually returns 403, not 410, but that is fine. The point is to reduce the load before the request reaches PrestaShop. I still keep the Apache rule as a backup.

7. Fix the Plesk / Bunny CDN problem

This was the part that turned the issue from annoying into serious. Because the site was behind Bunny CDN, the origin server saw many requests coming from Bunny IPs. When crawlers started hitting thousands of generated filter URLs, Plesk / Fail2Ban eventually blocked Bunny CDN IPs.

That meant real requests coming through Bunny could not reliably reach the origin anymore, and that meant Googlebot started seeing 500 errors. If Google repeatedly sees server errors, it can start dropping URLs from the index. That is what happened here.

So, besides fixing the filter URLs, you also need to make sure Plesk does not block your CDN. In Plesk / Fail2Ban, check:

Tools & Settings → IP Address Banning / Fail2Ban

Look at banned IPs and trusted IPs. If Bunny CDN IPs are banned, unban them. Then add Bunny CDN IP ranges to trusted IPs if your setup requires it.

The safer long-term setup is:

  1. Only allow origin traffic from Bunny CDN.
  2. Make Plesk trust Bunny as the proxy.
  3. Restore the real visitor IP in logs if possible.
  4. Do not let Fail2Ban ban Bunny CDN IPs.
  5. Block abusive traffic at the CDN/WAF layer instead of the origin.

The important part is that the CDN must not get banned by the origin server. If the origin blocks the CDN, Googlebot and real users can both get errors.

8. Test the server response

Test a bad filter URL with a fake bot user agent:

curl -I -A "GPTBot" "https://example.com/category-name?q=Color-Red"

You should see: HTTP/2 410

or, if your CDN blocks it first: HTTP/2 403

Then test a clean category URL:

curl -I -A "GPTBot" "https://example.com/category-name"

That should return: HTTP/2 200

Also test Googlebot:

curl -I -A "Googlebot" "https://example.com/category-name"

Clean URLs must return 200. Filter URLs can return 410. That is the whole point.

9. Submit clean sitemaps again

After the server errors are fixed, submit a clean sitemap in Google Search Console. The sitemap should contain only real, useful URLs: product pages, category pages, CMS pages, blog posts, and important landing pages.

It should not contain:

?q=, ?order=, ?resultsPerPage=, ?selected_filters=, ?filter=

Do not submit faceted search URLs in the sitemap. Also, use URL Inspection in Google Search Console for important pages that were dropped. If the page now returns 200 and is indexable, request indexing again.

Recovery is not instant. If Google dropped many URLs because of repeated 500 errors, it can take time for them to come back.

Why canonical tags were not enough

Canonical tags are useful, but they were not enough here. A filtered page can have:

<link rel="canonical" href="https://example.com/category-name">

That tells crawlers which page should be treated as the main version. But the crawler still has to request the filtered page to see the canonical tag.

When you have hundreds of thousands of generated URLs, that is already too late. The server still has to handle the request. PrestaShop still has to generate the page. The CDN still has to pass traffic. Plesk / Fail2Ban can still get triggered.

Canonical tags help with duplicate content, but they do not solve the server-load problem.

Why nofollow was not enough

rel="nofollow" was also not enough. This is still a discoverable URL:

<a href="/category-name?q=Color-Red" rel="nofollow"> Red </a>

The URL is still in href. For this case, I did not want faceted search URLs in href at all. Removing the crawlable link was the important fix.

Why robots.txt was not enough

Robots.txt is a request, not an enforcement. Good crawlers may follow it. Other bots may ignore it. Some tools may request old URLs they already discovered before the robots.txt update.

So I used robots.txt, but I did not rely on it alone. The final setup uses several layers:

template override → stop exposing filter URLs as crawlable links robots.txt → tell crawlers not to crawl filter parameters .htaccess → return 410 if bots request filter URLs anyway CDN/WAF → block heavy requests before they hit the origin Plesk/Fail2Ban → do not block Bunny CDN IPs Google Search Console → resubmit clean URLs for reindexing

Related PrestaShop GitHub issue

There is already an open PrestaShop issue for huge CPU/database usage spikes with faceted search and many attributes. The report describes random category URLs with several faceted filters causing MariaDB CPU spikes and 524/500 errors, with the author suspecting Googlebot crawling possible faceted subpages.

That issue is focused mainly on performance. My case was similar, but the impact went further: the crawl storm went through Bunny CDN, Plesk/Fail2Ban blocked Bunny IPs, Googlebot started seeing 500 errors, and real URLs were deindexed.

This happened on PrestaShop 8. I have not tested the same case in PrestaShop 9 yet, so it is possible the behavior is different there. But for PrestaShop 8 stores using ps_facetedsearch, this is something I would definitely check.

So I think this should be treated as more than a slow-query issue. It is also an SEO, crawl-budget, and server-stability issue.

Related issue: https://github.com/PrestaShop/PrestaShop/issues/34415

Final setup

The final setup is:

  1. Products, categories, CMS pages, and blog posts stay crawlable.
  2. Filter URLs are removed from normal href links.
  3. Filter functionality still works for users.
  4. robots.txt disallows filter parameters.
  5. Apache returns 410 for bot requests to filter URLs.
  6. CDN/WAF blocks heavy bot filter requests before they hit the server.
  7. Plesk / Fail2Ban does not block Bunny CDN IPs.
  8. Googlebot gets 200 responses again on real pages.
  9. Clean sitemaps are submitted for reindexing.

Ideally, PrestaShop should provide this as a core/module option in ps_facetedsearch. Merchants should not have to override templates manually to prevent crawl traps.

A good built-in option would be:

  • render filters as buttons/spans instead of crawlable href links
  • keep filter URLs in data-search-url for AJAX
  • allow only selected facets to be crawlable
  • prevent multi-filter combinations from being exposed as links
  • provide built-in canonical/noindex controls for filtered pages

To recap

Do not block the whole bot. Block the useless URLs. Do not let Plesk block your CDN IPs. Make sure Googlebot gets 200 responses on real pages again.

This keeps the real shop pages available for Google, Bing, ChatGPT, Claude, Facebook previews, and other useful crawlers, while preventing PrestaShop faceted search from creating a giant crawl trap.

Now the remaining job is recovery: keeping the server stable, making sure Googlebot no longer sees 500 errors, submitting clean sitemaps, and waiting for Google to crawl and reindex the affected URLs again.