Cloudflare Accuses Perplexity of Using Stealth Crawlers to Bypass Content Access Rules

Cloudflare Accuses Perplexity of Using Stealth Crawlers to Bypass Content Access Rules

TLDR : Cloudflare, an internet infrastructure provider, accuses Perplexity of dubious indexing practices, using bots masquerading as Google Chrome to access content forbidden to its exploration bots. Despite blocking measures, Perplexity managed to obtain detailed information on newly created sites, leading Cloudflare to enhance its protections and remove Perplexity from its list of verified bots.

Cloudflare, an internet infrastructure provider, claims to have identified questionable indexing practices by Perplexity to fuel what it calls "its conversational response engine." According to a report published on its official blog, the startup allegedly uses bots masquerading as Google Chrome on macOS to access content explicitly forbidden to its declared crawlers.
Cloudflare says it received complaints from clients who, despite specifically blocking Perplexity's crawlers via robots.txt files or firewall rules (WAF), found that the company still had access to their content.
It decided to conduct a series of tests and, for this purpose, created new sites and implemented the same access restriction measures for Perplexity's official bots. These sites were newly registered and indexed by no search engine. Despite this, Perplexity was able to provide detailed information about the hosted content.
Cloudflare reports observing that when PerplexityBot and Perplexity-User were blocked, the platform adapted its methods: modifying the user-agent (the identification string sent to the website to indicate who it is), rotating IP addresses, and changing ASN (autonomous system number) to circumvent blocking measures.
The company notes that the IP addresses used were not within the range officially communicated by Perplexity, adding that "this activity was observed across tens of thousands of domains and millions of requests per day."
Emphasizing that the web operates on trust, it decided to remove Perplexity from its list of verified bots and strengthened its protections to block stealth crawlers.

Perplexity's Defense

Perplexity denies the accusations of stealth collection or bypassing robots.txt and claims that, unlike traditional crawlers, its agents operate only at the user's request, without indexing or data storage. According to Perplexity, Cloudflare's analysis is based on a technical misunderstanding between its different services and a profound misunderstanding of how AI agents work, questioning its ability to judge legitimate traffic.