How to block bots from OpenAI GPTBot

Learn how to use the Vercel WAF to block, rate limit, or challenge traffic from OpenAI GPTBot.
Last updated on February 25, 2025
FunctionsPolicy & Security

OpenAI has an AI-based crawler (also referred to as “GPTBot”) that fetches and indexes website content, which is then used to power the training data of models like GPT series, o series, and more. According to our research, OpenAI generates hundreds of millions of monthly requests, making it the most active AI crawler on the web.

  • Good Bots: Examples include Googlebot, Applebot, and Bingbot. They’re typically transparent about their intentions, respect robots.txt, and help improve search engine visibility.
  • Bad Bots: These bots may scrape content without permission, inflate server usage, or perform malicious activities. Even certain “legitimate” AI crawlers can become unwanted if they exceed fair-use limits or repeatedly crawl error pages (e.g., excessive 404 requests).

We'll focus on the OpenAI GPTBot here.

Vercel Firewall (WAF) is a Web Application Firewall service that lets you:

  • Log or block requests that match certain criteria (IP address, user agent, request path, geolocation, etc.).
  • Challenge suspicious traffic with an automated check (e.g., requiring the visitor’s browser to pass a JavaScript challenge).
  • Rate limit excessive or malicious requests.

All configuration changes are applied globally within ~300ms, and can be rolled back instantly.

OpenAI identifies itself in the User-Agent header. Check your Firewall traffic for entries like Mozilla/5.0 (compatible; GPTBot/1.0; +https://openai.com/gptbot).

If you see repeated requests from that User-Agent, you can be confident it’s OpenAI. Note that AI crawlers occasionally use different sub-variants of user agents over time.

Vercel Firewall allows a few approaches:

If you don’t mind occasional indexing by OpenAI models, but want to cap how frequently it crawls your site, you can rate limit requests from the “GPTBot” user agent. This helps prevent resource overload while still allowing some AI-based traffic.

If you want to slow down less-dedicated bots, you can challenge requests from certain user agents. The challenge forces a short browser-based security check. Most legitimate human traffic (with real browsers) will pass automatically, but a bot that can’t solve or respond to the challenge will be blocked.

If you want to fully prevent GPTBot from crawling your site—and avoid incurring data transfer or function usage for these requests—you can persistently block it. Requests that match your block rule won’t reach your Vercel Functions or static pages, so you won’t be charged.

Vercel provides several Firewall Templates you can clone or learn from:

  1. Rate Limit API Requests
  2. Block OFAC-Sanctioned Countries
  3. Block WordPress URLs
  4. Block Bad Bots
  5. Block AI Bots

To block OpenAI GPTBot specifically, you can start with the “Block AI Bots Firewall Rule” template and modify it for the defined user agent.

  1. Navigate to “Firewall” in your Vercel project dashboard.
  2. Click “Add Rule” (or “Create Rule” if using templates).
  3. Select the “Block AI Bots” template (or a blank custom rule).
  4. Match Condition: For “User Agent” contains “GPTBot”.
  5. Action: Choose “Deny” (to block) or “Challenge” (to verify) or “Rate Limit” (to limit).
  6. Review & Publish changes.

Note: Once published, changes take effect globally in ~300ms. You can always roll back if you block or challenge traffic unintentionally.

These posts provide background on how advanced bots handle JavaScript, distribution, and crawling inefficiencies (e.g., excessive 404s).

Couldn't find the guide you need?