Adding User Agent restrictions to Cloudflare Pages

1: Introduction
A User-Agent is a short piece of text that a client sends with every HTTP request to identify itself to the server. You’ll find it in the User-Agent header, and it typically describes:

What type of client is making the request (browser, bot, script, mobile app, CLI tool)
The client’s version (e.g., Chrome 121, curl 8.0, PostmanRuntime/7.x)
The operating system or environment (Windows, macOS, Linux, Android, iOS) Restricting access to Cloudflare Pages via User-Agent is a simple protection to control how clients interact with your site. With Cloudflare Functions baked directly into Pages, you can inspect incoming requests and decide exactly which User-Agents you want to allow, block, or redirect - all before the request ever hits your content. While it’s not a replacement for strong authentication, it’s incredibly useful for lightweight protection, staging environments, internal tooling, and bot filtering; especially when you want something fast that runs entirely at the edge. It’s also very important in the world of AI, especially where bots no longer respect robots.txt, so server side protections can be added.

2: Common use cases (bots, internal tools, whitelisting apps)
So, your probably thinking, why would I ever need that, if I want to restrict who can access my content I should just set up a server with SSR to restrict access, which is possible, but in some scenarios very unnecessary. Here are some use cases I came up with: * Preventing scrapers, spam bots, and automated scanners that identify themselves with predictable User-Agents. * Allowing only specific internal tools, scripts, or automation systems to access parts of your Cloudflare Pages site. * Restricting staging or preview deployments so only approved tools (e.g., CI pipelines, QA tools) can fetch them. * Filtering out browsers entirely and allowing only programmatic User-Agents to hit endpoints designed strictly for automation.

3: How Cloudflare Pages Handles Requests
Cloudflare Pages is built around serving static assets at the edge. To extend that behavior, Cloudflare includes Functions, which act as serverless logic that runs before your static content is served.

4: Overview of Cloudflare Pages architecture

Your static files (HTML, CSS, JS, images, etc.) are deployed globally across Cloudflare’s edge network.
Each request is routed to the nearest data center and served from there.
If you add Functions, Cloudflare uses them to intercept and handle requests before passing control to the Pages content. Where Cloudflare Functions sit in the request flow

5: Setting Up User-Agent Restrictions
The way this script works is using a tool in Cloudflare called Cloudflare Functions which allows a serverless site to run with middleware. To set this up, in the root folder of your repository create a /functions folder, which will be automatically picked up by Cloudflare when deployed. After making the functions folder, you can make an _middleware.js file, which basically applies the code to the entirety of the website (don’t worry we will add specifc paths later!) This code should then be added to your _middleware.js file.

export async function onRequest(context) {
  const ua = context.request.headers.get("User-Agent") || "";

  const allowedPatterns = [
    /okhttp/i,
    /reactnative/i
  ];

  const isAllowed = allowedPatterns.some(p => p.test(ua));

  if (ua && !isAllowed) {
    return new Response("Access blocked.", { status: 403 });
  }

  return context.next();
}

6: How to add a custom user agent block?
const allowedPatterns = [/gpt/i,/bot/i]; specifies the User Agent in regex form so that it can be matched by searching for keywords; this array can be expaned or shrunk based off the User Agents you want to block. In my example, I have banned any GPT scrapers.