Blogroll: CloudFlare

I read blogs, as well as write one. The 'blogroll' on this site reproduces some posts from some of the people I enjoy reading. There are currently 19 posts from the blog 'CloudFlare.'

Disclaimer: Reproducing an article here need not necessarily imply agreement or endorsement!

Subscribe to CloudFlare feed CloudFlare
Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet.
Updated: 1 year 3 weeks ago

Reduce origin load, save on cloud egress fees, and maximize cache hits with Cache Reserve

Tue, 15/11/2022 - 14:00
Reduce origin load, save on cloud egress fees, and maximize cache hits with Cache ReserveReduce origin load, save on cloud egress fees, and maximize cache hits with Cache Reserve

Earlier this year, we introduced Cache Reserve. Cache Reserve helps users serve content from Cloudflare’s cache for longer by using R2’s persistent data storage. Serving content from Cloudflare’s cache benefits website operators by reducing their bills for egress fees from origins, while also benefiting website visitors by having content load faster.

Cache Reserve has been in closed beta for a few months while we’ve collected feedback from our initial users and continued to develop the product. After several rounds of iterating on this feedback, today we’re extremely excited to announce that Cache Reserve is graduating to open beta – users will now be able to test it and integrate it into their content delivery strategy without any additional waiting.

If you want to see the benefits of Cache Reserve for yourself and give us some feedback– you can go to the Cloudflare dashboard, navigate to the Caching section and enable Cache Reserve by pushing one button.

How does Cache Reserve fit into the larger picture?

Content served from Cloudflare’s cache begins its journey at an origin server, where the content is hosted. When a request reaches the origin, the origin compiles the content needed for the response and sends it back to the visitor.

The distance between the visitor and the origin can affect the performance of the asset as it may travel a long distance for the response. This is also where the user is charged a fee to move the content from where it’s stored on the origin to the visitor requesting the content. These fees, known as “bandwidth” or “egress” fees, are familiar monthly line items on the invoices for users that host their content on cloud providers.

Reduce origin load, save on cloud egress fees, and maximize cache hits with Cache Reserve

Cloudflare’s CDN sits between the origin and visitor and evaluates the origin’s response to see if it can be cached. If it can be added to Cloudflare’s cache, then the next time a request comes in for that content, Cloudflare can respond with the cached asset, which means there's no need to send the request to the origin– reducing egress fees for our customers. We also cache content in data centers close to the visitor to improve the performance and cut down on the transit time for a response.

To help assets remain cached for longer, a few years ago we introduced Tiered Cache which organizes all of our 250+ global data centers into a hierarchy of lower-tiers (generally closer to visitors) and upper-tiers (generally closer to origins). When a request for content cannot be served from a lower-tier’s cache, the upper-tier is checked before going to the origin for a fresh copy of the content. Organizing our data centers into tiers helps us cache content in the right places for longer by putting multiple caches between the visitor’s request and the origin.

Why do cache misses occur?
Misses occur when Cloudflare cannot serve the content from cache and must go back to the origin to retrieve a fresh copy. This can happen when a customer sets the cache-control time to signify when the content is out of date (stale) and needs to be revalidated. The other element at play – how long the network wants content to remain cached – is a bit more complicated and can fluctuate depending on eviction criteria.

CDNs must consider whether they need to evict content early to optimize storage of other assets when cache space is full. At Cloudflare, we prioritize eviction based on how recently a piece of cached content was requested by using an algorithm called “least recently used” or LRU. This means that even if cache-control signifies that a piece of content should be cached for many days, we may still need to evict it earlier (if it is least-requested in that cache) to cache more popular content.

This works well for most customers and website visitors, but is often a point of confusion for people wondering why content is unexpectedly displaying a miss. If eviction did not happen then content would need to be cached in data centers that were further away from visitors requesting that data, harming the performance of the asset and injecting inefficiencies into how Cloudflare’s network operates.

Some customers, however, have large libraries of content that may not be requested for long periods of time. Using the traditional cache, these assets would likely be evicted and, if requested again, served from the origin. Keeping assets in cache requires that they remain popular on the Internet which is hard given what’s popular or current is constantly changing. Evicting content that becomes cold means additional origin egress for the customer if that content needs to be pulled repeatedly from the origin.

Reduce origin load, save on cloud egress fees, and maximize cache hits with Cache Reserve

Enter Cache Reserve
This is where Cache Reserve shines. Cache Reserve serves as the ultimate upper-tier data center for content that might otherwise be evicted from cache. Once admitted to Cache Reserve, content can be stored for a much longer period of time– 30 days by default. If another request comes in during that period, it can be extended for another 30 days (and so on) or until cache-control signifies that we should no longer serve that content from cache. Cache Reserve serves as a safety net to backstop all cacheable content, so customers don't have to worry about unwanted cache eviction and origin egress fees.

How does Cache Reserve save egress?

The promise of Cache Reserve is that hit ratios will increase and egress fees from origins will decrease for long tail content that is rarely requested and may be evicted from cache.

However, there are additional egress savings built into the product. For example, objects are written to Cache Reserve on misses. This means that when fetching the content from the origin on a cache miss, we both use that to respond to a request while also writing the asset to Cache Reserve, so customers won’t experience egress from serving that asset for a long time.

Cache Reserve is designed to be used with tiered cache enabled for maximum origin shielding. When there is a cache miss in both the lower and upper tiers, Cache Reserve is checked and if there is a hit, the response will be cached in both the lower and upper tier on its way back to the visitor without the origin needing to see the request or serve any additional data.

Cache Reserve accomplishes these origin egress savings for a low price, based on R2 costs. For more information on Cache Reserve prices and operations, please see the documentation here.

Scaling Cache Reserve on Cloudflare’s developer platform

When we first announced Cache Reserve, the response was overwhelming. Over 20,000 users wanted access to the beta, and we quickly made several interesting discoveries about how people wanted to use Cache Reserve.

The first big challenge we found was that users hated egress fees as much as we do and wanted to make sure that as much content as possible was in Cache Reserve. During the closed beta we saw usage above 8,000 PUT operations per second sustained, and objects served at a rate of over 3,000 GETs per second. We were also caching around 600Tb for some of our large customers. We knew that we wanted to open the product up to anyone that wanted to use it and in order to scale to meet this demand, we needed to make several changes quickly. So we turned to Cloudflare’s developer platform.

Cache Reserve stores data on R2 using its S3-compatible API. Under the hood, R2 handles all the complexity of an object storage system using our performant and scalable developer primitives: Workers and Durable Objects. We decided to use developer platform tools because it would allow us to implement different scaling strategies quickly. The advantage of building on the Cloudflare developer platform is that Cache Reserve was easily able to experiment to see how we could best distribute the high load we were seeing, all while shielding the complexity of how Cache Reserve works from users.  

With the single press of a button, Cache Reserve performs these functions:

  • On a cache miss, Pingora (our new L7 proxy) reaches out to the origin for the content and writes the response to R2. This happens while the content continues its trip back to the visitor (thereby avoiding needless latency).
  • Inside R2, a Worker writes the content to R2’s persistent data storage while also keeping track of the important metadata that Pingora sends about the object (like origin headers, freshness values, and retention information) using Durable Objects storage.
  • When the content is next requested, Pingora looks up where the data is stored in R2 by computing the cache key. The cache key’s hash determines both the object name in R2 and which bucket it was written to, as each zone’s assets are sharded across multiple buckets to distribute load.
  • Once found, Pingora attaches the relevant metadata and sends the content from R2 to the nearest upper-tier to be cached, then to the lower-tier and finally back to the visitor.
Reduce origin load, save on cloud egress fees, and maximize cache hits with Cache Reserve

This is magic! None of the above needs to be managed by the user. By bringing together R2, Workers, Durable Objects, Pingora, and Tiered Cache we were able to quickly build and make changes to Cache Reserve to scale as needed…

What’s next for Cache Reserve

In addition to the work we’ve done to scale Cache Reserve, opening the product up also opens the door to more features and integrations across Cloudflare. We plan on putting additional analytics and metrics in the hands of Cache Reserve users, so they know precisely what’s in Cache Reserve and how much egress it’s saving them. We also plan on building out more complex integrations with R2 so if customers want to begin managing their storage, they are able to easily make that transition. Finally, we’re going to be looking into providing more options for customers to control precisely what is eligible for Cache Reserve. These features represent just the beginning for how customers will control and customize their cache on Cloudflare.

What’s some of the feedback been so far?As a long time Cloudflare customer, we were eager to deploy Cache Reserve to provide cost savings and improved performance for our end users. Ensuring our application always performs optimally for our global partners and delivery riders is a primary focus of Delivery Hero. With Cache Reserve our cache hit ratio improved by 5% enabling us to scale back our infrastructure and simplify what is needed to operate our global site and provide additional cost savings.
Wai Hang Tang, Director of Engineering at Delivery HeroAnthology uses Cloudflare's global cache to drastically improve the performance of content for our end users at schools and universities. By pushing a single button to enable Cache Reserve, we were able to provide a great experience for teachers and students and reduce two-thirds of our daily egress traffic.
Paul Pearcy, Senior Staff Engineer at AnthologyAt Enjoei we’re always looking for ways to help make our end-user sites faster and more efficient. By using Cloudflare Cache Reserve, we were able to drastically improve our cache hit ratio by more than 10% which reduced our origin egress costs. Cache Reserve also improved the performance for many of our merchants’ sites in South America, which improved their SEO and discoverability across the Internet (Google, Criteo, Facebook, Tiktok)– and it took no time to set it up.
Elomar Correia, Head of DevOps SRE | Enterprise Solutions Architect at EnjoeiIn the live events industry, the size and demand for our cacheable content can be extremely volatile, which causes unpredictable swings in our egress fees. Additionally, keeping data as close to our users as possible is critical for customer experience in the high traffic and low bandwidth scenarios our products are used in, such as conventions and music festivals. Cache Reserve helps us mitigate both of these problems with minimal impact on our engineering teams, giving us more predictable costs and lower latency than existing solutions.
Jarrett Hawrylak, VP of Engineering | Enterprise Ticketing at Patron TechnologyHow can I use it today?

As of today, Cache Reserve is in open beta, meaning that it’s available to anyone who wants to use it.

To use the Cache Reserve:

  • Simply go to the Caching tile in the dashboard.
  • Navigate to the Cache Reserve page and push the enable data sync button (or purchase button).

Enterprise Customers can work with their Cloudflare Account team to access Cache Reserve.

Customers can ensure Cache Reserve is working by looking at the baseline metrics regarding how much data is cached and how many operations we’ve seen in the Cache Reserve section of the dashboard. Specific requests served by Cache Reserve are available by using Logpush v2 and finding HTTP requests with the field “CacheReserveUsed.”

We will continue to make sure that we are quickly triaging the feedback you give us and making improvements to help ensure Cache Reserve is easy to use, massively beneficial, and your choice for reducing egress fees for cached content.

Reduce origin load, save on cloud egress fees, and maximize cache hits with Cache ReserveTry it out

We’ve been so excited to get Cache Reserve in more people’s hands. There will be more exciting developments to Cache Reserve as we continue to invest in giving you all the tools you need to build your perfect cache.

Try Cache Reserve today and let us know what you think.

Categories: Technology

Indexing millions of HTTP requests using Durable Objects

Tue, 15/11/2022 - 14:00
Indexing millions of HTTP requests using Durable ObjectsIndexing millions of HTTP requests using Durable Objects

Our customers rely on their Cloudflare logs to troubleshoot problems and debug issues. One of the biggest challenges with logs is the cost of managing them, so earlier this year, we launched the ability to store and retrieve Cloudflare logs using R2.

In this post, I’ll explain how we built the R2 Log Retrieval API using Cloudflare Workers with a focus on Durable Objects and the Streams API. Using these, allows a customer to index and query millions of their Cloudflare logs stored in batches on R2.

Before we dive into the internals you might be wondering why one doesn't just use a traditional database to index these logs? After all, databases are a well proven technology. Well, the reason is that individual developers or companies, both large and small, often don't have the resources necessary to maintain such a database and the surrounding infrastructure to create this kind of setup.

Our approach instead relies on Durable Objects to maintain indexes of the data stored in R2, removing the complexity of managing and maintaining your own database. It was also super easy to add Durable Objects to our existing Workers code with just a few lines of config and some code. And R2 is very economical.


Indexing data is often used to reduce the lookup time for a query by first pre-processing the data and computing an index – usually a file (or set of files) that have a known structure which can be used to perform lookups on the underlying data. This approach makes lookups quick as indexes typically contain the answer for a given query, or at the very least, tells you how to find it. For this project we are going to index records by the unique identifier called a RayID which our customers use to identify an HTTP request in their logs, but this solution can be modified to index any many other types of data.

When indexing RayIDs for logs stored in R2, we choose an index structure that is fairly straightforward and is commonly known as a forward-index. This type of index consists of a key-value mapping between a document's name and a list of words contained in that document. The terms "document" and "words" are meant to be generic, and you get to define what a document and a word is.

In our case, a document is a batch of logs stored in R2 and the words are RayIDs contained within that document. For example, our index currently looks like this:

Indexing millions of HTTP requests using Durable ObjectsBuilding an index using Durable Objects and the Streams API

In order to maintain the state of our index, we chose to use Durable Objects. Each Durable Object has its own transactional key-value storage. Therefore, to store our index, we assign each key to be the document (batch) name and the value being a JSON-array of RayIDs within that document. When extracting the RayIDs for a given batch, updating the index becomes as simple as storage.put(batchName, rayIds). Likewise, getting all the RayIDs for a document is just a call to storage.get(batchName).

When performing indexing, since the batches are stored using compression and often with a 30-100x compression ratio, reading an entire batch into memory can lead to out-of-memory (OOM) errors in our Worker. To get around this, we use the Streams API to avoid the OOM errors by processing the data in smaller chunks. There are two types of streams available: byte-oriented and value-oriented. Byte-oriented streams operate at the byte-level for things such as compressing and decompressing data, while the value-orientated streams work on first class values in JavaScript. Numbers, strings, undefined, null, objects, you name it. If it's a valid JavaScript value, you can process it using a value-oriented stream. The Streams API also allows us to define our own JavaScript transformations for both byte- and value-oriented streams.

So, when our API receives a request to index a batch, our Worker streams the contents from R2 into a pipeline of TransformationStreams to decompress the data, decode the bytes into strings, split the data into records based on the newlines, and finally collect all the RayIDs. Once we've collected all the RayIDs, the data is then persisted in the index by making calls to the Durable Object, which in turn calls the aforementioned storage.put method to persist the data to the index. To illustrate what I mean, I include some code detailing the steps described above.

async function index(r2, bucket, key, storage) { const obj = await getObject(r2, bucket, key); const rawStream = obj.Body as ReadableStream; const index: Record<string, string[]> = {}; const rayIdIndexingStream = new TransformStream({ transform(chunk: string) { for (const match of chunk.matchAll(RAYID_FIELD_REGEX)) { const { rayid } = match.groups!; if (key in index) { index[key].push(rayid); } else { index[key] = [rayid]; } } } }); await collectStream(rawStream.pipeThrough(new DecompressionStream('gzip')).pipeThrough(textDecoderStream()).pipeThrough(readlineStream()).pipeThrough(rayIdIndexingStream)); storage.put(index); } Searching for a RayID

Once a batch has been indexed, and the index is persisted to our Durable Object, we can then query it using a combination of the RayID and a batch name to check the presence of a RayID in that given batch. Assuming that a zone is producing a batch of logs at the rate of one batch per second this means that over 86,400 batches would be produced in a day! Over the course of a week, there would be far too many keys in our index for us to be able to iterate through them all in a timely manner. This is where the encoding of a RayID and the naming of each batch comes into play.

A RayID is currently, and I emphasize currently because this can change and has over time, a 16-byte hex encoded value where the first 36-bits encode a timestamp, and it looks something like: 764660d8ec5c2ab1. Note that the format of the RayID is likely to evolve in the near future, but for now we can use it to optimize retrieval.

Each batch produced by Logpush also happens to encode the time the batch was started and completed. Last but not least, upon analysis of our logging pipeline we found that 95% of RayIDs can be found in a batch produced within five minutes of the request completing. (Note that the encoded time sets a lower bound of the batch time we need to search).

Indexing millions of HTTP requests using Durable ObjectsEncoding of a RayIDIndexing millions of HTTP requests using Durable ObjectsEncoding of a batch name

For example: say we have a request that was made on November 3 at 16:00:00 UTC. We only need to check the batches under the prefix 20221103 and those batches that contain the time range of 16:00:00 to 16:05:00 UTC.

Indexing millions of HTTP requests using Durable Objects

By reducing the number of batches to just a small handful of possibilities for a given RayID, we can simply ask our index if any of those batches contains the RayID by iterating through all the batch names (keys).

async function lookup(r2, bucket, prefix, start, end, rayid, storage) { const keys = await listObjects(r2, bucket, prefix, start, end); for (const key of keys) { const haystack: string[] | undefined = await storage.get(key); if (haystack && haystack.includes(rayid)) { return key } } return undefined }

If the RayID is found in a batch, we then stream the corresponding batch back from R2 using another TransformationStream pipeline to filter out any non-matching records from the final response. If no result was found, we return an error message saying we were unable to find the RayID.


To recap, we showed how we can use Durable Objects and their underlying storage to create and manage forward-indexes for performing efficient lookup on the RayID for potentially millions of records. All without needing to manage your own database or logging infrastructure or doing a full-scan of the data.

While this is just one possible use case for Durable Objects, we are just getting started. If you haven't read it before, checkout How we built Instant Logs to see another application of Durable Objects to stream millions of logs in real-time to your Cloudflare Dashboard!

What’s next

We currently offer RayID lookups for the HTTP Requests, Firewall Events, and soon support for Workers Trace Events. This is just the beginning for our Log Retrieval API, and we are already looking to add the ability to index and query on more types of fields such as status codes and host names. We also plan to integrate this into the dashboard so that developers can quickly retrieve the logs they need without needing to craft the necessary API calls by hand.

Last but not least, we want to make our retrieval functionality even more powerful and are looking at adding more complex types of filters and queries that you can run against your logs.

As always, stay tuned to the blog for more updates to our developer documentation for instructions on how to get started with log retrieval. If you’re interested in joining the beta, please email

Categories: Technology

Store and process your Cloudflare Logs... with Cloudflare

Tue, 15/11/2022 - 14:00
Store and process your Cloudflare Logs... with CloudflareStore and process your Cloudflare Logs... with Cloudflare

Millions of customers trust Cloudflare to accelerate their website, protect their network, or as a platform to build their own applications. But, once you’re running in production, how do you know what’s going on with your application? You need logs from Cloudflare – a record of what happened on our network when your customers interacted with your product that uses Cloudflare.

Cloudflare Logs are an indispensable tool for debugging applications, identifying security vulnerabilities, or just understanding how users are interacting with your product. However, our customers generate petabytes of logs, and store them for months or years at a time. Log data is tantalizing: all those answers, just waiting to be revealed with the right query! But until now, it’s been too hard for customers to actually store, search, and understand their logs without expensive and cumbersome third party tools.

Today we’re announcing Cloudflare Logs Engine: a new product to enable any kind of investigation with Cloudflare Logs — all within Cloudflare.

Starting today, Cloudflare customers who push their logs to R2 can retrieve them by time range and unique identifier. Over the coming months we want to enable customers to:

  • Store logs for any Cloudflare dataset, for as long as you want, with a few clicks
  • Access logs no matter what plan you use, without relying on third party tools
  • Write queries that include multiple datasets
  • Quickly identify the logs you need and take action based on what you find
Why Cloudflare Logs?

When it comes to visibility into your traffic, most customers start with analytics. Cloudflare dashboard is full of analytics about all of our products, which give a high-level overview of what’s happening: for example, number of requests served, the ratio of cache hits, or the amount of CPU time used.

But sometimes, more detail is needed. Developers especially need to be able to read individual log lines to debug applications. For example, suppose you notice a problem where your application throws an error in an unexpected way – you need to know the cause of that error and see every request with that pattern.

Cloudflare offers tools like Instant Logs and wrangler tail which excel at real-time debugging. These are incredibly helpful if you’re making changes on the fly, or if the problem occurs frequently enough that it will appear during your debugging session.

In other cases, you need to find that needle in a haystack — the one rare event that causes everything to go wrong. Or you might have identified a security issue and want to make sure you’ve identified every time that issue could have been exploited in your application’s history.

When this happens, you need logs. In particular, you need forensics: the ability to search the entire history of your logs.

A brief overview of log analysis

Before we take a look at Logs Engine itself, I want to briefly talk about alternatives – how have our customers been dealing with their logs so far?

Cloudflare has long offered Logpull and Logpush. Logpull enables enterprise customers to store their HTTP logs on Cloudflare for up to seven days, and retrieve them by either time or RayID. Logpush can send your Cloudflare logs just about anywhere on the Internet, quickly and reliably. While Logpush provides more flexibility, it’s been up to customers to actually store and analyze those logs.

Cloudflare has a number of partnerships with SIEMs and data warehouses/data lakes. Many of these tools even have pre-built Cloudflare dashboards for easy visibility. And third party tools have a big advantage in that you can store and search across many log sources, not just Cloudflare.

That said, we’ve heard from customers that they have some challenges with these solutions.

First, third party log tooling can be expensive! Most tools require that you pay not just for storage, but for indexing all of that data when it’s ingested. While that enables powerful search functionality later on, Cloudflare (by its nature) is often one of the largest emitters of logs that a developer will have. If you were to store and index every log line we generate, it can cost more money to analyze the logs than to deliver the actual service.

Second, these tools can be hard to use. Logs are often used to track down an issue that customers discover via analytics in the Cloudflare dashboard. After finding what you need in logs, it can be hard to get back to the right part of the Cloudflare dashboard to make the appropriate configuration changes.

Finally, Logpush was previously limited to Enterprise plans. Soon, we will start offering these services to customers at any scale, regardless of plan type or how they choose to pay.

Why Logs Engine?

With Logs Engine, we wanted to solve these problems. We wanted to build something affordable, easy to use, and accessible to any Cloudflare customer. And we wanted it to work for any Cloudflare logs dataset, for any span of time.

Our first insight was that to make logs affordable, we need to separate storage and compute. The cost of Storage is actually quite low! Thanks to R2, there’s no reason many of our customers can’t store all of their logs for long periods of time. At the same time, we want to separate out the analysis of logs so that customers only pay for the compute of logs they analyze – not every line ingested. While we’re still developing our query pricing, our aim is to be predictable, transparent and upfront. You should never be surprised by the cost of a query (or land a huge bill by accident).

It’s great to separate storage and compute. But, if you need to scan all of your logs anyway to answer the first question you have, you haven’t gained any benefits to this separation. In order to realize cost savings, it’s critical to narrow down your search before executing a query. That’s where our next big idea came in: a tight integration with analytics.

Most of the time, when analyzing logs, you don’t know what you’re looking for. For example, if you’re trying to find the cause of a specific origin status code, you may need to spend some time understanding which origins are impacted, which clients are sending them, and the time range in which these errors happened. Thanks to our ABR analytics, we can provide a good summary of the data very quickly – but not the exact details of what happened. By integrating with analytics, we can help customers narrow down their queries, then switch to Logs Engine once you know exactly what you’re looking for.

Finally, we wanted to make logs accessible to anyone. That means all plan types – not just Enterprise.

Additionally, we want to make it easy to both set up log storage and analysis, and also to take action on logs once you find problems. With Logs Engine, it will be possible to search logs right from the dashboard, and to immediately create rules based on the patterns you find there.

What’s available today and our roadmap

Today, Enterprise customers can store logs in R2 and retrieve them via time range. Currently in beta, we also allow customers to retrieve logs by RayID (see our companion blog post) — to join the beta, please email

Coming soon, we will enable customers on all plan types — not just Enterprise — to ingest logs into Logs Engine. Details on pricing will follow soon.

We also plan to build more powerful querying capability, beyond time range and RayID lookup. For example, we plan to support arbitrary filtering on any column, plus more expressive queries that can look across datasets or aggregate data.

But why stop at logs? This foundation lays the groundwork to support other types of data sources and queries one day. We are just getting started. Over the long term, we’re also exploring the ability to ingest data sources outside of Cloudflare and query them. Paired with Analytics Engine this is a formidable way to explore any data set in a cost-effective way!

Categories: Technology

Migrate from S3 easily with the R2 Super Slurper

Tue, 15/11/2022 - 14:00
Migrate from S3 easily with the R2 Super SlurperMigrate from S3 easily with the R2 Super Slurper

R2 is an S3-compatible, globally distributed object storage, allowing developers to store large amounts of unstructured data without the costly egress bandwidth fees you commonly find with other providers.

To enjoy this egress freedom, you’ll have to start planning to send all that data you have somewhere else into R2. You might want to do it all at once, moving as much data as quickly as possible while ensuring data consistency. Or do you prefer moving the data to R2 slowly and gradually shifting your reads from your old provider to R2? And only then decide whether to cut off your old storage or keep it as a backup for new objects in R2?

There are multiple options for architecture and implementations for this movement, but taking terabytes of data from one cloud storage provider to another is always problematic, always involves planning, and likely requires staffing.

And that was hard. But not anymore.

Today we're announcing the R2 Super Slurper, the feature that will enable you to move all your data to R2 in one giant slurp or sip by sip — all in a friendly, intuitive UI and API.

Migrate from S3 easily with the R2 Super SlurperThe first step: R2 Super Slurper Private BetaOne giant slurp

The very first iteration of the R2 Super Slurper allows you to target an S3 bucket and import the objects you have stored there into your R2 bucket. It's a simple, one-time import that covers the most common scenarios. Point to your existing S3 source, grant the R2 Super Slurper permissions to read the objects you want to migrate, and an asynchronous job will take care of the rest.

Migrate from S3 easily with the R2 Super Slurper

You'll also be able to save the definitions and credentials to access your source bucket, so you can migrate different folders from within the bucket, in new operations, without having to define URLs and credentials all over again. This operation alone will save you from scripting your way through buckets with many paths you’d like to validate for consistency.  During the beta stages — with your feedback — we will evolve the R2 Super Slurper to the point where anyone can achieve an entirely consistent, super slurp, all with the click of just a few buttons.

Automatic sip by sip migration

Other future development includes automatic sip by sip migration, which provides a way to incrementally copy objects to R2 as they get requested from an end-user. It allows you to start serving objects from R2 as they migrate, saving you money immediately.

Migrate from S3 easily with the R2 Super Slurper

The flow of the requests and object migration will look like this:

  • Check for Object — A request arrives at Cloudflare (1), and we check the R2 bucket for the requested object (2). If the object exists, R2 serves it (3).
  • Copy the Object — If the object does not exist in R2, a request for the object flows to the origin bucket (2a). Once there's an answer with an object, we serve it and copy it into R2 (2b).
  • Serve the Object — R2 serves all future requests for the object (3).

With this capability you can copy your objects, previously scattered through one or even multiple buckets from other vendors, while ensuring that everything requested from the end-user side gets served from R2. And because you will only need to use the R2 Super Slurper to sip the object from elsewhere on the first request, you will start saving on those egress fees for any subsequent ones.

We are currently targeting S3-compatible buckets for now, but you can expect other sources to become available during 2023.

Join the waitlist for the R2 Super Slurper private beta

To access the R2 Super Slurper, you must be an R2 user first and sign up for the R2 Super Slurper waitlist here.

We will collaborate closely with many early users in the private beta stage to refine and test the service . Soon, we'll announce an open beta where users can sign up for the service.

Make sure to join our Discord server and get in touch with a fantastic community of users and Cloudflare staff for all R2-related topics!

Categories: Technology

Get started with Cloudflare Workers with ready-made templates

Tue, 15/11/2022 - 14:00
Get started with Cloudflare Workers with ready-made templatesGet started with Cloudflare Workers with ready-made templates

One of the things we prioritize at Cloudflare is enabling developers to build their applications on our developer platform with ease. We’re excited to share a collection of ready-made templates that’ll help you start building your next application on Workers. We want developers to get started as quickly as possible, so that they can focus on building and innovating and avoid spending so much time configuring and setting up their projects.

Introducing Cloudflare Workers Templates

Cloudflare Workers enables you to build applications with exceptional performance, reliability, and scale. We are excited to share a collection of templates that helps you get started quickly and give you an idea of what is possible to build on our developer platform.

We have made available a set of starter templates highlighting different use cases of Workers. We understand that you have different ideas you will love to build on top of Workers and you may have questions or wonder if it is possible. These sets of templates go beyond the convention ‘Hello, World’ starter. They’ll help shape your idea of what kind of applications you can build with Workers as well as other products in the Cloudflare Developer Ecosystem.

We are excited to introduce a new collection of starter templates This shortcut will serve you a collection of templates that you can immediately start using to build your next application.

Get started with Cloudflare Workers with ready-made templatesCloudflare Workers Template CollectionHow to use Cloudflare Workers templates

We created this collection of templates to support you in getting started with Workers. Some examples listed showcase a use-case with a combination of multiple Cloudflare developer platform products to build applications similar to ones you might use every day. We know that Workers are being used today for different use-cases, we want to showcase some of them to you.

We have templates for building an image sharing website with Pages Functions, direct creator upload to Cloudflare Stream, Durable Object-powered request scheduler and many more.

One example to highlight is a template that lets you accept payment for video content. It is powered by Pages Functions, Cloudflare Stream and Stripe Checkout.  This app shows how you can use Cloudflare Workers with external payment and authentication systems to create a logged-in experience, without ever having to manage a database or persist any state yourself.

Once a user has paid, Stripe Checkout redirects to a URL that verifies a token from Stripe, and generates a signed URL using Cloudflare Stream to view the video. This signed URL is only valid for a specified amount of time, and access can be restricted by country or IP address.

To use the template, you need to either click the deploy with Workers button or open the template with Stackblitz.

Get started with Cloudflare Workers with ready-made templates

The Deploy with Workers button will redirect you to a page where you can authorize GitHub to deploy a fork of the repository using GitHub actions while opening with StackBlitz creates a new fork of the template for you to start working with.

Get started with Cloudflare Workers with ready-made templatesDeploy to Workers Demo

These templates also comes bundled with additional features I would like to share with you:

Integrated Deploy with Workers Button

We added a deploy button to all templates listed in the templates repository and the collection website, so you can quickly get up to speed with your code. The deploy with Workers button lets you deploy your code in under five minutes, it uses a GitHub action powered with Cloudflare Workers to do this.

Get started with Cloudflare Workers with ready-made templatesGitHub Repository with Deploy to Workers ButtonSupport for Test Driven Development

As developers, we need to write tests for our code to ensure we’re shipping quality production grade software and also ensure that our code is bug-free. We support writing tests in Workers using the Wrangler unstable_dev API to write integration and end-to-end tests. We want to enable not just the developer experience but also nudge developers to follow best practices and prioritize TDD in development. We configured a number of templates to support integration tests against a local server, this will serve as a template to help you set up tests in your projects.

Here’s an example using Wrangler’s unstable_dev API and Vitest test framework to test the code written in an example ‘Hello worker’ starter template:

import { unstable_dev } from 'wrangler'; import { describe, expect, it, beforeAll, afterAll } from 'vitest'; describe('Worker', () => { let worker; beforeAll(async () => { worker = await unstable_dev('index.js', {}, { disableExperimentalWarning: true }); }); afterAll(async () => { await worker.stop(); }); it('should return Hello worker!', async () => { const resp = await worker.fetch(); if (resp) { const text = await resp.text(); expect(text).toMatchInlineSnapshot(`"Hello worker!"`); } }); }); Online IDE Integration with StackBlitz

We announced StackBlitz’s partnership with Cloudflare Workers during Platform Week early this year. We believe a developer’s experience should be of utmost priority because we want them to build with ease on our developer platform.

StackBlitz is an online development platform for building web applications. It is powered by WebContainers, the first WebAssembly-based operating system which boots Node.js environments in milliseconds, securely within your browser tab.

We made it even easier to get started with Workers with an integrated Open with StackBlitz button for each starter template, making it easier to create a fork of a template and the great thing is you only need a web browser to build your application.

Everything we’ve highlighted in this post, all leads to one thing - How can we create a better experience for developers getting started with Workers. We introduce these ready-made templates to make you more efficient, bring you a wholesome developer experience and help improve your time to deployment. We want you to spend less time getting started with building on the Workers developer platform, so what are you waiting for?

Next Steps

You can start building your own Worker today using the available templates provided in the templates collection to help you get started. If you would like to contribute your own templates to the collection, be sure to send in a pull request we’re more than happy to review and add to the growing collection. Share what you have built with us in the #builtwith channel on our Discord community. Make sure to follow us on Twitter or join our Discord Developers Community server.

Categories: Technology

Welcome to the Supercloud (and Developer Week 2022)

Mon, 14/11/2022 - 14:01
Welcome to the Supercloud (and Developer Week 2022)

This post is also available in Deutsch and Español.

Welcome to the Supercloud (and Developer Week 2022)

In Cloudflare’s S-1 document there’s a section that begins: “The Internet was not built for what it has become”.

That sentence expresses the idea that the Internet, which started as an experiment, has blossomed into something we all need to rely upon for our daily lives and work. And that more is needed than just the Internet as was designed; it needed security and performance and privacy.

Something similar can be said about the cloud: the cloud was not designed for what it must become.

The introduction of services like Amazon EC2 was undoubtedly a huge improvement on the old way of buying and installing racks and racks of servers and storage systems, and then maintaining them.

But by its nature the cloud was a virtualization of the older real world infrastructure and not a radical rethink of what computing should look like to meet the demands of Internet-scale businesses. It’s as if steam locomotives were replaced with efficient electric engines but still required a chimney on top and stopped to take on water every two hundred miles.

Welcome to the Supercloud (and Developer Week 2022)

The cloud replaced the rituals of buying servers and installing operating systems with new and now familiar rituals of choosing regions, and provisioning virtual machines, and keeping code artificially warm.

But along the way glimpses of light are seen through the cloud in the form of lambdas, or edges, or functions, or serverless. All are trying to give a name to a model of cloud computing that promises to make developers highly productive at scaling from one to Internet-scale. It’s a model that rather than virtualizing machines or disks or wrapping things in containers says: “write code, we’ll run it, don’t sweat the details like scaling or location”.

We’re calling that the Supercloud.

The foundations of the Supercloud are compute and data services that make running any size application efficient and infinitely scalable without the baggage of the cloud as it exists today.

The foundations of the Supercloud

Some years ago a movement called NoSQL developed new ways of storing and processing data that didn’t rely on databases. Key-value stores and document stores flourished because rather than thinking about data at the granularity of databases or tables or even rows, they made a direct connection between code and data at a simple level.

You can think of NoSQL as a drive towards granularity. And it worked. NoSQL stores, KVs, object stores (like R2) abound. The rise of MapReduce for processing data is also about granularity; by breaking data processing into easily scaled pieces (the map and the reduce) it was possible to handle huge amounts of data efficiently and scale up and down as needed.

The same thing is happening for cloud code. Just as programmers didn’t always want to think in database-sized chunks, they shouldn’t have to think about VM- or container-sized chunks. It’s inefficient and has nothing to do with the actual job of writing code to create a service. It’s unnecessary work that distracts from the real value of programming something into existence.

In distributed programming theory, granularity has been around for a long time. The CSP model is of tiny processes performing tasks and passing data (it helped inspire the Go language); the Actor model has messages passed between multitudes of actors changing internal state; even the lambda calculus is about discrete functions acting on data.

Object-oriented programming has developers reasoning about objects (not virtual machines or disks). And in CORBA, and similar systems, there’s the concept of an object request broker allowing objects to run and be accessed remotely in a distributed system without knowing details of where or how the object executes.

The theory of computing points away from dedicated machines (virtual or real) and to code and data that run on the Supercloud handling the details of code execution and data locality automatically and efficiently.

So whether you write your code by breaking it up into functions or ship large pieces of functionality or entire programs, the foundations of the Supercloud means that your code benefits from its efficiency. And more.

The Supercloud advantage

The Supercloud makes scaling easy because no one has to think about how many VMs to provision, no one has to keep hot standby VMs in case there's a flood of visitors. Just as MapReduce (which traces its heritage to the lambda calculus) scales up and down, so should general purpose computing.

And it’s not just about scaling. In the Supercloud both code and data are mobile and move around the network. Attach data to the code (such as with Durable Objects; hello Actor model) and you have a foundation for applications that can scale to any size and move close to users as needed to provide the best performance.

Alternatively, if your data is immovable, we move your code closer to it, no matter how many times you need to access it.

Not only that but working at this level of flexibility means that code enforcing a data privacy or data residence law about where data can be processed or stored can operate at the level of individual users or objects. The same code can behave differently and even be executed in a completely different country based on where its associated data is stored.

A Supercloud has two interesting effects on the cost of running a program. Firstly, it makes it more economical because you only run what you need. There’s never any need for committed VMs waiting for work, or idle machines you’re paying for just in case. Code either runs or it doesn’t. It scales up and down as needed. You only pay for precisely what you need.

Secondly, it creates a more efficient compute platform which is better for everyone. It forces the compute platform (e.g. us) to be as efficient as possible. We have to be able to start code quickly for performance and scale up reasons. We need to efficiently use CPUs because no customer is paying us to keep idle CPUs around. And it’s better for the environment because cloud machines run at very high levels of utilization. This level of efficiency is what allows our platform to scale to the 10 million requests that Cloudflare Workers processed in the time it took you to read the last word of this sentence.

And this compute platform scales well beyond a machine, or a data center, or a country. With the right software (which we’ve built) it scales to the size of the Internet. Software allocates resources automatically across the globe, moving connections, data and processing around for high efficiency and optimal end user experience.

Efficient compute and storage, a global network that’s everywhere everyone is, bound together by software that turns the globe into a single cloud. The Supercloud.

Welcome to the Supercloud (and Developer Week 2022)Welcome to the Supercloud

The Supercloud is performant, scalable, available, private, and cost-efficient. Choosing a region for your application, or provisioning virtual machines, or working out how to auto-scale containers, or worrying about cold starts seems ridiculous, hard, anachronistic, a waste of time, rigid and expensive.

Happily, Cloudflare’s been building the alternative to that traditional cloud into our network and our developer platform for years. The Supercloud. The term may be new, but that doesn’t mean that it’s not real. Today, we have over a million developers building on the Supercloud.

Each of those developers wants to get code running on one machine and perfect it. It’s so much easier to work that way. We just happen to have one machine that scales to the size of the Internet: a global, distributed supercomputer. It’s the Supercloud and we build our own products on it, and you can join those one million developers and build on it too.

We’ve been building the Supercloud for 12 years, and five years ago opened it up to developers through Cloudflare Workers. Cloudflare Workers was built for scale and performance since day one, by running on our global network.

And with that, welcome to the Supercloud and welcome to Cloudflare Developer Week 2022.

As is it the case with all of our Innovation Weeks, we’re excited to kick off another week of announcements, enabling more and more use cases to be built on the Supercloud. In fact, it’s building on the Workers developer platform that gives us the super powers to continue delivering new building blocks for our users. This week, we’re going not to just tell you about all the new tools you can play with, but also how we built many of them, how you can use them, and what our customers are building with them in production today.

Watch on Cloudflare TV

You can watch the complete segment of our weekly show This Week in Net here — or hear it in the audio/podcast format.

Categories: Technology

Cloudflare Workers scale too well and broke our infrastructure, so we are rebuilding it on Workers

Mon, 14/11/2022 - 14:00
Cloudflare Workers scale too well and broke our infrastructure, so we are rebuilding it on WorkersCloudflare Workers scale too well and broke our infrastructure, so we are rebuilding it on Workers

While scaling our new Feature Flagging product DevCycle, we’ve encountered an interesting challenge: our Cloudflare Workers-based infrastructure can handle way more instantaneous load than our traditional AWS infrastructure. This led us to rethink how we design our infrastructure to always use Cloudflare Workers for everything.

The origin of DevCycle

For almost 10 years, Taplytics has been a leading provider of no-code A/B testing and feature flagging solutions for product and marketing teams across a wide range of use cases for some of the largest consumer-facing companies in the world. So when we applied ourselves to build a new engineering-focused feature management product, DevCycle, we built upon our experience using Workers which have served over 140 billion requests for Taplytics customers.

The inspiration behind DevCycle is to build a focused feature management tool for engineering teams, empowering them to build their software more efficiently and deploy it faster. Helping engineering teams reach their goals, whether it be continuous deployment, lower change failure rate, or a faster recovery time. DevCycle is the culmination of our vision of how teams should use Feature Management to build high-quality software faster. We've used DevCycle to build DevCycle, enabling us to implement continuous deployment successfully.

DevCycle architecture

One of the first things we asked ourselves when ideating DevCycle was how we could get out of the business of managing 1000’s of vCPUs worth of AWS instances and move our core business logic closer to our end-user devices. Based on our experience with Cloudflare Workers at Taplytics we knew we wanted it to be a core part of our future infrastructure for DevCycle.

By using the global computing power of Workers and moving as much logic to the SDKs as possible with our local bucketing server-side SDKs, we were able to massively reduce or eliminate the latency of fetching feature flag configurations for our users. In addition, we used a shared WASM library across our Workers and local bucketing SDKs to dramatically reduce the amount of code we need to maintain per SDK, and increase the consistency of our platform. This architecture has also fundamentally changed our business's cost structure to easily serve any customer of any scale.

The core architecture of DevCycle revolves around publishing and consuming JSON configuration files per project environment. The publishing side is managed in our AWS services, while Cloudflare manages the consumption of these config files at scale. This split in responsibilities allows for all high-scale requests to be managed by Cloudflare, while keeping our AWS services simple and low-scale.

Cloudflare Workers scale too well and broke our infrastructure, so we are rebuilding it on Workers“Workers are breaking our events pipeline”

One of the primary challenges as a feature management platform is that we don’t have direct control over the load from our customers’ applications using our SDKs; our systems need the ability to scale instantly to match their load. For example, we have a couple of large customers whose mobile traffic is primarily driven by push notifications, which causes massive instantaneous spikes in traffic to our APIs in the range of 10x increases in load. As you can imagine, any traditional auto-scaled API service and the load balancer cannot manage that type of increase in load. Thus, our choices are to dramatically increase the minimum size of our cluster and load balancer to handle these unknown load spikes, accept that some requests will be rate-limited, or move to an architecture that can handle this load.

Cloudflare Workers scale too well and broke our infrastructure, so we are rebuilding it on Workers

Given that all our SDK API requests are already served with Workers, they have no problem scaling instantly to 10x+ their base load. Sadly we can’t say the same about the traditional parts of our infrastructure.

For each feature flag configuration request to a Worker, a corresponding events request is sent to our AWS events infrastructure. The events are received by our events API container in Kubernetes, where they are then published to Kafka and eventually ingested by Snowflake. While Cloudflare Workers have no problem handling instantaneous spikes in feature flag requests, the events system can't keep up. Our cluster and events API containers need to be scaled up faster to prevent the existing instances from being overwhelmed. Even the load balancer has issues accepting the sudden increase. Cloudflare Workers just work too well in comparison to EC2 instances + EKS.

To solve this issue we are moving towards a new events Cloudflare Worker which will be able to handle the instantaneous events load from these requests and make use of the Kinesis Data Firehose to write events to our existing S3 bucket which is ingested by Snowflake. In the future, we look forward to testing out Cloudflare Queues writing to R2 once a Snowflake connector has been created. This architecture should allow us to ingest events at almost any scale and withstand instantaneous traffic spikes with a predictable and efficient cost structure.

Cloudflare Workers scale too well and broke our infrastructure, so we are rebuilding it on WorkersBuilding without a database next to your code

Workers provide many benefits, including fast response times, infinite scalability, serverless architecture, and excellent up-time performance. However, if you want to see all these benefits, you need to architect your Workers to assume that you don’t have direct access to a centralized SQL / NoSQL database (or D1) like you would with a traditional API service. For example, suppose you build your workers to require reaching out to a database to fetch and update user data every time a request is made to your Workers. In that case, your request latency will be tied to the geographic distance between your Worker and the database plus the latency of the database. In addition, your Workers will be able to scale significantly beyond the number of database connections you can support, and your uptime will be tied to the uptime of your external database. Therefore, when architecting your systems to use Workers, we advise relying primarily on data sent as part of the API request and cacheable data on Cloudflare’s global network.

Cloudflare provides multiple products and services to help with data on their global network:

  • KV: “global, low-latency, key-value data store.”
    • However, the lowest latency way of retrieving data from within a Worker is limited by a minimum 60-second TTL. So you’ll need to be ok with cached data that is 60 seconds stale.
  • Durable Objects: “provide low-latency coordination and consistent storage for the Workers platform through two features: global uniqueness and a transactional storage API.”
    • Ability to store user-level information closer to the end user.
    • Unfamiliar worker interface for accessing data for developers with SQL / NoSQL experience.
  • R2: “store large amounts of unstructured data.”
    • Ability to store arbitrarily large amounts of unstructured data using familiar S3 APIs.
    • Cloudflare’s cache can be used to provide low-latency access within workers.
  • D1: “serverless SQLite database”

Each of these tools that Cloudflare provides has made building APIs far more accessible than when Workers launched initially; however, each service has aspects which need to be accounted for when architecting your systems. Being an open platform, you can also access any publically available database you want from a Worker. For example, we are making use of Macrometa for our EdgeDB product built into our Workers to help customers access their user data.

The predictable cost structure of Workers

One of the greatest advantages of moving most of our workloads towards Cloudflare Workers is the predictable cost structure that can scale 1:1 with our request loads and can be easily mapped to usage-based billing for our customers. In addition, we no longer have to run excess EC2 instances to handle random spikes in load, just in case they happen.

Too many SaaS services have opaque billing based on max usage or other metrics that don’t relate directly to their costs. Moving from our legacy AWS architecture with high fixed costs like databases and caching layers to Workers has resulted in our infrastructure spending is directly tied to using our APIs and SDKs. For DevCycle, this architecture has been over ~5x more cost-efficient to operate.

The future of DevCycle and Cloudflare

With DevCycle we will continue to invest in leveraging serverless computing and moving our core business logic as close to our users as possible, either on Cloudflare’s global network or locally within our SDKs. We’re excited to integrate even more deeply with the Cloudflare developer platform as new services evolve. We already see future use cases for R2, Queues and Durable Objects and look forward to what’s coming next from Cloudflare.

Categories: Technology

Build applications of any size on Cloudflare with the Queues open beta

Mon, 14/11/2022 - 14:00
Build applications of any size on Cloudflare with the Queues open betaBuild applications of any size on Cloudflare with the Queues open beta

Message queues are a fundamental building block of cloud applications—and today the Cloudflare Queues open beta brings queues to every developer building for Region: Earth. Cloudflare Queues follows Cloudflare Workers and Cloudflare R2 in a long line of innovative services built for the Workers Developer Platform, enabling developers to build more complex applications without configuring networks, choosing regions, or estimating capacity. Best of all, like many other Cloudflare services, there are no egregious egress charges!

Build applications of any size on Cloudflare with the Queues open beta

If you’ve ever purchased something online and seen a message like “you will receive confirmation of your order shortly,” you’ve interacted with a queue. When you completed your order, your shopping cart and information were stored and the order was placed into a queue. At some later point, the order fulfillment service picks and packs your items and hands it off to the shipping service—again, via a queue. Your order may sit for only a minute, or much longer if an item is out of stock or a warehouse is busy, and queues enable all of this functionality.

Message queues are great at decoupling components of applications, like the checkout and order fulfillment services for an ecommerce site. Decoupled services are easier to reason about, deploy, and implement, allowing you to ship features that delight your customers without worrying about synchronizing complex deployments.

Queues also allow you to batch and buffer calls to downstream services and APIs. This post shows you how to enroll in the open beta, walks you through a practical example of using Queues to build a log sink, and tells you how we built Queues using other Cloudflare services. You’ll also learn a bit about the roadmap for the open beta.

Getting startedEnrolling in the open beta

Open the Cloudflare dashboard and navigate to the Workers section. Select Queues from the Workers navigation menu and choose Enable Queues Beta.

Review your order and choose Proceed to Payment Details.

Note: If you are not already subscribed to a Workers Paid Plan, one will be added to your order automatically.

Enter your payment details and choose Complete Purchase. That’s it - you’re enrolled in the open beta! Choose Return to Queues on the confirmation page to return to the Cloudflare Queues home page.

Creating your first queue

After enabling the open beta, open the Queues home page and choose Create Queue. Name your queue `my-first-queue` and choose Create queue. That’s all there is to it!

The dash displays a confirmation message along with a list of all the queues in your account.

Build applications of any size on Cloudflare with the Queues open beta

Note: As of the writing of this blog post each account is limited to ten queues. We intend to raise this limit as we build towards general availability.

Managing your queues with Wrangler

You can also manage your queues from the command line using Wrangler, the CLI for Cloudflare Workers. In this section, you build a simple but complete application implementing a log aggregator or sink to learn how to integrate Workers, Queues, and R2.

Setting up resources
To create this application, you need access to a Cloudflare Workers account with a subscription plan, access to the Queues open beta, and an R2 plan.

Install and authenticate Wrangler then run wrangler queues create log-sink from the command line to create a queue for your application.

Run wrangler queues list and note that Wrangler displays your new queue.

Note: The following screenshots use the jq utility to format the JSON output of wrangler commands. You do not need to install jq to complete this application.

Build applications of any size on Cloudflare with the Queues open beta

Finally, run wrangler r2 bucket create log-sink to create an R2 bucket to store your aggregated logs. After the bucket is created, run wrangler r2 bucket list to see your new bucket.

Build applications of any size on Cloudflare with the Queues open beta

Creating your Worker
Next, create a Workers application with two handlers: a fetch() handler to receive individual incoming log lines and a queue() handler to aggregate a batch of logs and write the batch to R2.

In an empty directory, run wrangler init to create a new Cloudflare Workers application. When prompted:

  • Choose “y” to create a new package.json
  • Choose “y” to use TypeScript
  • Choose “Fetch handler” to create a new Worker at src/index.ts
Build applications of any size on Cloudflare with the Queues open beta

Open wrangler.toml and replace the contents with the following:

wrangler.tomlname = "queues-open-beta" main = "src/index.ts" compatibility_date = "2022-11-03" [[queues.producers]] queue = "log-sink" binding = "BUFFER" [[queues.consumers]] queue = "log-sink" max_batch_size = 100 max_batch_timeout = 30 [[r2_buckets]] bucket_name = "log-sink" binding = "LOG_BUCKET"

The [[queues.producers]] section creates a producer binding for the Worker at src/index.ts called BUFFER that refers to the log-sink queue. This Worker can place messages onto the log-sink queue by calling await env.BUFFER.send(log);

The [[queues.consumers]] section creates a consumer binding for the log-sink queue for your Worker. Once the log-sink queue has a batch ready to be processed (or consumed), the Workers runtime will look for the queue() event handler in src/index.ts and invoke it, passing the batch as an argument. The queue() function signature looks as follows:

async queue(batch: MessageBatch<Error>, env: Environment): Promise<void> {

The final binding in your wrangler.toml creates a binding for the log-sink R2 bucket that makes the bucket available to your Worker via env.LOG_BUCKET.


Open src/index.ts and replace the contents with the following code:

export interface Env { BUFFER: Queue; LOG_BUCKET: R2Bucket; } export default { async fetch(request: Request, env: Environment): Promise<Response> { let log = await request.json(); await env.BUFFER.send(log); return new Response("Success!"); }, async queue(batch: MessageBatch<Error>, env: Environment): Promise<void> { const logBatch = JSON.stringify(batch.messages); await env.LOG_BUCKET.put(`logs/${}.log.json`, logBatch); }, };

The export interface Env section exposes the two bindings you defined in wrangler.toml: a queue named BUFFER and an R2 bucket named LOG_BUCKET.

The fetch() handler transforms the request body into JSON, adds the body to the BUFFER queue, then returns an HTTP 200 response with the message Success!

The `queue()` handler receives a batch of messages that each contain log entries, iterates through concatenating each log into a string buffer, then writes that buffer to the LOG_BUCKET R2 bucket using the current timestamp as the filename.

Publishing and running your application
To publish your log sink application, run wrangler publish. Wrangler packages your application and its dependencies and deploys it to Cloudflare’s global network.

Build applications of any size on Cloudflare with the Queues open beta

Note that the output of wrangler publish includes the BUFFER queue binding, indicating that this Worker is a producer and can place messages onto the queue. The final line of output also indicates that this Worker is a consumer for the log-sink queue and can read and remove messages from the queue.

Use your favorite API client, like curl, httpie, or Postman, to send JSON log entries to the published URL for your Worker via HTTP POST requests. Navigate to your log-sink R2 bucket in the Cloudflare dashboard and note that the logs prefix is now populated with aggregated logs from your request.

Build applications of any size on Cloudflare with the Queues open beta

Download and open one of the logfiles to view the JSON array inside. That’s it - with fewer than 45 lines of code and config, you’ve built a log aggregator to ingest and store data in R2!

Build applications of any size on Cloudflare with the Queues open betaBuffering R2 writes with Queues in the real world

In the previous example, you create a simple Workers application that buffers data into batches before writing the batches to R2. This reduces the number of calls to the downstream service, reducing load on the service and saving you money., the fastest UUIDv4-as-a-service, wanted to confirm whether their API truly generates unique IDs on every request. With 80,000 requests per day, it wasn’t trivial to find out. They decided to write every generated UUID to R2 to compare IDs across the entire population. However, writing directly to R2 at the rate UUIDs are generated is inefficient and expensive.

To reduce writes and costs, introduced Cloudflare Queues into their UUID generation workflow. Each time a UUID is requested, a Worker places the value of the UUID into a queue. Once enough messages have been received, the buffered batch of JSON objects is written to R2. This avoids invoking an R2 write on every API call, saving costs and making the data easier to process later.

The uuid-queue application consists of a single Worker with three event handlers:

  1. A fetch handler that receives a JSON object representing the generated UUID and writes it to a Cloudflare Queue.
  2. A queue handler that writes batches of JSON objects to R2 in CSV format.
  3. A scheduled handler that combines batches from the previous hour into a single file for future processing.

To view the source or deploy this application into your own account, visit the repository on GitHub.

How we built Cloudflare Queues

Like many of the Cloudflare services you use and love, we built Queues by composing other Cloudflare services like Workers and Durable Objects. This enabled us to rapidly solve two difficult challenges: securely invoking your Worker from our own service and maintaining a strongly consistent state at scale. Several recent Cloudflare innovations helped us overcome these challenges.

Securely invoking your Worker

In the Before Times (early 2022), invoking one Worker from another Worker meant a fresh HTTP call from inside your script. This was a brittle experience, requiring you to know your downstream endpoint at deployment time. Nested invocations ran as HTTP calls, passing all the way through the Cloudflare network a second time and adding latency to your request. It also meant security was on you - if you wanted to control how that second Worker was invoked, you had to create and implement your own authentication and authorization scheme.

Worker to Worker requests
During Platform Week in May 2022, Service Worker Bindings entered general availability. With Service Worker Bindings, your Worker code has a binding to another Worker in your account that you invoke directly, avoiding the network penalty of a nested HTTP call. This removes the performance and security barriers discussed previously, but it still requires that you hard-code your nested Worker at compile time. You can think of this setup as “static dispatch,” where your Worker has a static reference to another Worker where it can dispatch events.

Dynamic dispatch
As Service Worker Bindings entered general availability, we also launched a closed beta of Workers for Platforms, our tool suite to help make any product programmable. With Workers for Platforms, software as a service (SaaS) and platform providers can allow users to upload their own scripts and run them safely via Cloudflare Workers. User scripts are not known at compile time, but are dynamically dispatched at runtime.

Workers for Platforms entered general availability during GA week in September 2022, and is available for all customers to build with today.

With dynamic dispatch generally available, we now have the ability to discover and invoke Workers at runtime without the performance penalty of HTTP traffic over the network. We use dynamic dispatch to invoke your queue’s consumer Worker whenever a message or batch of messages is ready to be processed.

Consistent stateful data with Durable Objects

Another challenge we faced was storing messages durably without sacrificing performance. We took the design goal of ensuring that all messages were persisted to disk in multiple locations before we confirmed receipt of the message to the user. Again, we turned to an existing Cloudflare product—Durable Objects—which entered general availability nearly one year ago today.

Durable Objects are named instances of JavaScript classes that are guaranteed to be unique across Cloudflare’s entire network. Durable Objects process messages in-order and on a single-thread, allowing for coordination across messages and provide a strongly consistent storage API for key-value pairs. Offloading the hard problem of storing data durably in a distributed environment to Distributed Objects allowed us to reduce the time to build Queues and prepare it for open beta.

Open beta roadmap

Our open beta process empowers you to influence feature prioritization and delivery. We’ve set ambitious goals for ourselves on the path to general availability, most notably supporting unlimited throughput while maintaining 100% durability. We also have many other great features planned, like first-in first-out (FIFO) message processing and API compatibility layers to ease migrations, but we need your feedback to build what you need most, first.


Cloudflare Queues is a global message queue for the Workers developer. Building with Queues makes your applications more performant, resilient, and cost-effective—but we’re not done yet. Join the Open Beta today and share your feedback to help shape the Queues roadmap as we deliver application integration services for the next generation cloud.

Categories: Technology

The road to a more standards-compliant Workers API

Mon, 14/11/2022 - 14:00
The road to a more standards-compliant Workers APIThe road to a more standards-compliant Workers API

Earlier this year, we announced our participation in a new W3C Community Group for the advancement of Web-interoperable API standards. Since then, this new WinterCG has been hard at work identifying the common API standards around which all JavaScript runtimes can build. Today I just want to give a peek at some work the WinterCG has been doing; and show off some of the improvements we have been making in the Workers runtime to increase alignment with Web Platform standards around event handling, task cancellation using AbortController, text encoding and decoding, URL parsing and pattern matching, and streams support.

The WinterCG Minimum Common Web Platform API

Right at the start of the WinterCG activity, the group took some time to evaluate and compare the various non-browser JavaScript runtimes such as Node.js, Deno, Bun, and Workers with the purpose of identifying the Web Platform APIs they all had in common. Following a very simple criteria, we looked at the standard APIs that were already implemented and supported by at least two of these runtimes and compiled those into a list that the WinterCG calls the "Minimum Common Web Platform API". This list will serve as the basis for what the community group defines as the minimum set of Web Platform APIs that should be implemented consistently across runtimes that claim to be "Web-interoperable".

The current list is straightforward:

AbortController ReadableStreamDefaultController AbortSignal ReadableStreamDefaultReader ByteLengthQueuingStrategy SubtleCrypto CompressionStream TextDecoder CountQueuingStrategy TextDecoderStream Crypto TextEncoder CryptoKey TextEncoderStream DecompressionStream TransformStream DOMException TransformStreamDefaultController Event URL EventTarget URLPattern ReadableByteStreamController URLSearchParams ReadableStream WritableStream ReadableStreamBYOBReader WritableStreamDefaultController ReadableStreamBYOBRequest  

In addition to these, the WinterCG also expects Web-interoperable runtimes to have implementations of the atob(), btoa(), queueMicrotask(), structuredClone(), setTimeout(), clearTimeout(), setInterval(), clearInterval(), console, and crypto.subtle APIs available on the global scope.

Today, we are happy to say that the Workers runtime has compliant or nearly compliant implementations of every one of these WinterCG Minimum Common Web Platform APIs. Some of these APIs intentionally diverge from the standards either due to backwards compatibility concerns, Workers-specific features, or performance optimizations. Other APIs diverge still because we are still in the process of updating them to align with the specifications.

Improving standards compliance in the Workers runtime

The Workers runtime has, from the beginning, had the mission to align its developer experience with JavaScript and Web Platform standards as much as possible. Over the past year we have worked hard to continue advancing that mission forward both by improving the standards-compliance of existing APIs such as Event, EventTarget, URL, and streams; and the introduction of new Web Platform APIs such as URLPattern, encoding streams, and compression streams.

Event and EventTarget

The Workers runtime has provided an implementation of the Event and EventTarget Web Platform APIs from the very beginning. These were, however, only limited implementations of what the WHATWG DOM specification defines. Specifically, Workers had only implemented the bare minimum of the Event API that it itself needed to operate.

Today, the Event and EventTarget implementations in Workers provide a more complete implementation.

Let's look at the official definition of Event as defined by the WHATWG DOM standard:

[Exposed=*] interface Event { constructor(DOMString type, optional EventInit eventInitDict = {}); readonly attribute DOMString type; readonly attribute EventTarget? target; readonly attribute EventTarget? srcElement; // legacy readonly attribute EventTarget? currentTarget; sequence<EventTarget> composedPath(); const unsigned short NONE = 0; const unsigned short CAPTURING_PHASE = 1; const unsigned short AT_TARGET = 2; const unsigned short BUBBLING_PHASE = 3; readonly attribute unsigned short eventPhase; undefined stopPropagation(); attribute boolean cancelBubble; // legacy alias of .stopPropagation() undefined stopImmediatePropagation(); readonly attribute boolean bubbles; readonly attribute boolean cancelable; attribute boolean returnValue; // legacy undefined preventDefault(); readonly attribute boolean defaultPrevented; readonly attribute boolean composed; [LegacyUnforgeable] readonly attribute boolean isTrusted; readonly attribute DOMHighResTimeStamp timeStamp; undefined initEvent(DOMString type, optional boolean bubbles = false, optional boolean cancelable = false); // legacy }; dictionary EventInit { boolean bubbles = false; boolean cancelable = false; boolean composed = false; };

Web Platform API specifications are always written in terms of a definition language called Web IDL. Every attribute defined in the interface is a property that is exposed on the object. Event objects, then, are supposed to have properties like type, target, srcElement, currentTarget, bubbles, cancelable, returnValue, defaultPrevented, composed, isTrusted, and timeStamp. They are also expected to have methods such as composedPath(), stopPropagation(), and stopImmediatePropagation(). Because most of these were not immediately needed by Workers, most were not provided originally.

Today, all standard, non-legacy properties and methods defined by the specification are available for use:

const event = new Event('foo', { bubbles: false, cancelable: true, composed: true, }); console.log(event.bubbles); console.log(event.cancelable); console.log(event.composed); addEventListener('foo', (event) => { console.log(event.eventPhase); // 2 AT_TARGET console.log(event.currentTarget); console.log(event.composedPath()); }); dispatchEvent(event);

While we were at it, we also fixed a long standing bug in the implementation of Event that prevented user code from properly subclassing the Event object to create their own custom event types. This change is protected by a compatibility flag that is now enabled by default for all Workers using a compatibility date on or past 2022-01-31.

class MyEvent extends Event { constructor() { super('my-event') } get type() { return super.type.toUpperCase() } } const myEvent = new MyEvent(); // Previously, this would print "my-event" instead of "MY-EVENT" as expected. console.log(myEvent.type);

The EventTarget implementation has also been updated to support once handlers (event handlers that are triggered at-most once then automatically unregistered), cancelable handlers (using AbortSignal), and event listener objects, all in line with the standard.

Using a one-time event handler

addEventListener('foo', (event) => { console.log('printed only once'); }, { once: true }); dispatchEvent(new Event('foo')); dispatchEvent(new Event('foo'));

Once handlers are key for preventing memory leaks in your applications when you know that a particular event is only ever going to be emitted once, or whenever you only care about handling it once. The stored reference to the function or object that is handling the event is removed immediately upon the first invocation, allowing the memory to be garbage collected.

Using a cancelable event handler

const ac = new AbortController(); addEventListener('foo', (event) => { console.log('not printed at all'); }, { signal: ac.signal }); ac.abort(); dispatchEvent(new Event('foo'));

Using an event listener object

While passing a function to addEventListener() is the most common case, the standard actually allows an event listener to be an object with a handleEvent() method as well.

const listener = { handleEvent(event) { console.log(event.type); } }; addEventListener('foo', listener); addEventListener('bar', listener); dispatchEvent(new Event('foo')); dispatchEvent(new Event('bar')); AbortController and AbortSignal

As illustrated in the cancelable event example above, we have also introduced an implementation of the AbortController and AbortSignal APIs into Workers. These provide a standard, and interoperable way of signaling cancellation of several kinds of tasks.

The AbortController/AbortSignal pattern is straightforward: An AbortSignal is just a type of EventTarget that will emit a single "abort" event when it is triggered:

const ac = new AbortController(); ac.signal.addEventListener('abort', (event) => { console.log(event.reason); // 'just because' }, { once: true }); ac.abort('just because');

The AbortController is used to actually trigger the abort event, optionally with a reason argument that is passed on to the event. The reason is typically an Error object but can be any JavaScript value.

The AbortSignal can only be triggered once, so the "abort" event should only ever be emitted once.

It is also possible to create AbortSignals that timeout after a specified period of time:

const signal = AbortSignal.timeout(10);

Or an AbortSignal that is pre-emptively triggered immediately on creation (these will never actually emit the "abort" event):

const signal = AbortSignal.abort('for reasons');

Currently, within Workers, AbortSignal and AbortController has been integrated with the EventTarget, fetch(), and streams APIs in alignment with the relevant standard specifications for each.

Using AbortSignal to cancel a fetch() const ac = new AbortController(); const res = fetch('', { signal: ac.signal }); ac.abort(new Error('canceled')) try { await res; } catch (err) { console.log(err); } TextEncoderStream and TextDecoderStream

The Workers runtime has long provided basic implementations of the TextEncoder and TextDecoder APIs. Initially, these were limited to only supporting encoding and decoding of UTF-8 text. The standard definition of TextDecoder, however, defines a much broader range of text encodings that are now fully supported by the Workers implementation. Per the standard, TextEncoder currently only supports UTF-8.

const win1251decoder = new TextDecoder("windows-1251"); const bytes = new Uint8Array([ 207, 240, 232, 226, 229, 242, 44, 32, 236, 232, 240, 33, ]); console.log(win1251decoder.decode(bytes)); // Привет, мир!

In addition to supporting the full range of encodings defined by the standard, Workers also now provides implementations of the TextEncoderStream and TextDecoderStream, which provide TransformStream implementations that apply encoding and decoding to streaming data:

const { writable, readable } = new TextDecoderStream("windows-1251"); const writer = writable.getWriter(); writer.write(new Uint8Array([ 207, 240, 232, 226, 229, 242, 44, 32, 236, 232, 240, 33, ])); const reader = readable.getReader(); const res = await; console.log(res.value); // Привет, мир!

Using the encoding streams requires the use of the transformstream_enable_standard_constructor compatibility flag.

CompressionStream and DecompressionStream

Streaming compression and decompression is also now supported in the runtime using the standard CompressionStream and DecompressionStream APIs.

const ds = new DecompressionStream('gzip'); const decompressedStream =; const cs = new CompressionStream('gzip'); const compressedStream =;

These are TransformStream implementations that fully conform to the standard definitions. Use of the compression streams does not require a compatibility flag to enable.

URL and URLPattern

Similar to Event, there has been an implementation of the Web Platform standard URL API available within Workers from nearly the beginning. But also like Event, the implementation was not entirely compatible with the standard.

The incompatibilities were subtle, for instance, in the original implementation, the URL string "https://a//b//c//" would be parsed incorrectly as "https://a/b/c" (note that the extra empty path segments are removed) whereas the standard parsing algorithm would produce "https://a//b//c/" as a result. Such inconsistent results were causing interoperability issues with JavaScript written to run across multiple JavaScript runtimes and needed to be fixed.

A new implementation of the URL parsing algorithm has been provided, and as of October 31, 2022 it has been enabled by default for all newly deployed Workers. Older Workers can begin using the new implementation by updating their compatibility dates to 2022-10-31 or by enabling the url_standard compatibility flag.

Along with the updated URL implementation, Workers now provides an implementation of the standard URLPattern API.

URLPattern provides a regular-expression-like syntax for matching a URL string against a pattern. For instance, consider this example taken from the MDN documentation for URLPattern:

// Matching a pathname let pattern1 = new URLPattern('') // same as let pattern2 = new URLPattern( '/books/:id', '', ); // or let pattern3 = new URLPattern({ protocol: 'https', hostname: '', pathname: '/books/:id', }); // or let pattern4 = new URLPattern({ pathname: '/books/:id', baseURL: '', }); ReadableStream, WritableStream, and TransformStream

Last, but absolutely not least, our most significant effort over the past year has been providing new standards compliant implementations of the ReadableStream, WritableStream, and TransformStream APIs.

The Workers runtime has always provided an implementation of these objects but they were never fully conformant to the standard. User code was not capable of creating custom ReadableStream and WritableStream instances, and TransformStreams were limited to simple identity pass-throughs of bytes. The implementations have been updated now to near complete compliance with the standard (near complete because we still have a few edge cases and features we are working on).

The new streams implementation will be enabled by default in all new Workers as of November 30, 2022, or can be enabled earlier using the streams_enable_constructors and transformstream_enable_standard_constructor compatibility flags.

Creating a custom ReadableStreamasync function handleRequest(request) { const enc = new TextEncoder(); const rs = new ReadableStream({ pull(controller) { controller.enqueue(enc.encode('hello world')); controller.close(); } }); return new Response(rs); }

The new implementation supports both "regular" and "bytes" ReadableStream types, supports BYOB readers, and includes performance optimizations for both tee() and pipeThrough().

Creating a custom WritableStream const ws = new WritableStream({ write(chunk) { console.log(chunk); // "hello world" } }); const writer = ws.getWriter(); writer.write("hello world");

WritableStreams are fairly simple objects that can accept any JavaScript value written to them.

Creating a custom TransformStream const { readable, writable } = new TransformStream({ transform(chunk, controller) { controller.enqueue(chunk.toUpperCase()); } }); const writer = writable.getWriter(); const reader = readable.getReader(); writer.write("hello world"); const res = await; console.log(res.value); // "HELLO WORLD"

It has always been possible in Workers to call new TransformStream() (with no arguments) to create a limited version of a TransformStream that only accepts bytes and only acts as a pass-through, passing the bytes written to the writer on to the reader without any modification.

That original implementation is now available within Workers using the IdentityTransformStream class.

const { readable, writable } = new IdentityTransformStream(); const writer = writable.getWriter(); const reader = readable.getReader(); const enc = new TextEncoder(); const dec = new TextDecoder(); writer.write(enc.encode("hello world")); const res = await; console.log(dec.decode(res.value)); // "hello world"

If your code is using new TransformStream() today as this kind of pass-through, the new implementation will continue to work except for one very important difference: the old, non-standard implementation of new TransformStream() supported BYOB reads on the readable side (i.e. readable.getReader({ mode: 'byob' })). The new implementation (enabled via a compatibility flag and becoming the default on November 30 ) does not support BYOB reads as required by the stream standard.

What's next

It is clear that we have made a lot of progress in improving the standards compliance of the Workers runtime over the past year, but there is far more to do. Next we will be turning our attention to the implementation of the fetch() and WebSockets APIs, as well as actively seeking closer alignment with other runtimes through collaboration in the Web-interoperable Runtimes Community Group.

If you are interested in helping drive the implementation of Web Platform APIs forward, and advancing interoperability between JavaScript runtime environments, the Workers Runtime team at Cloudflare is hiring! Reach out, or see our open positions here.

Categories: Technology

2022 US midterm elections attack analysis

Fri, 11/11/2022 - 18:14
2022 US midterm elections attack analysis2022 US midterm elections attack analysis

Through Cloudflare’s Impact programs, we provide cyber security products to help protect access to authoritative voting information and the security of sensitive voter data. Two core programs in this space are the Athenian Project, dedicated to protecting state and local governments that run elections, and Cloudflare for Campaigns, a project with a suite of Cloudflare products to secure political campaigns’ and state parties’ websites and internal teams.

However, the weeks ahead of the elections, and Election Day itself, were not entirely devoid of attacks. Using data from Cloudflare Radar, which showcases global Internet traffic, attack, and technology trends and insights, we can explore traffic patterns, attack types, and top attack sources associated with both Athenian Project and Cloudflare for Campaigns participants.

For both programs, overall traffic volume unsurprisingly ramped up as Election Day approached. SQL Injection (SQLi) and HTTP Anomaly attacks were the two largest categories of attacks mitigated by Cloudflare’s Web Application Firewall (WAF), and the United States was the largest source of observed attacks — see more on this last point below.

Below, we explore the trends seen across both customer sets from October 1, 2022, through Election Day on November 8.

Athenian Project

Throughout October, daily peak traffic volumes effectively doubled over the course of the month, with a weekday/weekend pattern also clearly visible. However, significant traffic growth is visible on Monday, November 7, and Tuesday, November 8 (Election Day), with Monday’s peak just under 2x October’s peaks, while Tuesday saw two peaks, one just under 4x higher than October peaks, while the other was just over 4x higher. Zooming in, the first peak was at 1300 UTC (0800 Eastern time, 0500 Pacific time), while the second was at 0400 UTC (2300 Eastern time, 2000 Pacific time). The first one appears to be aligned with the polls opening on the East Coast, while the second appears to be aligned with the time that the polls closed on the West Coast.

However, aggregating the traffic here presents a somewhat misleading picture. While both spikes were due to increased traffic across multiple customer sites, the second one was exacerbated by a massive increase in traffic for a single customer. Regardless, the increased traffic clearly shows that voters turned to local government sites around Election Day.

2022 US midterm elections attack analysis

Despite this increase in overall traffic, attack traffic mitigated by Cloudflare’s Web Application Firewall (WAF) remained remarkably consistent throughout October and into November, as seen in the graph below. The obvious exception was an attack that occurred on Monday, October 10. This attack targeted a single Athenian Project participant, and was mitigated by rate limiting the requests.

2022 US midterm elections attack analysis

SQL injection (SQLi) attacks saw significant growth in volume in the week and a half ahead of Election Day, along with an earlier significant spike on October 24. While the last weekend in October (October 29 and 30) saw significant SQLi attack activity, the weekend of November 5 and 6 was comparatively quiet. However, those attacks ramped up again heading into and on Election Day, as seen in the graph below.

2022 US midterm elections attack analysis

Attempted attacks mitigated with the HTTP Anomaly ruleset also ramped up in the week ahead of Election Day, though to a much lesser extent than SQLi attacks. As the graph below shows, the biggest spikes were seen on October 31/November 1, and just after midnight UTC on November 4 (late afternoon to early evening in the US). Related request volume also grew heading into Election Day, but without significant short-duration spikes. There is also a brief but significant attack clearly visible on the graph on October 10. However, it occurred several hours after the rate limited attack referenced above — it is not clear if the two are related.

2022 US midterm elections attack analysis

The distribution of attacks over the surveyed period from October 1 through November 9 shows that those categorized as SQLi and HTTP Anomaly were responsible for just over two-thirds of WAF-mitigated requests. Nearly 14% were categorized as “Software Specific,” which includes attacks related to specific CVEs. The balance of the attacks were mitigated by WAF rules in categories including File Inclusion, XSS (Cross Site Scripting), Directory Traversal, and Command Injection.

2022 US midterm elections attack analysis

Media reports suggest that foreign adversaries actively try to interfere with elections in the United States. While this may be the case, analysis of the mitigated attacks targeting Athenian Project customers found that over 95% of the mitigated requests (attacks) came from IP addresses that geolocate to the United States. However, that does not mean that the attackers themselves are necessarily located in the country, but rather that they appear to be using compromised systems and proxies within the United States to launch their attacks against these sites protected by Cloudflare.

2022 US midterm elections attack analysisCloudflare for Campaigns

In contrast to Athenian Project participants, traffic to candidate sites that are participants in Cloudflare for Campaigns began to grow several weeks ahead of Election Day. The graph below shows a noticeable increase (~50%) in peak traffic volumes starting on October 12, with an additional growth (50-100%) starting a week later. Traffic to these sites appeared to quiet a bit toward the end of October, but saw significant growth again heading into, and during, Election Day.

However, once again, this aggregate traffic data presents something of a misleading picture, as one candidate site saw multiple times more traffic than the other participating sites. While those other sites saw similar shifts in traffic as well, they were dwarfed by those experienced by the outlier site.

2022 US midterm elections attack analysis

The WAF-mitigated traffic trend for campaign sites followed a similar pattern to the overall traffic. As the graph below shows, attack traffic also began to increase around October 19, with a further ramp near the end of the month. The October 27 spike visible in the graph was due to an attack targeting a single customer’s site, and was addressed using “Security Level” mitigation techniques, which uses IP reputation information to decide if and how to present challenges for incoming requests.

2022 US midterm elections attack analysis

The top two rule categories, HTTP Anomaly and SQLi, together accounted for nearly three-quarters of the mitigated requests, and Directory Traversal attacks were just under 10% of mitigated requests for this customer set. The HTTP Anomaly and Directory Traversal percentages were higher than those for attacks targeting Athenian Project participants, while the SQLi percentage was slightly lower.

2022 US midterm elections attack analysis

Once again, a majority of the WAF-mitigated attacks came from IP addresses in the United States. However, among Cloudflare for Campaigns participants, the United States only accounted for 55% of attacks, significantly lower than the 95% seen for Athenian Project participants. The balance is spread across a long tail of countries, with allies including Germany, Canada, and the United Kingdom among the top five. As noted above, however, the attackers may be elsewhere, and are using botnets or other compromised systems in these countries to launch attacks.

2022 US midterm elections attack analysisImproving security with data

We are proud to be trusted by local governments, campaigns, state parties, and voting rights organizations to protect their websites and provide uninterrupted access to information and trusted election results. Sharing information about the threats facing these websites helps us further support their valuable work by enabling them, and other participants in the election space, to take proactive steps to improve site security.

Learn more about how to apply to the Athenian Project, and check out Cloudflare Radar for real-time insights into Internet traffic, attack trends, and more.

Categories: Technology

Protecting election groups during the 2022 US midterm elections

Wed, 09/11/2022 - 16:41
Protecting election groups during the 2022 US midterm electionsProtecting election groups during the 2022 US midterm elections

On Tuesday, November 8, 2022, constituents cast their ballots for the 2022 US midterm elections, which included races for all 435 seats in the House of Representatives, 35 of the 100 seats in the Senate, and many gubernatorial races in states including Florida, Michigan, and Pennsylvania. Preparing for elections is a giant task, and states and localities have their work cut out for them with corralling poll workers, setting up polling places, and managing the physical security of ballots and voting machines.

We at Cloudflare are proud to be able to play a role in helping safeguard the integrity of the electoral process. Through our Impact programs, we provide cyber security products to help protect access to authoritative voting information and the security of sensitive voter data.

We have reported on our work in the election space with the Athenian Project, dedicated to protecting state and local governments that run elections; Cloudflare for Campaigns, a project with a suite of Cloudflare products to secure political campaigns’ and state parties’ websites and internal teams; and Project Galileo, in which we have helped voting rights organizations and election results sites stay online during traffic spikes.

Since our reporting in 2020, we have expanded our relationships with government agencies and worked with project participants across the United States in a range of election roles to support free and fair elections. For the midterm elections, we continued to support election entities with the tools and expertise on how to secure their web infrastructure to promote trust in the voting process.

Overall, we were ready for the unexpected, as we had experience supporting those in the election community in 2020 during a time of uncertainty around COVID-19 and increased political polarization. But for the midterms, the Cybersecurity and Infrastructure Security Agency (CISA), the key agency tasked with protecting election infrastructure against cyber threats, reported the morning of November 8 that they “continue to see no specific or credible threat to disrupt election infrastructure” for the day of the election.

At Cloudflare, although we did see reports of a few smaller attacks and outages, we are pleased that the robust cyber security preparations by governments, nonprofits, local municipalities, campaigns, and state parties appeared to be successful, as we did not identify large-scale attacks on November 8, 2022.

Below are highlights on the activity we saw as we approached midterms and how we worked together with all of these groups to secure election resources.

Key takeaways from the 2022 midterm electionsFor state and local governments protected under the Athenian Project
  • We protect 361 election websites in 31 states. This is a 31% increase since our reporting during the 2020 election.
  • Average daily application-layer attack volume against Athenian sites was only 3.4% higher in November through Election Day than it was in October.
  • From October 1 through November 8, 2022, government election sites experienced an average of 16,170,728 threats per day.
  • A majority of the threats to government election sites that Cloudflare mitigated in October 2022 were classified as HTTP anomaly, SQL injection, and software specific CVEs.
For political campaigns and state parties protected under Cloudflare for Campaigns
  • With our partnership with Defending Digital Campaigns, we protected 56 House campaigns, 15 political parties, and 34 Senate campaigns during the midterm elections.
  • Average daily application-layer attack volume against campaign sites was over 3x higher in November through Election Day than it was in October.
  • From October 1 through November 8, 2022, political campaign and state party sites saw an average of 149,949 threats per day.
  • HTTP anomaly, SQL injection, and directory traversal were the most active categories for mitigated requests against campaign sites in October.
Risks to online election groups as we approached the midterms

In preparation for the midterms, the Federal Bureau of Investigation (FBI) and CISA put out a variety of public service announcements calling attention to cyber election risks, like DDoS attacks, and providing reassurance that cyber attacks were “unlikely to result in large-scale disruptions or prevent voting.” Earlier this year, the FBI issued a warning on phishing attempts, with details about a seemingly organized plot to steal election officials’ credentials via an email with a fake invoice attached.

We also saw some threat actors announce plans to target the midterm elections. Killnet, a pro-Russia hacking group, targeted US state websites, successfully taking the public-facing websites of a number of states temporarily offline. Hacking groups will target public-facing government websites to promote mistrust in the democratic process.

Voting authorities face challenges unrelated to malicious activity, too. Without the proper tools in place, traffic spikes during election season can impede voters’ ability to access information about polling places, registration, and results. During the 2020 US election, we saw 4x traffic spikes to government elections sites.

On the political organizing side, political campaigns and state parties increasingly rely on the Internet and their web presence to issue policy stances, raise donations, and organize their campaign operations. In October 2022, the FBI notified Republican and Democratic state parties that Chinese hackers were scanning party websites for vulnerabilities.

So, what happened during the 2022 US midterm elections?Protecting election groups during the 2022 US midterm elections

As we prepared for the midterms, we had a team of engineers ready to assist state and local governments, campaigns, political parties, and voting rights organizations looking for help to protect their websites from cyber attacks. A majority of the threats that we saw and directly assisted on were before the election, especially in the wake of many advisories from federal agencies on Killnet’s targeting of US government sites.

During this time, we worked with CISA’s Joint Cyber Defense Collaborative (JCDC) to provide security briefings to state and local election officials and to make sure our free Enterprise services for state and local governments under the Athenian Project were part of JCDC’s Cybersecurity Toolkit to Protect Elections. We provided additional support in terms of webinars, security recommendations, and best practices to better prepare these groups for the midterms.

A week before the election, we worked with partners such as Defending Digital Campaigns to onboard many political campaigns and state parties to Cloudflare for Campaigns after seeing a number of campaigns come under DDoS attack. With this, we were able to accept 21 of the Senate Campaigns up for re-election, with an overall total of 34 Senate campaigns protected under the project.

Preparing for the next election

Being in the election space means working with local government, campaigns, state parties, and voting rights organizations to build trust. Democracies rely on access to information and trusted election results.

We accept applications to the Athenian Project all year long, not just during election season — learn how to apply. We look forward to providing more information on threats to these actors in the election space in the next few months to support their valuable work.

Categories: Technology

How the Brazilian Presidential elections affected Internet traffic

Thu, 03/11/2022 - 19:42
How the Brazilian Presidential elections affected Internet trafficBrasil, sei lá
Ou o meu coração se engana
Ou uma terra igual não há
— From Tom Jobim’s song, Brasil NativoHow the Brazilian Presidential elections affected Internet traffic

Brazil’s recent presidential election got significant attention from both global and national media outlets, not only because of the size of the country, but also because of premature allegations of electoral fraud. The first round of the Brazilian 2022 general election was held on October 2, and the runoff was held on Sunday, October 30. With 124 million votes counted, former president Lula da Silva (2003-2010) won with 50.9% of the votes, beating incumbent Jair Bolsonaro, who had 49.1% of the votes.

How the Brazilian Presidential elections affected Internet trafficThe final results of the elections as published by the official Tribunal Superior Eleitoral, with more than 124 million votes counted.)

Using Cloudflare’s data, we can explore the impact that this election had on Internet traffic patterns in Brazil, as well as interest in content from election-related websites, news organizations, social media platforms, and video platforms.

Here are a few highlights: while the runoff generated much more interest to election related websites (we actually have a view to DNS queries, a proxy to websites), the first round showed bigger increases in traffic to news organizations.

For the candidate’s domains, Lula’s win had the higher impact.

Also: official results came earlier on the runoff than the first round, and spikes in traffic were higher earlier that day (October 30).

(Note: we’re using local times — that means UTC-3, that is related to the more populated regions of Brazil — in this blog, although some charts have x-axis UTC).

Let’s start by looking at general Internet traffic in Brazil.

On election days, traffic goes down (during the day)

Using Cloudflare Radar, we can see something that has also been observed in other countries that hold Sunday elections: when most people are getting outside to vote, Internet traffic goes down (in comparison with previous Sundays). We saw this in the two rounds of the Presidential elections in France back in April 2022, in Portugal’s legislative elections in January 2022 and now, in Brazil.

How the Brazilian Presidential elections affected Internet traffic

We can also compare Sundays in October. There were five weekends. The two that had elections show the same pattern of lower traffic during the day, as seen in the previous chart. Comparing the two election days, there was a bigger drop in traffic on October 30 (down 21% at around 18:00 local time), than on October 2 (down 10% at around 20:00). Related or not, there was a bigger turnout on the runoff (124 million votes) than on the first round (123 million). Here’s the view on October 30:

How the Brazilian Presidential elections affected Internet traffic

And here’s October 2:

How the Brazilian Presidential elections affected Internet traffic

A more clear view in comparing the October weekends, and where you can see how the October 2 and 30 Sundays have the same pattern and different from the others three of the month, is this one (bear in mind that the x-axis is showing UTC time, it’s -3 hours in Brazil):

How the Brazilian Presidential elections affected Internet traffic

If we look at the main network providers (ASNs) in Brazil, the trend is the same. Claro (AS28573) also shows the drop in traffic on October 30, as does Telefonica (AS27699):

How the Brazilian Presidential elections affected Internet traffic

Here’s Telefonica:

How the Brazilian Presidential elections affected Internet traffic

We observed a similar impact from the October 30 runoff election to traffic from different states in Brazil, including São Paulo, Rio de Janeiro, Rio Grande do Norte, Minas Gerais, and Bahia.

Mobile device usage greater on weekends (and on election days)

When we look at the share of Brazil’s Internet traffic from mobile devices during October, we find that the highest percentages were on October 2 (first round of the elections, 66.3%), October 9 (66.4%) and October 30 (runoff election, 65%). We’ve seen this in other elections, an increase in mobile device traffice, so this seems to follow the same trend.

How the Brazilian Presidential elections affected Internet traffic

This chart also shows how mobile device usage in Brazil is at its highest on the weekends (all the main spikes for percentage of mobile devices are over the weekend, and more on Sundays).

Now, let’s look at anonymized and aggregated DNS traffic data from our resolver. This data provides a proxy for traffic to, and thus interest in, different categories of sites from users in Brazil around the election.

Election-related sites: higher interest in the runoff

Brazil has government websites related to elections, but also its own Tribunal Superior Eleitoral (Electoral Superior Court) that includes a website and app with live updates on the results of the elections for everyone to check. Looking at those related domains and using mean hourly traffic in September as a baseline, we can see that the October 2 first round spiked to 16x more DNS queries at 20:00 local time. However, DNS query traffic during the runoff election peaked at 18:00 local time on October 30 with 17.4x more DNS traffic as compared to the September baseline.

How the Brazilian Presidential elections affected Internet traffic

We can look more closely at each one of those two election days. On October 2, traffic had its first significant increase at around 17:00 local time, reaching 15x more requests to election-related domains as compared to the September baseline. This initial peak occurred at the same time the polling stations were closing. However, the peak that day, at 16x above baseline, was reached at 20:00 local time, as seen in the figure below.

How the Brazilian Presidential elections affected Internet traffic

On Sunday, October 30, 2022, the pattern is similar, although the peak was reached earlier, given that results started to arrive earlier than on the first round. The peak was reached at around 18:00 local time, with request traffic 17.4x above baseline.

How the Brazilian Presidential elections affected Internet traffic

As seen in the figure below, Lula first led in the official results at 18:45 local time, with votes from 67% of the polling stations counted at that time. Around 20:00 Lula was considered the winner (the peak seen in the previous chart was at that time).

How the Brazilian Presidential elections affected Internet trafficCandidate websites: in the end, winner takes all?

For Lula-related domains, there are clear spikes around the first round of elections on October 2. A 13x spike was observed on October 1 at around 21:00 local time. Two notable spikes were observed on October 2 — one at 16.7x above baseline at 09:00 local time, and the other at 10.7x above baseline at 21:00 local time. During the October 30 runoff election, only one clear spike was observed. The spike, at 16.7x above baseline, occurred at around 20:00, coincident with the time Lula was being announced as the winner.

How the Brazilian Presidential elections affected Internet traffic

For Bolsonaro-related domains, we observed a different pattern. Increased traffic as compared to the baseline is visible in the days leading up to the first round election, reaching 10x on September 30. On October 2, a 8x spike above baseline was seen at 18:00 local time. However, the two most significant spikes seen over the course of the month were observed on October 16, at 20x above baseline, a few hours after the first Lula-Bolsonaro television debate, and on October 25, at around 20:00, at 22x above baseline. That was the last week of campaigning before the October 30 runoff and when several polling predictions were announced. The second and last Bolsonaro-Lula debate was on October 28, and there’s a spike at 22:00 to Lula’s websites, and a smaller but also clear one at 21:00 to Bolsonaro’s websites).

How the Brazilian Presidential elections affected Internet trafficNews websites: more interest in the first round

With official election results being available more rapidly, DNS traffic for Brazilian news organization websites peaked much earlier in the evening than what we saw in France, for example, where more definitive election results arrived much later on election day. But another interesting trend here is how the first round, on October 2, had 9.1x more DNS traffic (compared with the September baseline), than what we saw during the runoff on October 30 (6.1x).

How the Brazilian Presidential elections affected Internet traffic

The way the results arrived faster also had an impact on the time of the peak, occurring at around 19:00 local time on October 30, as compared to around 20:00 on October 2.

At 19:45 local time on October 30, Lula was already the winner with more than 98% of the votes counted. After 20:00 there was a clear drop in DNS traffic to news organizations.

How the Brazilian Presidential elections affected Internet traffic

On October 2, it was only around 22:00 that it became official that there would be a runoff between Lula and Bolsonaro. Peak request volume was reached at 20:00 (9x), but traffic remained high (8x) at around 21:00 and until 22:00, like the following chart shows:

How the Brazilian Presidential elections affected Internet trafficConclusion: Real world events impact the Internet

Cloudflare Radar, our tool for Internet insights, can provide a unique perspective on how major global or national events impact the Internet. It is interesting to not only see that a real world event can impact Internet traffic (and different types of websites) for a whole country, but also see how much that impact is represented at specific times. It’s all about human behavior at relevant moments in time, like elections as a collective event is.

Past examples of this include important presidential elections, the Super Bowl, the Oscars, Eurovision, never before seen views of the universe from a telescope , the holiday shopping season, or religious events such as Ramadan.

You can keep an eye on these trends using Cloudflare Radar.

Categories: Technology

Cloudflare is not affected by the OpenSSL vulnerabilities CVE-2022-3602 and CVE-2022-3786

Wed, 02/11/2022 - 09:31
Cloudflare is not affected by the OpenSSL vulnerabilities CVE-2022-3602 and CVE-2022-3786Cloudflare is not affected by the OpenSSL vulnerabilities CVE-2022-3602 and CVE-2022-3786

Yesterday, November 1, 2022, OpenSSL released version 3.0.7 to patch CVE-2022-3602 and CVE-2022-3786, two HIGH risk vulnerabilities in the OpenSSL 3.0.x cryptographic library. Cloudflare is not affected by these vulnerabilities because we use BoringSSL in our products.

These vulnerabilities are memory corruption issues, in which attackers may be able to execute arbitrary code on a victim’s machine. CVE-2022-3602 was initially announced as a CRITICAL severity vulnerability, but it was downgraded to HIGH because it was deemed difficult to exploit with remote code execution (RCE). Unlike previous situations where users of OpenSSL were almost universally vulnerable, software that is using other versions of OpenSSL (like 1.1.1) are not vulnerable to this attack.

How do these issues affect clients and servers?

These vulnerabilities reside in the code responsible for X.509 certificate verification - most often executed on the client side to authenticate the server and the certificate presented. In order to be impacted by this vulnerability the victim (client or server) needs a few conditions to be true:

  • A malicious certificate needs to be signed by a Certificate Authority that the victim trusts.
  • The victim needs to validate the malicious certificate or ignore a series of warnings from the browser.
  • The victim needs to be running OpenSSL 3.0.x before 3.0.7.

For a client to be affected by this vulnerability, they would have to visit a malicious site that presents a certificate containing an exploit payload. In addition, this malicious certificate would have to be signed by a trusted certificate authority (CA).

Servers with a vulnerable version of OpenSSL can be attacked if they support mutual authentication - a scenario where both client and a server provide a valid and signed X.509 certificate, and the client is able to present a certificate with an exploit payload to the server.

How should you handle this issue?

If you’re managing services that run OpenSSL: you should patch vulnerable OpenSSL packages. On a Linux system you can determine if you have any processes dynamically loading OpenSSL with the lsof command. Here’s an example of finding OpenSSL being used by NGINX.

root@55f64f421576:/# lsof | grep nginx 1294 root mem REG 254,1 925009 /usr/lib/x86_64-linux-gnu/ (path dev=0,142)

Once the package maintainers for your Linux distro release OpenSSL 3.0.7 you can patch by updating your package sources and upgrading the libssl3 package. On Debian and Ubuntu this can be done with the apt-get upgrade command

root@55f64f421576:/# apt-get --only-upgrade install libssl3

With that said, it’s possible that you could be running a vulnerable version of OpenSSL that the lsof command can’t find because your process is statically compiled. It’s important to update your statically compiled software that you are responsible for maintaining, and make sure that over the coming days you are updating your operating system and other installed software that might contain the vulnerable OpenSSL versions.

Key takeaways

Cloudflare’s use of BoringSSL helped us be confident that the issue would not impact us prior to the release date of the vulnerabilities.

More generally, the vulnerability is a reminder that memory safety is still an important issue. This issue may be difficult to exploit because it requires a maliciously crafted certificate that is signed by a trusted CA, and certificate issuers are likely to begin validating that the certificates they sign don’t contain payloads that exploit these vulnerabilities.  However, it’s still important to patch your software and upgrade your vulnerable OpenSSL packages to OpenSSL 3.0.7 given the severity of the issue.

To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.

Categories: Technology

Privacy Gateway: a privacy preserving proxy built on Internet standards

Thu, 27/10/2022 - 14:01
 a privacy preserving proxy built on Internet standards a privacy preserving proxy built on Internet standards

If you’re running a privacy-oriented application or service on the Internet, your options to provably protect users’ privacy are limited. You can minimize logs and data collection but even then, at a network level, every HTTP request needs to come from somewhere. Information generated by HTTP requests, like users’ IP addresses and TLS fingerprints, can be sensitive especially when combined with application data.

Meaningful improvements to your users’ privacy require a change in how HTTP requests are sent from client devices to the server that runs your application logic. This was the motivation for Privacy Gateway: a service that relays encrypted HTTP requests and responses between a client and application server. With Privacy Gateway, Cloudflare knows where the request is coming from, but not what it contains, and applications can see what the request contains, but not where it comes from. Neither Cloudflare nor the application server has the full picture, improving end-user privacy.

We recently deployed Privacy Gateway for Flo Health Inc., a leading female health app, for the launch of their Anonymous Mode. With Privacy Gateway in place, all request data for Anonymous Mode users is encrypted between the app user and Flo, which prevents Flo from seeing the IP addresses of those users and Cloudflare from seeing the contents of that request data.

With Privacy Gateway in place, several other privacy-critical applications are possible:

  • Browser developers can collect user telemetry in a privacy-respecting manner– what extensions are installed, what defaults a user might have changed — while removing what is still a potentially personal identifier (the IP address) from that data.
  • Users can visit a healthcare site to report a Covid-19 exposure without worrying that the site is tracking their IP address and/or location.
  • DNS resolvers can serve DNS queries without linking who made the request with what website they’re visiting – a pattern we’ve implemented with Oblivious DNS.
    Privacy Gateway is based on Oblivious HTTP (OHTTP), an emerging IETF standard and is built upon standard hybrid public-key cryptography.
How does it work?

The main innovation in the Oblivious HTTP standard – beyond a basic proxy service – is that these messages are encrypted to the application’s server, such that Privacy Gateway learns nothing of the application data beyond the source and destination of each message.

Privacy Gateway enables application developers and platforms, especially those with strong privacy requirements, to build something that closely resembles a “Mixnet”: an approach to obfuscating the source and destination of a message across a network. To that end, Privacy Gateway consists of three main components:

  1. Client: the user’s device, or any client that’s configured to forward requests to Privacy Gateway.
  2. Privacy Gateway: a service operated by Cloudflare and designed to relay requests between the Client and the Gateway, without being able to observe the contents within.
  3. Application server: the origin or application web server responsible for decrypting requests from clients, and encrypting responses back.

If you were to imagine request data as the contents of the envelope (a letter) and the IP address and request metadata as the address on the outside, with Privacy Gateway, Cloudflare is able to see the envelope’s address and safely forward it to its destination without being able to see what’s inside.

 a privacy preserving proxy built on Internet standardsAn Oblivious HTTP transaction using Privacy Gateway

In slightly more detail, the data flow is as follows:

  1. Client encapsulates an HTTP request using the public key of the application server, and sends it to Privacy Gateway over an HTTPS connection.
  2. Privacy Gateway forwards the request to the server over its own, separate HTTPS connection with the application server.
  3. The application server  decapsulates the request, forwarding it to the target server which can produce the response.
  4. The application server returns an encapsulated response to Privacy Gateway, which then forwards the result to the client.
    As specified in the protocol, requests from the client to the server are encrypted using HPKE, a state-of-the-art standard for public key encryption – which you can read more about here. We’ve taken additional measures to ensure that OHTTP’s use of HPKE is secure by conducting a formal analysis of the protocol, and we expect to publish a deeper analysis in the coming weeks.
How Privacy Gateway improves end-user privacy

This interaction offers two types of privacy, which we informally refer to as request privacy and client privacy.

Request privacy means that the application server does not learn information that would otherwise be revealed by an HTTP request, such as IP address, geolocation, TLS and HTTPS fingerprints, and so on. Because Privacy Gateway uses a separate HTTPS connection between itself and the application server, all of this per-request information revealed to the application server represents that of Privacy Gateway, not of the client. However, developers need to take care to not send personally identifying information in the contents of requests. If the request, once decapsulated, includes information like users’ email, phone number, or credit card info, for example, Privacy Gateway will not meaningfully improve privacy.

Client privacy is a stronger notion. Because Cloudflare and the application server are not colluding to share individual user’s data, from the server’s perspective, each individual transaction came from some unknown client behind Privacy Gateway. In other words, a properly configured Privacy Gateway deployment means that applications cannot link any two requests to the same client. In particular, with Privacy Gateway, privacy loves company. If there is only one end-user making use of Privacy Gateway, then it only provides request privacy (since the client IP address remains hidden from the Gateway). It would not provide client privacy, since the server would know that each request corresponds to the same, single client. Client privacy requires that there be many users of the system, so the application server cannot make this determination.

To better understand request and client privacy, consider the following HTTP request between a client and server:

 a privacy preserving proxy built on Internet standardsNormal HTTP configuration with a client anonymity set of size 1

If a client connects directly to the server (or “Gateway” in OHTTP terms), the server is likely to see information about the client, including the IP address, TLS cipher used, and a degree of location data based on that IP address:

- ipAddress: # the client’s real IP address - ASN: 7922 - AS Organization: Comcast Cable - tlsCipher: AEAD-CHACHA20-POLY1305-SHA256 # potentially unique - tlsVersion: TLSv1.3 - Country: US - Region: California - City: Campbell

There’s plenty of sensitive information here that might be unique to the end-user. In other words, the connection offers neither request nor client privacy.

With Privacy Gateway, clients do not connect directly to the application server itself. Instead, they connect to Privacy Gateway, which in turn connects to the server. This means that the server only observes connections from Privacy Gateway, not individual connections from clients, yielding a different view:

- ipAddress: # a Cloudflare IP - ASN: 13335 - AS Organization: Cloudflare - tlsCipher: ECDHE-ECDSA-AES128-GCM-SHA256 # shared across several clients - tlsVersion: TLSv1.3 - Country: US - Region: California - City: Los Angeles  a privacy preserving proxy built on Internet standardsPrivacy Gateway configuration with a client anonymity set of size k

This is request privacy. All information about the client’s location and identity are hidden from the application server. And all details about the application data are hidden from Privacy Gateway. For sensitive applications and protocols like DNS, keeping this metadata separate from the application data is an important step towards improving end-user privacy.

Moreover, applications should take care to not reveal sensitive, per-client information in their individual requests. Privacy Gateway cannot guarantee that applications do not send identifying info – such as email addresses, full names, etc – in request bodies, since it cannot observe plaintext application data. Applications which reveal user identifying information in requests may violate client privacy, but not request privacy. This is why – unlike our full application-level Privacy Proxy product – Privacy Gateway is not meant to be used as a generic proxy-based protocol for arbitrary applications and traffic. It is meant to be a special purpose protocol for sensitive applications, including DNS (as is evidenced by Oblivious DNS-over-HTTPS), telemetry data, or generic API requests as discussed above.

Integrating Privacy Gateway into your application

Integrating with Privacy Gateway requires applications to implement the client and server side of the OHTTP protocol. Let’s walk through what this entails.

Server Integration

The server-side part of the protocol is responsible for two basic tasks:

  1. Publishing a public key for request encapsulation; and
  2. Decrypting encapsulated client requests, processing the resulting request, and encrypting the corresponding response.

A public encapsulation key, called a key configuration, consists of a key identifier (so the server can support multiple keys at once for rotation purposes), cryptographic algorithm identifiers for encryption and decryption, and a public key:

HPKE Symmetric Algorithms { HPKE KDF ID (16), HPKE AEAD ID (16), } OHTTP Key Config { Key Identifier (8), HPKE KEM ID (16), HPKE Public Key (Npk * 8), HPKE Symmetric Algorithms Length (16), HPKE Symmetric Algorithms (32..262140), }

Clients need this public key to create their request, and there are lots of ways to do this. Servers could fix a public key and then bake it into their application, but this would require a software update to rotate the key. Alternatively, clients could discover the public key some other way. Many discovery mechanisms exist and vary based on your threat model – see this document for more details. To start, a simple approach is to have clients fetch the public key directly from the server over some API. Below is a snippet of the API that our open source OHTTP server provides:

func (s *GatewayResource) configHandler(w http.ResponseWriter, r *http.Request) { config, err := s.Gateway.Config(s.keyID) if err != nil { http.Error(w, http.StatusText(http.StatusInternalServerError), http.StatusInternalServerError) return } w.Write(config.Marshal()) }

Once public key generation and distribution is solved, the server then needs to handle encapsulated requests from clients. For each request, the server needs to decrypt the request, translate the plaintext to a corresponding HTTP request that can be resolved, and then encrypt the resulting response back to the client.

Open source OHTTP libraries typically offer functions for request decryption and response encryption, whereas plaintext translation from binary HTTP to an HTTP request is handled separately. For example, our open source server delegates this translation to a different library that is specific to how Go HTTP requests are represented in memory. In particular, the function to translate from a plaintext request to a Go HTTP request is done with a function that has the following signature:

func UnmarshalBinaryRequest(data []byte) (*http.Request, error) { ... }

Conversely, translating a Go HTTP response to a plaintext binary HTTP response message is done with a function that has the following signature:

type BinaryResponse http.Response func (r *BinaryResponse) Marshal() ([]byte, error) { ... }

While there exist several open source libraries that one can use to implement OHTTP server support, we’ve packaged all of it up in our open source server implementation available here. It includes instructions for building, testing, and deploying to make it easy to get started.

Client integration

Naturally, the client-side behavior of OHTTP mirrors that of the server. In particular, the client must:

  1. Discover or obtain the server public key; and
  2. Encode and encrypt HTTP requests, send them to Privacy Gateway, and decrypt and decode the HTTP responses.

Discovery of the server public key depends on the server’s chosen deployment model. For example, if the public key is available over an API, clients can simply fetch it directly:

$ curl https://server.example/ohttp-configs > config.bin

Encoding, encrypting, decrypting, and decoding are again best handled by OHTTP libraries when available. With these functions available, building client support is rather straightforward. A trivial example Go client using the library functions linked above is as follows:

configEnc := ... // encoded public key config, err := ohttp.UnmarshalPublicConfig(configEnc) if err != nil { return err } request, err := http.NewRequest(http.MethodGet, "https://test.example/index.html", nil) if err != nil { return err } binaryRequest := ohttp.BinaryRequest(*request) encodedRequest, err := binaryRequest.Marshal() if err != nil { return err } ohttpClient := ohttp.NewDefaultClient(config) encapsulatedReq, reqContext, err := ohttpClient.EncapsulateRequest(encodedRequest) relayRequest, err := http.NewRequest(http.MethodPost, "https://relay.example", bytes.NewReader(encapsulatedReq.Marshal())) if err != nil { return err } relayRequest.Header.Set("Content-Type", "message/ohttp-req") client := http.Client{} relayResponse, err := client.Do(relayRequest) if err != nil { return err } bodyBytes, err := ioutil.ReadAll(relayResponse.Body) if err != nil { return err } encapsulatedResp, err := ohttp.UnmarshalEncapsulatedResponse(bodyBytes) if err != nil { return err } receivedResp, err := reqContext.DecapsulateResponse(encapsulatedResp) if err != nil { return err } response, err := ohttp.UnmarshalBinaryResponse(receivedResp) if err != nil { return err } fmt.Println(response)

A standalone client like this isn’t likely very useful to you if you have an existing application. To help integration into your existing application, we created a sample OHTTP client library that’s compatible with iOS and macOS applications. Additionally, if there’s language or platform support you would like to see to help ease integration on either or the client or server side, please let us know!


Privacy Gateway is currently in early access – available to select privacy-oriented companies and partners. If you’re interested, please get in touch.

Categories: Technology

Stronger than a promise: proving Oblivious HTTP privacy properties

Thu, 27/10/2022 - 14:00
 proving Oblivious HTTP privacy properties proving Oblivious HTTP privacy properties

We recently announced Privacy Gateway, a fully managed, scalable, and performant Oblivious HTTP (OHTTP) relay. Conceptually, OHTTP is a simple protocol: end-to-end encrypted requests and responses are forwarded between client and server through a relay, decoupling who from what was sent. This is a common pattern, as evidenced by deployed technologies like Oblivious DoH and Apple Private Relay. Nevertheless, OHTTP is still new, and as a new protocol it’s imperative that we analyze the protocol carefully.

To that end, we conducted a formal, computer-aided security analysis to complement the ongoing standardization process and deployment of this protocol. In this post, we describe this analysis in more depth, digging deeper into the cryptographic details of the protocol and the model we developed to analyze it. If you’re already familiar with the OHTTP protocol, feel free to skip ahead to the analysis to dive right in. Otherwise, let’s first review what OHTTP sets out to achieve and how the protocol is designed to meet those goals.

Decoupling who from what was sent

OHTTP is a protocol that combines public key encryption with a proxy to separate the contents of an HTTP request (and response) from the sender of an HTTP request. In OHTTP, clients generate encrypted requests and send them to a relay, the relay forwards them to a gateway server, and then finally the gateway decrypts the message to handle the request. The relay only ever sees ciphertext and the client and gateway identities, and the gateway only ever sees the relay identity and plaintext.

In this way, OHTTP is a lightweight application-layer proxy protocol. This means that it proxies application messages rather than network-layer connections. This distinction is important, so let’s make sure we understand the differences. Proxying connections involves a whole other suite of protocols typically built on HTTP CONNECT. (Technologies like VPNs and WireGuard, including Cloudflare WARP, can also be used, but let’s focus on HTTP CONNECT for comparison.)

 proving Oblivious HTTP privacy propertiesConnection-oriented proxy depiction

Since the entire TCP connection itself is proxied, connection-oriented proxies are compatible with any application that uses TCP. In effect, they are general purpose proxy protocols that support any type of application traffic. In contrast, proxying application messages is compatible with application use cases that require transferring entire objects (messages) between a client and server.

 proving Oblivious HTTP privacy propertiesMessage-oriented proxy depiction

Examples include DNS requests and responses, or, in the case of OHTTP, HTTP requests and responses. In other words, OHTTP is not a general purpose proxy protocol: it’s fit for purpose, aimed at transactional interactions between clients and servers (such as app-level APIs). As a result, it is much simpler in comparison.

Applications use OHTTP to ensure that requests are not linked to either of the following:

  1. Client identifying information, including the IP address, TLS fingerprint, and so on. As a proxy protocol, this is a fundamental requirement.
  2. Future requests from the same client. This is necessary for applications that do not carry state across requests.

These two properties make OHTTP a perfect fit for applications that wish to provide privacy to their users without compromising basic functionality. It’s served as the foundation for a widespread deployment of Oblivious DoH for over a year now, and as of recently, serves as the foundation for Flo Health Inc.’s Anonymous Mode feature.

It’s worth noting that both of these properties could be achieved with a connection-oriented protocol, but at the cost of a new end-to-end TLS connection for each message that clients wish to transmit. This can be prohibitively expensive for all entities that participate in the protocol.

So how exactly does OHTTP achieve these goals? Let’s dig deeper into OHTTP to find out.

Oblivious HTTP protocol design

A single transaction in OHTTP involves the following steps:

  1. A client encapsulates an HTTP request using the public key of the gateway server, and sends it to the relay over a client<>relay HTTPS connection.
  2. The relay forwards the request to the server over its own relay<>gateway HTTPS connection.
  3. The gateway decapsulates the request, forwarding it to the target server which can produce the resource.
  4. The gateway returns an encapsulated response to the relay, which then forwards the result to the client.

Observe that in this transaction the relay only ever sees the client and gateway identities (the client IP address and the gateway URL, respectively), but does not see any application data. Conversely, the gateway sees the application data and the relay IP address, but does not see the client IP address. Neither party has the full picture, and unless the relay and gateway collude, it stays that way.

The HTTP details for forwarding requests and responses in the transaction above are not technically interesting – a message is sent from sender to receiver over HTTPS using a POST – so we’ll skip over them. The fascinating bits are in the request and response encapsulation, which build upon HPKE, a recently ratified standard for hybrid public key encryption.

Let’s begin with request encapsulation, which is hybrid public key encryption. Clients first transform their HTTP request into a binary format, called Binary HTTP, as specified by RFC9292. Binary HTTP is, as the name suggests, a binary format for encoding HTTP messages. This representation lets clients encode HTTP requests to binary-encoded values and for the gateway to reverse this process, recovering an HTTP request from a binary-encoded value. Binary encoding is necessary because the public key encryption layer expects binary-encoded inputs.

Once the HTTP request is encoded in binary format, it is then fed into HPKE to produce an encrypted message, which clients then send to the relay to be forwarded to the gateway. The gateway decrypts this message, transforms the binary-encoded request back to its equivalent HTTP request, and then forwards it to the target server for processing.

 proving Oblivious HTTP privacy properties

Responses from the gateway are encapsulated back to the client in a very similar fashion. The gateway first encodes the response in an equivalent binary HTTP message, encrypts it using a symmetric key known only to the client and gateway, and then returns it to the relay to be forwarded to the client. The client decrypts and transforms this message to recover the result.

 proving Oblivious HTTP privacy propertiesSimplified model and security goals

In our formal analysis, we set out to make sure that OHTTP’s use of encryption and proxying achieves the desired privacy goals described above.

To motivate the analysis, consider the following simplified model where there exists two clients C1 and C2, one relay R, and one gateway, G. OHTTP assumes an attacker that can observe all network activity and can adaptively compromise either R or G, but not C1 or C2. OHTTP assumes that R and G do not collude, and so we assume only one of R and G is compromised. Once compromised, the attacker has access to all session information and private key material for the compromised party. The attacker is prohibited from sending client-identifying information, such as IP addresses, to the gateway. (This would allow the attacker to trivially link a query to the corresponding client.)

In this model, both C1 and C2 send OHTTP requests Q1 and Q2, respectively, through R to G, and G provides answers A1 and A2. The attacker aims to link C1 to (Q1, A1) and C2 to (Q2, A2), respectively. The attacker succeeds if this linkability is possible without any additional interaction. OHTTP prevents such linkability. Informally, this means:

  1. Requests and responses are known only to clients and gateways in possession of the corresponding response key and HPKE keying material.
  2. The gateway cannot distinguish between two identical requests generated from the same client, and two identical requests generated from different clients, in the absence of unique per-client keys.

And informally it might seem clear that OHTTP achieves these properties. But we want to prove this formally, which means that the design, if implemented perfectly, would have these properties. This type of formal analysis is distinct from formal verification, where you take a protocol design and prove that some code implements it correctly. Whilst both are useful they are different processes, and in this blog post we’ll be talking about the former. But first, let’s give some background on formal analysis.

Formal analysis programming model

In our setting, a formal analysis involves producing an algebraic description of the protocol and then using math to prove that the algebraic description has the properties we want. The end result is proof that shows that our idealized algebraic version of the protocol is “secure”, i.e. has the desired properties, with respect to an attacker we want to defend against. In our case, we chose to model our idealized algebraic version of OHTTP using a tool called Tamarin, a security-focused theorem prover and model checker. Tamarin is an intimidating tool to use, but makes intuitive sense once you get familiar with it. We’ll break down the various parts of a Tamarin model in the context of our OHTTP model below.

Modeling the Protocol Behavior

Tamarin uses a technique known as multiset rewriting to describe protocols. A protocol description is formed of a series of “rules” that can “fire” when certain requirements are met. Each rule represents a discrete step in the protocol, and when a rule fires that means the step was taken. For example, we have a rule representing the gateway generating its long-term public encapsulation key, and for different parties in the protocol establishing secure TLS connections. These rules can be triggered pretty much any time as they have no requirements.

 proving Oblivious HTTP privacy propertiesBasic rule for OHTTP gateway key generation

Tamarin represents these requirements as “facts”. A rule can be triggered when the right facts are available. Tamarin stores all the available facts in a “bag” or multiset. A multiset is similar to an ordinary set, in that it stores a collection of objects in an unordered fashion, but unlike an ordinary set, duplicate objects are allowed. This is the “multiset” part of “multiset rewriting”.

The rewriting part refers to the output of our rules. When a rule triggers it takes some available facts out of the bag and, when finished, inserts some new facts into the bag. These new facts might fulfill the requirements of some other rule, which can then be triggered, producing even more new facts, and so on1. In this way we can represent progress through the protocol. Using input and output facts, we can describe our rule for generating long-term public encapsulation keys, which has no requirements and produces a long-term key as output, as follows.

 proving Oblivious HTTP privacy properties

A rule requirement is satisfied if there exist output facts that match the rule’s input facts. As an example, in OHTTP, one requirement for the client rule for generating a request is that the long-term public encapsulation key exists. This matching is shown below.

 proving Oblivious HTTP privacy properties

Let’s put some of these pieces together to show a very small but concrete part of OHTTP as an example: the client generating its encapsulated request and sending it to the relay. This step should produce a message for the relay, as well as any corresponding state needed to process the eventual response from the relay. As a precondition, the client requires (1) the gateway public key and (2) a TLS connection to the relay. And as mentioned earlier, generating the public key and TLS connection do not require any inputs, so they can be done at any time.

 proving Oblivious HTTP privacy properties

Modeling events in time

Beyond consuming and producing new facts, each Tamarin rule can also create side effects, called “action facts.” Tamarin records the action facts each time a rule is triggered. An action fact might be something like “a client message containing the contents m was sent at time t.” Sometimes rules can only be triggered in a strict sequence, and we can therefore put their action facts in a fixed time order. At other times multiple rules might have their prerequisites met at the same time, and therefore we can’t put their action facts into a strict time sequence. We can represent this pattern of partially ordered implications as a directed acyclic graph, or DAG for short.

Altogether, multiset rewriting rules describe the steps of a protocol, and the resulting DAG records the actions associated with the protocol description. We refer to the DAG of actions as the action graph. If we’ve done our job well it’s possible to follow these rules and produce every possible combination of messages or actions allowed by the protocol, and their corresponding action graph.

As an example of the action graph, let’s consider what happens when the client successfully finishes the protocol. When the requirements for this rule are satisfied, the rule triggers, marking that the client is done and that the response was valid. Since the protocol is done at this point, there are no output facts produced.

 proving Oblivious HTTP privacy propertiesAction graph for terminal client response handler ruleModeling the attacker

The action graph is core to reasoning about the protocol’s security properties. We can check a graph for various properties, e.g. “does the first action taken by the relay happen after the first action taken by the client?”. Our rules allow for multiple runs of the protocol to happen at the same time. This is very powerful. We can look at a graph and ask “did something bad happen here that might break the protocol’s security properties?”

In particular, we can prove (security and correctness) properties by querying this graph, or by asserting various properties about it. For example, we might say “for all runs of the protocol, if the client finishes the protocol and can decrypt the response from the gateway, then the response must have been generated and encrypted by an entity which has the corresponding shared secret.”

This is a useful statement, but it doesn’t say much about security. What happens if the gateway private key is compromised, for example? In order to prove security properties, we need to define our threat model, which includes the adversary and their capabilities. In Tamarin, we encode the threat model as part of the protocol model. For example, when we define messages being passed from the client to the relay, we can add a special rule that allows the attacker to read it as it goes past. This gives us the ability to describe properties such as “for all runs of the protocol in our language the attacker never learns the secret key.”

For security protocols, we typically give the attacker the ability to read, modify, drop, and replay any message. This is sometimes described as “the attacker controls the network”, or a Dolev-Yao attacker. However, the attacker can also sometimes compromise different entities in a protocol, learning state associated with that entity. This is sometimes called an extended Dolev-Yao attacker, and it is precisely the attacker we consider in our model.

Going back to our model, we give the attacker the ability to compromise long-term key pairs and TLS sessions as needed through different rules. These set various action facts that mark the fact that compromise took place.

 proving Oblivious HTTP privacy propertiesAction graph for key compromise rule

Putting everything together, we have a way to model the protocol behavior, attacker capabilities, and security properties. Let’s now dive into how we applied these to prove OHTTP secure.

OHTTP Tamarin model

In our model, we give the attacker the ability to compromise the server’s long-term keys and the key between the client and the relay. Against this attacker, we aim to prove these two informal statements stated above:

  1. Requests and responses are known only to clients and gateways in possession of the corresponding response key and HPKE keying material.
  2. The gateway cannot distinguish between two requests generated from the same client, and two requests generated from different clients, in the absence of unique per-client keys.

To prove these formally, we express them somewhat differently. First, we assert that the protocol actually completes. This is an important step, because if your model has a bug in it where the protocol can’t even run as intended, then Tamarin is likely to say it’s “secure” because nothing bad ever happens.

For the core security properties, we translate the desired goals into questions we ask about the model. In this way, formal analysis only provides us proof (or disproof!) of the questions we ask, not the questions we should have asked, and so this translation relies on experience and expertise. We break down this translation for each of the questions we want to ask below, starting with gateway authentication.

Gateway authentication Unless the attacker has compromised the gateway’s long term keys, if the client completes the protocol and is able to decrypt the gateway’s response, then it knows that: the responder was the gateway it intended to use, the gateway derived the same keys, the gateway saw the request the client sent, and the response the client received is the one the gateway sent.

This tells us that the protocol actually worked, and that the messages sent and received were as they were supposed to be. One aspect of authentication can be that the participants agree on some data, so although this protocol seems to be a bit of a grab bag of properties they’re all part of one authentication property.

Next, we need to prove that the request and response remain secret. There are several ways in which secrecy may be violated, e.g., if encryption or decryption keys are compromised. We do so by proving the following properties.

Request and response secrecy
The request and response are both secret, i.e., the attacker never learns them, unless the attacker has compromised the gateway’s long term keys.

In a sense, request and response secrecy covers the case where the gateway is malicious, because if the gateway is malicious then the “attacker” knows the gateway’s long term keys.

Relay connection security
The contents of the connection between the client and relay are secret unless the attacker has compromised the relay.

We don’t have to worry about the secrecy of the connection if the client is compromised because in that scenario the attacker knows the query before it’s even been sent, and can learn the response by making an honest query itself. If your client is compromised then it’s game over.

AEAD nonce reuse resistance
If the gateway sends a message to the client, and the attacker finds a different message encrypted with the same key and nonce, then either the attacker has already compromised the gateway, or they already knew the query.

In translation, this property means that the response encryption is correct and not vulnerable to attack, such as through AEAD nonce reuse. This would obviously be a disaster for OHTTP, so we were careful to check that this situation never arises, especially as we’d already detected this issue in ODoH.

Finally, and perhaps most importantly, we want to prove that an attacker can’t link a particular query to a client. We prove a slightly different property which effectively argues that, unless the relay and gateway collude, then the attacker cannot link the encrypted query to its decrypted query together. In particular, we prove the following:

Client unlinkability
If an attacker knows the query and the contents of the connection sent to the relay (i.e. the encrypted query), then it must have compromised both the gateway and the relay.

This doesn’t in general prove indistinguishability. There are two techniques an attacker can use to link two queries. Direct inference and statistical analysis. Because of the anonymity trilemma we know that we cannot defend against statistical analysis, so we have to declare it out of scope and move on. To prevent direct inference we need to make sure that the attacker doesn't compromise either the client, or both the relay and the gateway together, which would let it directly link the queries. So is there anything we can protect against? Thankfully there is one thing. We can make sure that a malicious gateway can't identify that a single client sent two messages. We prove that by not keeping any state between connections. If a returning client acts in exactly the same way as a new client, and doesn't carry any state between requests, there's nothing for the malicious gateway to analyze.

And that’s it! If you want to have a go at proving some of these properties yourself our models and proofs are available on our GitHub, as are our ODoH models and proofs. The Tamarin prover is freely available too, so you can double-check all our work. Hopefully this post has given you a flavor of what we mean when we say that we’ve proven a protocol secure, and inspired you to have a go yourself. If you want to work on great projects like this check out our careers page.

1Depending on the model, this process can lead to an exponential blow-up in search space, making it impossible to prove anything automatically. Moreover, if the new output facts do not fulfill the requirements of any remaining rule(s) then the process hangs.

Categories: Technology

And here's another one: the Next.js Edge Runtime becomes the fourth full-stack framework supported by Cloudflare Pages

Mon, 24/10/2022 - 14:00
 the Next.js Edge Runtime becomes the fourth full-stack framework supported by Cloudflare Pages the Next.js Edge Runtime becomes the fourth full-stack framework supported by Cloudflare Pages

You can now deploy Next.js applications which opt in to the Edge Runtime on Cloudflare Pages. Next.js is the fourth full-stack web framework that the Pages platform officially supports, and it is one of the most popular in the 'Jamstack-y' space.

Cloudflare Pages started its journey as a platform for static websites, but with last year's addition of Pages Functions powered by Cloudflare Workers, the platform has progressed to support an even more diverse range of use cases. Pages Functions allows developers to sprinkle in small pieces of server-side code with its simple file-based routing, or, as we've seen with the adoption from other frameworks (namely SvelteKit, Remix and Qwik), Pages Functions can be used to power your entire full-stack app. The folks behind Remix previously talked about the advantages of adopting open standards, and we've seen this again with Next.js' Edge Runtime.

Next.js' Edge Runtime

Next.js' Edge Runtime is an experimental mode that developers can opt into which results in a different type of application being built. Previously, Next.js applications which relied on server-side rendering (SSR) functionality had to be deployed on a Node.js server. Running a Node.js server has significant overhead, and our Cloudflare Workers platform was fundamentally built on a different technology, V8.

However, when Next.js introduced the Edge Runtime mode in June 2022, we saw the opportunity to bring this widely used framework to our platform. We're very excited that this is being developed in coordination with the WinterCG standards to ensure interoperability across the various web platforms and to ensure that developers have the choice on where they run their business, without fearing any significant vendor lock-in.

It’s important to note that some existing Next.js apps built for Node.js won't immediately work on Pages. If your application relies on any Node.js built-ins or long-running processes, then Pages may not support your app with today’s announcement as we're working on expanding our support for Node.js.

However, we see the migration to the Edge Runtime as an effort that's worthy of investment, to run your applications, well, on the edge! These applications are cheaper to run, respond faster to users and have the latest features that full-stack frameworks offer. We're seeing increased interest in third-party npm packages and libraries that support standardized runtimes, and in combination with Cloudflare's data products (e.g. KV, Durable Objects and D1), we're confident that the edge is going to be the first place that people will want to deploy applications going forward.

Deploy your Next.js app to Cloudflare Pages

Let’s walk through an example, creating a new Next.js application that opts into this Edge Runtime and deploying it to Cloudflare Pages.

npx create-next-app@latest my-app

This will create a new Next.js app in the my-app folder. The default template comes with a traditional Node.js powered API route, so let's update that to instead use the Edge Runtime.

// pages/api/hello.js // Next.js Edge API Routes: export const config = { runtime: 'experimental-edge', } export default async function (req) { return new Response( JSON.stringify({ name: 'John Doe' }), { status: 200, headers: { 'Content-Type': 'application/json' } } ) }

Thanks to the Edge Runtime adopting the Web API standards, if you've ever written a Cloudflare Worker before, this might look familiar.

Next, we can update the global next.config.js configuration file to use the Edge Runtime. This will enable us to use the getServerSideProps() API and server-side render (SSR) our webpages.

// next.config.js /** @type {import('next').NextConfig} */ const nextConfig = { experimental: { runtime: 'experimental-edge', }, reactStrictMode: true, swcMinify: true, } module.exports = nextConfig

Finally, we're ready to deploy the project. Publish it to a GitHub or GitLab repository, create a new Pages project, and select "Next.js" from the list of framework presets. This will configure your project to use the @cloudflare/next-on-pages CLI which builds and transforms your project into something we can deploy on Pages. Navigate to the project settings and add an environment variable, NODE_VERSION set to 14 or greater, as well as the following compatibility flags: streams_enable_constructors and transformstream_enable_standard_constructor. You should now be able to deploy your Next.js application. If you want to read more, you can find a detailed guide in our documentation.

How it runs on Cloudflare PagesCompatibility Dates and Compatibility Flags

Cloudflare Workers has solved the versioning problem by introducing compatibility dates and compatibility flags. While it has been in beta, Pages Functions has always defaulted to using the oldest version of the Workers runtime. We've now introduced controls to allow developers to set these dates and flags on their Pages projects environments.

 the Next.js Edge Runtime becomes the fourth full-stack framework supported by Cloudflare Pages

By keeping this date recent, you are able to opt in to the latest features and bug fixes that the Cloudflare Workers runtime offers, but equally, you're completely free to keep the date on whatever works for you today, and we'll continue to support the functionality at that point in time, forever. We also allow you to set these dates for your production and preview environments independently which will let you test these changes out safely in a preview deployment before rolling it out in production.

We've been working on adding more support for the Streams API to the Workers Runtime, and some of this functionality is gated behind the flags we added to the project earlier. These flags are currently scheduled to graduate and become on-by-default on a future compatibility date, 2022-11-30.

The @cloudflare/next-on-pages CLI

Vercel introduced the Build Output API in July 2022 as a "zero configuration" directory structure which the Vercel platform inherently understands and can deploy. We've decided to hook into this same API as a way to build Next.js projects consistently that we can understand and deploy.

The open-source @cloudflare/next-on-pages CLI runs npx vercel build behind the scenes, which produces a .vercel/output directory. This directory conforms to the Build Output API, and of particular interest, contains a config.json, static folder and folder of functions. The @cloudflare/next-on-pages CLI then parses this config.json manifest, and combines all the functions into a single Pages Functions 'advanced mode' _worker.js.

At this point, the build is finished. Pages then automatically picks up this _worker.js and deploys it with Pages Functions atop the static directory.

Although currently just an implementation detail, we opted to use this Build Output API for a number of reasons. We’re also exploring other similar functionality natively on the Pages platform. We already have one "magical" directory, the functions directory which we use for the file-based routing of Pages Functions. It's possible that we offer other fixed directory structures which would reduce the need for configuration of any projects using frameworks which adopt the API. Let us know in Discord if you have any thoughts or preferences on what you would like to see!

Additionally, if more full-stack frameworks do adopt Vercel's Build Output API, we may have automatic support for them running on Pages with this CLI. We've only been experimenting with Next.js here so far (and SvelteKit, Remix and Qwik all have their own way of building their projects on Pages at the moment), but it's possible that in the future we may converge on a standard approach which could be shared between frameworks and platforms. We're excited to see how this might transpire. Again, let us know if you have thoughts!

Experimental webpack minification

As part of the compilation from .vercel/output/functions to an _worker.js, the @cloudflare/next-on-pages CLI can perform an experimental minification to give you more space for your application to run on Workers. Right now, most accounts are limited to a maximum script size of 1MB (although this can be raised in some circumstances—get in touch!). You can ordinarily fit quite a lot of code in this, but one thing notable about Next.js' build process at the moment is that it creates webpack-compiled, fully-formed and fully-isolated functions scripts in each of the directories in .vercel/output/functions. This means that each function ends up looking something like this:

let _ENTRIES = {}; (() => { // webpackBootstrap })(); (self["webpackChunk_N_E"] = self["webpackChunk_N_E"] || []).push([100], { 123: (() => { // webpack chunk #123 }), 234: (() => { // webpack chunk #234 }), 345: (() => { // webpack chunk #345 }), // …lots of webpack chunks… }, () => { // webpackRuntimeModules }]); export default { async fetch(request, env, ctx) { return _ENTRIES['some_function']; } }

The script contains everything that's needed to deploy this function, and most of the logic exists in these webpack chunks, but that means that each function has a lot of code shared with its siblings. Quickly, you'll reach the 1MB limit, if you naively deployed all these functions together.

Our @cloudflare/next-on-pages --experimental-minify CLI argument deals with this problem by analyzing webpack chunks which are re-used in multiple places in this .vercel/output/functions directory and extracts out that code to a common place. This allows our compiler (esbuild) to efficiently combine this code, without duplicating it in all of these places. This process is experimental for the time being, while we look to make this as efficient as possible, without introducing any bugs as a result. Please file an issue on GitHub if you notice any difference in behavior when using --experimental-minify.

What's next?

Pages Functions has been in beta for almost a year, and we're very excited to say that general availability is just around the corner. We're polishing off the last of the remaining features which includes analytics, logging, and billing. In fact, for billing, we recently made the announcement of how you'll be able to use the Workers Paid plan to remove the request limits of the Pages Functions beta from November 15.

Finally, we're also looking at how we can bring Wasm support to Pages Functions which will unlock ever more use-cases for your full-stack applications. Stay tuned for more information on how we'll be offering this soon.

Try creating a Next.js Edge Runtime application and deploying it to Cloudflare Pages with the example above or by following the guide in our documentation. Let us know if you have any questions or face any issues in Discord or on GitHub, and please report any quirks of the --experimental-minify argument. As always, we're excited to see what you build!

Categories: Technology

Page Shield can now watch for malicious outbound connections made by third-party JavaScript code

Fri, 21/10/2022 - 14:53
Page Shield can now watch for malicious outbound connections made by third-party JavaScript codePage Shield can now watch for malicious outbound connections made by third-party JavaScript code

Page Shield can now watch for malicious outbound connections made by third-party JavaScript code

Many websites use third party JavaScript libraries to cut development time by using pre-built features. Common examples include checkout services, analytics tools, or live chat integrations. Any one of these JavaScript libraries may be sending site visitors’ data to unknown locations.

If you manage a website, and you have ever wondered where end user data might be going and who has access to it, starting today, you can find out using Page Shield’s Connection Monitor.

Page Shield is our client side security solution that aims to detect malicious behavior and compromises that affect the browser environment directly, such as those that exploit vulnerabilities in third party JavaScript libraries.

Connection Monitor, available from today, is the latest addition to Page Shield and allows you to see outbound connections being made by your users’ browsers initiated by third party JavaScript added to your site. You can then review this information to ensure only appropriate third parties are receiving sensitive data.

Customers on our business and enterprise plans receive visibility in outbound connections provided by Connection Monitor. If you are using our Page Shield enterprise add-on, you also get notifications whenever a connection is found to be potentially malicious.

Covering more attack surface with Connection Monitor

Connection Monitor expands the net of opportunities to catch malicious behavior that might be happening in your users’ browsers by complementing the visibility provided by Script Monitor, the core feature of Page Shield before today.

While Script Monitor is focusing on analyzing JavaScript code to find malicious signals, Connection Monitor is looking at where data is sent to. The two features work perfectly together.

Very frequently, in fact, client side compromises within the context of web applications result in data exfiltration. The most well known example of this is Magecart-style attacks where a malicious actor would attempt to exfiltrate credit card data directly from the application’s check out flow (normally on e-commerce sites) without changing the application behavior.

These attacks are often hard to detect as they exploit JavaScript outside your direct control, for example an embedded widget, and operate without any noticeable effect on the user experience.

Page Shield can now watch for malicious outbound connections made by third-party JavaScript codeComplementing Content Security Policies

Page Shield uses Content Security Policies (CSPs) to receive data from the browser, but complements them by focusing on the core problem: detecting malicious behavior, something that CSPs don’t do out of the box.

Content Security Policies are widely adopted and allow you, as a website administrator, to tell browsers what the browser is allowed to load and from where. This is useful in principle, but in practice CSPs are hard to maintain for large applications, and often end up being very broad making them ineffective. More importantly, CSPs provide no built-in mechanism to detect malicious behavior. This is where Page Shield helps.

Before today, with Script Monitor, Page Shield would detect malicious behavior by focusing on JavaScript files only, by running, among other things, our classifier on JavaScript code. Starting today, with Connection Monitor, we also perform threat intelligence feed lookups against connection URL endpoints allowing us to quickly spot potentially suspicious data leaks.

Connection Monitor: under the hood

Connection Monitor uses the connect-src directive from Content Security Policies (CSPs) to receive information about outbound connections from browsers.

This information is then stored for easy access and enhanced with additional insights including connection status, connection page source, domain information, and if you have access to our enterprise add-on, threat feed intelligence.

To use Connection Monitor you need to proxy your application via Cloudflare. When turned on, it will, on a sampled percentage of HTML page loads only, insert the following HTTP response header that implements the Content Security Policy used to receive data:

content-security-policy-report-only: script-src 'none'; connect-src 'none'; report-uri <HOSTNAME>/cdn-cgi/script_monitor/report?<QUERY_STRING>

This HTTP response header asks the browser to send information regarding scripts (script-src) and connections (connect-src) to the given endpoint. By default, the endpoint hostname is, but you can change it to be the same hostname of your website if you are on our enterprise add-on.

Using the above CSP, browsers will report any connections initiated by:

  • <a> ping,
  • fetch(),
  • XMLHttpRequest,
  • WebSocket,
  • EventSource, and
  • Navigator.sendBeacon()

An example connection report is shown below:

"csp-report": { "document-uri": "", "referrer": "", "violated-directive": "connect-src", "effective-directive": "connect-src", "original-policy": "", "disposition": "report", "blocked-uri": "wss://", "line-number": 5, "column-number": 16, "source-file": "", "status-code": 200, "script-sample": "" }

Using reports like the one above, we can then create an inventory of outbound connection URLs alongside which pages they were initiated by. This data is then made available via the dashboard enhanced with:

  • Connection status: Active if the connection has been seen recently
  • Timestamps: First seen and last seen
  • Metadata: WHOIS information, SSL certificate information, if any, domain registration information
  • Malicious signals: Threat feed domain and URL lookups*

* URL feed lookups are only available if the full connection path is being stored.

A note on privacy

At Cloudflare, we want to ensure both direct customer and end customer privacy. For this reason, Connection Monitor by default will only store and collect the scheme and host portion of the connection URL, so for example, if the endpoint the browser is sending data to is:

Connection Monitor will only store and drop the path /session/abc. This ensures that we are minimizing the risk of storing session IDs, or other sensitive data that may be found in full URLs.

Not storing the path, does mean that in some specific circumstances, we are not able to do full URL feed lookups from our threat intelligence. For this reason, if you know you are not inserting sensitive data in connection paths, you can easily turn on path storage from the dashboard. Domain lookups will continue to work as expected. Support for also storing the query string will be added in the future.

Going further

Script Monitor and Connection Monitor are only two of many directives provided by CSP that we plan to support in Page Shield. Going further, there are a number of additional features we are already working on, including the ability to suggest and implement both positive and negative policies directly from the dashboard.

We are excited to see Connection Monitor providing additional visibility in application behavior and look forward to the next evolutions.

Categories: Technology

Cloudflare Workers and micro-frontends: made for one another

Thu, 20/10/2022 - 14:00
 made for one another

To help developers build better web applications we researched and devised a fragments architecture to build micro-frontends using Cloudflare Workers that is lightning fast, cost-effective to develop and operate, and scales to the needs of the largest enterprise teams without compromising release velocity or user experience.

Here we share a technical overview and a proof of concept of this architecture.

Why micro-frontends?

One of the challenges of modern frontend web development is that applications are getting bigger and more complex. This is especially true for enterprise web applications supporting e-commerce, banking, insurance, travel, and other industries, where a unified user interface provides access to a large amount of functionality. In such projects it is common for many teams to collaborate to build a single web application. These monolithic web applications, usually built with JavaScript technologies like React, Angular, or Vue, span thousands, or even millions of lines of code.

When a monolithic JavaScript architecture is used with applications of this scale, the result is a slow and fragile user experience with low Lighthouse scores. Furthermore, collaborating development teams often struggle to maintain and evolve their parts of the application, as their fates are tied with fates of all the other teams, so the mistakes and tech debt of one team often impacts all.

Drawing on ideas from microservices, the frontend community has started to advocate for micro-frontends to enable teams to develop and deploy their features independently of other teams. Each micro-frontend is a self-contained mini-application, that can be developed and released independently, and is responsible for rendering a “fragment” of the page. The application then combines these fragments together so that from the user's perspective it feels like a single application.

 made for one anotherAn application consisting of multiple micro-frontends

Fragments could represent vertical application features, like “account management” or “checkout”, or horizontal features, like “header” or “navigation bar”.

Client-side micro-frontends

A common approach to micro-frontends is to rely upon client-side code to lazy load and stitch fragments together (e.g. via Module Federation). Client-side micro-frontend applications suffer from a number of problems.

Common code must either be duplicated or published as a shared library. Shared libraries are problematic themselves. It is not possible to tree-shake unused library code at build time resulting in more code than necessary being downloaded to the browser and coordinating between teams when shared libraries need to be updated can be complex and awkward.

Also, the top-level container application must bootstrap before the micro-frontends can even be requested, and they also need to boot before they become interactive. If they are nested, then you may end up getting a waterfall of requests to get micro-frontends leading to further runtime delays.

These problems can result in a sluggish application startup experience for the user.

Server-side rendering could be used with client-side micro-frontends to improve how quickly a browser displays the application but implementing this can significantly increase the complexity of development, deployment and operation. Furthermore, most server-side rendering approaches still suffer from a hydration delay before the user can fully interact with the application.

Addressing these challenges was the main motivation for exploring an alternative solution, which relies on the distributed, low latency properties provided by Cloudflare Workers.

Micro-frontends on Cloudflare Workers

Cloudflare Workers is a compute platform that offers a highly scalable, low latency JavaScript execution environment that is available in over 275 locations around the globe. In our exploration we used Cloudflare Workers to host and render micro-frontends from anywhere on our global network.

Fragments architecture

In this architecture the application consists of a tree of “fragments” each deployed to Cloudflare Workers that collaborate to server-side render the overall response. The browser makes a request to a “root fragment”, which will communicate with “child fragments” to generate the final response. Since Cloudflare Workers can communicate with each other with almost no overhead, applications can be server-side rendered quickly by child fragments, all working in parallel to render their own HTML, streaming their results to the parent fragment, which combines them into the final response stream delivered to the browser.

 made for one anotherA high-level overview of a fragments architectureVisit the “Cloud Gallery”

We have built an example of a “Cloud Gallery” application to show how this can work in practice. It is deployed to Cloudflare Workers at

The demo application is a simple filtered gallery of cloud images built using our fragments architecture. Try selecting a tag in the type-ahead to filter the images listed in the gallery. Then change the delay on the stream of cloud images to see how the type-ahead filtering can be interactive before the page finishes loading.

Multiple Cloudflare Workers

The application is composed of a tree of six collaborating but independently deployable Cloudflare Workers, each rendering their own fragment of the screen and providing their own client-side logic, and assets such as CSS stylesheets and images.

 made for one anotherArchitectural overview of the Cloud Gallery app

The “main” fragment acts as the root of the application. The “header” fragment has a slider to configure an artificial delay to the display of gallery images. The “body” fragment contains the “filter” fragment and “gallery” fragments. Finally, the “footer” fragment just shows some static content.

The full source code of the demo app is available on GitHub.

Benefits and features

This architecture of multiple collaborating server-side rendered fragments, deployed to Cloudflare Workers has some interesting features.


Fragments are entirely encapsulated, so they can control what they own and what they make available to other fragments.

Fragments can be developed and deployed independently

Updating one of the fragments is as simple as redeploying that fragment. The next request to the main application will use the new fragment. Also, fragments can host their own assets (client-side JavaScript, images, etc.), which are streamed through their parent fragment to the browser.

Server-only code is not sent to the browser

As well as reducing the cost of downloading unnecessary code to the browser, security sensitive code that is only needed for server-side rendering the fragment is never exposed to other fragments and is not downloaded to the browser. Also, features can be safely hidden behind feature flags in a fragment, allowing more flexibility with rolling out new behavior safely.


Fragments are fully composable - any fragment can contain other fragments. The resulting tree structure gives you more flexibility in how you architect and deploy your application. This helps larger projects to scale their development and deployment. Also, fine-grain control over how fragments are composed, could allow fragments that are expensive to server-side render to be cached individually.

Fantastic Lighthouse scores

Streaming server-rendered HTML results in great user experiences and Lighthouse scores, which in practice means happier users and higher chance of conversions for your business.

 made for one anotherLighthouse scores for the Cloud Gallery app

Each fragment can parallelize requests to its child fragments and pipe the resulting HTML streams into its own single streamed server-side rendered response. Not only can this reduce the time to render the whole page but streaming each fragment through to the browser reduces the time to the first byte of each fragment.

Eager interactivity

One of the powers of a fragments architecture is that fragments can become interactive even while the rest of the application (including other fragments) is still being streamed down to the browser.

In our demo, the “filter” fragment is immediately interactive as soon as it is rendered, even if the image HTML for the “gallery” fragment is still loading.

To make it easier to see this, we added a slider to the top of the “header” that can simulate a network or database delay that slows down the HTML stream which renders the “gallery” images. Even when the “gallery” fragment is still loading, the type-ahead input, in the “filter” fragment, is already fully interactive.

Just think of all the frustration that this eager interactivity could avoid for web application users with unreliable Internet connection.

Under the hood

As discussed already this architecture relies upon deploying this application as many cooperating Cloudflare Workers. Let’s look into some details of how this works in practice.

We experimented with various technologies, and while this approach can be used with many frontend libraries and frameworks, we found the Qwik framework to be a particularly good fit, because of its HTML-first focus and low JavaScript overhead, which avoids any hydration problems.

Implementing a fragment

Each fragment is a server-side rendered Qwik application deployed to its own Cloudflare Worker. This means that you can even browse to these fragments directly. For example, the “header” fragment is deployed to

 made for one anotherA screenshot of the self-hosted “header” fragment

The header fragment is defined as a Header component using Qwik. This component is rendered in a Cloudflare Worker via a fetch() handler:

export default { fetch(request: Request, env: Record<string, unknown>): Promise<Response> { return renderResponse(request, env, <Header />, manifest, "header"); }, }; cloud-gallery/header/src/entry.ssr.tsx

The renderResponse() function is a helper we wrote that server-side renders the fragment and streams it into the body of a Response that we return from the fetch() handler.

The header fragment serves its own JavaScript and image assets from its Cloudflare Worker. We configure Wrangler to upload these assets to Cloudflare and serve them from our network.

Implementing fragment composition

Fragments that contain child fragments have additional responsibilities:

  • Request and inject child fragments when rendering their own HTML.
  • Proxy requests for child fragment assets through to the appropriate fragment.
Injecting child fragments

The position of a child fragment inside its parent can be specified by a FragmentPlaceholder helper component that we have developed. For example, the “body” fragment has the “filter” and “gallery” fragments.

<div class="content"> <FragmentPlaceholder name="filter" /> <FragmentPlaceholder name="gallery" /> </div> cloud-gallery/body/src/root.tsx

The FragmentPlaceholder component is responsible for making a request for the fragment and piping the fragment stream into the output stream.

Proxying asset requests

As mentioned earlier, fragments can host their own assets, especially client-side JavaScript files. When a request for an asset arrives at the parent fragment, it needs to know which child fragment should receive the request.

In our demo we use a convention that such asset paths will be prefixed with /_fragment/<fragment-name>. For example, the header logo image path is /_fragment/header/cf-logo.png. We developed a tryGetFragmentAsset() helper which can be added to the parent fragment’s fetch() handler to deal with this:

export default { async fetch( request: Request, env: Record<string, unknown> ): Promise<Response> { // Proxy requests for assets hosted by a fragment. const asset = await tryGetFragmentAsset(env, request); if (asset !== null) { return asset; } // Otherwise server-side render the template injecting child fragments. return renderResponse(request, env, <Root />, manifest, "div"); }, }; cloud-gallery/body/src/entry.ssr.tsxFragment asset paths

If a fragment hosts its own assets, then we need to ensure that any HTML it renders uses the special _fragment/<fragment-name> path prefix mentioned above when referring to these assets. We have implemented a strategy for this in the helpers we developed.

The FragmentPlaceholder component adds a base searchParam to the fragment request to tell it what this prefix should be. The renderResponse() helper extracts this prefix and provides it to the Qwik server-side renderer. This ensures that any request for client-side JavaScript has the correct prefix. Fragments can apply a hook that we developed called useFragmentRoot(). This allows components to gather the prefix from a FragmentContext context.

For example, since the “header” fragment hosts the Cloudflare and GitHub logos as assets, it must call the useFragmentRoot() hook:

export const Header = component$(() => { useStylesScoped$(HeaderCSS); useFragmentRoot(); return (...); }); cloud-gallery/header/src/root.tsx

The FragmentContext value can then be accessed in components that need to apply the prefix. For example, the Image component:

export const Image = component$((props: Record<string, string | number>) => { const { base } = useContext(FragmentContext); return <img {...props} src={base + props.src} />; }); cloud-gallery/helpers/src/image/image.tsxService-binding fragments

Cloudflare Workers provide a mechanism called service bindings to make requests between Cloudflare Workers efficiently that avoids network requests. In the demo we use this mechanism to make the requests from parent fragments to their child fragments with almost no performance overhead, while still allowing the fragments to be independently deployed.

Comparison to current solutions

This fragments architecture has three properties that distinguish it from other current solutions.

Unlike monoliths, or client-side micro-frontends, fragments are developed and deployed as independent server-side rendered applications that are composed together on the server-side. This significantly improves rendering speed, and lowers interaction latency in the browser.

Unlike server-side rendered micro-frontends with Node.js or cloud functions, Cloudflare Workers is a globally distributed compute platform with a region-less deployment model. It has incredibly low latency, and a near-zero communication overhead between fragments.

Unlike solutions based on module federation, a fragment's client-side JavaScript is very specific to the fragment it is supporting. This means that it is small enough that we don’t need to have shared library code, eliminating the version skew issues and coordination problems when updating shared libraries.

Future possibilities

This demo is just a proof of concept, so there are still areas to investigate. Here are some of the features we’d like to explore in the future.


Each micro-frontend fragment can be cached independently of the others based on how static its content is. When requesting the full page, the fragments only need to run server-side rendering for micro-frontends that have changed.

 made for one anotherAn application where the output of some fragments are cached

With per-fragment caching you can return the HTML response to the browser faster, and avoid incurring compute costs in re-rendering content unnecessarily.

Fragment routing and client-side navigation

Our demo application used micro-frontend fragments to compose a single page. We could however use this approach to implement page routing as well. When server-side rendering, the main fragment could insert the appropriate “page” fragment based on the visited URL. When navigating, client-side, within the app, the main fragment would remain the same while the displayed “page” fragment would change.

 made for one anotherAn application where each route is delegated to a different fragment

This approach combines the best of server-side and client-side routing with the power of fragments.

Using other frontend frameworks

Although the Cloud Gallery application uses Qwik to implement all fragments, it is possible to use other frameworks as well. If really necessary, it’s even possible to mix and match frameworks.

To achieve good results, the framework of choice should be capable of server-side rendering, and should have a small client-side JavaScript footprint. HTML streaming capabilities, while not required, can significantly improve performance of large applications.

 made for one anotherAn application using different frontend frameworksIncremental migration strategies

Adopting a new architecture, compute platform, and deployment model is a lot to take in all at once, and for existing large applications is prohibitively risky and expensive. To make this  fragment-based architecture available to legacy projects, an incremental adoption strategy is a key.

Developers could test the waters by migrating just a single piece of the user-interface within their legacy application to a fragment, integrating with minimal changes to the legacy application. Over time, more of the application could then be moved over, one fragment at a time.

Convention over configuration

As you can see in the Cloud Gallery demo application, setting up a fragment-based micro-frontend requires quite a bit of configuration. A lot of this configuration is very mechanical and could be abstracted away via conventions and better tooling. Following productivity-focused precedence found in Ruby on Rails, and filesystem based routing meta-frameworks, we could make a lot of this configuration disappear.

Try it yourself!

There is still so much to dig into! Web applications have come a long way in recent years and their growth is hard to overstate. Traditional implementations of micro-frontends have had only mixed success in helping developers scale development and deployment of large applications. Cloudflare Workers, however, unlock new possibilities which can help us tackle many of the existing challenges and help us build better web applications.

Thanks to the generous free plan offered by Cloudflare Workers, you can check out the Gallery Demo code and deploy it yourself.

If all of these sounds interesting to you, and you would like to work with us on improving the developer experience for Cloudflare Workers, we are also happy to share that we are hiring!

Categories: Technology

Cloudflare Pages gets even faster with Early Hints

Fri, 07/10/2022 - 14:00
Cloudflare Pages gets even faster with Early Hints

This post is also available in Deutsch, Español and Français.

Cloudflare Pages gets even faster with Early Hints

Last year, we demonstrated what we meant by “lightning fast”, showing Pages' first-class performance in all parts of the world, and today, we’re thrilled to announce an integration that takes this commitment to speed even further – introducing Pages support for Early Hints! Early Hints allow you to unblock the loading of page critical resources, ahead of any slow-to-deliver HTML pages. Early Hints can be used to improve the loading experience for your visitors by significantly reducing key performance metrics such as the largest contentful paint (LCP).

What is Early Hints?

Early Hints is a new feature of the Internet which is supported in Chrome since version 103, and that Cloudflare made generally available for websites using our network. Early Hints supersedes Server Push as a mechanism to "hint" to a browser about critical resources on your page (e.g. fonts, CSS, and above-the-fold images). The browser can immediately start loading these resources before waiting for a full HTML response. This uses time that was otherwise previously wasted! Before Early Hints, no work could be started until the browser received the first byte of the response. Now, the browser can fill this time usefully when it was previously sat waiting. Early Hints can bring significant improvements to the performance of your website, particularly for metrics such as LCP.

How Early Hints works

Cloudflare caches any preload and preconnect type Link headers sent from your 200 OK response, and sends them early for any subsequent requests as a 103 Early Hints response.

In practical terms, an HTTP conversation now looks like this:


GET / Host:

Early Hints Response

103 Early Hints Link: </styles.css>; rel=preload; as=style


200 OK Content-Type: text/html; charset=utf-8 Link: </styles.css>; rel=preload; as=style <html> <!-- ... --> </html> Early Hints on Cloudflare Pages

Websites hosted on Cloudflare Pages can particularly benefit from Early Hints. If you're using Pages Functions to generate dynamic server-side rendered (SSR) pages, there's a good chance that Early Hints will make a significant improvement on your website.

Performance Testing

We created a simple demonstration e-commerce website in order to evaluate the performance of Early Hints.

Cloudflare Pages gets even faster with Early Hints

This landing page has the price of each item, as well as a remaining stock counter. The page itself is just hand-crafted HTML and CSS, but these pricing and inventory values are being templated in live for every request with Pages Functions. To simulate loading from an external data-source (possibly backed by KV, Durable Objects, D1, or even an external API like Shopify) we've added a fixed delay before this data resolves. We include preload links in our response to some critical resources:

  • an external CSS stylesheet,
  • the image of the t-shirt,
  • the image of the cap,
  • and the image of the keycap.

The very first request makes a waterfall like you might expect. The first request is held blocked for a considerable amount of time while we resolve this pricing and inventory data. Once loaded, the browser parses the HTML, pulls out the external resources, and makes subsequent requests for their contents. The CSS and images extend the loading time considerably given their large dimensions and high quality. The largest contentful paint (LCP) occurs when the t-shirt image loads, and the document finishes once all requests are fulfilled.

Cloudflare Pages gets even faster with Early Hints

Subsequent requests are where things get interesting! These preload links are cached on Cloudflare's global network, and are sent ahead of the document in a 103 Early Hints response. Now, the waterfall looks much different. The initial request goes out the same, but now, requests for the CSS and images slide much further left since they can be started as soon as the 103 response is delivered. The browser starts fetching those resources while waiting for the original request to finish server-side rendering. The LCP again occurs once the t-shirt image has loaded, but this time, it's brought forward by 530ms because it started loading 752ms faster, and the document is fully loaded 562ms faster, again because the external resources could all start loading faster.

Cloudflare Pages gets even faster with Early Hints

The final four requests (highlighted in yellow) come back as 304 Not Modified responses using a If-None-Match header. By default, Cloudflare Pages requires the browser to confirm that all assets are fresh, and so, on the off chance that they were updated between the Early Hints response and when they come to being used, the browser is checking if they have changed. Since they haven't, there's no contentful body to download, and the response completes quickly. This can be avoided by setting a custom Cache-Control header on these assets using a _headers file. For example, you could cache these images for one minute with a rule like:

# _headers /*.png Cache-Control: max-age=60

We could take this performance audit further by exploring other features that Cloudflare offers, such as automatic CSS minification, Cloudflare Images, and Image Resizing.

We already serve Cloudflare Pages from one of the fastest networks in the world — Early Hints simply allows developers to take advantage of our global network even further.

Using Early Hints and Cloudflare Pages

The Early Hints feature on Cloudflare is currently restricted to caching Link headers in a webpage's response. Typically, this would mean that Cloudflare Pages users would either need to use the _headers file, or Pages Functions to apply these headers. However, for your convenience, we've also added support to transform any <link> HTML elements you include in your body into Link headers. This allows you to directly control the Early Hints you send, straight from the same document where you reference these resources – no need to come out of HTML to take advantage of Early Hints.

For example, for the following HTML document, will generate an Early Hints response:

HTML Document

<!DOCTYPE html> <html> <head> <link rel="preload" as="style" href="/styles.css" /> </head> <body> <!-- ... --> </body> </html>

Early Hints Response

103 Early Hints Link: </styles.css>; rel=preload; as=style

As previously mentioned, Link headers can also be set with a _headers file if you prefer:

# _headers / Link: </styles.css>; rel=preload; as=style

Early Hints (and the automatic HTML <link> parsing) has already been enabled automatically for all domains. If you have any custom domains configured on your Pages project, make sure to enable Early Hints on that domain in the Cloudflare dashboard under the "Speed" tab. More information can be found in our documentation.

Additionally, in the future, we hope to support the Smart Early Hints features. Smart Early Hints will enable Cloudflare to automatically generate Early Hints, even when no Link header or <link> elements exist, by analyzing website traffic and inferring which resources are important for a given page. We'll be sharing more about Smart Early Hints soon.

In the meantime, try out Early Hints on Pages today! Let us know how much of a loading improvement you see in our Discord server.

Watch on Cloudflare TV
Categories: Technology
Additional Terms