Blogroll: CloudFlare

I read blogs, as well as write one. The 'blogroll' on this site reproduces some posts from some of the people I enjoy reading. There are currently 36 posts from the blog 'CloudFlare.'

Disclaimer: Reproducing an article here need not necessarily imply agreement or endorsement!

Subscribe to CloudFlare feed CloudFlare
The Cloudflare Blog
Updated: 1 week 22 hours ago

Add Watermarks to your Cloudflare Stream Video Uploads

Fri, 11/09/2020 - 12:00
Add Watermarks to your Cloudflare Stream Video UploadsAdd Watermarks to your Cloudflare Stream Video Uploads

Since the launch of Cloudflare Stream, our customers have been asking for a programmatic way to add watermarks to their videos. We built the Watermarks API to support a wide range of use cases: from customers who simply want to tell Stream “can you put this watermark image to the top right of my video?” to customers with more detailed asks such as “can you put this watermark image in a way it doesn’t take up more than 10% of the original video and with 20% opacity?” All that and more is now available at no additional cost through the Watermarks API.

What is Cloudflare Stream?

Cloudflare Stream provides out-of-the-box video infrastructure so developers can bring their app ideas to market faster. While building a video streaming app, developers must ask themselves questions like

  • Where do we store the videos affordably?
  • How do we encode the videos to support users with varying Internet speeds?
  • How do we maintain our video pipeline in the long term?”

Cloudflare Stream is a single product that handles video encoding, storage, delivery and presentation (with the Stream Player.) Stream lets developers launch their ideas faster while having the confidence the video infrastructure will scale with their app’s growth.

How the Watermark API works

The Watermark API lets you add a watermark to a video at the time of uploading. It consists of two new features to the Stream API:

  • A new /stream/watermarks endpoint that lets you create watermark profiles and returns a uid, a unique identifier for each watermark profile
  • Support for a watermark object containing the uid of the watermark profile that can be passed at the time of upload
Step 1: Creating a Watermark Profile

A watermark profile describes the nature of the watermark, including the image to use as a watermark and properties such as its positioning, padding and scale.

Add Watermarks to your Cloudflare Stream Video Uploads

In this example, we are going to create a watermark profile that places the Cloudflare logo to the lower left of the video:

curl --request POST \ --url https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/stream/watermarks \ --header 'content-type: application/json' \ --header 'x-auth-email: $CLOUDFLARE_EMAIL \ --header 'x-auth-key: $CLOUDFLARE_KEY \ --data '{ "url": "https://storage.googleapis.com/zaid-test/Watermarks%20Demo/cf-icon.png", "name": "Cloudflare Icon", "opacity": 0.5, "padding": 0.05, "scale": 0.1, "position": "lowerLeft" }'

The response contains information about the watermark profile, including a uid that we will use in the next step

{ "result": { "uid": "a85d289c2e3f82701103620d16cd2408", "size": 9165, "height": 504, "width": 600, "created": "2020-09-03T20:43:56.337486Z", "downloadedFrom": "REDACTED_VIDEO_URL", "name": "Cloudflare Icon", "opacity": 0.5, "padding": 0.05, "scale": 0.1, "position": "lowerLeft" }, "success": true, "errors": [], "messages": [] } Step 2: Apply the Watermark

We’ve created the watermark and are ready to use it. Below is a screengrab from the Built For This commercial. It contains no watermark:

Add Watermarks to your Cloudflare Stream Video Uploads

We are going to upload the commercial and request Stream to add the logo from the previous step as a watermark:

curl --request POST \ --url https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/stream/copy \ --header 'content-type: application/json' \ --header 'x-auth-email: $EMAIL \ --header 'x-auth-key: $AUTH_KEY' \ --data '{ "url": "https://storage.googleapis.com/zaid-test/Watermarks%20Demo/The%20Internet%20was%20BuiltForThis.mp4", "watermark": { "uid": "a85d289c2e3f82701103620d16cd2408" } }' Step 3: Your video, now with a watermark!

You’re done! You can watch the video with a watermark:


What’s next

Read the detailed Watermark API docs covering different use cases.

In future iterations, we plan to add support for animated watermarks. Additionally, we want to add Watermark support to the Stream Dashboard so you have a UI to manage and add watermarks.

Categories: Technology

Migrating cdnjs to serverless with Workers KV

Thu, 10/09/2020 - 12:00
Migrating cdnjs to serverless with Workers KV

Cloudflare powers cdnjs, an open-source project that accelerates websites by delivering popular JavaScript libraries and resources via Cloudflare’s network. Since our major update in December, we focused on remodelling cdnjs for scalability and resilience. Today, we are excited to announce how Cloudflare delivers cdnjs—a migration to a serverless infrastructure using Cloudflare Workers and its distributed key-value store Workers KV!

What is cdnjs and why do I care?Migrating cdnjs to serverless with Workers KV

For those unfamiliar, cdnjs is an acronym describing a Content Delivery Network (CDN) for JavaScript (JS). A CDN simply refers to a geographically distributed network of servers that provide Internet content, whether it is memes, cat videos, or HTML pages. In our case, the CDN refers to Cloudflare’s ever expanding network of over 200 globally distributed data centers.

And here’s why this is relevant to you: it makes page load times lightning-fast. Virtually every website you visit needs to fetch JS libraries in order to load, including this one. Let’s say you visit a Sydney-based website that contains a local file from jQuery, a popular library found in 76.2% of websites. If you are located in New York, you may notice a delay, as it can easily exceed 300ms to fetch the file—not to mention the time it takes for the round trips involved with the TLS handshake. However, if the website references jQuery using cdnjs.cloudflare.com, you can retrieve the file from the closest Cloudflare data center in Buffalo, reducing the latency to a blazing 20ms.

While cdnjs operates behind the scenes, it is used by over 11% of websites, making the Internet a much faster and more reliable place. In July, cdnjs served almost 190 billion requests—an enormous 3.46PB of data.

Where are the files stored?

Migrating cdnjs to serverless with Workers KV

While cdnjs speeds up the Internet, it certainly isn’t magic!

Historically, a number of load-balanced machines at one of Cloudflare’s core data centers would periodically pull cdnjs files from a backing store, acting as the origin for cdnjs.cloudflare.com. When a new file is requested, it is cached by Cloudflare, allowing it to be fetched quickly from any of our data centers.

The backing store is a catalogue of JS, CSS, and other web libraries in the form of an open-source GitHub repository. What this means is that anyone—including you—can contribute to it, subject to review and other processes.

However, until recently, these existing operations were very labor intensive and fragile.

This blog post will explain why we changed the infrastructure behind cdnjs to make it faster, more reliable, and easier to maintain. First, we will discuss how the community used to contribute to cdnjs, outlining the pains and concerns of the old system. Then, we will explore the benefits of migrating to Workers KV. After, we will dive into the new architecture, as well as upgrades to the website and cdnjs API. Finally, we will review the history of cdnjs, and where it is headed in the future.

If you think you know how to make a PR, think again
Migrating cdnjs to serverless with Workers KV

For the non-technical reader, a pull request (PR) is a request to merge changes you’ve made to a repository. Traditionally, if you wanted to include your JavaScript library in cdnjs, you would first create a PR on GitHub to cdnjs/cdnjs with a JSON file describing your package and additional files for any version you wished to include. Once your PR was approved by our old bot, manually reviewed, and then merged by a maintainer, your package would be integrated with cdnjs.

Sounds easy, right? You can just fork the repo, clone it, and copy paste a few files, no?

Exactly. Contributing was easy if you had several hours to burn, a case-sensitive file system, and a couple hundred gigabytes of free disk space to git clone the 300GB repo. If you were short on time—no problem, you could always use your advanced knowledge of git sparse-checkout to get the job done. Don’t know git? Just add one file at a time manually through GitHub’s UI.

I think you get the point. I know I certainly did when I naively spent 10 hours cloning the repo, only to discover that macOS is case-insensitive by default.

However, updating cdnjs was not only difficult for the contributors, but also the maintainers. Historically, the community was able to contribute version files directly, which could potentially be malicious. This created lots of work for maintainers, requiring them to inspect each file manually, diffing files against the official library source and running malware checks.
So how did packages update once they were in cdnjs? In the JSON file describing each package, there was an optional auto-update definition telling the bot where to look for new versions of the library. If present, when your package released a new version from npm or GitHub, the bot would download it, pushing the files to cdnjs/cdnjs and computed Subresource Integrity (SRI) hashes to cdnjs/SRIs. If the auto-update property was missing, it would be your responsibility to make manual PRs to update cdnjs with any future versions.

A wake-up call for cdnjsMigrating cdnjs to serverless with Workers KV

In April, during maintenance at one of our core data centers, a technician accidentally disconnected the cables supplying all external connections to our other data centers, causing the data center to go offline for approximately four hours. This incident served as the first wake-up call for cdnjs, especially since the affected data center housed the primary cdnjs origin web servers. In this case, we did have a backup running on an external provider, but what really saved us was Cloudflare’s global cache, which minimized the impact of the outage as only uncached assets failed to load.

We started to think about how we can improve both the reliability and performance of how we serve cdnjs. We went straight to Cloudflare Workers, our own platform for developing on the edge. One powerful tool built into Workers is Workers KV—a low-latency, globally distributed key-value store optimized for high-read applications.

We put two and two together, realizing that instead of pulling the cdnjs/cdnjs repository and serving files from disk, we could cut the physical machines out entirely, distributing the data around the world and serving files straight from the edge. That way, cdnjs would be able to recover from any origin data center failure, while also increasing its scalability.

Workers KV to the rescueMigrating cdnjs to serverless with Workers KV

At first glance, the decision to use Workers KV was a no-brainer. Since files in cdnjs never change but require frequent reads, Workers KV was a perfect fit.

However, as we planned our migration, we became concerned that with over 7 million assets in cdnjs, there would undoubtedly exist files that exceed Workers KV’s 10MiB value limit. After investigating, we discovered that several hundred cdnjs files were oversized, the majority being JavaScript Source Maps.

Then the idea hit us. We could store compressed versions of cdnjs files in Workers KV, not only solving our oversized file issue, but also optimizing how we serve files.

If you pay the Internet bill, you’ll know that bandwidth is expensive! For this reason, all modern browsers will try to fetch compressed web content whenever it is available. Similarly, within Cloudflare we often experiment with on-the-fly compression to reduce our bandwidth, always serving compressed content to the eyeball when it is accepted. As a result, we decided to compress all cdnjs files ahead of time, writing them to Workers KV with both optimal Brotli and gzip forms. That way, we could increase the compression level compared to on-the-fly compression as we no longer have the latency requirements.

This means we now serve cdnjs files faster and smaller!

A complete makeover for cdnjs

Migrating cdnjs to serverless with Workers KV

Today, if you want to include your JavaScript library in cdnjs, you first create a PR on GitHub to our new repository cdnjs/packages. The repo is easily cloneable at 50MB and consists of thousands of JSON files, each describing a cdnjs package and how it is auto-updated from npm or git. Once your file is validated by our automated CI—powered by a new bot—and merged by a maintainer, your package would be automatically enrolled in our auto-update service.

In the new system, security and maintainability are prioritized. For starters, cdnjs version files are created by our bot, minimizing the possibility of human error when merging a new version. While the JSON files in cdnjs/packages are added by error-prone humans, they are inspected by our bot before being approved by a maintainer. Each file is automatically validated against a JSON schema, as well as checked for popularity on npm or GitHub.

When the bot discovers a new release, it pushes Brotli and gzip-compressed versions of the files to a files namespace in Workers KV. With each entry, the bot writes some metadata in Workers KV for the ETag and Last-Modified HTTP headers. Similar to before, the bot also computes Subresource Integrity (SRI) hashes of the uncompressed files, but now pushes them instead to a SRIs namespace in Workers KV.

Then, when a new file is requested from cdnjs.cloudflare.com, a Cloudflare Worker will inspect the client’s Accept-Encoding header, fetching either the Brotli or gzip-compressed version with its ETag and Last-Modified metadata from Workers KV. As the compressed file travels back through Cloudflare, it is cached for future requests and uncompressed on-the-fly if needed.

At the moment, there are still a handful of files exceeding Workers KV’s size limit. Consequently, if the Cloudflare Worker fails to retrieve a file from Workers KV, it is fetched from the origin backed by the original git repo. In the coming months, we plan on gradually removing this infrastructure.

Scaling the website and APIMigrating cdnjs to serverless with Workers KV

Besides the core cdnjs infrastructure, many of its other components received upgrades as well!

On the cdnjs project’s homepage, you will be greeted by a slick new beta website built by Matt. Constructed with Vue and Nuxt, the beta website is powered entirely by the cdnjs API. As a result, it is always up-to-date with the latest package information and requires low resource usage to serve the site—which runs completely on the client-side after the first page load—helping us scale with cdnjs’s never-ending growth.

In fact, the cdnjs API also strengthened its scalability, benefitting from a serverless architecture close to the one we have seen with cdnjs and Workers KV.

Before migrating to Workers KV, the cdnjs API relied on a regularly scheduled process that involved generating about 300MB of metadata. The cdnjs API’s backend would then fetch this enormous “package.min.js” file into memory and use it to operate the API. If you are curious, the file is still being hosted here, but be warned—it may lag your browser! Similarly, file SRIs were pushed to cdnjs/SRIs, which was cloned by the API locally to serve SRI responses.

After all cdnjs files (within the permitted size limit) were moved to Workers KV, these legacy processes became unsustainable, requiring millions of reads and an unreasonable amount of time. Therefore, we decided to upload all metadata found into Workers KV. We split the metadata into four namespaces—one for package-level metadata, one for version-specific metadata, one containing aggregated metadata, and one for file SRIs.

Similar to cdnjs’s serverless design, a Cloudflare Worker sits on top of metadata.speedcdnjs.com, serving data from Workers KV using several public endpoints. Currently, the cdnjs API is fully integrated with these endpoints, which provide an elegant solution as cdnjs continues to scale.

Transparency and the future of cdnjs

Since its birth in January 2011, cdnjs has always been deeply rooted in transparency, deriving its strength from the community. Even when cdnjs exploded in size and its founders Ryan Kirkman and Thomas Davis teamed up with us in June 2011, the project remained entirely open-source on GitHub.

As the years passed, it became harder for the founders to stay active, heavily depending on the community for support. With a nearly nonexistent budget and little access to the repository, core cdnjs maintainers were challenged every day to keep the project alive.

Last year, this led us to contact the founders, who were happy to have our assistance with the project. With Cloudflare’s increased role, cdnjs is as stable as ever, with active members from both Cloudflare and the community.

However, as we remove our reliance on the legacy system and store files in Workers KV, there are concerns that cdnjs will become proprietary. Don’t worry, we are working hard to ensure that cdnjs remains as transparent and open-source as possible. To help the community audit updates to Workers KV, there is a new repository, cdnjs/logs, which is used by the bot to log all Workers KV-related events. Furthermore, anyone can validate the integrity of cdnjs files by fetching SRIs from the cdnjs API.

Conclusion

Overall, this past year has been a turbulent time for cdnjs, but all of its shortcomings have acted as red flags to help us build a better system. Most recently, we have mitigated the risks of depending on physical machines at a single location, migrating cdnjs to a serverless infrastructure where its files are stored in Workers KV.

Today, cdnjs is in good hands, and is not going away anytime soon. Shout out especially to the maintainers Sven and Matt for creating tons of momentum with the project, working on everything from scaling cdnjs to editing this post.

Moving forward, we are committed to making cdnjs as transparent as possible. As we continue to improve cdnjs, we will release more blog posts to keep the community up to date. If you are interested, please subscribe to our blog. After all, it is the community that makes cdnjs possible! A special thanks to our active GitHub contributors and members of the cdnjs Community Forum for sticking with us!

Categories: Technology

Unimog - Cloudflare’s edge load balancer

Wed, 09/09/2020 - 12:00
Unimog - Cloudflare’s edge load balancer

As the scale of Cloudflare’s edge network has grown, we sometimes reach the limits of parts of our architecture. About two years ago we realized that our existing solution for spreading load within our data centers could no longer meet our needs. We embarked on a project to deploy a Layer 4 Load Balancer, internally called Unimog, to improve the reliability and operational efficiency of our edge network. Unimog has now been deployed in production for over a year.

This post explains the problems Unimog solves and how it works. Unimog builds on techniques used in other Layer 4 Load Balancers, but there are many details of its implementation that are tailored to the needs of our edge network.

Unimog - Cloudflare’s edge load balancerThe role of Unimog in our edge network

Cloudflare operates an anycast network, meaning that our data centers in 200+ cities around the world serve the same IP addresses. For example, our own cloudflare.com website uses Cloudflare services, and one of its IP addresses is 104.17.175.85. All of our data centers will accept connections to that address and respond to HTTP requests. By the magic of Internet routing, when you visit cloudflare.com and your browser connects to 104.17.175.85, your connection will usually go to the closest (and therefore fastest) data center.

Unimog - Cloudflare’s edge load balancer

Inside those data centers are many servers. The number of servers in each varies greatly (the biggest data centers have a hundred times more servers than the smallest ones). The servers run the application services that implement our products (our caching, DNS, WAF, DDoS mitigation, Spectrum, WARP, etc). Within a single data center, any of the servers can handle a connection for any of our services on any of our anycast IP addresses. This uniformity keeps things simple and avoids bottlenecks.

Unimog - Cloudflare’s edge load balancer

But if any server within a data center can handle any connection, when a connection arrives from a browser or some other client, what controls which server it goes to? That’s the job of Unimog.

There are two main reasons why we need this control. The first is that we regularly move servers in and out of operation, and servers should only receive connections when they are in operation. For example, we sometimes remove a server from operation in order to perform maintenance on it. And sometimes servers are automatically removed from operation because health checks indicate that they are not functioning correctly.

The second reason concerns the management of the load on the servers (by load we mean the amount of computing work each one needs to do). If the load on a server exceeds the capacity of its hardware resources, then the quality of service to users will suffer. The performance experienced by users degrades as a server approaches saturation, and if a server becomes sufficiently overloaded, users may see errors. We also want to prevent servers being underloaded, which would reduce the value we get from our investment in hardware. So Unimog ensures that the load is spread across the servers in a data center. This general idea is called load balancing (balancing because the work has to be done somewhere, and so for the load on one server to go down, the load on some other server must go up).

Note that in this post, we’ll discuss how Cloudflare balances the load on its own servers in edge data centers. But load balancing is a requirement that occurs in many places in distributed computing systems. Cloudflare also has a Layer 7 Load Balancing product to allow our customers to balance load across their servers. And Cloudflare uses load balancing in other places internally.

Deploying Unimog led to a big improvement in our ability to balance the load on our servers in our edge data centers. Here’s a chart for one data center, showing the difference due to Unimog. Each line shows the processor utilization of an individual server (the colour of the lines indicates server model). The load on the servers varies during the day with the activity of users close to this data center. The white line marks the point when we enabled Unimog. You can see that after that point, the load on the servers became much more uniform. We saw similar results when we deployed Unimog to our other data centers.

Unimog - Cloudflare’s edge load balancerHow Unimog compares to other load balancers

There are a variety of techniques for load balancing. Unimog belongs to a category called Layer 4 Load Balancers (L4LBs). L4LBs direct packets on the network by inspecting information up to layer 4 of the OSI network model, which distinguishes them from the more common Layer 7 Load Balancers.

The advantage of L4LBs is their efficiency. They direct packets without processing the payload of those packets, so they avoid the overheads associated with higher level protocols. For any load balancer, it’s important that the resources consumed by the load balancer are low compared to the resources devoted to useful work. At Cloudflare, we already pay close attention to the efficient implementation of our services, and that sets a high bar for the load balancer that we put in front of those services.

The downside of L4LBs is that they can only control which connections go to which servers. They cannot modify the data going over the connection, which prevents them from participating in higher-level protocols like TLS, HTTP, etc. (in contrast, Layer 7 Load Balancers act as proxies, so they can modify data on the connection and participate in those higher-level protocols).

L4LBs are not new. They are mostly used at companies which have scaling needs that would be hard to meet with L7LBs alone. Google has published about Maglev, Facebook open-sourced Katran, and Github has open-sourced their GLB.

Unimog is the L4LB that Cloudflare has built to meet the needs of our edge network. It shares features with other L4LBs, and it is particularly strongly influenced by GLB. But there are some requirements that were not well-served by existing L4LBs, leading us to build our own:

  • Unimog is designed to run on the same general-purpose servers that provide application services, rather than requiring a separate tier of servers dedicated to load balancing.
  • It performs dynamic load balancing: measurements of server load are used to adjust the number of connections going to each server, in order to accurately balance load.
  • It supports long-lived connections that remain established for days.
  • Virtual IP addresses are managed as ranges (Cloudflare serves hundreds of thousands of IPv4 addresses on behalf of our customers, so it is impractical to configure these individually).
  • Unimog is tightly integrated with our existing DDoS mitigation system, and the implementation relies on the same XDP technology in the Linux kernel.

The rest of this post describes these features and the design and implementation choices that follow from them in more detail.

For Unimog to balance load, it’s not enough to send the same (or approximately the same) number of connections to each server, because the performance of our servers varies. We regularly update our server hardware, and we’re now on our 10th generation. Once we deploy a server, we keep it in service for as long as it is cost effective, and the lifetime of a server can be several years. It’s not unusual for a single data center to contain a mix of server models, due to expansion and upgrades over time. Processor performance has increased significantly across our server generations. So within a single data center, we need to send different numbers of connections to different servers to utilize the same percentage of their capacity.

It’s also not enough to give each server a fixed share of connections based on static estimates of their capacity. Not all connections consume the same amount of CPU. And there are other activities running on our servers and consuming CPU that are not directly driven by connections from clients. So in order to accurately balance load across servers, Unimog does dynamic load balancing: it takes regular measurements of the load on each of our servers, and uses a control loop that increases or decreases the number of connections going to each server so that their loads converge to an appropriate value.

Refresher: TCP connections

The relationship between TCP packets and connections is central to the operation of Unimog, so we’ll briefly describe that relationship.

(Unimog supports UDP as well as TCP, but for clarity most of this post will focus on the TCP support. We explain how UDP support differs towards the end.)

Here is the outline of a TCP packet:

Unimog - Cloudflare’s edge load balancer

The TCP connection that this packet belongs to is identified by the four labelled header fields, which span the IPv4/IPv6 (i.e. layer 3) and TCP (i.e. layer 4) headers: the source and destination addresses, and the source and destination ports. Collectively, these four fields are known as the 4-tuple. When we say the Unimog sends a connection to a server, we mean that all the packets with the 4-tuple identifying that connection are sent to that server.

A TCP connection is established via a three-way handshake between the client and the server handling that connection. Once a connection has been established, it is crucial that all the incoming packets for that connection go to that same server. If a TCP packet belonging to the connection is sent to a different server, it will signal the fact that it doesn’t know about the connection to the client with a TCP RST (reset) packet. Upon receiving this notification, the client terminates the connection, probably resulting in the user seeing an error. So a misdirected packet is much worse than a dropped packet. As usual, we consider the network to be unreliable, and it’s fine for occasional packets to be dropped. But even a single misdirected packet can lead to a broken connection.

Cloudflare handles a wide variety of connections on behalf of our customers. Many of these connections carry HTTP, and are typically short lived. But some HTTP connections are used for websockets, and can remain established for hours or days. Our Spectrum product supports arbitrary TCP connections. TCP connections can be terminated or stall for many reasons, and ideally all applications that use long-lived connections would be able to reconnect transparently, and applications would be designed to support such reconnections. But not all applications and protocols meet this ideal, so we strive to maintain long-lived connections. Unimog can maintain connections that last for many days.

Forwarding packets

The previous section described that the function of Unimog is to steer connections to servers. We’ll now explain how this is implemented.

To start with, let’s consider how one of our data centers might look without Unimog or any other load balancer. Here’s a conceptual view:

Unimog - Cloudflare’s edge load balancer

Packets arrive from the Internet, and pass through the router, which forwards them on to servers (in reality there is usually additional network infrastructure between the router and the servers, but it doesn’t play a significant role here so we’ll ignore it).

But is such a simple arrangement possible? Can the router spread traffic over servers without some kind of load balancer in between? Routers have a feature called ECMP (equal cost multipath) routing. Its original purpose is to allow traffic to be spread across multiple paths between two locations, but it is commonly repurposed to spread traffic across multiple servers within a data center. In fact, Cloudflare relied on ECMP alone to spread load across servers before we deployed Unimog. ECMP uses a hashing scheme to ensure that packets on a given connection use the same path (Unimog also employs a hashing scheme, so we’ll discuss how this can work in further detail below) . But ECMP is vulnerable to changes in the set of active servers, such as when servers go in and out of service. These changes cause rehashing events, which break connections to all the servers in an ECMP group. Also, routers impose limits on the sizes of ECMP groups, which means that a single ECMP group cannot cover all the servers in our larger edge data centers. Finally, ECMP does not allow us to do dynamic load balancing by adjusting the share of connections going to each server. These drawbacks mean that ECMP alone is not an effective approach.

Ideally, to overcome the drawbacks of ECMP, we could program the router with the appropriate logic to direct connections to servers in the way we want. But although programmable network data planes have been a hot research topic in recent years, commodity routers are still essentially fixed-function devices.

We can work around the limitations of routers by having the router send the packets to some load balancing servers, and then programming those load balancers to forward packets as we want. If the load balancers all act on packets in a consistent way, then it doesn’t matter which load balancer gets which packets from the router (so we can use ECMP to spread packets across the load balancers). That suggests an arrangement like this:

Unimog - Cloudflare’s edge load balancer

And indeed L4LBs are often deployed like this.

Instead, Unimog makes every server into a load balancer. The router can send any packet to any server, and that initial server will forward the packet to the right server for that connection:

Unimog - Cloudflare’s edge load balancer

We have two reasons to favour this arrangement:

First, in our edge network, we avoid specialised roles for servers. We run the same software stack on the servers in our edge network, providing all of our product features, whether DDoS attack prevention, website performance features, Cloudflare Workers, WARP, etc. This uniformity is key to the efficient operation of our edge network: we don’t have to manage how many load balancers we have within each of our data centers, because all of our servers act as load balancers.

The second reason relates to stopping attacks. Cloudflare’s edge network is the target of incessant attacks. Some of these attacks are volumetric - large packet floods which attempt to overwhelm the ability of our data centers to process network traffic from the Internet, and so impact our ability to service legitimate traffic. To successfully mitigate such attacks, it’s important to filter out attack packets as early as possible, minimising the resources they consume. This means that our attack mitigation system needs to occur before the forwarding done by Unimog. That mitigation system is called l4drop, and we’ve written about it before. l4drop and Unimog are closely integrated. Because l4drop runs on all of our servers, and because l4drop comes before Unimog, it’s natural for Unimog to run on all of our servers too.

XDP and xdpd

Unimog implements packet forwarding using a Linux kernel facility called XDP. XDP allows a program to be attached to a network interface, and the program gets run for every packet that arrives, before it is processed by the kernel’s main network stack. The XDP program returns an action code to tell the kernel what to do with the packet:

  • PASS: Pass the packet on to the kernel’s network stack for normal processing.
  • DROP: Drop the packet. This is the basis for l4drop.
  • TX: Transmit the packet back out of the network interface. The XDP program can modify the packet data before transmission. This action is the basis for Unimog forwarding.

XDP programs run within the kernel, making this an efficient approach even at high packet rates. XDP programs are expressed as eBPF bytecode, and run within an in-kernel virtual machine. Upon loading an XDP program, the kernel compiles its eBPF code into machine code. The kernel also verifies the program to check that it does not compromise security or stability. eBPF is not only used in the context of XDP: many recent Linux kernel innovations employ eBPF, as it provides a convenient and efficient way to extend the behaviour of the kernel.

XDP is much more convenient than alternative approaches to packet-level processing, particularly in our context where the servers involved also have many other tasks. We have continued to enhance Unimog since its initial deployment. Our deployment model for new versions of our Unimog XDP code is essentially the same as for userspace services, and we are able to deploy new versions on a weekly basis if needed. Also, established techniques for optimizing the performance of the Linux network stack provide good performance for XDP.

There are two main alternatives for efficient packet-level processing:

  • Kernel-bypass networking (such as DPDK), where a program in userspace manages a network interface (or some part of one) directly without the involvement of the kernel. This approach works best when servers can be dedicated to a network function (due to the need to dedicate processor or network interface hardware resources, and awkward integration with the normal kernel network stack; see our old post about this). But we avoid putting servers in specialised roles. (Github’s open-source GLB uses DPDK, and this is one of the main factors that made GLB unsuitable for us.)
  • Kernel modules, where code is added to the kernel to perform the necessary network functions. The Linux IPVS (IP Virtual Server) subsystem falls into this category. But developing, testing, and deploying kernel modules is cumbersome compared to XDP.

The following diagram shows an overview of our use of XDP. Both l4drop and Unimog are implemented by an XDP program. l4drop matches attack packets, and uses the DROP action to discard them. Unimog forwards packets, using the TX action to resend them. Packets that are not dropped or forwarded pass through to the normal Linux network stack. To support our elaborate use of XPD, we have developed the xdpd daemon which performs the necessary supervisory and support functions for our XDP programs.

Unimog - Cloudflare’s edge load balancer

Rather than a single XDP program, we have a chain of XDP programs that must be run for each packet (l4drop, Unimog, and others we have not covered here). One of the responsibilities of xdpd is to prepare these programs, and to make the appropriate system calls to load them and assemble the full chain.

Our XDP programs come from two sources. Some are developed in a conventional way: engineers write C code, our build system compiles it (with clang) to eBPF ELF files, and our release system deploys those files to our servers. Our Unimog XDP code works like this. In contrast, the l4drop XDP code is dynamically generated by xdpd based on information it receives from attack detection systems.

xdpd has many other duties to support our use of XDP:

  • XDP programs can be supplied with data using data structures called maps. xdpd populates the maps needed by our programs, based on information received from control planes.
  • Programs (for instance, our Unimog XDP program) may depend upon configuration values which are fixed while the program runs, but do not have universal values known at the time their C code was compiled. It would be possible to supply these values to the program via maps, but that would be inefficient (retrieving a value from a map requires a call to a helper function). So instead, xdpd will fix up the eBPF program to insert these constants before it is loaded.
  • Cloudflare carefully monitors the behaviour of all our software systems, and this includes our XDP programs: They emit metrics (via another use of maps), which xdpd exposes to our metrics and alerting system (prometheus).
  • When we deploy a new version of xdpd, it gracefully upgrades in such a way that there is no interruption to the operation of Unimog or l4drop.

Although the XDP programs are written in C, xdpd itself is written in Go. Much of its code is specific to Cloudflare. But in the course of developing xdpd, we have collaborated with Cilium to develop https://github.com/cilium/ebpf, an open source Go library that provides the operations needed by xdpd for manipulating and loading eBPF programs and related objects. We’re also collaborating with the Linux eBPF community to share our experience, and extend the core eBPF technology in ways that make features of xdpd obsolete.

In evaluating the performance of Unimog, our main concern is efficiency: that is, the resources consumed for load balancing relative to the resources used for customer-visible services. Our measurements show that Unimog costs less than 1% of the processor utilization, compared to a scenario where no load balancing is in use. Other L4LBs, intended to be used with servers dedicated to load balancing, may place more emphasis on maximum throughput of packets. Nonetheless, our experience with Unimog and XDP in general indicates that the throughput is more than adequate for our needs, even during large volumetric attacks.

Unimog is not the first L4LB to use XDP. In 2018, Facebook open sourced Katran, their XDP-based L4LB data plane. We considered the possibility of reusing code from Katran. But it would not have been worthwhile: the core C code needed to implement an XDP-based L4LB is relatively modest (about 1000 lines of C, both for Unimog and Katran). Furthermore, we had requirements that were not met by Katran, and we also needed to integrate with existing components and systems at Cloudflare (particularly l4drop). So very little of the code could have been reused as-is.

Encapsulation

As discussed as the start of this post, clients make connections to one of our edge data centers with a destination IP address that can be served by any one of our servers. These addresses that do not correspond to a specific server are known as virtual IPs (VIPs). When our Unimog XDP program forwards a packet destined to a VIP, it must replace that VIP address with the direct IP (DIP) of the appropriate server for the connection, so that when the packet is retransmitted it will reach that server. But it is not sufficient to overwrite the VIP in the packet headers with the DIP, as that would hide the original destination address from the server handling the connection (the original destination address is often needed to correctly handle the connection).

Instead, the packet must be encapsulated: Another set of packet headers is prepended to the packet, so that the original packet becomes the payload in this new packet. The DIP is then used as the destination address in the outer headers, but the addressing information in the headers of the original packet is preserved. The encapsulated packet is then retransmitted. Once it reaches the target server, it must be decapsulated: the outer headers are stripped off to yield the original packet as if it had arrived directly.

Encapsulation is a general concept in computer networking, and is used in a variety of contexts. The headers to be added to the packet by encapsulation are defined by an encapsulation format. Many different encapsulation formats have been defined within the industry, tailored to the requirements in specific contexts. Unimog uses a format called GUE (Generic UDP Encapsulation), in order to allow us to re-use the glb-redirect component from github’s GLB (glb-redirect is discussed below).

GUE is a relatively simple encapsulation format. It encapsulates within a UDP packet, placing a GUE-specific header between the outer IP/UDP headers and the payload packet to allow extension data to be carried (and we’ll see how Unimog takes advantage of this):

Unimog - Cloudflare’s edge load balancer

When an encapsulated packet arrives at a server, the encapsulation process must be reversed. This step is called decapsulation. The headers that were added during the encapsulation process are removed, leaving the original packet to be processed by the network stack as if it had arrived directly from the client.

An issue that can arise with encapsulation is hitting limits on the maximum packet size, because the encapsulation process makes packets larger. The de-facto maximum packet size on the Internet is 1500 bytes, and not coincidentally this is also the maximum packet size on ethernet networks. For Unimog, encapsulating a 1500-byte packet results in a 1536-byte packet. To allow for these enlarged encapsulated packets, we have enabled jumbo frames on the networks inside our data centers, so that the 1500-byte limit only applies to packets headed out to the Internet.

Forwarding logic

So far, we have described the technology used to implement the Unimog load balancer, but not how our Unimog XDP program selects the DIP address when forwarding a packet. This section describes the basic scheme. But as we’ll see, there is a problem, so then we’ll describe how this scheme is elaborated to solve that problem.

In outline, our Unimog XDP program processes each packet in the following way:

  1. Determine whether the packet is destined for a VIP address. Not all of the packets arriving at a server are for VIP addresses. Other packets are passed through for normal handling by the kernel’s network stack. (xdpd obtains the VIP address ranges from the Unimog control plane.)
  2. Determine the DIP for the server handling the packet’s connection.
  3. Encapsulate the packet, and retransmit it to the DIP.

In step 2, note that all the load balancers must act consistently - when forwarding packets, they must all agree about which connections go to which servers. The rate of new connections arriving at a data center is large, so it’s not practical for load balancers to agree by communicating information about connections amongst themselves. Instead L4LBs adopt designs which allow the load balancers to reach consistent forwarding decisions independently. To do this, they rely on hashing schemes: Take the 4-tuple identifying the packet’s connection, put it through a hash function to obtain a key (the hash function ensures that these key values are uniformly distributed), then perform some kind of lookup into a data structure to turn the key into the DIP for the target server.

Unimog uses such a scheme, with a data structure that is simple compared to some other L4LBs. We call this data structure the forwarding table, and it consists of an array where each entry contains a DIP specifying the server target server for the relevant packets (we call these entries buckets). The forwarding table is generated by the Unimog control plane and broadcast to the load balancers (more on this below), so that it has the same contents on all load balancers.

To look up a packet’s key in the forwarding table, the low N bits from the key are used as the index for a bucket (the forwarding table is always a power-of-2 in size):

Unimog - Cloudflare’s edge load balancer

Note that this approach does not provide per-connection control - each bucket typically applies to many connections. All load balancers in a data center use the same forwarding table, so they all forward packets in a consistent manner. This means it doesn’t matter which packets are sent by the router to which servers, and so ECMP re-hashes are a non-issue. And because the forwarding table is immutable and simple in structure, lookups are fast.

Although the above description only discusses a single forwarding table, Unimog supports multiple forwarding tables, each one associated with a trafficset - the traffic destined for a particular service. Ranges of VIP addresses are associated with a trafficset. Each trafficset has its own configuration settings and forwarding tables. This gives us the flexibility to differentiate how Unimog behaves for different services.

Precise load balancing requires the ability to make fine adjustments to the number of connections arriving at each server. So we make the number of buckets in the forwarding table more than 100 times the number of servers. Our data centers can contain hundreds of servers, and so it is normal for a Unimog forwarding table to have tens of thousands of buckets. The DIP for a given server is repeated across many buckets in the forwarding table, and by increasing or decreasing the number of buckets that refer to a server, we can control the share of connections going to that server. Not all buckets will correspond to exactly the same number of connections at a given point in time (the properties of the hash function make this a statistical matter). But experience with Unimog has demonstrated that the relationship between the number of buckets and resulting server load is sufficiently strong to allow for good load balancing.

But as mentioned, there is a problem with this scheme as presented so far. Updating a forwarding table, and changing the DIPs in some buckets, would break connections that hash to those buckets (because packets on those connections would get forwarded to a different server after the update). But one of the requirements for Unimog is to allow us to change which servers get new connections without impacting the existing connections. For example, sometimes we want to drain the connections to a server, maintaining the existing connections to that server but not forwarding new connections to it, in the expectation that many of the existing connections will terminate of their own accord. The next section explains how we fix this scheme to allow such changes.

Maintaining established connections

To make changes to the forwarding table without breaking established connections, Unimog adopts the “daisy chaining” technique described in the paper Stateless Datacenter Load-balancing with Beamer.

To understand how the Beamer technique works, let’s look at what can go wrong when a forwarding table changes: imagine the forwarding table is updated so that a bucket which contained the DIP of server A now refers to server B. A packet that would formerly have been sent to A by the load balancers is now sent to B. If that packet initiates a new connection (it’s a TCP SYN packet), there’s no problem - server B will continue the three-way handshake to complete the new connection. On the other hand, if the packet belongs to a connection established before the change, then the TCP implementation of server B has no matching TCP socket, and so sends a RST back to the client, breaking the connection.

This explanation hints at a solution: the problem occurs when server B receives a forwarded packet that does not match a TCP socket. If we could change its behaviour in this case to forward the packet a second time to the DIP of server A, that would allow the connection to server A to be preserved. For this to work, server B needs to know the DIP for the bucket before the change.

To accomplish this, we extend the forwarding table so that each bucket has two slots, each containing the DIP for a server. The first slot contains the current DIP, which is used by the load balancer to forward packets as discussed (and here we refer to this forwarding as the first hop). The second slot preserves the previous DIP (if any), in order to allow the packet to be forwarded again on a second hop when necessary.

For example, imagine we have a forwarding table that refers to servers A, B, and C, and then it is updated to stop new connections going to server A, but maintaining established connections to server A. This is achieved by replacing server A’s DIP in the first slot of any buckets where it appears, but preserving it in the second slot:

Unimog - Cloudflare’s edge load balancer

In addition to extending the forwarding table, this approach requires a component on each server to forward packets on the second hop when necessary. This diagram shows where this redirector fits into the path a packet can take:

Unimog - Cloudflare’s edge load balancer

The redirector follows some simple logic to decide whether to process a packet locally on the first-hop server or to forward it on the second-hop server:

  • If the packet is a SYN packet, initiating a new connection, then it is always processed by the first-hop server. This ensures that new connections go to the first-hop server.
  • For other packets, the redirector checks whether the packet belongs to a connection with a corresponding TCP socket on the first-hop server. If so, it is processed by that server.
  • Otherwise, the packet has no corresponding TCP socket on the first-hop server. So it is forwarded on to the second-hop server to be processed there (in the expectation that it belongs to some connection established on the second-hop server that we wish to maintain).

In that last step, the redirector needs to know the DIP for the second hop. To avoid the need for the redirector to do forwarding table lookups, the second-hop DIP is placed into the encapsulated packet by the Unimog XDP program (which already does a forwarding table lookup, so it has easy access to this value). This second-hop DIP is carried in a GUE extension header, so that it is readily available to the redirector if it needs to forward the packet again.

This second hop, when necessary, does have a cost. But in our data centers, the fraction of forwarded packets that take the second hop is usually less than 1% (despite the significance of long-lived connections in our context). The result is that the practical overhead of the second hops is modest.

When we initially deployed Unimog, we adopted the glb-redirect iptables module from github’s GLB to serve as the redirector component. In fact, some implementation choices in Unimog, such as the use of GUE, were made in order to facilitate this re-use. glb-redirect worked well for us initially, but subsequently we wanted to enhance the redirector logic. glb-redirect is a custom Linux kernel module, and developing and deploying changes to kernel modules is more difficult for us than for eBPF-based components such as our XDP programs. This is not merely due to Cloudflare having invested more engineering effort in software infrastructure for eBPF; it also results from the more explicit boundary between the kernel and eBPF programs (for example, we are able to run the same eBPF programs on a range of kernel versions without recompilation). We wanted to achieve the same ease of development for the redirector as for our XDP programs.

To that end, we decided to write an eBPF replacement for glb-redirect. While the redirector could be implemented within XDP, like our load balancer, practical concerns led us to implement it as a TC classifier program instead (TC is the traffic control subsystem within the Linux network stack). A downside to XDP is that the packet contents prior to processing by the XDP program are not visible using conventional tools such as tcpdump, complicating debugging. TC classifiers do not have this downside, and in the context of the redirector, which passes most packets through, the performance advantages of XDP would not be significant.

The result is cls-redirect, a redirector implemented as a TC classifier program. We have contributed our cls-redirect code as part of the Linux kernel test suite. In addition to implementing the redirector logic, cls-redirect also implements decapsulation, removing the need to separately configure GUE tunnel endpoints for this purpose.

There are some features suggested in the Beamer paper that Unimog does not implement:

  • Beamer embeds generation numbers in the encapsulated packets to address a potential corner case where a ECMP rehash event occurs at the same time as a forwarding table update is propagating from the control plane to the load balancers. Given the combination of circumstances required for a connection to be impacted by this issue, we believe that in our context the number of affected connections is negligible, and so the added complexity of the generation numbers is not worthwhile.
  • In the Beamer paper, the concept of daisy-chaining encompasses third hops etc. to preserve connections across a series of changes to a bucket. Unimog only uses two hops (the first and second hops above), so in general it can only preserve connections across a single update to a bucket. But our experience is that even with only two hops, a careful strategy for updating the forwarding tables permits connection lifetimes of days.

To elaborate on this second point: when the control plane is updating the forwarding table, it often has some choice in which buckets to change, depending on the event that led to the update. For example, if a server is being brought into service, then some buckets must be assigned to it (by placing the DIP for the new server in the first slot of the bucket). But there is a choice about which buckets. A strategy of choosing the least-recently modified buckets will tend to minimise the impact to connections.

Furthermore, when updating the forwarding table to adjust the balance of load between servers, Unimog often uses a novel trick: due to the redirector logic, exchanging the first-hop and second-hop DIPs for a bucket only affects which server receives new connections for that bucket, and never impacts any established connections. Unimog is able to achieve load balancing in our edge data centers largely through forwarding table changes of this type.

Control plane

So far, we have discussed the Unimog data plane - the part that processes network packets. But much of the development effort on Unimog has been devoted to the control plane - the part that generates the forwarding tables used by the data plane. In order to correctly maintain the forwarding tables, the control plane consumes information from multiple sources:

  • Server information: Unimog needs to know the set of servers present in a data center, some key information about each one (such as their DIP addresses), and their operational status. It also needs signals about transitional states, such as when a server is being withdrawn from service, in order to gracefully drain connections (preventing the server from receiving new connections, while maintaining its established connections).
  • Health: Unimog should only send connections to servers that are able to correctly handle those connections, otherwise those servers should be removed from the forwarding tables. To ensure this, it needs health information at the node level (indicating that a server is available) and at the service level (indicating that a service is functioning normally on a server).
  • Load: in order to balance load, Unimog needs information about the resource utilization on each server.
  • IP address information: Cloudflare serves hundreds of thousands of IPv4 addresses, and these are something that we have to treat as a dynamic resource rather than something statically configured.

The control plane is implemented by a process called the conductor. In each of our edge data centers, there is one active conductor, but there are also standby instances that will take over if the active instance goes away.

We use Hashicorp’s Consul in a number of ways in the Unimog control plane (we have an independent Consul server cluster in each data center):

  • Consul provides a key-value store, with support for blocking queries so that changes to values can be received promptly. We use this to propagate the forwarding tables and VIP address information from the conductor to xdpd on the servers.
  • Consul provides server- and service-level health checks. We use this as the source of health information for Unimog.
  • The conductor stores its state in the Consul KV store, and uses Consul’s distributed locks to ensure that only one conductor instance is active.

The conductor obtains server load information from Prometheus, which we already use for metrics throughout our systems. It balances the load across the servers using a control loop, periodically adjusting the forwarding tables to send more connections to underloaded servers and less connections to overloaded servers. The load for a server is defined by a Prometheus metric expression which measures processor utilization (with some intricacies to better handle characteristics of our workloads). The determination of whether a server is underloaded or overloaded is based on comparison with the average value of the load metric, and the adjustments made to the forwarding table are proportional to the deviation from the average. So the result of the feedback loop is that the load metric for all servers converges on the average.

Finally, the conductor queries internal Cloudflare APIs to obtain the necessary information on servers and addresses.

Unimog - Cloudflare’s edge load balancer

Unimog is a critical system: incorrect, poorly adjusted or stale forwarding tables could cause incoming network traffic to a data center to be dropped, or servers to be overloaded, to the point that a data center would have to be removed from service. To maintain a high quality of service and minimise the overhead of managing our many edge data centers, we have to be able to upgrade all components. So to the greatest extent possible, all components are able to tolerate brief absences of the other components without any impact to service. In some cases this is possible through careful design. In other cases, it requires explicit handling. For example, we have found that Consul can temporarily report inaccurate health information for a server and its services when the Consul agent on that server is restarted (for example, in order to upgrade Consul). So we implemented the necessary logic in the conductor to detect and disregard these transient health changes.

Unimog also forms a complex system with feedback loops: The conductor reacts to its observations of behaviour of the servers, and the servers react to the control information they receive from the conductor. This can lead to behaviours of the overall system that are hard to anticipate or test for. For instance, not long after we deployed Unimog we encountered surprising behaviour when data centers became overloaded. This is of course a scenario that we strive to avoid, and we have automated systems to remove traffic from overloaded data centers if it does. But if a data center became sufficiently overloaded, then health information from its servers would indicate that many servers were degraded to the point that Unimog would stop sending new connections to those servers. Under normal circumstances, this is the correct reaction to a degraded server. But if enough servers become degraded, diverting new connections to other servers would mean those servers became degraded, while the original servers were able to recover. So it was possible for a data center that became temporarily overloaded to get stuck in a state where servers oscillated between healthy and degraded, even after the level of demand on the data center had returned to normal. To correct this issue, the conductor now has logic to distinguish between isolated degraded servers and such data center-wide problems. We have continued to improve Unimog in response to operational experience, ensuring that it behaves in a predictable manner over a wide range of conditions.

UDP Support

So far, we have described Unimog’s support for directing TCP connections. But Unimog also supports UDP traffic. UDP does not have explicit connections between clients and servers, so how it works depends upon how the UDP application exchanges packets between the client and server. There are a few cases of interest:

Request-response UDP applications

Some applications, such as DNS, use a simple request-response pattern: the client sends a request packet to the server, and expects a response packet in return. Here, there is nothing corresponding to a connection (the client only sends a single packet, so there is no requirement to make sure that multiple packets arrive at the same server). But Unimog can still provide value by spreading the requests across our servers.

To cater to this case, Unimog operates as described in previous sections, hashing the 4-tuple from the packet headers (the source and destination IP addresses and ports). But the Beamer daisy-chaining technique that allows connections to be maintained does not apply here, and so the buckets in the forwarding table only have a single slot.

UDP applications with flows

Some UDP applications have long-lived flows of packets between the client and server. Like TCP connections, these flows are identified by the 4-tuple. It is necessary that such flows go to the same server (even when Cloudflare is just passing a flow through to the origin server, it is convenient for detecting and mitigating certain kinds of attack to have that flow pass through a single server within one of Cloudflare’s data centers).

It's possible to treat these flows by hashing the 4-tuple, skipping the Beamer daisy-chaining technique as for request-response applications. But then adding servers will cause some flows to change servers (this would effectively be a form of consistent hashing). For UDP applications, we can’t say in general what impact this has, as we can for TCP connections. But it’s possible that it causes some disruption, so it would be nice to avoid this.

So Unimog adapts the daisy-chaining technique to apply it to UDP flows. The outline remains similar to that for TCP: the same redirector component on each server decides whether to send a packet on a second hop. But UDP does not have anything corresponding to TCP’s SYN packet that indicates a new connection. So for UDP, the part that depends on SYNs is removed, and the logic applied for each packet becomes:

  • The redirector checks whether the packet belongs to a connection with a corresponding UDP socket on the first-hop server. If so, it is processed by that server.
  • Otherwise, the packet has no corresponding TCP socket on the first-hop server. So it is forwarded on to the second-hop server to be processed there (in the expectation that it belongs to some flow established on the second-hop server that we wish to maintain).

Although the change compared to the TCP logic is not large, it has the effect of switching the roles of the first- and second-hop servers: For UDP, new flows go to the second-hop server. The Unimog control plane has to take account of this when it updates a forwarding table. When it introduces a server into a bucket, that server should receive new connections or flows. For a TCP trafficset, this means it becomes the first-hop server. For UDP trafficset, it must become the second-hop server.

This difference between handling of TCP and UDP also leads to higher overheads for UDP. In the case of TCP, as new connections are formed and old connections terminate over time, fewer packets will require the second hop, and so the overhead tends to diminish. But with UDP, new connections always involve the second hop. This is why we differentiate the two cases, taking advantage of SYN packets in the TCP case.

The UDP logic also places a requirement on services. The redirector must be able to match packets to the corresponding sockets on a server according to their 4-tuple. This is not a problem in the TCP case, because all TCP connections are represented by connected sockets in the BSD sockets API (these sockets are obtained from an accept system call, so that they have a local address and a peer address, determining the 4-tuple). But for UDP, unconnected sockets (lacking a declared peer address) can be used to send and receive packets. So some UDP services only use unconnected sockets. For the redirector logic above to work, services must create connected UDP sockets in order to expose their flows to the redirector.

UDP applications with sessions

Some UDP-based protocols have explicit sessions, with a session identifier in each packet. Session identifiers allow sessions to persist even if the 4-tuple changes. This happens in mobility scenarios - for example, if a mobile device passes from a WiFi to a cellular network, causing its IP address to change. An example of a UDP-based protocol with session identifiers is QUIC (which calls them connection IDs).

Our Unimog XDP program allows a flow dissector to be configured for different trafficsets. The flow dissector is the part of the code that is responsible for taking a packet and extracting the value that identifies the flow or connection (this value is then hashed and used for the lookup into the forwarding table). For TCP and UDP, there are default flow dissectors that extract the 4-tuple. But specialised flow dissectors can be added to handle UDP-based protocols.

We have used this functionality in our WARP product. We extended the Wireguard protocol used by WARP in a backwards-compatible way to include a session identifier, and added a flow dissector to Unimog to exploit it. There are more details in our post on the technical challenges of WARP.

Conclusion

Unimog has been deployed to all of Cloudflare’s edge data centers for over a year, and it has become essential to our operations. Throughout that time, we have continued to enhance Unimog (many of the features described here were not present when it was first deployed). So the ease of developing and deploying changes, due to XDP and xdpd, has been a significant benefit. Today we continue to extend it, to support more services, and to help us manage our traffic and the load on our servers in more contexts.

Categories: Technology

Rendering React on the Edge with Flareact and Cloudflare Workers

Thu, 03/09/2020 - 13:00
Rendering React on the Edge with Flareact and Cloudflare Workers

The following is a guest post from Josh Larson, Engineer at Vox Media.

Imagine you’re the maintainer of a high-traffic media website, and your DNS is already hosted on Cloudflare.

Page speed is critical. You need to get content to your audience as quickly as possible on every device. You also need to render ads in a speedy way to maintain a good user experience and make money to support your journalism.

One solution would be to render your site statically and cache it at the edge. This would help ensure you have top-notch delivery speed because you don’t need a server to return a response. However, your site has decades worth of content. If you wanted to make even a small change to the site design, you would need to regenerate every single page during your next deploy. This would take ages.

Another issue is that your site would be static — and future updates to content or new articles would not be available until you deploy again.

That’s not going to work.

Another solution would be to render each page dynamically on your server. This ensures you can return a dynamic response for new or updated articles.

However, you’re going to need to pay for some beefy servers to be able to handle spikes in traffic and respond to requests in a timely manner. You’ll also probably need to implement a system of internal caches to optimize the performance of your app, which could lead to a more complicated development experience. That also means you’ll be at risk of a thundering herd problem if, for any reason, your cache becomes invalidated.

Neither of these solutions are great, and you’re forced to make a tradeoff between one of these two approaches.

Thankfully, you’ve recently come across a project like Next.js which offers a hybrid approach: static-site generation along with incremental regeneration. You’re in love with the patterns and developer experience in Next.js, but you’d also love to take advantage of the Cloudflare Workers platform to host your site.

Cloudflare Workers allow you to run your code on the edge quickly, efficiently and at scale. Instead of paying for a server to host your code, you can host it directly inside the datacenter — reducing the number of network trips required to load your application. In a perfect world, we wouldn’t need to find hosting for a Next.js site, because Cloudflare offers the same JavaScript hosting functionality with the Workers platform. With their dynamic runtime and edge caching capabilities, we wouldn’t need to worry about making a tradeoff between static and dynamic for our site.

Unfortunately, frameworks like Next.js and Cloudflare Workers don’t mesh together particularly well due to technical constraints. Until now:

I’m excited to announce Flareact, a new open-source React framework built for Cloudflare Workers.Rendering React on the Edge with Flareact and Cloudflare Workers

With Flareact, you don’t need to make the tradeoff between a static site and a dynamic application.

Flareact allows you to render your React apps at the edge rather than on the server. It is modeled after Next.js, which means it supports file-based page routing, dynamic page paths and edge-side data fetching APIs.

Not only are Flareact pages rendered at the edge — they’re also cached at the edge using the Cache API. This allows you to provide a dynamic content source for your app without worrying about traffic spikes or response times.

With no servers or origins to deal with, your site is instantly available to your audience. Cloudflare Workers gives you a 0ms cold start and responses from the edge within milliseconds.

You can check out the docs and get started now by clicking the button below:

Rendering React on the Edge with Flareact and Cloudflare Workers

To get started manually, install the latest wrangler, and use the handy wrangler generate command below to create your first project:

npm i @cloudflare/wrangler -g wrangler generate my-project https://github.com/flareact/flareact-templateWhat’s the big deal?

Hosting React apps on Cloudflare Workers Sites is not a new concept. In fact, you’ve always been able to deploy a create-react-app project to Workers Sites in addition to static versions of other frameworks like Gatsby and Next.js.

However, Flareact renders your React application at the edge. This allows you to provide an initial server response with HTML markup — which can be helpful for search engine crawlers. You can also cache the response at the edge and optionally invalidate that cache on a timed basis — meaning your static markup will be regenerated if you need it to be fresh.

This isn’t a new pattern: Next.js has done the hard work in defining the shape of this API with SSG support and Incremental Static Regeneration. While there are nuanced differences in the implementation between Flareact and Next.js, they serve a similar purpose: to get your application to your end-user in the quickest and most-scalable way possible.

A focus on developer experience

A magical developer experience is a crucial ingredient to any successful product.

As a longtime fan and user of Next.js, I wanted to experiment with running the framework on Cloudflare Workers. However, Next.js and its APIs are framed around the Node.js HTTP Server API, while Cloudflare Workers use V8 isolates and are modeled after the FetchEvent type.

Since we don’t have typical access to a filesystem inside V8 isolates, it’s tough to mimic the environment required to run a dynamic Next.js server at the edge. Though projects like Fab have come up with workarounds, I decided to approach the project with a clean slate and use existing patterns established in Next.js in a brand-new framework.

As a developer, I absolutely love the simplicity of exporting an asynchronous function from my page to have it supply props to the component. Flareact implements this pattern by allowing you to export a getEdgeProps function. This is similar to getStaticProps in Next.js, and it matches the expected return shape of that function in Next.js — including a revalidate parameter. Learn more about data fetching in Flareact.

I was also inspired by the API Routes feature of Next.js when I implemented the API Routes feature of Flareact — enabling you to write standard Cloudflare Worker scripts directly within your React app.

I hope porting over an existing Next.js project to Flareact is a breeze!

How it works

When a FetchEvent request comes in, Flareact inspects the URL pathname to decide how to handle it:

If the request is for a page or for page props, it checks the cache for that request and returns it if there’s a hit. If there is a cache miss, it generates the page request or props function, stores the result in the cache, and returns the response.

If the request is for an API route, it sends the entire FetchEvent along to the user-defined API function, allowing the user to respond as they see fit.

Rendering React on the Edge with Flareact and Cloudflare Workers

If you want your cached page to be revalidated after a certain amount of time, you can return an additional revalidate property from getEdgeProps(). This instructs Flareact to cache the endpoint for that number of seconds before generating a new response.

Rendering React on the Edge with Flareact and Cloudflare Workers

Finally, if the request is for a static asset, it returns it directly from the Workers KV.

The Worker

The core responsibilities of the Worker — or in a traditional SSR framework, the server are to:

  1. Render the initial React page component into static HTML markup.
  2. Provide the initial page props as a JSON object, embedded into the static markup in a script tag.
  3. Load the client-side JavaScript bundles and stylesheets necessary to render the interactive page.

One challenge with building Flareact is that the Webpack targets the webworker output rather than the node output. This makes it difficult to inform the worker which pages exist in the filesystem, since there is no access to the filesystem.

To get around this, Flareact leverages require.context, a Webpack-specific API, to inspect the project and build a manifest of pages on the client and the worker. I’d love to replace this with a smarter bundling strategy on the client-side eventually.

The Client

In addition to handling incoming Worker requests, Flareact compiles a client bundle containing the code necessary for routing, data fetching and more from the browser.

The core responsibilities of the client are to:

  1. Listen for routing events
  2. Fetch the necessary page component and its props from the worker over AJAX

Building a client router from scratch has been a challenge. It listens for changes to the internal route state, updates the URL pathname with pushState, makes an AJAX request to the worker for the page props, and then updates the current component in the render tree with the requested page.

It was fun building a flareact/link component similar to next/link:

import Link from "flareact/link"; export default function Index() { return ( <div> <Link href="/about"> <a>Go to About</a> </Link> </div> ); }

I also set out to build a custom version of next/head for Flareact. As it turns out, this was non-trivial! With lots of interesting stuff going on behind the scenes to support SSR and client-side routing events, I decided to make flareact/head a simple wrapper around react-helmet instead:

import Head from "flareact/head"; export default function Index() { return ( <div> <Head> <title>My page title</title> </Head> <h1>Hello, world.</h1> </div> ); } Local Development

The local developer experience of Flareact leverages the new wrangler dev command, sending server requests through a local tunnel to the Cloudflare edge and back to your machine.

This is a huge win for productivity, since you don’t need to manually build and deploy your application to see how it will perform in a production environment.

It’s also a really exciting update to the serverless toolchain. Running a robust development environment in a serverless world has always been a challenge, since your code is executing in a non-traditional context. Tunneling local code to the edge and back is such a great addition to Cloudflare’s developer experience.

Use cases

Flareact is a great candidate for a lot of Jamstack-adjacent applications, like blogs or static marketing sites.

It could also be used for more dynamic applications, with robust API functions and authentication mechanisms — all implemented using Cloudflare Workers.

Imagine building a high-traffic e-commerce site with Flareact, where both site reliability and dynamic rendering for things like price changes and stock availability are crucial.

There are also untold possibilities for integrating the Workers KV into your edge props or API functions as a first-class database solution. No need to reach for an externally-hosted database!

While the project is still in its early days, here are a couple real-world examples:

The road ahead

I have to be honest: creating a server-side rendered React framework with little prior knowledge was very difficult. There’s still a ton to learn, and Flareact has a long way to go to reach parity with Next.js in the areas of optimization and production-readiness.

Here’s what I’m hoping to add to Flareact in the near future:

  • Smarter client bundling and Webpack chunks to reduce individual page weight
  • A more feature-complete client-side router
  • The ability to extend and customize the root document of the app
  • Support for more style frameworks (CSS-in-JS, Sass, CSS modules, etc)
  • A more stable development environment
  • Documentation and support for environment variables, secrets and KV namespaces
  • A guide for deploying from GitHub Actions and other CI tools

If the project sounds interesting to you, be sure to check out the source code on GitHub. Contributors are welcome!

Categories: Technology

Two clicks to add region-based Zero Trust compliance

Tue, 01/09/2020 - 15:51
Two clicks to add region-based Zero Trust compliance

Your team members are probably not just working from home - they may be working from different regions or countries. The flexibility of remote work gives employees a chance to work from the towns where they grew up or countries they always wanted to visit. However, that distribution also presents compliance challenges.

Depending on your industry, keeping data inside of certain regions can be a compliance or regulatory requirement. You might require employees to connect from certain countries or exclude entire countries altogether from your corporate systems.

When we worked in physical offices, keeping data inside of a country was easy. All of your users connecting to an application from that office were, of course, in that country. Remote work changed that and teams had to scramble to find a way to keep people productive from anywhere, which often led to sacrifices in terms of compliance. Starting today, you can make geography-based compliance easy again in Cloudflare Access with just two clicks.

You can now build rules that require employees to connect from certain countries. You can also add rules that block team members from connecting from other countries. This feature works with any identity provider configured and requires no other changes for your users or administrators.

What is Cloudflare Access?

Cloudflare Access secures applications by applying Zero Trust enforcement to every request. Rather than trusting anyone on a private network, Access checks for identity any time someone attempts to reach an application. With Cloudflare’s global network, that check takes place in a data center in over 200 cities around the world to avoid compromising performance.

Behind the scenes, administrators build rules to decide who should be able to reach the tools protected by Access. In turn, when users need to connect to those tools, they are prompted to authenticate with one of the identity provider options. Cloudflare Access checks their login against the list of allowed users and, if permitted, allows the request to proceed.

Two clicks to add region-based Zero Trust compliance

Cloudflare Access can check more than just their username. As a Zero Trust platform, Access aggregates multiple sources of signal about a user and surfaces those to the administrator. Some signals include if the user authenticated with a mutual TLS client certificate or hard key. However, some organizations also have compliance requirements that center around region, in addition to multifactor authentication.

Allow some countries, exclude others

You can build Cloudflare Access rules to be as simple as only allow team members with @team.com email addresses. However, usernames and passwords alone are not always sufficient. Depending on where you operate, or where you need to operate, you can use Cloudflare Access to layer country-specific rules on top of your identity provider workflows.

With this release, you can now add rules that require users to connect from certain countries or restrict logins from other countries. For example, you can require that users only connect from Portugal.

Two clicks to add region-based Zero Trust compliance

You can also exclude countries altogether. Cloudflare does not have an office in Costa Rica, a place I know many of us would love to visit. If a member of the team was on a beach vacation there and I wanted to make sure they really unplugged from work, we could add a rule to block logins to our applications from Costa Rica.

Two clicks to add region-based Zero Trust compliance

Some applications might not need country-specific requirements. Cloudflare Access rules can be configured on an application-by-application basis. You can add rules about country connections to specific applications that contain sensitive information, while limiting others to just identity.

Audit logins by country and user

Cloudflare Access captures every request a user makes to an internal application, without the need for any code changes. Your organization can export these logs to a third-party storage or SIEM solution to audit the country of origin for each user request. With that data, your compliance and security teams can quickly audit where your corporate devices are operating without the need to deploy additional client-side software.

Layer with other Zero Trust rules

Zero trust security starts with a username. Administrators build rules to determine which users can reach specific applications. Cloudflare Access integrates with your team’s identity provider, or even multiple identity providers, to make those username-based decisions at the edge of our network.

However, identity consists of more than just a username. Cloudflare Access can aggregate multiple sources of signal in Cloudflare's network. Access can use that information to make a decision about identity in our network - long before that request ever reaches your infrastructure.

You can combine user rules with mutual TLS requirements, or device posture checks, and even force logins to always use a hard key. All of these zero-trust rules run inline with Cloudflare’s existing security features, like our WAF and DDoS mitigation, to add layers of security to every request. The Cloudflare network gives your team a zero-trust platform to apply all of the data we can gather about a request to determine whether or not it should be allowed.

The country rules we’re announcing today become another layer in that zero trust model. Like other sources of signal, you can combine these rules to build a comprehensive policy tailored to your organization’s compliance or security needs. For example, you can build a rule that only allows users to login to your application when they connect from Germany and use a physical hard key.

Two clicks to add region-based Zero Trust complianceHow to get started

To get started, navigate to an application you have added to Cloudflare Access or create a new one. Cloudflare Access policies consist of actions that can allow, block, or bypass requests based on the criteria defined. Access follows policies in order of precedence from top to bottom in the UI.

Inside of a policy you can define the criteria with three types of operators:

  • Include: Include rules function like OR operators. Users must meet at least one criterion in an Include rule. For example, an include rule can be constructed to allow anyone with @cloudflare.com email domains or user@team.com email domains to connect.
  • Require: Require rules function like AND operators. Users must meet all Require rule criteria.
  • Exclude: Exclusion rules function like "NOT" operators. Users must not meet the criterion of an Exclude rule.

To require that users connect from a particular country, create an Allow policy that includes your users email or identity provider group. Within that Allow policy, add a Require rule and choose the country that will be required. If you want to create a rule that requires multiple countries, you can add them into an Access Group.

Two clicks to add region-based Zero Trust compliance

You can then add that group into the Require rule.

Two clicks to add region-based Zero Trust complianceWhat’s next?

Cloudflare Access, part of Cloudflare for Teams, is available today. The country requirement rule is available in all plans.You can follow the documentation here to add the additional rule.

Categories: Technology

Analysis of Today's CenturyLink/Level(3) Outage

Sun, 30/08/2020 - 21:55
Analysis of Today's CenturyLink/Level(3) Outage

Today CenturyLink/Level(3), a major ISP and Internet bandwidth provider, experienced a significant outage that impacted some of Cloudflare’s customers as well as a significant number of other services and providers across the Internet. While we’re waiting for a post mortem from CenturyLink/Level(3), I wanted to write up the timeline of what we saw, how Cloudflare’s systems routed around the problem, why some of our customers were still impacted in spite of our mitigations, and what appears to be the likely root cause of the issue.

Increase In Errors

At 10:03 UTC our monitoring systems started to observe an increased number of errors reaching our customers’ origin servers. These show up as “522 Errors” and indicate that there is an issue connecting from Cloudflare’s network to wherever our customers’ applications are hosted.

Cloudflare is connected to CenturyLink/Level(3) among a large and diverse set of network providers. When we see an increase in errors from one network provider, our systems automatically attempt to reach customers’ applications across alternative providers. Given the number of providers we have access to, we are generally able to continue to route traffic even when one provider has an issue.

Analysis of Today's CenturyLink/Level(3) OutageThe diverse set of network providers Cloudflare connects to. Source: https://bgp.he.net/AS13335#_asinfo‌‌Automatic Mitigations

In this case, beginning within seconds of the increase in 522 errors, our systems automatically rerouted traffic from CenturyLink/Level(3) to alternate network providers we connect to including Cogent, NTT, GTT, Telia, and Tata.

Our Network Operations Center was also alerted and our team began taking additional steps to mitigate any issues our automated systems weren’t automatically able to address beginning at 10:09 UTC. We were successful in keeping traffic flowing across our network for most customers and end users even with the loss of CenturyLink/Level(3) as one of our network providers.

Analysis of Today's CenturyLink/Level(3) OutageDashboard Cloudflare’s automated systems recognizing the damage to the Internet caused by the CenturyLink/Level(3) failure and automatically routing around it.

The graph below shows traffic between Cloudflare’s network and six major tier-1 networks that are among the network providers we connect to. The red portion shows CenturyLink/Level(3) traffic, which dropped to near-zero during the incident. You can also see how we automatically shifted traffic to other network providers during the incident to mitigate the impact and ensure traffic continued to flow.

Analysis of Today's CenturyLink/Level(3) OutageTraffic across six major tier-1 networks that are among the network providers Cloudflare connects to. CenturyLink/Level(3) in red.

The following graph shows 522 errors (indicating our inability to reach customers’ applications) across our network during the time of the incident.

Analysis of Today's CenturyLink/Level(3) Outage

The sharp spike up at 10:03 UTC was the CenturyLink/Level(3) network failing. Our automated systems immediately kicked in to attempt to reroute and rebalance traffic across alternative network providers, causing the errors to drop in half immediately and then fall to approximately 25 percent of the peak as those paths were automatically optimized.

Between 10:03 UTC and 10:11 UTC our systems automatically disabled CenturyLink/Level(3) in the 48 cities where we’re connected to them and rerouted traffic across alternate network providers. Our systems take into account capacity on other providers before shifting out traffic in order to prevent cascading failures. This is why the failover, while automatic, isn’t instantaneous in all locations. Our team was able to apply additional manual mitigations to reduce the number of errors another 5 percent.

Why Did the Errors Not Drop to Zero?

Unfortunately, there were still an elevated number of errors indicating we were still unable to reach some customers. CenturyLink/Level(3) is among the largest network providers in the world. As a result, many hosting providers only have single-homed connectivity to the Internet through their network.

To use the old Internet as a “superhighway” analogy, that’s like only having a single offramp to a town. If the offramp is blocked, then there’s no way to reach the town. This was exacerbated in some cases because CenturyLink/Level(3)’s network was not honoring route withdrawals and continued to advertise routes to networks like Cloudflare’s even after they’d been withdrawn. In the case of customers whose only connectivity to the Internet is via CenturyLink/Level(3), or if CenturyLink/Level(3) continued to announce bad routes after they'd been withdrawn, there was no way for us to reach their applications and they continued to see 522 errors until CenturyLink/Level(3) resolved their issue around 14:30 UTC.

The same was a problem on the other (“eyeball”) side of the network. Individuals need to have an onramp onto the Internet’s superhighway. An onramp to the Internet is essentially what your ISP provides. CenturyLink is one of the largest ISPs in the United States.

Analysis of Today's CenturyLink/Level(3) OutageSource: https://broadbandnow.com/CenturyLink

Because this outage appeared to take all of the CenturyLink/Level(3) network offline, individuals who are CenturyLink customers would not have been able to reach Cloudflare or any other Internet provider until the issue was resolved. Globally, we saw a 3.5% drop in global traffic during the outage, nearly all of which was due to a nearly complete outage of CenturyLink’s ISP service across the United States.

So What Likely Happened Here?

While we will not know exactly what happened until CenturyLink/Level(3) issue a post mortem, we can see clues from BGP announcements and how they propagated across the Internet during the outage. BGP is the Border Gateway Protocol. It is how routers on the Internet announce to each other what IPs sit behind them and therefore what traffic they should receive.

Starting at 10:04 UTC, there were a significant number of BGP updates. A BGP update is the signal a router makes to say that a route has changed or is no longer available. Under normal conditions, the Internet sees about 1.5MBs – 2MBs of BGP updates every 15 minutes. At the start of the incident, the number of BGP updates spiked to more than 26MBs of BGP updates per 15 minute period and stayed elevated throughout the incident.

Analysis of Today's CenturyLink/Level(3) OutageSource: http://archive.routeviews.org/bgpdata/2020.08/UPDATES/

These updates show the instability of BGP routes inside the CenturyLink/Level(3) backbone. The question is what would have caused this instability. The CenturyLink/Level(3) status update offers some hints and points at a flowspec update as the root cause.

Analysis of Today's CenturyLink/Level(3) OutageWhat’s Flowspec?

In CenturyLink/Level(3)’s update they mention that a bad Flowspec rule caused the issue. So what is Flowspec? Flowspec is an extension to BGP, which allows firewall rules to be easily distributed across a network, or even between networks, using BGP. Flowspec is a powerful tool. It allows you to efficiently push rules across an entire network almost instantly. It is great when you are trying to quickly respond to something like an attack, but it can be dangerous if you make a mistake.

At Cloudflare, early in our history, we used to use Flowspec ourselves to push out firewall rules in order to, for instance, mitigate large network-layer DDoS attacks. We suffered our own Flowspec-induced outage more than 7 years ago. We no longer use Flowspec ourselves, but it remains a common protocol for pushing out network firewall rules.

We can only speculate what happened at CenturyLink/Level(3), but one plausible scenario is that they issued a Flowspec command to try to block an attack or other abuse directed at their network. The status report indicates that the Flowspec rule prevented BGP itself from being announced. We have no way of knowing what that Flowspec rule was, but here’s one in Juniper’s format that would have blocked all BGP communications across their network.

route DISCARD-BGP { match { protocol tcp; destination-port 179; } then discard; } Why So Many Updates?

A mystery remains, however, why global BGP updates stayed elevated throughout the incident. If the rule blocked BGP then you would expect to see an increase in BGP announcements initially and then they would fall back to normal.

One possible explanation is that the offending Flowspec rule came near the end of a long list of BGP updates. If that were the case, what may have happened is that every router in CenturyLink/Level(3)’s network would receive the Flowspec rule. They would then block BGP. That would cause them to stop receiving the rule. They would start back up again, working their way through all the BGP rules until they got to the offending Flowspec rule again. BGP would be dropped again. The Flowspec rule would no longer be received. And the loop would continue, over and over.

One challenge of this is that on every cycle, the queue of BGP updates would continue to increase within CenturyLink/Level(3)’s network. This may have gotten to a point where the memory and CPU of their routers was overloaded, causing an additional set of challenges to getting their network back online.

Why Did It Take So Long to Fix?

This was a significant global Internet outage and, undoubtedly, the CenturyLink/Level(3) team received immediate alerts. They are a very sophisticated network operator with a world class Network Operations Center (NOC). So why did it take more than four hours to resolve?

Again, we can only speculate. First, it may have been that the Flowspec rule and the significant load that large number of BGP updates imposed on their routers made it difficult for them to login to their own interfaces. Several of the other tier-1 providers took action, it appears at CenturyLink/Level(3)’s request, to de-peer their networks. This would have limited the number of BGP announcements being received by the CenturyLink/Level(3) network and helped give it time to catch up.

Second, it also may have been that the Flowspec rule was not issued by CenturyLink/Level(3) themselves but rather by one of their customers. Many network providers will allow Flowspec peering. This can be a powerful tool for downstream customers wishing to block attack traffic, but can make it much more difficult to track down an offending Flowspec rule when something goes wrong.

Finally, it never helps when these issues occur early on a Sunday morning. Networks the size and scale of CenturyLink/Level(3)’s are extremely complicated. Incidents happen. We appreciate their team keeping us informed with what was going on throughout the incident. #hugops

Categories: Technology

Asynchronous HTMLRewriter for Cloudflare Workers

Fri, 28/08/2020 - 12:00
Asynchronous HTMLRewriter for Cloudflare WorkersAsynchronous HTMLRewriter for Cloudflare Workers

Last year, we launched HTMLRewriter for Cloudflare Workers, which enables developers to make streaming changes to HTML on the edge. Unlike a traditional DOM parser that loads the entire HTML document into memory, we developed a streaming parser written in Rust. Today, we’re announcing support for asynchronous handlers in HTMLRewriter. Now you can perform asynchronous tasks based on the content of the HTML document: from prefetching fonts and image assets to fetching user-specific content from a CMS.

How can I use HTMLRewriter?

We designed HTMLRewriter to have a jQuery-like experience. First, you define a handler, then you assign it to a CSS selector; Workers does the rest for you. You can look at our new and improved documentation to see our supported list of selectors, which now include nth-child selectors. The example below changes the alternative text for every second image in a document.

async function editHtml(request) { return new HTMLRewriter() .on("img:nth-child(2)", new ElementHandler()) .transform(await fetch(request)) } class ElementHandler { element(e) { e.setAttribute("alt", "A very interesting image") } }

Since these changes are applied using streams, we maintain a low TTFB (time to first byte) and users never know the HTML was transformed. If you’re interested in how we’re able to accomplish this technically, you can read our blog post about HTML parsing.

What’s new with HTMLRewriter?

Now you can define an async handler which allows any code that uses await. This means you can make dynamic HTML injection, based on the contents of the document, without having prior knowledge of what it contains. This allows you to customize HTML based on a particular user, feature flag, or even an integration with a CMS.

class UserCustomizer { // Remember to add the `async` keyword to the handler method async element(e) { const user = await fetch(`https://my.api.com/user/${e.getAttribute("user-id")}/online`) if (user.ok) { // Add the user’s name to the element e.setAttribute("user-name", await user.text()) } else { // Remove the element, since this user not online e.remove() } } } What can I build with HTMLRewriter?

To illustrate the flexibility of HTMLRewriter, I wrote an example that you can deploy on your own website. If you manage a website, you know that old links and images can expire with time. Here’s an excerpt from a years’ old post I wrote on the Cloudflare Blog:

Asynchronous HTMLRewriter for Cloudflare Workers

As you might see, that missing image is not the prettiest sight. However, we can easily fix this using async handlers in HTMLRewriter. Using a service like the Internet Archive API, we can check if an image no longer exists and rewrite the URL to use the latest archive. That means users don’t see an ugly placeholder and won’t even know the image was replaced.

async function fetchAndFixImages(request) { return new HTMLRewriter() .on("img", new ImageFixer()) .transform(await fetch(request)) } class ImageFixer { async element(e) { var url = e.getAttribute("src") var response = await fetch(url) if (!response.ok) { var archive = await fetch(`https://archive.org/wayback/available?url=${url}`) if (archive.ok) { var snapshot = await archive.json() e.setAttribute("src", snapshot.archived_snapshots.closest.url) } else { e.remove() } } } }

Using the Workers Playground, you can view a working sample of the above code. A more complex example could even alert a service like Sentry when a missing image is detected. Using the previous missing image, now you can see the image is restored and users are none of the wiser.

Asynchronous HTMLRewriter for Cloudflare Workers

If you’re interested in deploying this to your own website, click on the button below:

Asynchronous HTMLRewriter for Cloudflare Workers

What else can I build with HTMLRewriter?

We’ve been blown away by developer projects using HTMLRewriter. Here are a few projects that caught our eye and are great examples of the power of Cloudflare Workers and HTMLRewriter:

If you’re interested in using HTMLRewriter, check out our documentation. Also be sure to share any creations you’ve made with @CloudflareDev, we love looking at the awesome projects you build.

Categories: Technology

How Argo Tunnel engineering uses Argo Tunnel

Thu, 27/08/2020 - 12:00
How Argo Tunnel engineering uses Argo Tunnel

Whether you are managing a fleet of machines or sharing a private site from your localhost, Argo Tunnel is here to help. On the Argo Tunnel team we help make origins accessible from the Internet in a secure and seamless manner. We also care deeply about productivity and developer experience for the team, so naturally we want to make sure we have a development environment that is reliable, easy to set up and fast to iterate on.

A brief history of our development environment (dev-stack)Docker compose

When our development team was still small, we used a docker-compose file to orchestrate the services needed to develop Argo Tunnel. There was no native support for hot reload, so every time an engineer made a change, they had to restart their dev-stack.

We could hack around it to hot reload with docker-compose, but when that failed, we had to waste time debugging the internals of Docker. As the team grew, we realized we needed to invest in improving our dev stack.

At the same time Cloudflare was in the process of migrating from Marathon to kubernetes (k8s). We set out to find a tool that could detect changes in source code and automatically upgrade pods with new images.

Skaffold + Minikube

Initially Skaffold seemed to match the criteria. It watches for change in source code, builds new images and deploys applications onto any k8s. Following Skaffold’s tutorial, we picked minikube as the local k8s, but together they didn’t meet our expectations. Port forwarding wasn’t stable, we got frequent connections refused or timeout.

In addition, iteration time didn’t improve, because spinning up minikube takes a long time and it doesn’t use the host's docker registry and so it can’t take advantage of caching. At this point we considered reverting back to using docker compose, but the k8s ecosystem is booming, so we did some more research.

Tilt + Docker for mac k8s

Eventually we found a great blog post from Tilt comparing different options for local k8s, and they seem to be solving the exact problem we are having. Tilt is a tool that makes local development on k8s easier. It detects changes in local sources and updates your deployment accordingly.

In addition, it supports live updates without having to rebuild containers, a process that used to take around 20 minutes. With live updates, we can copy the newest source into the container, run cargo build within the container, and restart the service without building a new image. Following Tilt’s blog post, we switched to Docker for Mac’s built-in k8s. Combining Tilt and Docker for Mac k8s, we finally have a development environment that meets our needs.

Rust services that could take 20 minutes to rebuild now take less than a minute.

Collaborating with a distributed team

We reached a much happier state with our dev-stack, but one problem remained: we needed a way to share it. As our teams became distributed with people in Austin, Lisbon and Seattle, we needed better ways to help each other.

One day, I was helping our newest member understand an error observed in cloudflared, Argo Tunnel’s command line interface (CLI) client. I knew the error could either originate from the backend service or a mock API gateway service, but I couldn’t tell for sure without looking at logs.

To get them, I had to ask our new teammate to manually send me the logs of the two services. By the time I discovered the source of the error, reviewed the deployment manifest, and determined the error was caused by a secret set as an empty string, two full hours had elapsed!

I could have solved this in minutes if I had remote access to her development environment. That’s exactly what Argo Tunnel can do! Argo Tunnel provides remote access to development environments by creating secure outbound-only connections to Cloudflare’s edge network from a resource exposing it to the Internet. That model helps protect servers and resources from being vulnerable to attack by an exposed IP address.

I can use Argo Tunnel to expose a remote dev environment, but the information stored is sensitive. Once exposed, we needed a way to prevent users from reaching it unless they are an authenticated member of my team. Cloudflare Access solves that challenge. Access sits in front of the hostname powered by Argo Tunnel and checks for identity on every request. I can combine both services to share the dev-stack details with the rest of the team in a secure deployment.

The built-in k8s dashboard gives a great overview of the dev-stack, with the list of pods, deployments, services, config maps, secrets, etc. It also allows us to inspect pod logs and exec into a container. By default, it is secured by a token that changes every time the service restarts. To avoid the hassle of distributing the service token to everyone on the team, we wrote a simple reverse proxy that injects the service token in the authorization header before forwarding requests to the dashboard service.

Then we run Argo Tunnel as a sidecar to this reverse proxy, so it is accessible from the Internet. Finally, to make sure no random person can see our dashboard, we put an Access policy that only allows team members to access the hostname.

The request flow is eyeball -> Access -> Argo Tunnel -> reverse proxy -> dashboard service

How Argo Tunnel engineering uses Argo TunnelWorking example

Your team can use the same model to develop remotely. Here’s how to get started.

  1. Start a local k8s cluster. https://docs.tilt.dev/choosing_clusters.html offers great advice in choosing a local cluster based on your OS and experience with k8s
How Argo Tunnel engineering uses Argo Tunnel

2. Enable dashboard service:

How Argo Tunnel engineering uses Argo Tunnel

3. Create a reverse proxy that will inject the service token of the kubernetes-dashboard service account in the Authorization header before forwarding requests to kubernetes dashboard service

package main import ( "crypto/tls" "fmt" "net/http" "net/http/httputil" "net/url" "os" ) func main() { config, err := loadConfigFromEnv() if err != nil { panic(err) } reverseProxy := httputil.NewSingleHostReverseProxy(config.proxyURL) // The default Director builds the request URL. We want our custom Director to add Authorization, in // addition to building the URL singleHostDirector := reverseProxy.Director reverseProxy.Director = func(r *http.Request) { singleHostDirector(r) r.Header.Add("Authorization", fmt.Sprintf("Bearer %s", config.token)) fmt.Println("request header", r.Header) fmt.Println("request host", r.Host) fmt.Println("request ULR", r.URL) } reverseProxy.Transport = &http.Transport{ TLSClientConfig: &tls.Config{ InsecureSkipVerify: true, }, } server := http.Server{ Addr: config.listenAddr, Handler: reverseProxy, } server.ListenAndServe() } type config struct { listenAddr string proxyURL *url.URL token string } func loadConfigFromEnv() (*config, error) { listenAddr, err := requireEnv("LISTEN_ADDRESS") if err != nil { return nil, err } proxyURLStr, err := requireEnv("DASHBOARD_PROXY_URL") if err != nil { return nil, err } proxyURL, err := url.Parse(proxyURLStr) if err != nil { return nil, err } token, err := requireEnv("DASHBOARD_TOKEN") if err != nil { return nil, err } return &config{ listenAddr: listenAddr, proxyURL: proxyURL, token: token, }, nil } func requireEnv(key string) (string, error) { result := os.Getenv(key) if result == "" { return "", fmt.Errorf("%v not provided", key) } return result, nil }

4. Create an Argo Tunnel sidecar to expose this reverse proxy

apiVersion: apps/v1 kind: Deployment metadata: name: dashboard-auth-proxy namespace: kubernetes-dashboard labels: app: dashboard-auth-proxy spec: replicas: 1 selector: matchLabels: app: dashboard-auth-proxy template: metadata: labels: app: dashboard-auth-proxy spec: containers: - name: dashboard-tunnel # Image from https://hub.docker.com/r/cloudflare/cloudflared image: cloudflare/cloudflared:2020.8.0 command: ["cloudflared", "tunnel"] ports: - containerPort: 5000 env: - name: TUNNEL_URL value: "http://localhost:8000" - name: NO_AUTOUPDATE value: "true" - name: TUNNEL_METRICS value: "localhost:5000" # dashboard-proxy is a proxy that injects the dashboard token into Authorization header before forwarding # the request to dashboard_proxy service - name: dashboard-auth-proxy image: dashboard-auth-proxy ports: - containerPort: 8000 env: - name: LISTEN_ADDRESS value: localhost:8000 - name: DASHBOARD_PROXY_URL value: https://kubernetes-dashboard - name: DASHBOARD_TOKEN valueFrom: secretKeyRef: name: ${TOKEN_NAME} key: token

5. Find out the URL to access your dashboard from Tilt’s UI

How Argo Tunnel engineering uses Argo Tunnel

6. Share the URL with your collaborators so they can access your dashboard anywhere they are through the tunnel!

How Argo Tunnel engineering uses Argo Tunnel

You can find the source code for the example in https://github.com/cloudflare/argo-tunnel-examples/tree/master/sharing-k8s-dashboard

If this sounds like a team you want to be on, we are hiring!

Categories: Technology

Cloudflare and Human Rights: Joining the Global Network Initiative (GNI)

Wed, 26/08/2020 - 13:00
 Joining the Global Network Initiative (GNI)

Consistent with our mission to help build a better Internet, Cloudflare has long recognized the importance of conducting our business in a way that respects the rights of Internet users around the world. We provide free services to important voices online - from human rights activists to independent journalists to the entities that help maintain our democracies - who would otherwise be vulnerable to cyberattack. We work hard to develop internal mechanisms and build products that empower user privacy. And we believe that being transparent about the types of requests we receive from government entities and how we respond is critical to maintaining customer trust.

As Cloudflare continues to expand our global network, we think there is more we can do to formalize our commitment to help respect human rights online. To that end, we are excited to announce that we have joined the Global Network Initiative (GNI), one of the world's leading human rights organizations in the information and communications Technology (ICT) sector, as observers.

Business + Human Rights

Understanding Cloudflare’s new partnership with GNI requires some additional background on how human rights concepts apply to businesses.

In 1945, following the end of World War II, 850 delegates representing forty-six nations gathered in San Francisco to create an international organization dedicated to preserving peace and helping to build a better world. In drafting the eventual United Nations (UN) Charter, the delegates included seven mentions of the need for a set of formal, universal rights that would provide basic protections for all people, everywhere.

Although human rights have traditionally been the purview of nation-states, in 2005 the UN Secretary-General appointed Harvard Professor John Ruggie as the first Special Representative for Business and Human Rights. Ruggie was tasked with determining whether businesses, particularly multinational corporations operating in many jurisdictions, ought to have their own unique obligations under international human rights law.

In 2008, Ruggie proposed the "Protect, Respect and Remedy" framework on business and human rights to the UN Human Rights Council. He argued that while states have an obligation to protect human rights, businesses have a responsibility to respect human rights. Essentially, businesses should act with due diligence in all of their operations to avoid infringing on the rights of others, and remedy any harms that do occur. Moreover, businesses' responsibilities exist independent of any state's ability or willingness to meet their own human rights obligations.

Ruggie's framework was later incorporated into the UN Guiding Principles on Business and Human Rights, which was unanimously endorsed by the UN Human Rights Council in 2011.

Cloudflare + Human Rights

With Cloudflare's network now operating in more than 100 countries, with more than a billion unique IP addresses passing through Cloudflare's network every day, we believe we have a significant responsibility to respect, and a tremendous potential to promote, human rights.

For example, privacy is a protected human right under Article 17 of the International Covenant on Civil and Political Rights, which means that the law enforcement and other government orders we periodically receive to access customer's personal information affect the human rights of our customers and users. Twice a year, we publish a Transparency Report that describes exactly how many orders, subpoenas, and warrants we receive, how we respond, and how many customers or domains are affected.

In responding to law enforcement requests for subscriber information, Cloudflare has relied on three principles: due process, respect for our customer's privacy, and notice for those affected. As we say often, Cloudflare's intent is to follow the law, and not to make law enforcement's job any harder, or easier.

However, as Cloudflare expands internationally, relying on due process and transparency alone may not always be sufficient. For one thing, those kinds of protections are dependent on good-faith implementation of legal processes like rule of law and judicial oversight. They also require a government and legal system that carries some political legitimacy, like those established through democratic processes.  

To improve the consistency of how we work with governments around the world, Cloudflare will be incorporating human rights training and due diligence tools like human rights impact assessments into our decision-making processes throughout our company.  

We recognize that as a company there are limits to what Cloudflare can or should do as we operate around the world. But we are committed to more formally documenting our efforts to respect human rights under the UN Guiding Principles on Business and Human Rights, and we've partnered with GNI to help us get there.

 Joining the Global Network Initiative (GNI)Who is GNI?

GNI is a non-profit organization launched in 2008. GNI members include ICT companies, civil society organizations (including human rights and press freedom groups), academic experts, and investors from around the world. Its mission is to protect and advance freedom of expression and privacy rights in the ICT sector by setting a global standard for responsible decision making and serving as a multistakeholder voice in the face of government restrictions and demands.

GNI provides companies with concrete guidance on how to implement the GNI Principles, which are based on international human rights law and standards, and are informed by the United Nations Guiding Principles on Business and Human Rights.  

When companies join GNI, they agree to have their implementation of the GNI Principles assessed independently by participating in GNI’s assessment process. The assessment is made up of a review of relevant internal systems, policies and procedures for implementing the Principles and an examination of specific cases or examples that show how the company is implementing them in practice.

What's Next?

Cloudflare will serve in observer status in GNI for the next year while we work with GNI's staff to implement and document formal human rights practices and training throughout our company.  Cloudflare will undergo its first independent assessment after becoming full GNI members.

It's an exciting step for us as a company. But, we also think there is more Cloudflare can do to help build a better, more sustainable Internet. As we say often, we are just getting started.

Categories: Technology

Announcing wrangler dev — the Edge on localhost

Wed, 26/08/2020 - 12:00
Announcing wrangler dev — the Edge on localhost

Cloudflare Workers — our serverless platform — allows developers around the world to run their applications from our network of 200 datacenters, as close as possible to their users.

A few weeks ago we announced a release candidate for wrangler dev — today, we're excited to take wrangler dev, the world’s first edge-based development environment, to GA with the release of wrangler 1.11.

Think locally, develop globally

It was once assumed that to successfully run an application on the web, one had to go and acquire a server, set it up (in a data center that hopefully you had access to), and then maintain it on an ongoing basis. Luckily for most of us, that assumption was challenged with the emergence of the cloud. The cloud was always assumed to be centralized — large data centers in a single region (“us-east-1”), reserved for compute. The edge? That was for caching static content.

Again, assumptions are being challenged.

Cloudflare Workers is about moving compute from a centralized location to the edge. And it makes sense: if users are distributed all over the globe, why should all of them be routed to us-east-1, on the opposite side of the world, causing latency and degrading user experience?

But challenging one assumption caused others to come into view. One of the most obvious ones was: would a local development environment actually provide the best experience for someone looking to test their Worker code? Trying to fit the entire Cloudflare edge, with all its dependencies onto a developer’s machine didn’t seem to be the best approach. Especially given that the place the developer was going to run that code in production was mere milliseconds away from the computer they were running on.

When I was in college, getting started with programming, one of the biggest barriers to entry was installing all the dependencies required to run a single library. I would go as far as to say that the third, and often forgotten hardest problem in computer science is dependency management.

We’re not the first to try and unify development environments across machines — tools such as Docker aim to solve this exact problem by providing a prepackaged development environment.

Yet, packaging up the Workers runtime is not quite so simple.

Beyond the Workers runtime, there are many components that make up Cloudflare’s edge, including DNS resolution, the Cloudflare cache — all of those parts are what makes Cloudflare Workers so powerful. That means that without those components, a standalone runtime is insufficient to represent the behavior of Worker request handling. The reason to develop locally first is to have the opportunity to experiment without affecting production. Thus, having a local development environment that truly reflects production is a requirement.

wrangler dev

wrangler dev provides all the convenience of a local development environment, without the headache of trying to reproduce the reality of production locally — and then having to keep the two environments in sync.

By running at the edge, it provides a high fidelity, consistent experience for all developers, without sacrificing the speedy feedback loop of a local development environment.

Live reloadingAnnouncing wrangler dev — the Edge on localhost

As you update your code, wrangler dev will detect changes, and push the new version of your code to the edge.

console.log() at your fingertipsAnnouncing wrangler dev — the Edge on localhost

Previously to extract your console logs from the Workers runtime, you had to have the Workers Preview open in a browser window at all times. With wrangler dev, you can receive your own logs, directly to your terminal of choice.

Cache API, KV, and more!

Since wrangler dev runs on the edge, you can now easily test the state of a cache.put(), without having to deploy your Worker to production.

wrangler dev will spin up a new KV namespace for development, so you don’t have to worry about affecting your production data.

And if you’re looking to test out some of the features provided on request.cf that provide rich information about the request such as geo-location — they will all be provided from the Cloudflare data center.

Get started

wrangler dev is now available in the latest version of Wrangler, the official Cloudflare Workers CLI.

To get started, follow our installation instructions here.

What’s next?

wrangler dev is just our first foray into giving our developers more visibility and agility with their development process.

We recognize that we have a lot more work to do to meet our developers needs, including providing an easy testing framework for Workers, and allowing our customers to observe their Workers’ behavior in production.

Just as wrangler dev provides a quick feedback loop between our developers and their code, we love to have a tight feedback loop between our developers and our product. We love to hear what you’re building, how you’re building it, and how we can help you build it better.

Categories: Technology

Improving the Wrangler Startup Experience

Tue, 25/08/2020 - 12:00
Improving the Wrangler Startup ExperienceImproving the Wrangler Startup Experience

Today I’m excited to announce wrangler login, an easy way to get started with Wrangler! This summer for my internship on the Workers Developer Productivity team I was tasked with helping improve the Wrangler user experience. For those who don’t know, Workers is Cloudflare’s serverless platform which allows users to deploy their software directly to Cloudflare’s edge network.

This means you can write any behaviour on requests heading to your site or even run fully fledged applications directly on the edge. Wrangler is the open-source CLI tool used to manage your Workers and has a big focus on enabling a smooth developer experience.

When I first heard I was working on Wrangler, I was excited that I would be working on such a cool product but also a little nervous. This was the first time I would be writing Rust in a professional environment, the first time making meaningful open-source contributions, and on top of that the first time doing all of this remotely. But thanks to lots of guidance and support from my mentor and team, I was able to help make the Wrangler and Workers developer experience just a little bit better.

The Problem

The main improvement I focused on this summer was the experience when getting started with Wrangler. For many of the commands to publish and develop live Workers, the user first needs to authenticate with Cloudflare. This is mainly done through the wrangler config command which has the user go create an API token and paste it into Wrangler. Creating a token involves going to the Cloudflare dashboard, going to your profile, going to the API tokens page, selecting a token template, adding your zones and accounts, and finally creating the token. While this is a completely valid authentication flow, it’s not as easy as it could be.

It could be frustrating to users who have to leave Wrangler and then possibly get lost in the wrong dashboard page or use the wrong settings for their token. When a group of intern candidates were given the task of using Wrangler, most of them got stuck on this step! Many users might forgo using Workers altogether if this is the first thing they encounter when sitting down to develop. Instead we wanted an experience where users could use their Cloudflare login (ie. their username, password, and possible two-factor authentication) and immediately be ready to go.

No OAuth? No Problem

What we came up with was a way to create and transfer API Tokens for a user, similar to how Argo Tunnel does their login.

Improving the Wrangler Startup Experience

An overview of the process is shown above, which starts with Wrangler. When the user types wrangler login in their terminal, they will be prompted to open the Cloudflare dashboard in their browser. All dashboard pages require the user to sign in before loading and once the user is signed in, all actions taken by the dashboard page will use the authentication of that user.

This means we can make a dashboard page which automatically creates an API token configured to manage Workers. Then when the user loads this page, a properly configured API token will be created for that user. Our dashboard page will then hand off the token to EdgeWorker Config Service (EWC) which will temporarily store it. While this is all going on Wrangler will be polling EWC waiting for the token to appear and once it does, Wrangler will retrieve the token and authenticate the user. With this, we have a seamless way to authenticate a Cloudflare user.

Security

One thing we had to be mindful of was security, these are users’ tokens after all. If someone was listening to network traffic and saw the request to the Cloudflare dashboard page, nothing would be stopping them from polling EWC themselves and stealing the token away from the user to wreak havoc on their Workers and zones. To solve this problem we used asymmetric RSA encryption. Asymmetric encryption lets us create two separate but mathematically connected keys. One is a private key which can encrypt and decrypt information and one is a public key which can only encrypt information.

Wrangler will generate a public-private key pair and pass off the public key to our dashboard page. Once the dashboard page is finished creating our token, EWC will then encrypt the token using the public key before storing. This means in the previous scenario where someone takes the token from our user, all they will have is an encrypted token they can’t use. The only way to decrypt it would be with the private key held by Wrangler.

Improving the Wrangler Startup Experience

In the end, this solution results in a smooth experience for Workers users. Now instead of rummaging through dashboard pages you can get started with Wrangler in only a few seconds, sometimes without having to leave the comfort of your own terminal.

Try out wrangler login in the 1.11.0 release of Wrangler and let us know how you like it. Also I would like to thank the Workers team for helping make this possible and giving me an awesome experience this summer! In order to implement this feature I had to touch different parts of Cloudflare like EWC and Stratus (Cloudflare’s front end monorepo) and work in areas unfamiliar to me such as frontend TypeScript and React. The responsiveness and encouragement I received helped get this feature created and helped make for a great summer!

Categories: Technology

Delivering HTTP/2 upload speed improvements

Mon, 24/08/2020 - 12:00
Delivering HTTP/2 upload speed improvementsDelivering HTTP/2 upload speed improvements

Cloudflare recently shipped improved upload speeds across our network for clients using HTTP/2. This post describes our journey from troubleshooting an issue to fixing it and delivering faster upload speeds to the global Internet.

We launched speed.cloudflare.com in May 2020 to give our users insight into how well their networks perform. The test provides download, upload and latency tests. Soon after release, we received reports from a small number of users that sometimes upload speeds were underreported. Our investigation determined that it seemed to happen with end users that had high upload bandwidth available (several hundreds Mbps class cable modem or fiber service). Our speed tests are performed via browser JavaScript, and most browsers use HTTP/2 by default. We found that HTTP/2 upload speeds were sometimes much slower than HTTP/1.1 (assuming all TLS) when the user had high available upload bandwidth.

Upload speed is more important than ever, especially for people using home broadband connections. As many people have been forced to work from home they’re using their broadband connections differently than before. Prior to the pandemic broadband traffic was very asymmetric (you downloaded way more than you uploaded… think listening to music, or streaming a movie), but now we’re seeing an increase in uploading as people video conference from home or create content from their home office.

Initial Tests

User reports were focused on particularly fast home networks. I set up a dummynet network simulator to test upload speed in a controlled environment. I launched a linux VM running our code inside my Macbook Pro and set up a dummynet between the VM and Mac host.  Measuring upload speed is simple – create a file and upload using curl to an endpoint which accepts a request body. I ran the same test 20 times and took a median upload speed (Mbps).

% dd if=/dev/urandom of=test.dat bs=1M count=10 % curl --http1.1 -w '%{speed_upload}\n' -sf -o/dev/null --data-binary @test.dat https://edge/upload-endpoint % curl --http2 -w '%{speed_upload}\n' -sf -o/dev/null --data-binary @test.dat https://edge/upload-endpoint

Stepping up to uploading a 10MB object over a network which has 200Mbps available bandwidth and 40ms RTT, the result was surprising. Using our production configuration, HTTP/2 upload speed tested at almost half of the same test conditions using HTTP/1.1 (higher is better).

Delivering HTTP/2 upload speed improvements

The result may differ depending on your network, but the gap is bigger when the network is fast. On a slow network, like my home cable connection (5Mbps upload and 20ms RTT), HTTP/2 upload speed was almost identical to the performance observed with HTTP/1.1.

Receiver Flow Control

Before we get into more detail on this topic, my intuition suggested the issue was related to receiver flow control. Usually the client (browser or any HTTP client) is the receiver of data, but in the case the client is uploading content to the server, the server is the receiver of data. And the receiver needs some type of flow control of the receive buffer.

How we handle receiver flow control differs between HTTP/1.1 and HTTP/2. For example, HTTP/1.1 doesn’t define protocol-level receiver flow control since there is no multiplexing of requests in the connection and it’s up to the TCP layer which handles receiving data. Note that most of the modern OS TCP stacks have auto tuning of the receive buffer (we will revisit that later) and they tune based on the current BDP (bandwidth-delay product).

In the case of HTTP/2, there is a stream-level flow control mechanism because the protocol supports multiplexing of streams. Each HTTP/2 stream has its own flow control window and there is connection level flow control for all streams in the connection. If it’s too tight, the sender will be blocked by the flow control. If it’s too loose we may end up wasting memory for buffering. So keeping it optimal is important when implementing flow control and the most optimal strategy is to keep the receive buffer matching the current BDP. BDP represents the maximum bytes in flight in the given network and can be used as an optimal buffer size.

How NGINX handles the request body buffer

Initially I tried to find a parameter which controls NGINX upload buffering and tried to see if tuning the values improved the result. There are a couple of parameters which are related to uploading a request body.

And this is HTTP/2 specific:

Cloudflare does not use the proxy_request_buffering directive, so it can be immediately discounted.  client_body_buffer_size is the size of the request body buffer which is used regardless of the protocol, so this one applies to HTTP/1.1 and HTTP/2 as well.

When looking into the code, here is how it works:

Delivering HTTP/2 upload speed improvements
  • HTTP/1.1: use client_body_buffer_size buffer as a buffer between upstream and the client, simply repeating reading and writing using this buffer.
  • HTTP/2: since we need a flow control window update for the HTTP/2 DATA frame, there are two parameters:
    • http2_body_preread_size: it specifies the size of the initial request body read before it starts to send to the upstream.
    • client_body_buffer_size: it specifies the size of the request body buffer.
    • Those two parameters are used for allocating a request body buffer during uploading. Here is a brief summary of how unbuffered upload works:
      • Allocate a single request body buffer which size is a maximum of http2_body_preread_size and client_body_buffer_size. This means if http2_body_preread_size is 64KB and client_body_buffer_size is 128KB, then a 128KB buffer is allocated. We use 128KB for client_body_buffer_size.
      • HTTP/2 Settings INITIAL_WINDOW_SIZE of the stream is set to http2_body_preread_size and we use 64KB as a default (the RFC7540 default value).
      • HTTP/2 module reads up to http2_body_preread_size before sending it to upstream.
      • After flushing the preread buffer, keep reading from the client and write to upstream and send WINDOW_UPDATE frame back to the client when necessary until the request body is fully received.

To summarise what this means: HTTP/1.1 simply uses a single buffer, so TCP socket buffers do the flow control. However with HTTP/2, the application layer also has receiver flow control and NGINX uses a fixed size buffer for the receiver. This limits upload speed when the current link has a BDP larger than the current request body buffer size. So the bottleneck is HTTP/2 flow control when the buffer size is too tight.

We're going to need a bigger buffer?

In theory, bigger buffer sizes should avoid upload bottlenecks, so I tried a few out by running my tests again. The previous chart result is now labelled "prod" and plotted alongside HTTP/2 tests with client_body_buffer_size set to 256KB, 512KB and 1024KB:

Delivering HTTP/2 upload speed improvements

It appears 512KB is an optimal value for client_body_buffer_size.

What if I test with some other network parameter? Here is when RTT is 10ms, in this case, 256KB looks optimal.

Delivering HTTP/2 upload speed improvements

Both cases look much better than current 128KB and get a similar performance to HTTP/1.1 or even better. However, it seems like the optimal buffer size is a moving target and furthermore having too large a buffer size can hurt the performance: we need a smart way to find the optimal buffer size.

Autotuning request body buffer size

One of the ideas that can help this kind of situation is autotuning. For example, modern TCP stacks autotune their receive buffers automatically. In production, our edge also has TCP receiver buffer autotuning enabled by default.

net.ipv4.tcp_moderate_rcvbuf = 1

But in case of HTTP/2, TCP buffer autotuning is not very effective because the HTTP/2 layer is doing its own flow control and the existing 128KB was too small for a high BDP link. At this point, I decided to pursue autotuning HTTP/2 receive buffer sizing as well, similar to what TCP does.

The basic idea is that NGINX doubles the size of HTTP/2 request body buffer size based on its BDP. Here is an algorithm currently implemented in our version of NGINX:

  • Allocate a request body buffer as explained above.
  • For every RTT (using linux tcp_info), update the current BDP.
  • Double the request body buffer size when the current BDP > (receiver window / 4).
Test ResultLab Test

Here is a test result when HTTP/2 autotune upload is enabled (still using client_body_buffer_size 128KB). You can see "h2 autotune" is doing pretty well – similar or even slightly faster than HTTP/1.1 speed (that's the initial goal). It might be slightly worse than a hand-picked optimal buffer size for given conditions, but you can see now NGINX picks up optimal buffer size automatically based on network conditions.

Delivering HTTP/2 upload speed improvementsDelivering HTTP/2 upload speed improvementsProduction test

After we deployed this feature, I ran similar tests against our production edge, uploading a 10MB file from well connected client nodes to our edge. I created a Linux VM instance in Google Cloud and ran the upload test where the network is very fast (a few Gbps) and low latency (<10ms).

Here is when I run the test in the Google Cloud Belgium region to our CDG (Paris) PoP which has 7ms RTT. This looks very good with almost 3x improvement.

Delivering HTTP/2 upload speed improvements

I also tested between the Google Cloud Tokyo region and our NRT (Tokyo) PoP, which had a 2.3ms RTT. Although this is not realistic for home users, the results are interesting. A 128KB fixed buffer performs well, but HTTP/2 with buffer autotune outperforms even HTTP/1.1.

Delivering HTTP/2 upload speed improvementsSummary

HTTP/2 upload buffer autotuning is now fully deployed in the Cloudflare edge. Customers should now benefit from improved upload performance for all HTTP/2 connections, including speed tests on speed.cloudflare.com. Autotuning upload buffer size logic works well for most cases, and now HTTP/2 upload is much faster than before! When we think about the performance we usually tend to think about download speed or latency reduction, but faster uploading can also help users working from home when they need a large amount of upload, such as photo/video sharing apps, content creation, video conferencing or self broadcasting.

Many thanks to Lucas Pardue and Rustam Lalkaka for great feedback and suggestions on the article.

Update: we made this patch available to public and sent to NGINX upstream. You can find it here.

Categories: Technology

How Cloudflare uses Cloudflare Spectrum: A look into an intern’s project at Cloudflare

Fri, 21/08/2020 - 12:00
 A look into an intern’s project at Cloudflare A look into an intern’s project at Cloudflare

Cloudflare extensively uses its own products internally in a process known as dogfooding. As part of my onboarding as an intern on the Spectrum (a layer 4 reverse proxy) team, I learned that many internal services dogfood Spectrum, as they are exposed to the Internet and benefit from layer 4 DDoS protection. One of my first tasks was to update the configuration for an internal service that was using Spectrum. The configuration was managed in Salt (used for configuration management at Cloudflare), which was not particularly user-friendly, and required an engineer on the Spectrum team to handle updating it manually.

This process took about a week. That should instantly raise some questions, as a typical Spectrum customer can create a new Spectrum app in under a minute through Cloudflare Dashboard. So why couldn’t I?

This question formed the basis of my intern project for the summer.

The Process

Cloudflare uses various IP ranges for its products. Some customers also authorize Cloudflare to announce their IP prefixes on their behalf (this is known as BYOIP). Collectively, we can refer to these IPs as managed addresses. To prevent Bad Stuff (defined later) from happening, we prohibit managed addresses from being used as Spectrum origins. To accomplish this, Spectrum had its own table of denied networks that included the managed addresses. For the average customer, this approach works great – they have no legitimate reason to use a managed address as an origin.

Unfortunately, the services dogfooding Spectrum all use Cloudflare IPs, preventing those teams with a legitimate use-case from creating a Spectrum app through the configuration service (i.e. Cloudflare Dashboard). To bypass this check, these internal customers needed to define a custom Spectrum configuration, which needed to be manually deployed to the edge via a pull request to our Salt repo, resulting in a time consuming process.

If an internal customer wanted to change their configuration, the same time consuming process must be used. While this allowed internal customers to use Spectrum, it was tedious and error prone.

Bad Stuff

Bad Stuff is quite vague and deserves a better definition. It may seem arbitrary that we deny Cloudflare managed addresses. To motivate this, consider two Spectrum apps, A and B, where the origin of app A is the Cloudflare edge IP of app B, and the origin of app B is the edge IP of app A. Essentially, app A will proxy incoming connections to app B, and app B will proxy incoming connections to app A, creating a cycle.

This could potentially crash the daemon or degrade performance. In practice, this configuration is useless and would only be created by a malicious user, as the proxied connection never reaches an origin, so it is never allowed.

 A look into an intern’s project at Cloudflare

In fact, the more general case of setting another Spectrum app as an origin (even when the configuration does not result in a cycle) is (almost1) never needed, so it also needs to be avoided.

As well, since we are providing a reverse proxy to customer origins, we do not need to allow connections to IP ranges that cannot be used on the public Internet, as specified in this RFC.

The Problem

To improve usability and allow internal Spectrum customers to create apps using the Dashboard instead of the static configuration workflow, we needed a way to give particular customers permission to use Cloudflare managed addresses in their Spectrum configuration. Solving this problem was my main project for the internship.

A good starting point ended up being the Addressing API. The Addressing API is Cloudflare’s solution to IP management, an internal database and suite of tools to keep track of IP prefixes, with the goal of providing a unified source of truth for how IP addresses are being used across the organization. This makes it possible to provide a cross-product platform for products and features such as BYOIP, BGP On Demand, and Magic Transit.

The Addressing API keeps track of all Cloudflare managed IP prefixes, along with who owns the prefix. As well, the owner of a prefix can give permission for someone else to use the prefix. We call this a delegation.

A user’s permission to use an IP address managed by the Addressing API is determined as followed:

  1. Is the user the owner of the prefix containing the IP address?
    a) Yes, the user has permission to use the IP
    b) No, go to step 2
  2. Has the user been delegated a prefix containing the IP address?
    a) Yes, the user has permission to use the IP.
    b) No, the user does not have permission to use the IP.
The Solution

With the information present in the Addressing API, the solution starts to become clear. For a given customer and IP, we use the following algorithm:

  1. Is the IP managed by Cloudflare (or contained in the previous RFC)?
    a) Yes, go to step 2
    b) No, allow as origin
  2. Does the customer have permission to use the IP address?
    a) Yes, allow as origin
    b) No, deny as origin

As long as the internal customer has been given permission to use the Cloudflare IP (through a delegation in the Addressing API), this approach would allow them to use it as an origin.

However, we run into a corner case here – since BYOIP customers also have permission to use their own ranges, they would be able to set their own IP as an origin, potentially causing a cycle. To mitigate this, we need to check if the IP is a Spectrum edge IP. Fortunately, the Addressing API also contains this information, so all we have to do is check if the given origin IP is already in use as a Spectrum edge IP, and if so, deny it. Since all of the denied networks checks occur in the Addressing API, we were able to remove Spectrum's own deny network database, reducing the engineering workload to maintain it along the way.

Let's go through a concrete example. Consider an internal customer who wants to use 104.16.8.54/32 as an origin for their Spectrum app. This address is managed by Cloudflare, and suppose the customer has permission to use it, and the address is not already in use as an edge IP. This means the customer is able to specify this IP as an origin, since it meets all of our criteria.

For example, a request to the Addressing API could look like this:

curl --silent 'https://addr-api.internal/edge_services/spectrum/validate_origin_ip_acl?cidr=104.16.8.54/32' -H "Authorization: Bearer $JWT" | jq . { "success": true, "errors": [], "result": { "allowed_origins": { "104.16.8.54/32": { "allowed": true, "is_managed": true, "is_delegated": true, "is_reserved": false, "has_binding": false } } }, "messages": [] }

Now we have completely moved the responsibility of validating the use of origin IP addresses from Spectrum’s configuration service to the Addressing API.

Performance

This approach required making another HTTP request on the critical path of every create app request in the Spectrum configuration service. Some basic performance testing showed (as expected) increased response times for the API call (about 100ms). This led to discussion among the Spectrum team about the performance impact of different HTTP requests throughout the critical path. To investigate, we decided to use OpenTracing.

OpenTracing is a standard for providing distributed tracing of microservices. When an HTTP request is received, special headers are added to it to allow it to be traced across the different services. Within a given trace, we can see how long a SQL query took, the time a function took to complete, the amount of time a request spent at a given service, and more.

We have been deploying a tracing system for our services to provide more visibility into a complex system.

After instrumenting the Spectrum config service with OpenTracing, we were able to determine that the Addressing API accounted for a very small amount of time in the overall request, and allowed us to identify potentially problematic request times to other services.

 A look into an intern’s project at CloudflareLessons Learned

Reading documentation is important! Having a good understanding of how the Addressing API and the config service worked allowed me to create and integrate an endpoint that made sense for my use-case.

Writing documentation is just as important. For the final part of my project, I had to onboard Crossbow – an internal Cloudflare tool used for diagnostics – to Spectrum, using the new features I had implemented. I had written an onboarding guide, but some stuff was unclear during the onboarding process, so I made sure to gather feedback from the Crossbow team to improve the guide.

Finally, I learned not to underestimate the amount of complexity required to implement relatively simple validation logic. In fact, the implementation required understanding the entire system. This includes how multiple microservices work together to validate the configuration and understanding how the data is moved from the Core to the Edge, and then processed there. I found increasing my understanding of this system to be just as important and rewarding as completing the project.

Footnotes:

Regional Services actually makes use of proxying a Spectrum connection to another colocation, and then proxying to the origin, but the configuration plane is not involved in this setup.

Categories: Technology

Introducing Deploy Buttons

Thu, 20/08/2020 - 20:01
Introducing Deploy ButtonsIntroducing Deploy Buttons

When I first try out new development platforms, the first thing I do is get an OSS (Open Source Software) project I find on Github up and running. I used to start by following tutorials or digging through documentation. It’s a little bit counterintuitive. Let me share with you why. One reason is that Hello, World! examples rarely show the real “magic” of the platform. I want to feel excited and get a sense of how other people are creatively using the platform.

For example, I love it when I can build and deploy an OSS Pokedex app in a few minutes on Flutter to see if the platform actually lives up to the hype. It’s so much easier to do this than to spend a few hours following tutorials and documentation to get through the initial learning curve. You can think of it as shortening the time to first dopamine.

Another reason is that it makes learning the new platform much faster. Building off of an experienced developer’s work shows me which classes and functions are most useful to learn. There’s more nuance to building out full applications than is usually explained in the documentation. I can see how the pieces fit together and get a deeper understanding.

When I started my internship on the Workers team, I realized I could help create the magical experience of trying out a new platform quickly for the Workers dev experience. The team and I have spent the last few months working hard with the Workers team to make this possible. Today, I’m happy to share with you: Deploy Buttons.

Deploy Buttons let you deploy a project to the Workers Platform without even needing to set up a local development environment. Before Deploy Buttons, new developers had to jump between multiple places like the signup pages, the docs, the dashboard, and Wrangler to deploy a project. Now, it’s as easy as clicking a Deploy Button and three short steps to deploy using our new web-based deploy tool. This is the fastest way for beginners to get something live in less than five minutes, and for developers to share their projects with others by creating their own deploy buttons.

Try this button to deploy a GraphQL db:

Introducing Deploy Buttons

We’ve also curated a few awesome projects to try deploying now with this new deploy flow.

Before Deploy Buttons, things were a bit more complicated. When you found a project you like on GitHub or an example on our template gallery, you had to go to cloudflare.com to create an account, go to the docs to set up a local development environment, clone the project, and finally, learn how to use wrangler to deploy it. Now, you can just click on the deploy button to quickly get a project up and running with Workers to experience the magic of serverless.

Introducing Deploy Buttons

Here’s how it works. For example, if you find an interesting Workers project on GitHub and want to deploy it, you would click the “Deploy to Workers Button” on the repo README. This would take you to our new web-based deploy tool where you can deploy that project in just three steps. First, connect your Github account. Second, connect your Cloudflare account. Third, click “Deploy” and the project will be forked to your Github account and deployed to Workers with our Github action.

Want to create your own Deploy Button for your projects?

1) Add a GitHub Actions workflow to your project.

Add a new file to .github/workflows, such as .github/workflows/deploy.yml, and create a GitHub workflow for deploying your project. It should include a set of events, including at least repository_dispatch, but probably push and maybe schedule as well. Add a step for publishing your project using wrangler-action:

name: Build on: push: pull_request: repository_dispatch: deploy: runs-on: ubuntu-latest timeout-minutes: 60 needs: test steps: - uses: actions/checkout@v2 - name: Publish uses: cloudflare/wrangler-action@1.2.0

2) Add support for CF_API_TOKEN and CF_ACCOUNT_ID in your repo workflow:

# Update "Publish" step from last code snippet - name: Publish uses: cloudflare/wrangler-action@1.2.0 with: apiToken: ${{ secrets.CF_API_TOKEN }} env: CF_ACCOUNT_ID: ${{ secrets.CF_ACCOUNT_ID }}

3) Add the Markdown code for your button to your project's README, replacing the example url parameter with your repository URL.

Introducing Deploy Buttons

[![Deploy to Cloudflare Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/YOURUSERNAME/YOURREPO)

Does your project use features only available in a paid Workers plan like Workers KV? Providing the paid=true query parameter to the /button and the deploy application paths will render a "Deploy to Workers Bundled" button, as seen below -- it will also render a notice in the UI that the project requires Workers Bundled:

Introducing Deploy Buttons

[![Deploy to Cloudflare Workers](https://deploy.workers.cloudflare.com/button?paid=true)](https://deploy.workers.cloudflare.com/?url=https://github.com/YOURUSERNAME/YOURREPO&paid=true)

Thanks for tuning in. If you have any feedback, please fill out our feedback survey. We have more exciting features and improvements to announce soon for making the developer experience for Workers even better.

Categories: Technology

Require hard key auth with Cloudflare Access

Thu, 20/08/2020 - 14:00
Require hard key auth with Cloudflare Access

Last month, attackers compromised a Twitter team member’s access to an internal administrative panel in order to take over high-profile accounts. Full details of the breach are still pending, but Twitter has shared that the attackers stole credentials through a coordinated spear phishing attack.

The attackers convinced a team member to share login permissions, giving the attackers the ability to access the Twitter control plane. Once authenticated, they sent password reset flows to email accounts they controlled in order to hijack the Twitter accounts.

Administrative panels like Twitter’s are a rich target for phishing attacks because they give attackers a backdoor to privileged systems. Customer-facing teams at SaaS companies rely on these administrative panels to update end-user data and troubleshoot user account issues. If an attacker can compromise a single team member’s account they can potentially impact thousands of end users.

We have our own administrative panel at Cloudflare and we’ve deployed a number of safeguards over the last several years to keep it secure from phishing attacks. However, we had no way to enforce the security feature we think would most insulate us from phishing attacks: physical hard keys.

With hard keys, users can only login when they use a physical device - one that does not produce codes that could be shared over chat. Google notably eliminated all employee phishing cases by rolling out their own hard keys. We issued them to every team member who had to login to our identity provider to reach the admin panel, but our identity provider allowed users to to fall back to a less secure option for MFA.

To solve that problem, we moved the hard key requirement into Cloudflare’s network. Using Cloudflare Access, we can now restrict the ability to reach our admin panel only to team members who authenticate with a hard key. Starting today, we’re making that feature available to all teams.

Securing our own internal tools

Our admin panel gives our team members the ability to turn on features, manage settings, and investigate issues for our customers. The tool itself maintains its own set of application-level permissions that control the actions that administrators can take. Some customer accounts are scoped to specific team members; other accounts cannot be modified without the customer’s explicit approval.

We layer other safeguards on top of those permissions. Users who are inactive for a number of days need to be manually re-enabled. We regularly audit who can use the tool and trim back the list.

We used to lean on a VPN to keep the front door to the admin panel secure. Team members would connect to our VPN using a client on their device. Once on the VPN, they could reach the login page of the tool.

The VPN was painful for end users and a source of concern for our security team. Two issues, in particular, motivated us to find a better solution.

  • Segmentation was difficult. Not every user in the company needs to reach the admin panel, but any user on the VPN could connect to the login page. We wanted to limit that access to users in specific permission groups as part of a defense-in-depth strategy.
  • We could not add additional signals. Users could connect to the private network with just a password and MFA code. We could not limit VPN connections to corporate laptops, healthy devices, or other more granular restrictions.

About two years ago, we migrated that admin panel’s security perimeter to Cloudflare Access. Access gave us a zero-trust alternative to our VPN. Instead of being able to reach the admin panel because you are on the private network, Access continuously checks every request to the tool for identity against a list of allowed users.

We could use Access to limit who could reach certain tools, and it became a foundation for adding other sources of signal down the road. Access also gave us a new level of visibility. Since all requests pass through Cloudflare’s edge, Access provides our team with visibility into every request to every path. Without making any server-side code changes, Access can log every request and attribute it to an authenticated user. We can export those to our SIEM and create a comprehensive audit trail.

Hard keys without weak fallbacks

We integrated Cloudflare Access with our identity provider, which supports multifactor authentication (MFA). To login through Cloudflare Access, users would need to authenticate with their password and a MFA option. The ability to choose an option meant a less secure method could be selected.

Those fallback options were subject to a higher risk of phishing attack. SMS-based codes can be vulnerable to SIM swapping attacks. App-based time-based one-time-pin (TOTP) codes can linger on forgotten devices and, more dangerously, be transmitted as part of an attack.

Hard keys stand out because they rely on control of a physical item. With hard keys, users login with their password and then have to tap an actual key (typically in the form of a USB device). That tap presents a certificate, that only lives on the key, to the service configured to trust it. A user with the hard key could not inadvertently share that code over the phone or in a chat with an attacker.

We distribute hard keys to all team members at Cloudflare. However, we could not require team members to use them on an app-by-app basis. If I don’t have my hard key around, I always have the option to fall back to a TOTP code. We needed a filtering engine that could combine multiple sources of identity signal, which Cloudflare Access provided.

How Cloudflare Access solved our own problem

If you remember going to bars or clubs before the pandemic, Cloudflare Access might seem familiar. You had to show your identity card at the door to a bouncer. The bouncer (and the establishment) did not issue that card - your government did. However, they know what it should look like and how to use its information.

Once checked, the bouncer stamped the back of your hand. You could put your ID back in your wallet and the stamp became proof that the bartenders inside knew to trust.

Cloudflare Access works like that bouncer. When users connect to a resource secured by Cloudflare Access, we check for their ID by sending them to login to their identity provider like Okta or Azure AD. Users authenticate and their identity provider sends Cloudflare Access details like who they are and, for certain providers, what MFA method they used.

Like that stamp, Cloudflare Access uses the external identity information to create a distinct badge that we trust. Access generates a JSON Web Token (JWT) for the user and stores it in their browser. That token becomes the user’s badge for the rest of their session. Cloudflare Access looks for that JWT on every request the user makes to the application and enforces rules that an administrator configures about who can proceed.

Cloudflare Access can store more than just the user’s identity in the JWT. If the identity provider captures the MFA method used by a team member, Access can read that value and store it as an additional field in the JWT. RFC 8176, Authentication Method Reference Values, standardizes these values and how they are shared between systems.

We can use that standard to introduce an MFA option into the JWT created by Cloudflare Access. Access could then add an additional check that evaluated both the user’s identity and the MFA method they used to login.

The policy flexibility gave us what we needed to work with our security team to solve this problem. By adding a new rule that layers on top of the identity rule, we could immediately require that every team member logging in to our admin panel do so with their software-secured key.

Require hard key auth with Cloudflare Access

In our case, that includes any hard keys supported by WebAuthN and FIDO2 or keys tied to a physical device like Apple Touch ID and Windows Hello. Access would reject the attempt if they (or an attacker) used the fallback TOTP code - even if the identity provider allowed the login.

How to get started

You can begin using the same feature our own security team needed today at no additional cost. You’ll need an identity provider that supports Authenticated Method Reference, or amr settings. Today, that consists of Okta and Azure Active Directory. We expect others will add support and we will update our documentation as they do.

To get started, navigate to an application you have in Cloudflare Access or create a new one. In the rules that determine who is allowed to reach the application, add a rule in the “Require” section. Select “MFA” from the dropdown and then choose which options you want to require.

Require hard key auth with Cloudflare AccessWhat’s next?

Cloudflare Access, part of Cloudflare for Teams, is available today. You can follow the documentation here to add the additional rule. All accounts can use 50 seats of Cloudflare Access for free, including the hard key requirement feature.

Categories: Technology

Orange Clouding with Secondary DNS

Thu, 20/08/2020 - 12:00
What is secondary DNS?Orange Clouding with Secondary DNS

In a traditional sense, secondary DNS servers act as a backup to the primary authoritative DNS server.  When a change is made to the records on the primary server, a zone transfer occurs, synchronizing the secondary DNS servers with the primary server. The secondary servers can then serve the records as if they were the primary server, however changes can only be made by the primary server, not the secondary servers. This creates redundancy across many different servers that can be distributed as necessary.

There are many common ways to take advantage of Secondary DNS, some of which are:

  1. Secondary DNS as passive backup - The secondary DNS server sits idle until the primary server goes down, at which point a failover can occur and the secondary can start serving records.
  2. Secondary DNS as active backup - The secondary DNS server works alongside the primary server to serve records.
  3. Secondary DNS with a hidden primary - The nameserver records at the registrar point towards the secondary servers only, essentially treating them as the primary nameservers.
What is secondary DNS Override?

Secondary DNS Override builds on the Secondary DNS with a hidden primary model by allowing our customers to not only serve records as they tell us to, but also enable them to proxy any A/AAAA/CNAME records through Cloudflare's network. This is similar to how Cloudflare as a primary DNS provider currently works.

Consider the following example:

example.com Cloudflare IP - 192.0.2.0
example.com origin IP - 203.0.113.0

In order to take advantage of Cloudflare's security and performance services, we need to make sure that the origin IP stays hidden from the Internet.

Orange Clouding with Secondary DNSFigure 1: Secondary DNS without a hidden primary nameserver

Figure 1 shows that without a hidden primary nameserver, the resolver can choose to query either one. This opens up two issues:

  1. Violates RFC 1034 and RFC 2182 because the Cloudflare server will be responding differently than the primary nameserver.
  2. The origin IP will be exposed to the Internet.
Orange Clouding with Secondary DNSFigure 2: Secondary DNS with a hidden primary nameserver

Figure 2 shows the resolver always querying the Cloudflare Secondary DNS server.

How Does Secondary DNS Override work

The Secondary DNS Override UI looks similar to the primary UI, the only difference is that records cannot be edited.

Orange Clouding with Secondary DNSFigure 3: Secondary DNS Override Dashboard

In figure 3, all of the records have been transferred from the primary DNS server. test-orange and test-aaaa-orange have been overridden to proxy through the cloudflare network, while test-grey and test-mx are treated as regular DNS records.

Behind the scenes we store override records that pair with transferred records based on the name. For secondary override we don’t care about the type when overriding records, because of two things:

  1. According to RFC 1912 you cannot have a CNAME record with the same name as any other record. (This does not apply to some DNSSEC records, see RFC 2181)
  2. A and AAAA records are both address type records which should be either all proxied or all not proxied under the same name.

This means if you have several A and several AAAA records all with the name “example.com”, if one of them is proxied, all of them will be proxied. The UI helps abstract the idea that we are storing additional override records through the “orange cloud” button, which when clicked, will create an override record which applies to all A/AAAA or CNAME records with that name.

CNAME at the Apex

Normally, putting a CNAME at the apex of a zone is not allowed. For example:

example.com CNAME other-domain.com

Is not allowed because this means that there will be at least one other SOA record and one other NS record with the same name, disobeying RFC 1912 as mentioned above. Cloudflare can overcome this through the use of CNAME Flattening, which is a common technique used within the primary DNS product today. CNAME flattening allows us to return address records instead of the CNAME record when a query comes into our authoritative server.

Contrary to what was said above regarding the prevention of editing records through the Secondary DNS Override UI, the CNAME at the apex is the one exception to this rule. Users are able to create a CNAME at the apex in addition to the regular secondary DNS records, however the same rules defined in RFC 1912 also apply here. What this means is that the CNAME at the apex record can be treated as a regular DNS record or a proxied record, depending on what the user decides. Regardless of the proxy status of the CNAME at the apex record, it will override any other A/AAAA records that have been transferred from the primary DNS server.

Merging Secondary, Override and CNAME at Apex Records

At record edit time we do all of the merging of the secondary, override and CNAME at the apex records. This means that when a DNS request comes in to our authoritative server at the edge, we can still return the records in blazing fast times. The workflow is shown in figure 4.

Orange Clouding with Secondary DNSFigure 4: Record Merging process

Within the zone builder the steps are as follows:

  1. Check if there is any CNAME at the apex, if so, override all other A/AAAA secondary records at the apex.
  2. For each secondary record, check if there is a matching override record, if so, apply the proxy status of the override record to all secondary records with that name.
  3. Leave all other secondary records as is.
Getting Started

Secondary DNS Override is a great option for any users that want to take advantage of the Cloudflare network, without transferring all of their zones to Cloudflare DNS as a primary provider. Security and access control can be managed on the primary side, without worrying about unauthorized edits of information on the Cloudflare side.

Secondary DNS Override is currently available on the Enterprise plan, if you’d like to take advantage of it, please let your account team know. For additional documentation on Secondary DNS Override, please refer to our support article.

Categories: Technology

New and improved Workers Docs

Wed, 19/08/2020 - 12:00
New and improved Workers Docs

I’m happy to announce several updates to the Workers Docs that will allow you to take full advantage of our Workers platform. We integrated your feedback about the Docs user experience and design. We reorganized and reformatted all of our content. We upgraded the Docs engine to add new UI components. The documentation is now intuitive to navigate and the content is now easy and enjoyable to read.

You can find our new and improved documentation site here and can find the docs engine on our repo.

We hope this creates a better developer experience for you and makes the Docs more approachable to beginners. We plan to use our work and improvements for the Workers Docs to revamp docs for other Cloudflare products too.

Here’s a more detailed breakdown of the Workers Docs update.

Content Organization: We reorganized site content into four categories to make it easier for you to read and find content: Tutorials, How-to guides, Technical reference, and Learning. The new content structure is heavily inspired by Divio’s documentation system.

New and improved Workers Docs

The tutorials section groups together step by step guides for building a specific project on Workers (e.g. teaching a beginner how to cook). The how-to guides are a collection of examples to help developers understand and implement a specific feature of the Workers platform (e.g. a recipe in a cookbook). In the new Docs, the how-to guides section is split into the examples and starters pages. You can think of the examples as code snippets and the starters as fully completed and cloneable projects. The technical reference is a descriptive breakdown of each technical component for Workers (e.g. a reference encyclopedia article). In the new Docs, we decided to split the Reference section into three groups for Platform, Runtime APIs, and Wrangler documentation to make content easier to find. The learning section builds understanding for a specific topic or concept about the Workers platform (e.g. an article on culinary social history.)

The key insight is that by breaking content up in this way, each of the four types of documentation only has one job so it can really nail it. Plus, it makes it easier to find what you are looking for.

Improved Formatting:  We added more structure and detail to our content. Here are a few examples. First, reference pages like the one for Request class now give more context with a background section explaining use cases and tips on how to use the class. We also added a table of contents to the right sidebar to make browsing through a page much easier.

New and improved Workers Docs

We also cleaned up the formatting for content such as class breakdowns. This used to be done through chunky paragraphs of text. In the new Docs, each variable and parameter is listed and explained with plenty of whitespace for readability

New and improved Workers Docs

Similarly, we have improved the formatting for command breakdowns which used to be tables. It’s now easy to read through with the same list based format for class breakdowns.

New and improved Workers Docs

Redesigned Pages: With our Docs engine rewrite, we added new UI components and created new page designs. For instance, with the examples page, we added tags so you can sort by category. Also, the entire layout has been improved to make it easier to read and understand each example.

New and improved Workers Docs

Another great example is the tutorials page. We updated the page to show the difficulty and length of each tutorial. This should make the tutorials and learning how to develop on Workers more approachable for beginners. We also added new tutorials such how to manage multiple Workers projects with Lerna.

New and improved Workers Docs

Accurate Search: We updated the design for our Algolia docs search. The text is now easier to read and the results are prioritized better than before.

New and improved Workers Docs

Dark Mode: You can now customize the Docs to your liking and switch between light and dark mode. We designed these two themes to be really easy to read and visually appealing.

New and improved Workers Docs

I hope this update makes using the docs a better experience for you. We have more updates planned and are always looking to improve. Please feel submit an issue here if you have any feedback.

Categories: Technology

Election Cybersecurity: Preparing for the 2020 U.S. Elections.

Mon, 17/08/2020 - 13:30
 Preparing for the 2020 U.S. Elections.

At Cloudflare, our mission is to help build a better Internet. As we look to the upcoming 2020 U.S. elections, we are reminded that having the Internet be trusted, secure, reliable, and accessible for campaigns and citizens alike is critical to our democracy. We rely on the Internet to share and discover pertinent information such as how to register to vote, find polling locations, or learn more about candidates.

Due to the spread of COVID-19, we are seeing a number of election environments shift online, to varying degrees, with political parties conducting virtual fundraisers, campaigns moving town halls to online platforms and election officials using online forms to facilitate voting by mail. As the 2020 U.S. elections approach, we want to ensure that players in the election space have the tools they need to stay online to promote trust and confidence in the democratic system.

We’re keeping an eye on how this shift to online activities affect cyberattacks. From April to June 2020, for example, we saw a trend of increasing DDoS attacks, with double the amount of L3/4 attacks observed over our network compared to the first three months of 2020. In the election space, we are tracking trends and vulnerabilities to better understand the threats against these critical players. Our goal is to use the information to create best practices for election and campaign officials so they can be better prepared for the upcoming elections.

Key Takeaways:

  • When comparing types of attacks against campaigns and government election sites, we saw the exact inverse type of attacks with political campaigns experiencing more DDoS attacks while government sites experiencing more attempts to exploit security vulnerabilities.
  • On average, state and local government election sites experience 122,475 cyber threats per day with an average of 199 SQL injection attempts per day.
  • On average, political campaigns experience 4,949 cyber threats per day, although larger campaigns may see far more.
The Athenian Project & Cloudflare for Campaigns Participants

Since 2020, the number of domains under the Athenian Project has increased by 48 percent, to 229 state and local government election websites in 28 states receiving our security protections. Cloudflare also protects many political campaigns at all levels on a wide range of plans. Under Cloudflare for Campaigns, an initiative we launched in January 2020 to provide a free package of security protections to political campaigns with our partnership with Defending Digital Campaigns, we protect more than 50 political campaigns from candidates in 27 states.

Significant traffic spikes and probing for vulnerabilities to government election websites

For state and local governments, election night and the days leading up that day are typically the most important days of the year. With constituents accessing voter information such as voting and polling stations, election officials expect higher amounts of traffic to their website. Over the last few months, we’ve seen this shift at Cloudflare, with noticeable increases in traffic ranging from 2 to 3 times the volume of requests to many of these government election websites. We believe there are a wide range of factors for traffic spikes including, but not limited to, states expanding vote-by-mail initiatives and voter registration deadlines due to emergency orders by 53 states and territories throughout the United States. In March, more than 23 states conducted presidential primaries including 14 states on Super Tuesday, the most states on a single day to host primary elections.

At this year's DEF CON Voting Village, experts from the Department of Homeland Security identified routine failure due to abnormally high demand as the largest risk to election systems because of the coronavirus pandemic. We have seen this in full effect, with traffic to election websites being unpredictable, and including unexplained spikes outside of election cycles, per the graph below.

 Preparing for the 2020 U.S. Elections.

To help state and local governments under the Athenian Project prepare for elections, we wanted to identify the types of threats that election websites face and how to better protect their website from malicious attacks. Since the beginning of this year, we’ve seen a large number of attempts to exploit security vulnerabilities that were mitigated by the web application firewall (WAF), with 90 million threats blocked in March 2020, for example. Cloudflare’s WAF uses managed rulesets to offer a wide range of protection against known vulnerabilities and suspicious behavior and custom firewall rules to allow users to rapidly identify and adapt to the evolving threat landscape. Of the threats we identified, managed rulesets helped mitigate 51% of threats and custom firewall rules mitigated an additional 35% of threats. Having both managed rulesets and custom firewall rules therefore helps safeguard election information.

In previous elections, attackers have used SQL injections against government election websites to attempt to extract information. We therefore did a deeper dive on those types of attacks, to understand if these threats are being conducted leading up to the 2020 election. We identified a number of SQL injection threats that were blocked by Cloudflare, with an average of 43,884 attempts per day across all domains under the Athenian Project. SQL injection attacks are commonly attempted against government election sites, with the WAF blocking an average of 199 SQL injection threats per day.

 Preparing for the 2020 U.S. Elections.Political Campaigns have experienced more DDoS attacks

When looking at the ecosystem of election security, political campaigns can be soft targets for cyberattacks due to the inability to dedicate resources to sophisticated cybersecurity protections. Campaigns are typically short-term, cash strapped operations that do not have an IT staff or budget necessary to promote long term security strategies.

To gain a better understanding of the threats around political campaigns, we surveyed 80 U.S. federal political campaigns on a range of Cloudflare plans from Cloudflare for Campaigns to our self serve plans. Cloudflare has mitigated a total of 77,192,840 threats on these sites since January 2020. That means that, on average, these sites saw 4,949 threats per day from January 2020 to present.  In general, we see larger scale attacks against Senate candidate’s sites than those of House candidates.

 Preparing for the 2020 U.S. Elections.

As the election season has progressed, we’ve also seen an increase in the average number of attacks against political campaigns, with a 187% increase from May to June 2020. As face to face campaigning is not an option, campaigns now rely on online platforms such as video conferencing software, online fundraising and social media to reach voters. This can present significant cybersecurity challenges to already vulnerable groups, such as political campaigns. Political campaigns are realizing the importance of cybersecurity services and have begun working with state parties and committees on training on the types of cyber threats and widely available resources for campaigns. With basic cybersecurity hygiene training on issues such as password security, two factor authentication, identifying phishing scams, network protection, internal application security and social media privacy, campaign staff are less likely to be the victims of a data breach.

 Preparing for the 2020 U.S. Elections.

There has been a notable amount of DDoS activity against political campaign websites. DDoS attacks, which can be cheap, easy to organize and highly destructive, are often used for targeting political campaigns. A DDoS attack that takes down a campaign's website during critical times can severely disadvantage a website. Campaigns used rate limiting to address 63% of the cyber threats they experienced, suggesting that DDoS attacks remain a significant concern.

Securing Elections in 2020

Democracies rely on access to information and trust in government institutions, especially during a crisis. Reflecting this reality, elections officials are more aware and focused on reliability and resilience than ever before. Likewise, political campaigns are increasingly aware of the potential risks of DDoS activity and other cyber threats.

As COVID-19 continues to spread, it puts further pressure on ensuring that the Internet can be used to access and share election information. At Cloudflare, we believe that expanding access to tools that election officials and political candidates need to combat a range of online threats both serves our mission to help build a better Internet and strengthens our democracy.

Categories: Technology

Why I joined Cloudflare

Fri, 14/08/2020 - 12:00
Why I joined CloudflareWhy I joined Cloudflare

Customer Service. Business. Growth. While these three make up a large portion of what keeps most enterprise companies operating, they are just the beginning at Cloudflare.

I am excited to share that I have joined Cloudflare as its Chief Customer Officer. Cloudflare has seen explosive growth: we launched only a decade ago and have already amassed nearly 3 million customers and grown from a few 100 enterprise customers to 1000s. Currently, we are at a growth inflection point where more companies are choosing to partner with us and are leveraging our service. We are fortunate to serve these customers with a consistent, high quality experience, no matter where their end-users are located around the world.

But the flare doesn’t stop at performative success

I took this opportunity because Cloudflare serves the world and does what is right over what is easy. Our customers deliver meals to your doors, provide investment and financial advice, produce GPS devices for navigational assistance, and so much more. Our customers span every vertical and industry, as well as every size. By partnering with them, we have a hand in delighting customers everywhere and helping make the Internet better. I am excited to work with them to ensure that their Internet properties are safe, reliable, and fast!

Cloudflare doesn’t arbitrarily boast about wanting to help build a better Internet: it’s a mindset, a lifestyle for each colleague I have met at the company. The key driver here is to ameliorate standards across Internet security, reliability, and speed delivering services like Universal SSL, HTTP/2, TLS 1.3, HTTP/3, …

A bit about me

I have spent most of my career working with enterprise customers, large and small. For the past several years, I have worked primarily in networking and security companies like Palo Alto Networks, RedLock Security and CipherCloud, overseeing customer success and all post-sales activities. Prior to that, my experience includes working at companies like Ernst & Young and Tata Consultancy Services as a management consultant for many years. I enjoy working with customers and focusing on delivering value through the use of technology and services.

So back to the Customer Success Team at Cloudflare. Our team has been serving the business needs of our customers, large and small, with a genuine sense of dedication and commitment. Our team is made up of Customer Success Managers, Solution Engineers, Solution Architects, Senior Support Engineers, and Escalation Engineers. We run a 24x7 global team that is watchful and available to service the needs of our customers in a proactive manner, vigilant about any attacks on any of our customers’ Internet properties. We are growing the team and constantly hiring more team members globally.

While we have earned the trust and confidence of our customers, we are constantly striving to do more: to provide more value-added services and to help our customers grow even faster. We will continue to enhance our delivery capabilities around the world and earn the privilege to be a trusted advisor for all our customers.

My team and I are committed to enhancing the experience of our customers with more offerings in the marketplace, tailored adoption workshops, best practices sessions, digital assets, proactive monitoring services, and much more. Stay tuned for more announcements about improvements to our service models that will lead to a better customer experience. In the meantime, I welcome any feedback that can help us serve you better. We are in this together. I am proud to be part of this growing, interconnected universe of millions of people around the world, focused on helping build a better Internet!

Categories: Technology

Network-layer DDoS attack trends for Q2 2020

Wed, 05/08/2020 - 14:00
Network-layer DDoS attack trends for Q2 2020Network-layer DDoS attack trends for Q2 2020

In the first quarter of 2020, within a matter of weeks, our way of life shifted. We’ve become reliant on online services more than ever. Employees that can are working from home, students of all ages and grades are taking classes online, and we’ve redefined what it means to stay connected. The more the public is dependent on staying connected, the larger the potential reward for attackers to cause chaos and disrupt our way of life. It is therefore no surprise that in Q1 2020 (January 1, 2020 to March 31, 2020) we reported an increase in the number of attacks—especially after various government authority mandates to stay indoors—shelter-in-place went into effect in the second half of March.

In Q2 2020 (April 1, 2020 to June 30, 2020), this trend of increasing DDoS attacks continued and even accelerated:

  • The number of L3/4 DDoS attacks observed over our network doubled compared to that in the first three months of the year.
  • The scale of the largest L3/4 DDoS attacks increased significantly. In fact, we observed some of the largest attacks ever recorded over our network.
  • We observed more attack vectors being deployed and attacks were more geographically distributed.
The number of global L3/4 DDoS attacks in Q2 doubled

Gatebot is Cloudflare’s primary DDoS protection system. It automatically detects and mitigates globally distributed DDoS attacks. A global DDoS attack is an attack that we observe in more than one of our edge data centers. These attacks are usually generated by sophisticated attackers employing botnets in the range of tens of thousand to millions of bots.

Network-layer DDoS attack trends for Q2 2020

Sophisticated attackers kept Gatebot busy in Q2. The total number of global L3/4 DDoS attacks that Gatebot detected and mitigated in Q2 doubled quarter over quarter. In our Q1 DDoS report, we reported a spike in the number and size of attacks. We continue to see this trend accelerate through Q2; over 66% of all global DDoS attacks in 2020 occurred in the second quarter (nearly 100% increase). May was the busiest month in the first half of 2020, followed by June and April. Almost a third of all L3/4 DDoS attacks occurred in May.

In fact, 63% of all L3/4 DDoS attacks that peaked over 100 Gbps occurred in May. As the global pandemic continued to heighten around the world in May, attackers were especially eager to take down websites and other Internet properties.

Network-layer DDoS attack trends for Q2 2020Small attacks continue to dominate in numbers as big attacks get bigger in size

A DDoS attack’s strength is equivalent to its size—the actual number of packets or bits flooding the link to overwhelm the target. A ‘large’ DDoS attack refers to an attack that peaks at a high rate of Internet traffic. The rate can be measured in terms of packets or bits. Attacks with high bit rates attempt to saturate the Internet link, and attacks with high packet rates attempt to overwhelm the routers or other in-line hardware devices.

Similar to Q1, the majority of L3/4 DDoS attacks that we observed in Q2 were also relatively ‘small’ with regards to the scale of Cloudflare’s network. In Q2, nearly 90% of all L3/4 DDoS attacks that we saw peaked below 10 Gbps. Small attacks that peak below 10 Gbps can still easily cause an outage to most of the websites and Internet properties around the world if they are not protected by a cloud-based DDoS mitigation service.

Network-layer DDoS attack trends for Q2 2020

Similarly, from a packet rate perspective, 76% of all L3/4 DDoS attacks in Q2 peaked up to 1 million packets per second (pps). Typically, a 1 Gbps Ethernet interface can deliver anywhere between 80k to 1.5M pps. Assuming the interface also serves legitimate traffic, and that most organizations have much less than a 1 Gbps interface, you can see how even these ‘small’ packet rate DDoS attacks can easily take down Internet properties.

Network-layer DDoS attack trends for Q2 2020

In terms of duration, 83% of all attacks lasted between 30 to 60 minutes. We saw a similar trend in Q1 with 79% of attacks falling in the same duration range. This may seem like a short duration, but imagine this as a 30 to 60 minute cyber battle between your security team and the attackers. Now it doesn’t seem so short. Additionally, if a DDoS attack creates an outage or service degradation, the recovery time to reboot your appliances and relaunch your services can be much longer; costing you lost revenue and reputation for every minute.

Network-layer DDoS attack trends for Q2 2020In Q2, we saw the largest DDoS attacks on our network, ever

This quarter, we saw an increasing number of large scale attacks; both in terms of packet rate and bit rate. In fact, 88% of all DDoS attacks in 2020 that peaked above 100 Gbps were launched after shelter-in-place went into effect in March. Once again, May was not just the busiest month with the most number of attacks, but also the greatest number of large attacks above 100 Gbps.

Network-layer DDoS attack trends for Q2 2020

From the packet perspective, June took the lead with a whopping 754 million pps attack. Besides that attack, the maximum packet rates stayed mostly consistent throughout the quarter with around 200 million pps.

Network-layer DDoS attack trends for Q2 2020

The 754 million pps attack was automatically detected and mitigated by Cloudflare. The attack was part of an organized four-day campaign that lasted from June 18 to the 21. As part of the campaign, attack traffic from over 316,000 IP addresses targeted a single Cloudflare IP address.

Cloudflare’s DDoS protection systems automatically detected and mitigated the attack, and due to the size and global coverage of our network, there was no impact to performance. A global interconnected network is crucial when mitigating large attacks in order to be able to absorb the attack traffic and mitigate it close to the source, whilst also continuing serving legitimate customer traffic without inducing latency or service interruptions.

The United States is targeted with the most attacks

When we look at the L3/4 DDoS attack distribution by country, our data centers in the United States received the most number of attacks (22.6%), followed by Germany (4.4%), Canada (2.7%) and Great Britain (2.6%).

Network-layer DDoS attack trends for Q2 2020

However when we look at the total attack bytes mitigated by each Cloudflare data center, the United States still leads (34.9%), but followed by Hong Kong (6.6%), Russia (6.5%), Germany (4.5%) and Colombia (3.7%). The reason for this change is due to the total amount of bandwidth that was generated in each attack. For instance, while Hong Kong did not make it to the top 10 list due to the relatively small number of attacks that was observed in Hong Kong (1.8%), the attacks were highly volumetric and generated so much attack traffic that pushed Hong Kong to the 2nd place.

When analyzing L3/4 DDoS attacks, we bucket the traffic by the Cloudflare edge data center locations and not by the location of the source IP. The reason is when attackers launch L3/4 attacks they can ‘spoof’ (alter) the source IP address in order to obfuscate the attack source. If we were to derive the country based on a spoofed source IP, we would get a spoofed country. Cloudflare is able to overcome the challenges of spoofed IPs by displaying the attack data by the location of Cloudflare’s data center in which the attack was observed. We’re able to achieve geographical accuracy in our report because we have data centers in over 200 cities around the world.

57% of all L3/4 DDoS attacks in Q2 were SYN floods

An attack vector is a term used to describe the attack method. In Q2, we observed an increase in the number of vectors used by attackers in L3/4 DDoS attacks. A total of 39 different types of attack vectors were used in Q2, compared to 34 in Q1. SYN floods formed the majority with over 57% in share, followed by RST (13%), UDP (7%), CLDAP (6%) and SSDP (3%) attacks.

Network-layer DDoS attack trends for Q2 2020

SYN flood attacks aim to exploit the handshake process of a TCP connection. By repeatedly sending initial connection request packets with a synchronize flag (SYN), the attacker attempts to overwhelm the router’s connection table that tracks the state of TCP connections. The router replies with a packet that contains a synchronized acknowledgment flag (SYN-ACK), allocates a certain amount of memory for each given connection and falsely waits for the client to respond with a final acknowledgment (ACK). Given a sufficient number of SYNs that occupy the router’s memory, the router is unable to allocate further memory for legitimate clients causing a denial of service.

No matter the attack vector, Cloudflare automatically detects and mitigates stateful or stateless DDoS attacks using our 3 pronged protection approach comprising of our home-built DDoS protection systems:

  1. Gatebot - Cloudflare's centralized DDoS protection systems for detecting and mitigating globally distributed volumetric DDoS attacks. Gatebot runs in our network’s core data center. It receives samples from every one of our edge data centers, analyzes them and automatically sends mitigation instructions when attacks are detected. Gatebot is also synchronized to each of our customers’ web servers to identify its health and triggers accordingly, tailored protection.
  2. dosd (denial of service daemon) - Cloudflare’s decentralized DDoS protection systems. dosd runs autonomously in each server in every Cloudflare data center around the world, analyzes traffic, and applies local mitigation rules when needed. Besides being able to detect and mitigate attacks at super fast speeds, dosd significantly improves our network resilience by delegating the detection and mitigation capabilities to the edge.
  3. flowtrackd (flow tracking daemon) - Cloudflare’s TCP state tracking machine for detecting and mitigating the most randomized and sophisticated TCP-based DDoS attacks in unidirectional routing topologies. flowtrackd is able to identify the state of a TCP connection and then drops, challenges or rate-limits packets that don’t belong to a legitimate connection.

In addition to our automated DDoS protection systems, Cloudflare also generates real-time threat intelligence that automatically mitigates attacks. Furthermore, Cloudflare provides its customers firewall, rate-limiting and additional tools to further customize and optimize their protection.

Cloudflare DDoS mitigation

As Internet usage continues to evolve for businesses and individuals, expect DDoS tactics to adapt as well. Cloudflare protects websites, applications, and entire networks from DDoS attacks of any size, kind, or level of sophistication.

Our customers and industry analysts recommend our comprehensive solution for three main reasons:

  • Network scale: Cloudflare’s 37 Tbps network can easily block attacks of any size, type, or level of sophistication. The Cloudflare network has a DDoS mitigation capacity that is higher than the next four competitors—combined.
  • Time-to-mitigation: Cloudflare mitigates most network layer attacks in under 10 seconds globally, and immediate mitigation (0 seconds) when static rules are preconfigured. With our global presence, Cloudflare mitigates attacks close to the source with minimal latency. In some cases, traffic is even faster than over the public Internet.
  • Threat intelligence: Cloudflare’s DDoS mitigation is powered by threat intelligence harnessed from over 27 million Internet properties on it. Additionally, the threat intelligence is incorporated into customer facing firewalls and tools in order to empower our customers.

Cloudflare is uniquely positioned to deliver DDoS mitigation with unparalleled scale, speed, and smarts because of the architecture of our network. Cloudflare’s network is like a fractal—every service runs on every server in every Cloudflare data center that spans over 200 cities globally. This enables Cloudflare to detect and mitigate attacks close to the source of origin, no matter the size, source, or type of attack.

Network-layer DDoS attack trends for Q2 2020

To learn more about Cloudflare’s DDoS solution contact us or get started.

You can also join an upcoming live webinar where we will be discussing these trends, and strategies enterprises can implement to combat DDoS attacks and keep their networks online and fast. You can register here.

Categories: Technology

Pages

Additional Terms