Blogroll: CloudFlare

I read blogs, as well as write one. The 'blogroll' on this site reproduces some posts from some of the people I enjoy reading. There are currently 58 posts from the blog 'CloudFlare.'

Disclaimer: Reproducing an article here need not necessarily imply agreement or endorsement!

Subscribe to CloudFlare feed CloudFlare
The Cloudflare Blog
Updated: 2 weeks 4 days ago

Workers KV - free to try, with increased limits!

Mon, 16/11/2020 - 20:00
Workers KV - free to try, with increased limits!Workers KV - free to try, with increased limits!

In May 2019, we launched Workers KV, letting developers store key-value data and make that data globally accessible from Workers running in Cloudflare’s over 200 data centers.

Today, we’re announcing a Free Tier for Workers KV that opens up global, low-latency data storage to every developer on the Workers platform. Additionally, to expand Workers KV’s use cases even further, we’re also raising the maximum value size from 10 MB to 25 MB. You can now write an application that serves larger static files directly or JSON blobs directly from KV.

Together with our announcement of the Durable Objects limited beta last month, the Workers platform continues to move toward providing storage solutions for applications that are globally deployed as easily as an application running in a single data center today.

What are the new free tier limits?

The free tier includes 100,000 read operations and 1,000 each of write, list and delete operations per day, resetting daily at UTC 00:00, with a maximum total storage size of 1 GB. Operations that exceed these limits will fail with an error.

Additional KV usage costs $0.50 per million read operations, $5.00 per million list, write and delete operations and $0.50 per GB of stored data.

We intentionally chose these limits to prioritize use cases where KV works well - infrequently written data that may be frequently read around the globe.

What is the new KV value size limit?

We’re raising the value size limit in Workers KV from 10 MB to 25 MB. Users frequently store static assets in Workers KV to then be served by Workers code. To make it as easy as possible to deploy your entire site on Workers, we’re raising the value size limit to handle even larger assets.

Since Workers Sites hosts your site from Workers KV, the increased size limit also means Workers Sites assets can now be as large as 25 MB.

How does Workers KV work?

Workers KV stores key-value pairs and caches hot keys in Cloudflare’s data centers around the world. When a request hits a Worker that uses KV, it retrieves the KV pair from Cloudflare’s local cache with low latency if the pair has been accessed recently.

While some programs running on the Workers platform are stateless, it is often necessary to distribute files or configuration data to running Workers. Workers KV allows you to persist data and access it across multiple Workers calls.

For example, let’s say I wanted to serve a static text file from Cloudflare’s edge. I could provision my own object storage, host it on my own domain, and put that domain behind Cloudflare.

With Workers KV, however, that reduces down to a few simple steps. First, I bind my KV namespace to my Workers code with Wrangler.

wrangler kv:namespace create "BUCKET"

Then, in my wrangler.toml, I add my new namespace id to associate it with my Worker.

kv_namespaces = [ {binding = “BUCKET", id = <insert-id-here>} ]

I can upload a new text file from the command line using Wrangler:

$ wrangler kv:key put --binding=BUCKET "my-file" value.txt --path

And then serve that file from my Workers script with low latency from any of Cloudflare’s points of presence around the globe!

addEventListener('fetch', event => { event.respondWith(handleEvent(event)) }) async function handleEvent(event) { let txt = await BUCKET.get("my-file") return new Response(txt, { headers: { "content-type": "text/plain" } }) }

Beyond file hosting, Workers users have built many other types of applications with Workers KV:

  • Mass redirects - handle billions of HTTP redirects.
  • Access control rules - validate user requests to your API.
  • Translation keys - dynamically localize your web pages.
  • Configuration data - manage who can access your origin.

While Workers KV provides low latency access across the globe, it may not return the most up-to-date data if updates to keys are happening more than once a minute or from multiple data centers simultaneously. For use cases that cannot tolerate stale data, Durable Objects is a better solution.

Get started with Workers KV today, for free

The free tier and increased limits are live now!

You can get started with Workers and Workers KV in the Cloudflare dash. To check out an example of how to use Workers KV, check out the tutorial in the Workers documentation.

Categories: Technology

When trusted relationships are formed, everyone wins!

Mon, 16/11/2020 - 12:00
When trusted relationships are formed, everyone wins!When trusted relationships are formed, everyone wins!

Key Points:

  • Customer Success Managers offer continual strategic and technical guidance by way of interactive workshops, account reviews, tuning sessions and regular product updates.
  • Our product development and design teams constantly work on new features and product updates based on your input.
  • It’s a team effort. As part of our Premium Success offering, we can introduce you to Product Managers for in-depth conversations about our solutions and how they can apply to your business goals.
  • Cloudflare is always rapidly evolving and expanding our solutions! As technology advances, so does the sophistication of attacks. Through machine learning and behavioural analysis, we are able to ship new products to ensure you remain secure without impacting performance.

Reach out to your Customer Success Manager to gain more information on how they can accelerate your business.

The Success Story

Hi there. My name is Jake Jones and I’m a Customer Success Manager at Cloudflare covering the Middle East and Africa. When I look at what success means to me, it’s becoming a trusted advisor for my customers by taking a genuine interest in their priorities and helping them reach desired goals. I’ve learnt that successful partnerships are a byproduct of successful relationship building. Every customer is unique with their own specific IT set-up, ranging from legacy, cloud and hybrid infrastructures. Here is a short tale of how a collaboration with a Cloudflare customer resulted in multiple wins...

Let’s call the customer Dynamic, which is a befitting name as we constantly work together to review their infrastructure and optimise the Cloudflare platform further. We’ve all heard the phrase “people buy from people”, but true buy-in comes from forming genuine relationships.

In late 2019, when Dynamic came to Cloudflare, we understood and welcomed their emphasis on vendor collaboration. To achieve this we met regularly and had interactive workshops together, to prioritise which of Cloudflare’s solutions would be the most valuable to them. We sought to understand how they operate, their infrastructure, and most importantly, how we could help. In the first few months of the relationship, we mapped out future initiatives, projects and mitigation strategies (should an attack arise).

In the spring of 2020 attacks did arise and were diverse in nature, aiming to overload their website and IP subnets - we could not let this happen! Fortunately, through proactive communication, we understood each other’s workflow and had pre-established how Cloudflare could quickly protect against vulnerabilities. The attack was quickly mitigated through our automatic DDoS protection, with Magic Transit ensuring Dynamics on-premise infrastructure remained safeguarded. In turn, their reputation was upheld and any financial loss was minimized.

Fortunately, it’s not all doom and gloom. We’re there for the good times too! For example, we love to get feedback and grant exclusive access to our beta (early release) launches as part of our Premium Success offering.

Having access to Cloudflare’s beta programme is an excellent opportunity to knowledge share and improve the user experience. With this information, we feedback to our product and design team with recommendations on new functionalities the customer would like to see. However, it’s not only our solutions that are enhanced. We are continually evolving our infrastructure and adding more capacity on our servers, to accommodate the growing market landscape.

The moral of the story is that collaboration and relationship building leads to secure platform consolidation and trust in those who manage it.

Get in contact

To all Enterprise Customers, contact your CSM. We are here to make you more successful! For all others that would like to know more, follow this link and one of our team will contact you to start you on your path to success.

Categories: Technology

SAD DNS Explained

Fri, 13/11/2020 - 19:06
SAD DNS Explained

This week, at the ACM CCS 2020 conference, researchers from UC Riverside and Tsinghua University announced a new attack against the Domain Name System (DNS) called SAD DNS (Side channel AttackeD DNS). This attack leverages recent features of the networking stack in modern operating systems (like Linux) to allow attackers to revive a classic attack category: DNS cache poisoning. As part of a coordinated disclosure effort earlier this year, the researchers contacted Cloudflare and other major DNS providers and we are happy to announce that Public Resolver is no longer vulnerable to this attack.

In this post, we’ll explain what the vulnerability was, how it relates to previous attacks of this sort, what mitigation measures we have taken to protect our users, and future directions the industry should consider to prevent this class of attacks from being a problem in the future.

DNS Basics

The Domain Name System (DNS) is what allows users of the Internet to get around without memorizing long sequences of numbers. What’s often called the “phonebook of the Internet” is more like a helpful system of translators that take natural language domain names (like or and translate them into the native language of the Internet: IP addresses (like or [2001:db8::cf]). This translation happens behind the scenes so that users only need to remember hostnames and don’t have to get bogged down with remembering IP addresses.

DNS is both a system and a protocol. It refers to the hierarchical system of computers that manage the data related to naming on a network and it refers to the language these computers use to speak to each other to communicate answers about naming. The DNS protocol consists of pairs of messages that correspond to questions and responses. Each DNS question (query) and answer (reply) follows a standard format and contains a set of parameters that contain relevant information such as the name of interest (such as and the type of response record desired (such as A for IPv4 or AAAA for IPv6).

The DNS Protocol and Spoofing

These DNS messages are exchanged over a network between machines using a transport protocol. Originally, DNS used UDP, a simple stateless protocol in which messages are endowed with a set of metadata indicating a source port and a destination port. More recently, DNS has adapted to use more complex transport protocols such as TCP and even advanced protocols like TLS or HTTPS, which incorporate encryption and strong authentication into the mix (see Peter Wu’s blog post about DNS protocol encryption).

Still, the most common transport protocol for message exchange is UDP, which has the advantages of being fast, ubiquitous and requiring no setup. Because UDP is stateless, the pairing of a response to an outstanding query is based on two main factors: the source address and port pair, and information in the DNS message. Given that UDP is both stateless and unauthenticated, anyone, and not just the recipient, can send a response with a forged source address and port, which opens up a range of potential problems.

SAD DNS ExplainedThe blue portions contribute randomness

Since the transport layer is inherently unreliable and untrusted, the DNS protocol was designed with additional mechanisms to protect against forged responses. The first two bytes in the message form a message or transaction ID that must be the same in the query and response. When a DNS client sends a query, it will set the ID to a random value and expect the value in the response to match. This unpredictability introduces entropy into the protocol, which makes it less likely that a malicious party will be able to construct a valid DNS reply without first seeing the query. There are other potential variables to account for, like the DNS query name and query type are also used to pair query and response, but these are trivial to guess and don’t introduce an additional entropy.

Those paying close attention to the diagram may notice that the amount of entropy introduced by this measure is only around 16 bits, which means that there are fewer than a hundred thousand possibilities to go through to find the matching reply to a given query. More on this later.

The DNS Ecosystem

DNS servers fall into one of a few main categories: recursive resolvers (like or, nameservers (like the DNS root servers or Cloudflare Authoritative DNS). There are also elements of the ecosystem that act as “forwarders” such as dnsmasq. In a typical DNS lookup, these DNS servers work together to complete the task of delivering the IP address for a specified domain to the client (the client is usually a stub resolver - a simple resolver built into an operating system). For more detailed information about the DNS ecosystem, take a look at our learning site. The SAD DNS attack targets the communication between recursive resolvers and nameservers.

Each of the participants in DNS (client, resolver, nameserver) uses the DNS protocol to communicate with each other. Most of the latest innovations in DNS revolve around upgrading the transport between users and recursive resolvers to use encryption. Upgrading the transport protocol between resolvers and authoritative servers is a bit more complicated as it requires a new discovery mechanism to instruct the resolver when to (and when not to use) a more secure channel.  Aside from a few examples like our work with Facebook to encrypt recursive-to-authoritative traffic with DNS-over-TLS, most of these exchanges still happen over UDP. This is the core issue that enables this new attack on DNS, and one that we’ve seen before.

Kaminsky’s Attack

Prior to 2008, recursive resolvers typically used a single open port (usually port 53) to send and receive messages to authoritative nameservers. This made guessing the source port trivial, so the only variable an attacker needed to guess to forge a response to a query was the 16-bit message ID. The attack Kaminsky described was relatively simple: whenever a recursive resolver queried the authoritative name server for a given domain, an attacker would flood the resolver with DNS responses for some or all of the 65 thousand or so possible message IDs. If the malicious answer with the right message ID arrived before the response from the authoritative server, then the DNS cache would be effectively poisoned, returning the attacker’s chosen answer instead of the real one for as long as the DNS response was valid (called the TTL, or time-to-live).

For popular domains, resolvers contact authoritative servers once per TTL (which can be as short as 5 minutes), so there are plenty of opportunities to mount this attack. Forwarders that cache DNS responses are also vulnerable to this type of attack.

SAD DNS Explained

In response to this attack, DNS resolvers started doing source port randomization and careful checking of the security ranking of cached data. To poison these updated resolvers, forged responses would not only need to guess the message ID, but they would also have to guess the source port, bringing the number of guesses from the tens of thousands to over a trillion. This made the attack effectively infeasible. Furthermore, the IETF published RFC 5452 on how to harden DNS from guessing attacks.

It should be noted that this attack did not work for DNSSEC-signed domains since their answers are digitally signed. However, even now in 2020, DNSSEC is far from universal.

Defeating Source Port Randomization with Fragmentation

Another way to avoid having to guess the source port number and message ID is to split the DNS response in two. As is often the case in computer security, old attacks become new again when attackers discover new capabilities. In 2012, researchers Amir Herzberg and Haya Schulman from Bar Ilan University discovered that it was possible for a remote attacker to defeat the protections provided by source port randomization. This new attack leveraged another feature of UDP: fragmentation. For a primer on the topic of UDP fragmentation, check out our previous blog post on the subject by Marek Majkowski.

The key to this attack is the fact that all the randomness that needs to be guessed in a DNS poisoning attack is concentrated at the beginning of the DNS message (UDP header and DNS header).If the UDP response packet (sometimes called a datagram) is split into two fragments, the first half containing the message ID and source port and the second containing part of the DNS response, then all an attacker needs to do is forge the second fragment and make sure that the fake second fragment arrives at the resolver before the true second fragment does. When a datagram is fragmented, each fragment is assigned a 16-bit IDs (called IP-ID), which is used to reassemble it at the other end of the connection. Since the second fragment only has the IP-ID as entropy (again, this is a familiar refrain in this area), this attack is feasible with a relatively small number of forged packets. The downside of this attack is the precondition that the response must be fragmented in the first place, and the fragment must be carefully altered to pass the original section counts and UDP checksum.

SAD DNS Explained

Also discussed in the original and follow-up papers is a method of forcing two remote servers to send packets between each other which are fragmented at an attacker-controlled point, making this attack much more feasible. The details are in the paper, but it boils down to the fact that the control mechanism for describing the maximum transmissible unit (MTU) between two servers -- which determines at which point packets are fragmented -- can be set via a forged UDP packet.

SAD DNS Explained

We explored this risk in a previous blog post in the context of certificate issuance last year when we introduced our multi-path DCV service, which mitigates this risk in the context of certificate issuance by making DNS queries from multiple vantage points. Nevertheless, fragmentation-based attacks are proving less and less effective as DNS providers move to eliminate support for fragmented DNS packets (one of the major goals of DNS Flag Day 2020).

Defeating Source Port Randomization via ICMP error messages

Another way to defeat the source port randomization is to use some measurable property of the server that makes the source port easier to guess. If the attacker could ask the server which port number is being used for a pending query, that would make the construction of a spoofed packet much easier. No such thing exists, but it turns out there is something close enough - the attacker can discover which ports are surely closed (and thus avoid having to send traffic). One such mechanism is the ICMP “port unreachable” message.

Let’s say the target receives a UDP datagram destined for its IP and some port, the datagram either ends up either being accepted and silently discarded by the application, or rejected because the port is closed. If the port is closed, or more importantly, closed to the IP address that the UDP datagram was sent from, the target will send back an ICMP message notifying the attacker that the port is closed. This is handy to know since the attacker now doesn’t have to bother trying to guess the pending message ID on this port and move to other ports. A single scan of the server effectively reduces the search space of valid UDP responses from 232 (over a trillion) to 217 (around a hundred thousand), at least in theory.

This trick doesn’t always work. Many resolvers use “connected” UDP sockets instead of “open” UDP sockets to exchange messages between the resolver and nameserver. Connected sockets are tied to the peer address and port on the OS layer, which makes it impossible for an attacker to guess which “connected” UDP sockets are established between the target and the victim, and since the attacker isn’t the victim, it can’t directly observe the outcome of the probe.

To overcome this, the researchers found a very clever trick: they leverage ICMP rate limits as a side channel to reveal whether a given port is open or not. ICMP rate limiting was introduced (somewhat ironically, given this attack) as a security feature to prevent a server from being used as an unwitting participant in a reflection attack. In broad terms, it is used to limit how many ICMP responses a server will send out in a given time period. Say an attacker wanted to scan 10,000 ports and sent a burst of 10,000 UDP packets to a server configured with an ICMP rate limit of 50 per second, then only the first 50 would get an ICMP “port unreachable” message in reply.

Rate limiting seems innocuous until you remember one of the core rules of data security: don’t let private information influence publicly measurable metrics. ICMP rate limiting violates this rule because the rate limiter’s behavior can be influenced by an attacker making guesses as to whether a “secret” port number is open or not.

don’t let private information influence publicly measurable metrics

An attacker wants to know whether the target has an open port, so it sends a spoofed UDP message from the authoritative server to that port. If the port is open, no ICMP reply is sent and the rate counter remains unchanged. If the port is inaccessible, then an ICMP reply is sent (back to the authoritative server, not to the attacker) and the rate is increased by one. Although the attacker doesn’t see the ICMP response, it has influenced the counter. The counter itself isn’t known outside the server, but whether it has hit the rate limit or not can be measured by any outside observer by sending a UDP packet and waiting for a reply. If an ICMP “port unreachable” reply comes back, the rate limit hasn’t been reached. No reply means the rate limit has been met. This leaks one bit of information about the counter to the outside observer, which in the end is enough to reveal the supposedly secret information (whether the spoofed request got through or not).

SAD DNS ExplainedDiagram inspired by original paper‌‌

Concretely, the attack works as follows: the attacker sends a bunch (large enough to trigger the rate limiting) of probe messages to the target, but with a forged source address of the victim. In the case where there are no open ports in the probed set, the target will send out the same amount of ICMP “port unreachable” responses back to the victim and trigger the rate limit on outgoing ICMP messages. The attacker can now send an additional verification message from its own address and observe whether an ICMP response comes back or not. If it does then there was at least one port open in the set and the attacker can divide the set and try again, or do a linear scan by inserting the suspected port number into a set of known closed ports. Using this approach, the attacker can narrow down to the open ports and try to guess the message ID until it is successful or gives up, similarly to the original Kaminsky attack.

In practice there are some hurdles to successfully mounting this attack.

  • First, the target IP, or a set of target IPs must be discovered. This might be trivial in some cases - a single forwarder, or a fixed set of IPs that can be discovered by probing and observing attacker controlled zones, but more difficult if the target IPs are partitioned across zones as the attacker can’t see the resolver egress IP unless she can monitor the traffic for the victim domain.
  • The attack also requires a large enough ICMP outgoing rate limit in order to be able to scan with a reasonable speed. The scan speed is critical, as it must be completed while the query to the victim nameserver is still pending. As the scan speed is effectively fixed, the paper instead describes a method to potentially extend the window of opportunity by triggering the victim's response rate limiting (RRL), a technique to protect against floods of forged DNS queries. This may work if the victim implements RRL and the target resolver doesn’t implement a retry over TCP (A Quantitative Study of the Deployment of DNS Rate Limiting shows about 16% of nameservers implement some sort of RRL).
  • Generally, busy resolvers will have ephemeral ports opening and closing, which introduces false positive open ports for the attacker, and ports open for different pending queries than the one being attacked.

We’ve implemented an additional mitigation to to prevent message ID guessing - if the resolver detects an ID enumeration attempt, it will stop accepting any more guesses and switches over to TCP. This reduces the number of attempts for the attacker even if it guesses the IP address and port correctly, similarly to how the number of password login attempts is limited.


Ultimately these are just mitigations, and the attacker might be willing to play the long game. As long as the transport layer is insecure and DNSSEC is not widely deployed, there will be different methods of chipping away at these mitigations.

It should be noted that trying to hide source IPs or open port numbers is a form of security through obscurity. Without strong cryptographic authentication, it will always be possible to use spoofing to poison DNS resolvers. The silver lining here is that DNSSEC exists, and is designed to protect against this type of attack, and DNS servers are moving to explore cryptographically strong transports such as TLS for communicating between resolvers and authoritative servers.

At Cloudflare, we’ve been helping to reduce the friction of DNSSEC deployment, while also helping to improve transport security in the long run. There is also an effort to increase entropy in DNS messages with RFC 7873 - Domain Name System (DNS) Cookies, and make DNS over TCP support mandatory RFC 7766 - DNS Transport over TCP - Implementation Requirements, with even more documentation around ways to mitigate this type of issue available in different places. All of these efforts are complementary, which is a good thing. The DNS ecosystem consists of many different parties and software with different requirements and opinions, as long as the operators support at least one of the preventive measures, these types of attacks will become more and more difficult.

If you are an operator of an authoritative DNS server, you should consider taking the following steps to protect yourself from this attack:

We’d like to thank the researchers for responsibly disclosing this attack and look forward to working with them in the future on efforts to strengthen the DNS.

Categories: Technology

Automated Origin CA for Kubernetes

Fri, 13/11/2020 - 12:00
Automated Origin CA for KubernetesAutomated Origin CA for Kubernetes

In 2016, we launched the Cloudflare Origin CA, a certificate authority optimized for making it easy to secure the connection between Cloudflare and an origin server. Running our own CA has allowed us to support fast issuance and renewal, simple and effective revocation, and wildcard certificates for our users.

Out of the box, managing TLS certificates and keys within Kubernetes can be challenging and error prone. The secret resources have to be constructed correctly, as components expect secrets with specific fields. Some forms of domain verification require manually rotating secrets to pass. Once you're successful, don't forget to renew before the certificate expires!

cert-manager is a project to fill this operational gap, providing Kubernetes resources that manage the lifecycle of a certificate. Today we're releasing origin-ca-issuer, an extension to cert-manager integrating with Cloudflare Origin CA to easily create and renew certificates for your account's domains.

Origin CA IntegrationCreating an Issuer

After installing cert-manager and origin-ca-issuer, you can create an OriginIssuer resource. This resource creates a binding between cert-manager and the Cloudflare API for an account. Different issuers may be connected to different Cloudflare accounts in the same Kubernetes cluster.

apiVersion: kind: OriginIssuer metadata: name: prod-issuer namespace: default spec: signatureType: OriginECC auth: serviceKeyRef: name: service-key key: key ```

This creates a new OriginIssuer named "prod-issuer" that issues certificates using ECDSA signatures, and the secret "service-key" in the same namespace is used to authenticate to the Cloudflare API.

Signing an Origin CA Certificate

After creating an OriginIssuer, we can now create a Certificate with cert-manager. This defines the domains, including wildcards, that the certificate should be issued for, how long the certificate should be valid, and when cert-manager should renew the certificate.

apiVersion: kind: Certificate metadata: name: example-com namespace: default spec: # The secret name where cert-manager # should store the signed certificate. secretName: example-com-tls dnsNames: - # Duration of the certificate. duration: 168h # Renew a day before the certificate expiration. renewBefore: 24h # Reference the Origin CA Issuer you created above, # which must be in the same namespace. issuerRef: group: kind: OriginIssuer name: prod-issuer

Once created, cert-manager begins managing the lifecycle of this certificate, including creating the key material, crafting a certificate signature request (CSR), and constructing a certificate request that will be processed by the origin-ca-issuer.

When signed by the Cloudflare API, the certificate will be made available, along with the private key, in the Kubernetes secret specified within the secretName field. You'll be able to use this certificate on servers proxied behind Cloudflare.

Extra: Ingress Support

If you're using an Ingress controller, you can use cert-manager's Ingress support to automatically manage Certificate resources based on your Ingress resource.

apiVersion: networking/v1 kind: Ingress metadata: annotations: prod-issuer OriginIssuer name: example namespace: default spec: rules: - host: http: paths: - backend: serviceName: examplesvc servicePort: 80 path: / tls: # specifying a host in the TLS section will tell cert-manager # what DNS SANs should be on the created certificate. - hosts: - # cert-manager will create this secret secretName: example-tls Building an External cert-manager Issuer

An external cert-manager issuer is a specialized Kubernetes controller. There's no direct communication between cert-manager and external issuers at all; this means that you can use any existing tools and best practices for developing controllers to develop an external issuer.

We've decided to use the excellent controller-runtime project to build origin-ca-issuer, running two reconciliation controllers.

Automated Origin CA for KubernetesOriginIssuer Controller

The OriginIssuer controller watches for creation and modification of OriginIssuer custom resources. The controllers create a Cloudflare API client using the details and credentials referenced. This client API instance will later be used to sign certificates through the API. The controller will periodically retry to create an API client; once it is successful, it updates the OriginIssuer's status to be ready.

CertificateRequest Controller

The CertificateRequest controller watches for the creation and modification of cert-manager's CertificateRequest resources. These resources are created automatically by cert-manager as needed during a certificate's lifecycle.

The controller looks for Certificate Requests that reference a known OriginIssuer, this reference is copied by cert-manager from the origin Certificate resource, and ignores all resources that do not match. The controller then verifies the OriginIssuer is in the ready state, before transforming the certificate request into an API request using the previously created clients.

On a successful response, the signed certificate is added to the certificate request, and which cert-manager will use to create or update the secret resource. On an unsuccessful request, the controller will periodically retry.

Learn More

Up-to-date documentation and complete installation instructions can be found in our GitHub repository. Feedback and contributions are greatly appreciated. If you're interested in Kubernetes at Cloudflare, including building controllers like these, we're hiring.

Categories: Technology

UK Black History Month at Cloudflare

Thu, 12/11/2020 - 12:00
UK Black History Month at CloudflareUK Black History Month at Cloudflare

In February 2019, I started my journey at Cloudflare. Back then, we lived in a COVID-19 free world and I was lucky enough, as part of the employee onboarding program, to visit our San Francisco HQ. As I took my first steps into the office, I was greeted by a beautiful bouquet of Protea flowers at the reception desk. Being from South Africa, seeing our national flower instantly made me feel at home and welcomed to the Cloudflare family - this memory will always be with me.

Later that day, I learnt it was Black History Month in the US. This celebration included African food for lunch, highlights of Black History icons on Cloudflare’s TV screens, and African drummers. At Cloudflare, Black History Month is coordinated and run by Afroflare, one of many Employee Resource Groups (ERGs) that celebrates diversity and inclusion. The excellent delivery of Black History Month demonstrated to me how seriously Cloudflare takes Black History Month and ERGs.

Today, I am one of the Afroflare leads in the London office and led this year’s UK Black History Month celebration. 2020 has been a year of historical events, which made this celebration uniquely significant. George Floyd’s murder in the US increased the awareness of the Black Lives Matter movement across the world. The Nigerian #EndSARS movement against police brutality made global headlines and resulted in cyber attacks. The election of US Vice President-Elect Kamala Harris, the first female, African-American, and Asian-American to hold that position.

With the above mentioned, our approach during UK Black History Month was to celebrate the Past, Present, and Future of Black History. The past triumphs, as well as the contributions of people that are of African and Caribbean heritage, were celebrated through social media posts and internal emails, which documented their stories. Black History that is being made every day in the present was highlighted in fireside chats and interviews on Cloudflare TV for everyone to watch - at Cloudflare and the rest of the world. Finally, we wanted to take a look at the Future of rising stars and their actions that would make history by the strides taken today.

To showcase the incredible talent that we have at Cloudflare and highlight present contributions, we hosted knowledge share sessions delivered by experts in Business Development (Stephen Thompson), Customer Development (Jay Henderson), and Customer Success Management (Warren Rickards). All these talks are available as recordings on Cloudflare TV, and we encourage that you give them a watch!

We also had the honour of hosting phenomenal speakers through fireside chats on Cloudflare TV. These individuals are creating a positive impact in their communities today and are shaping the future of technology in Africa. My first chat was with Lungisa Matshoba, the CTO and Founder of Yoco, a payments company in South Africa that makes it easier for small businesses to accept payments and is positively impacting the South African economy.

Next, I spoke with Thando Tetty, the Head of Engineering at Investec UK, who shared stories from his career and the journey of immigrating from Eswatini to South Africa, then to the UK.  Finally, Cloudflare’s CSO and Afroflare’s Executive Advocate, Joe Sullivan, interviewed Ayotunde Coker, Managing Director at Rack Centre in Nigeria, who shared insights on the state of the Internet in Africa and spoke about African innovations contributing to technology at large.

By leveraging Cloudflare's infrastructure and using services like Cloudflare for Teams and Cloudflare TV, we were able to celebrate Black History Month UK for the first time as a company and remotely.

Coming from South Africa, where Black History Month doesn’t exist because Black History is made every day, it was crucial to surface these contributions and look beyond a single month, with the message of hope and by asking how we can celebrate Black History every day.

We aim to leave a legacy of hope and improve diversity and inclusion by believing that anything can be possible when you believe in yourself. In the words of the great Nelson Mandela,

“Sometimes it falls upon a generation to be great. You can be that great generation. Let your greatness blossom.” — London's Trafalgar Square in 2005.
Categories: Technology

My internship: Brotli compression using a reduced dictionary

Wed, 11/11/2020 - 16:32
 Brotli compression using a reduced dictionary

Brotli is a state of the art lossless compression format, supported by all major browsers. It is capable of achieving considerably better compression ratios than the ubiquitous gzip, and is rapidly gaining in popularity. Cloudflare uses the Google brotli library to dynamically compress web content whenever possible. In 2015, we took an in-depth look at how brotli works and its compression advantages.

One of the more interesting features of the brotli file format, in the context of textual web content compression, is the inclusion of a built-in static dictionary. The dictionary is quite large, and in addition to containing various strings in multiple languages, it also supports the option to apply multiple transformations to those words, increasing its versatility.

The open sourced brotli library, that implements an encoder and decoder for brotli, has 11 predefined quality levels for the encoder, with higher quality level demanding more CPU in exchange for a better compression ratio. The static dictionary feature is used to a limited extent starting with level 5, and to the full extent only at levels 10 and 11, due to the high CPU cost of this feature.

We improve on the limited dictionary use approach and add optimizations to improve the compression at levels 5 through 9 at a negligible performance impact when compressing web content.

Brotli Static Dictionary

Brotli primarily uses the LZ77 algorithm to compress its data. Our previous blog post about brotli provides an introduction.

To improve compression on text files and web content, brotli also includes a static, predefined dictionary. If a byte sequence cannot be matched with an earlier sequence using LZ77 the encoder will try to match the sequence with a reference to the static dictionary, possibly using one of the multiple transforms. For example, every HTML file contains the opening <html> tag that cannot be compressed with LZ77, as it is unique, but it is contained in the brotli static dictionary and will be replaced by a reference to it. The reference generally takes less space than the sequence itself, which decreases the compressed file size.

The dictionary contains 13,504 words in six languages, with lengths from 4 to 24 characters. To improve the compression of real-world text and web data, some dictionary words are common phrases ("The current") or strings common in web content (‘type=”text/javascript”’). Unlike usual LZ77 compression, a word from the dictionary can only be matched as a whole. Starting a match in the middle of a dictionary word, ending it before the end of a word or even extending into the next word is not supported by the brotli format.

Instead, the dictionary supports 120 transforms of dictionary words to support a larger number of matches and find longer matches. The transforms include adding suffixes (“work” becomes “working”) adding prefixes (“book” => “ the book”) making the first character uppercase ("process" => "Process") or converting the whole word to uppercase (“html” => “HTML”). In addition to transforms that make words longer or capitalize them, the cut transform allows a shortened match (“consistently” => “consistent”), which makes it possible to find even more matches.


With the transforms included, the static dictionary contains 1,633,984 different words – too many for exhaustive search, except when used with the slow brotli compression levels 10 and 11. When used at a lower compression level, brotli either disables the dictionary or only searches through a subset of roughly 5,500 words to find matches in an acceptable time frame. It also only considers matches at positions where no LZ77 match can be found and only uses the cut transform.

Our approach to the brotli dictionary uses a larger, but more specialized subset of the dictionary than the default, using more aggressive heuristics to improve the compression ratio with negligible cost to performance. In order to provide a more specialized dictionary, we provide the compressor with a content type hint from our servers, relying on the Content-Type header to tell the compressor if it should use a dictionary for HTML, JavaScript or CSS. The dictionaries can be furthermore refined by colocation language in the future.

Fast dictionary lookup

To improve compression without sacrificing performance, we needed a fast way to find matches if we want to search the dictionary more thoroughly than brotli does by default. Our approach uses three data structures to find a matching word directly. The radix trie is responsible for finding the word while the hash table and bloom filter are used to speed up the radix trie and quickly eliminate many words that can’t be matched using the dictionary.

 Brotli compression using a reduced dictionaryLookup for a position starting with “type”

The radix trie easily finds the longest matching word without having to try matching several words. To find the match, we traverse the graph based on the text at the current position and remember the last node with a matching word. The radix trie supports compressed nodes (having more than one character as an edge label), which greatly reduces the number of nodes that need to be traversed for typical dictionary words.

The radix trie is slowed down by the large number of positions where we can’t find a match. An important finding is that most mismatching strings have a mismatching character in the first four bytes. Even for positions where a match exists, a lot of time is spent traversing nodes for the first four bytes since the nodes close to the tree root usually have many children.

Luckily, we can use a hash table to look up the node equivalent to four bytes, matching if it exists or reject the possibility of a match. We thus look up the first four bytes of the string, if there is a matching node we traverse the trie from there, which will be fast as each four-byte prefix usually only has a few corresponding dict words. If there is no matching node, there will not be a matching word at this position and we do not need to further consider it.

While the hash table is designed to reject mismatches quickly and avoid cache misses and high search costs in the trie, it still suffers from similar problems: We might search through several 4-byte prefixes with the hash value of the given position, only to learn that no match can be found. Additionally, hash lookups can be expensive due to cache misses.

To quickly reject words that do not match the dictionary, but might still cause cache misses, we use a k=1 bloom filter to quickly rule out most non-matching positions. In the k=1 case, the filter is simply a lookup table with one bit indicating whether any matching 4-byte prefixes exist for a given hash value. If the hash value for the given bit is 0, there won’t be a match. Since the bloom filter uses at most one bit for each four-byte prefix while the hash table requires 16 bytes, cache misses are much less likely. (The actual size of the structures is a bit different since there are many empty spaces in both structures and the bloom filter has twice as many elements to reject more non-matching positions.)

This is very useful for performance as a bloom filter lookup requires a single memory access. The bloom filter is designed to be fast and simple, but still rejects more than half of all non-matching positions and thus allows us to save a full hash lookup, which would often mean a cache miss.


To improve the compression ratio without sacrificing performance, we employed a number of heuristics:

Only search the dictionary at some positions
This is also done using the stock dictionary, but we search more aggressively. While the stock dictionary only considers positions where the LZ77 match finder did not find a match, we also consider positions that have a bad match according to the brotli cost model: LZ77 matches that are short or have a long distance between the current position and the reference usually only offer a small compression improvement, so it is worth trying to find a better match in the static dictionary.

Only consider the longest match and then transform it
Instead of finding and transforming all matches at a position, the radix trie only gives us the longest match which we then transform. This approach results in a vast performance improvement. In most cases, this results in finding the best match.

Only include some transforms
While all transformations can improve the compression ratio, we only included those that work well with the data structures. The suffix transforms can easily be applied after finding a non-transformed match. For the upper case transforms, we include both the non-transformed and the upper case version of a word in the radix trie. The prefix and cut transforms do not play well with the radix trie, therefore a cut of more than 1 byte and prefix transforms are not supported.

Generating the reduced dictionary

At low compression levels, brotli searches a subset of ~5,500 out of 13,504 words of the dictionary, negatively impacting compression. To store the entire dictionary, we would need to store ~31,700 words in the trie considering the upper case transformed output of ASCII sequences and ~11,000 four-byte prefixes in the hash. This would slow down hash table and radix trie, so we needed to find a different subset of the dictionary that works well for web content.

For this purpose, we used a large data set containing representative content. We made sure to use web content from several world regions to reflect language diversity and optimize compression. Based on this data set, we identified which words are most common and result in the largest compression improvement according to the brotli cost model. We only include the most useful words based on this calculation. Additionally, we remove some words if they slow down hash table lookups of other, more common words based on their hash value.

We have generated separate dictionaries for HTML, CSS and JavaScript content and use the MIME type to identify the right dictionary to use. The dictionaries we currently use include about 15-35% of the entire dictionary including uppercase transforms. Depending on the type of data and the desired compression/speed tradeoff, different options for the size of the dictionary can be useful. We have also developed code that automatically gathers statistics about matches and generates a reduced dictionary based on this, which makes it easy to extend this to other textual formats, perhaps data that is majority non-English or XML data and achieve better results for this type of data.


We tested the reduced dictionary on a large data set of HTML, CSS and JavaScript files.

The improvement is especially big for small files as the LZ77 compression is less effective on them. Since the improvement on large files is a lot smaller, we only tested files up to 256KB. We used compression level 5, the same compression level we currently use for dynamic compression on our edge, and tested on a Intel Core i7-7820HQ CPU.

Compression improvement is defined as 1 - (compressed size using the reduced dictionary / compressed size without dictionary). This ratio is then averaged for each input size range. We also provide an average value weighted by file size. Our data set mirrors typical web traffic, covering a wide range of file sizes with small files being more common, which explains the large difference between the weighted and unweighted average.

 Brotli compression using a reduced dictionary

With the improved dictionary approach, we are now able to compress HTML, JavaScript and CSS files as well, or sometimes even better than using a higher compression level would allow us, all while using only 1% to 3% more CPU. For reference using compression level 6 over 5 would increase CPU usage by up to 12%.

Categories: Technology

Tech Leaders on the Future of Remote Work

Wed, 11/11/2020 - 13:17
Tech Leaders on the Future of Remote Work

Dozens of top leaders and thinkers from the tech industry and beyond recently joined us for a series of fireside chats commemorating Cloudflare’s 10th birthday. Over the course of 24 hours of conversation, many of these leaders touched on how the workplace has evolved during the pandemic, and how these changes will endure into the future.

Here are some of the highlights.

On the competition for talent

Stewart Butterfield
Co-founder and CEO, Slack

Tech Leaders on the Future of Remote Work

The thing that I think people don't appreciate or realize is that this is not a choice that companies are really going to make on an individual basis. I've heard a lot of leaders say, “we're going back to the office after the summer.”

If we say we require you to be in the office five days a week and, you know, Twitter doesn't, Salesforce doesn't — and those offers are about equal — they'll take those ones. I think we would also lose existing employees if they didn't believe that they had the flexibility. Once you do that, it affects the market for talent. If half of the companies support distributed work or flexible hours and flexible time in the office, you can compensate for that, but I think you’ve got to pay a lot more or something like that because that optionality is valuable to people.

Watch the full interview

On the harnessing the benefits of remote work

Hayden Brown
President and CEO, Upwork

Tech Leaders on the Future of Remote Work

I think a lot of these things are here to stay. What's fleeting is this idea that we have our children home from school and we don't have a social system around child care and things like that, because that's not sustainable.

What's here to stay are really companies finding, and workers finding, a new balance. It's not about, “let's all lock ourselves in our homes forever.” This is about being very intentional. How can we be intentional to really recognize the benefits that a distributed, more work from home-oriented culture and set of practices can give workers and businesses?

Those benefits include some very powerful tools towards addressing some of the diversity challenges that all of our companies face, because it suddenly opens up pools of talent that we can tap into, outside of the places where we've traditionally hired, and we can tap into those people — and they're not second class citizens, because they're not the only ones working remotely while everyone else is back at the office.

Watch the full interview

On capturing the serendipity of in-person meetings

Brett Hautop
VP of Global Design + Build at LinkedIn

Tech Leaders on the Future of Remote Work

That might be the single hardest thing to figure out. Because the big decision that's made right after the meeting, after you heard everything but you wanted to say it to one person and not everybody else. Or the thing that happens serendipitously on the way into a meeting, just because you're talking about your weekends and then you remember something — that is really unfair to the people who are on the team (working remotely).

And unless you go back to technology like the telepresence person driving around, or each of us having our own drone in the office that follows people around serving as my ability to see — these creepy things — it's really hard to recreate. So it's about changing a cultural norm and getting people to be more thoughtful about how to include people who aren't there, to go out of their way to include them. And that's something that could take years for us to teach ourselves.

Watch the full interview

On securing a hybrid work environment

Chris Young
Former CEO, McAfee

Tech Leaders on the Future of Remote Work

We saw a huge rise in phishing attacks that were directly correlated to the move to work from home. Cyber attackers understand that all of a sudden you've got probably millions of workers across different organizations that are not supervised in the same way — new systems, new protocols for how they work. And they preyed upon that very quickly… there's a whole litany of attacks that have been levied against the work from home model.

It's prudent to make sure that if you're going to have people working from home, that you take some steps to protect the home networking infrastructure because we could find ourselves in a situation where, if we don't pay attention to that over the long run, you start to see an uptick of attackers going after the home networking infrastructure. We always know the attackers will find the path of least resistance. It's like water on a roof: it will find the hole and go right there.

And I think it means a few things for us in a cybersecurity landscape. I think it's going to continue to shift and put a premium on the identity based architecture. The zero trust model authentication is going to be key. It's really the combination of: can I trust the user and can I trust the device in order to make a decision of do I trust this session? Do I trust this transaction?

Watch the full interview

On the opportunity for digital transformation

Bret Taylor
President and Chief Operating Officer at Salesforce

Tech Leaders on the Future of Remote Work

I hear across every industry that people aren't going to come back to the office full time. Maybe they'll come in a couple days a week. But that means our offices are probably going to be a little bit more for on-sites than they are for desks. And I think about: how does that change the shape of our employee engagement? And more importantly, how does it change the shape of our business models?

I think that the companies who were treating their digital initiatives as something sort of on the side are probably suffering right now. And there's an urgency around these shifts now that is more powerful than ever before.

I think a lot of these trends will remain. And that’s where the opportunity is for great companies, whether it's technology companies or other companies, who will lean into these changes and transform themselves. I think the ones that do will benefit from it. And I think there's going to be a lot of business model disruption and technology disruption coming out of this.

Watch the full interview

*Quotes have been lightly edited for clarity and length.

Want to watch more interviews and catch up on all of the announcements from Cloudflare during Birthday Week? Visit Cloudflare Birthday Week 2020

Categories: Technology

Bienvenue Cloudflare France! Why I’m helping Cloudflare grow in France

Tue, 10/11/2020 - 09:01
Bienvenue Cloudflare France!
Why I’m helping Cloudflare grow in France

If you'd like to read this post in French click here.

Bienvenue Cloudflare France!
Why I’m helping Cloudflare grow in France

I am incredibly excited to announce that I have joined Cloudflare as its Head of France to help build a better Internet and expand the company’s growing customer base in France. This is an important milestone for Cloudflare as we continue to grow our presence in Europe. Alongside our London, Munich, and Lisbon offices, Paris marks the fourth Cloudflare office in the EMEA region. With this, we’ll be able to further serve our customers’ demand, recruit local talent, and build on the successes we’ve had in our other offices around the globe. I have been impressed by what Cloudflare has built in EMEA including France, and I am even more excited by what lies ahead for our customers, partners, and employees.

Born in Paris and raised in Paris, Normandie and Germany, I started my career more than 20 years ago. While a teenager, I had the chance to work on one of the first Apple IIe’s available in France. I have always had a passion for technology and continue to be amazed by the value of its adoption with businesses large and small. In former roles as Solution Engineer to Account Manager, Partner Director to Sales Director, and more recently Country Manager—I’ve had the chance to manage different sizes of businesses and teams, and am passionate about seeking out and providing the best solutions and value to customers and their challenging yet unique needs.

In 2011, I opened the Amazon Web Services office in France. Over the last nine years, I have advised and helped a large number of companies, across varying industries and sizes, move from on-premise infrastructure to cloud and SaaS architectures. I have seen that this major and inevitable transition has increased, exponentially, the complexity of architecture with heterogeneous infrastructure environments across public cloud, on-premise, and hybrid deployments. The threat landscape, functional requirements, and scale of business applications have evolved faster than ever before, and the volume and sophistication of network attacks can strain the defensive capabilities of even the most advanced enterprises. This is forcing a major architectural shift in how enterprises address security, performance, and reliability at the network layer.

Today, companies’ digital assets (web properties, applications, APIs, and on) have become their most valuable asset. How organizations are able to use the Internet to serve their customers, partners, and employees—is now a strategic priority for organizations around the world. Cloudflare is leading this transition.

Why Cloudflare?

Here are four reasons why I’m joining and embarking on this amazing journey.

  • Cloudflare’s customer base and growth: I have been impressed with the growth, technology, and pace of adoption behind the company’s suite of products. Cloudflare is servicing Internet properties of more than 3.2 million customers that are relying on us around the world, including approximately 16 percent of the Fortune 1000 companies. From the public sector to enterprises to startups—companies of all sizes and types are being powered by these critical security, performance, and reliability services. Every day thousands of new customers sign up for Cloudflare services.
  • Cloudflare’s Global Network: I discovered early on that Cloudflare is powered by its global network that is always learning and growing. This means, as companies grow and expand, Cloudflare will be able to help them scale and support their growth. This network spans more than 200 cities in over 100 countries. With more than 1 billion unique IP addresses passing through it every day, it works as an immune system continuously learning and adapting to new threats, as well as optimizing itself which benefits all of Cloudflare customers and users worldwide. Cloudflare’s network operates within 100 milliseconds of 99% of the Internet-connected population in the developed world (for context, the blink of an eye is 300-400 milliseconds!). What’s more, this network blocked on average 76 billion cyber threats each day last quarter.
  • Cloudflare’s technology and pace of innovation: At Cloudflare, the pace of innovation has stunned me. Leveraging its unique global network, the company is continuously releasing new products and features in the cloud that are available at a massive scale—worldwide to its customers and users. I discovered some products which are disrupting traditional IT approaches. To name a few: Cloudflare One, a platform to connect and secure companies and teams anywhere (remote and across offices) and on any device; Cloudflare Workers, a serverless solution redefining how applications are deployed at the network edge; Magic Transit, which delivers the power of Cloudflare services for your on-premise, cloud-hosted, and hybrid networks; Argo Smart Routing which acts as Waze for the Internet, can significantly cut the amount of time users online spend waiting for content; and Cloudflare Web Analytics, a privacy-first solution to give marketers and web creators the information they need in a simple, clean way that doesn't sacrifice visitor privacy.
  • The company’s culture. During the interview process, I had the chance to meet many Cloudflare employees including some of the leadership team. I met a very diverse team of incredibly smart, curious, kind, and committed people. I was impressed by the builder mindset in all of the people I talked to, and all are truly passionate about the Cloudflare mission. I also loved the culture of openness, collaboration, and transparency—which aligns with the values I have embraced since I started my career. This wider Cloudflare mission has resonated with me: to help build a better Internet. In doing this, we provide organizations with powerful technologies that, previously, could only be used by those that could afford those large expenses and complexities to implement and maintain.
Cloudflare in France

In France, you can find a vibrant startup ecosystem, large enterprises, and a very active SMB business environment. Cloudflare has had customers in France from the very early days and today we have thousands of French customers spanning the country from not only startups, to SMBs and enterprises, but also government, education, and non-profit organizations. More than 25 percent of the CAC 40 are using Cloudflare services. Major French enterprises such as L’Oréal, Solocal, Criteo, Allianz France, DPD Group (le Groupe LaPoste), and more are protecting and accelerating their Internet properties with Cloudflare services. In addition, more than 30 percent of the Next40 are equipped with Cloudflare’s Internet security, performance, and reliability solutions—such as Back Market, Happn, Wildmoka, and SendinBlue. We take pride in being relied on by these organizations and are eager to help more French companies grow.

Bienvenue Cloudflare France!
Why I’m helping Cloudflare grow in FranceLooking ahead

Since the beginning of the year, the rise in remote work, cyber threats, and stress on online assets has generated an even greater need to provide secure, fast, and reliable Internet services. This goes for employees, customers, and partners—of any organization. As a result, this demand has never been so critical. We’re here to work with all types of customers. If you are a business, a public sector organisation, an NGO—or anyone that has cybersecurity, performance, or reliability challenges or questions—get in touch with us. We’d love to explore how we can help. If you are a system integrator, consulting company, MSP, and so on—let’s explore a partnership on how we may be able to help you accelerate your business.

If you are interested in joining Cloudflare and helping to build a more secure, fast, and reliable Internet—please explore our open positions and select Paris, France as the location. We are hiring talented people locally and globally, now building our initial team of Account Executives, Channel Managers, Business Development Representatives, Solution Engineers, Customer Success Managers, and more—to further serve our current customers and grow with more organizations in France.
It is a great honour for me to be part of the Cloudflare family, to help build Cloudflare’s future in France, and help French organizations grow. Feel free to reach out to me at

Categories: Technology

Announcing Spectrum DDoS Analytics and DDoS Insights & Trends

Sat, 07/11/2020 - 12:00
Announcing Spectrum DDoS Analytics and DDoS Insights & TrendsAnnouncing Spectrum DDoS Analytics and DDoS Insights & Trends

We’re excited to announce the expansion of the Network Analytics dashboard to Spectrum customers on the Enterprise plan. Additionally, this announcement introduces two major dashboard improvements for easier reporting and investigation.

Network Analytics

Cloudflare's packet and bit oriented dashboard, Network Analytics, provides visibility into Internet traffic patterns and DDoS attacks in Layers 3 and 4 of the OSI model. This allows our users to better understand the traffic patterns and DDoS attacks as observed at the Cloudflare edge.

When the dashboard was first released in January, these capabilities were only available to Bring Your Own IP customers on the Spectrum and Magic Transit services, but now Spectrum customers using Cloudflare’s Anycast IPs are also supported.

Protecting L4 applications

Spectrum is Cloudflare’s L4 reverse-proxy service that offers unmetered DDoS protection and traffic acceleration for TCP and UDP applications. It provides enhanced traffic performance through faster TLS, optimized network routing, and high speed interconnection. It also provides encryption to legacy protocols and applications that don’t come with embedded encryption. Customers who typically use Spectrum operate services in which network performance and resilience to DDoS attacks are of utmost importance to their business, such as email, remote access, and gaming.

Spectrum customers can now view detailed traffic reports on DDoS attacks on their configured TCP/ UDP applications, including size of attacks, attack vectors, source location of attacks, and permitted traffic. What’s more, users can also configure and receive real-time alerts when their services are attacked.

Network Analytics: RebootedAnnouncing Spectrum DDoS Analytics and DDoS Insights & Trends

Since releasing the Network Analytics dashboard in January, we have been constantly improving its capabilities. Today, we’re announcing two major improvements that will make both reporting and investigation easier for our customers: DDoS Insights & Trend and Group-by Filtering for grouping-based traffic analysis.

DDoS Trends Insights

First and foremost, we are adding a new DDoS Insights & Trends card, which provides dynamic insights into your attack trends over time. This feature provides a real-time view of the number of attacks, the percentage of attack traffic, the maximum attack rates, the total mitigated bytes, the main attack origin country, and the total duration of attacks, which can indicate the potential downtime that was prevented. These data points were surfaced as the most crucial ones by our customers in the feedback sessions. Along with the percentage of change period-over-period, our customers can easily understand how their security landscape evolves.

Announcing Spectrum DDoS Analytics and DDoS Insights & TrendsTrends InsightsTroubleshooting made easy

In the main time series chart seen in the dashboard, we added an ability for users to change the Group-by field which enables users to customize the Y axis. This way, a user can quickly identify traffic anomalies and sudden changes in traffic based on criteria such as IP protocols, TCP flags, source country, and take action if needed with Magic Firewall, Spectrum or BYOIP.

Announcing Spectrum DDoS Analytics and DDoS Insights & TrendsTime Series Group-By FilteringHarnessing Cloudflare’s edge to empower our users

The DDoS Insights & Trends, the new investigation tools and the additional user interface enhancements can assist your organization to better understand your security landscape and take more meaningful actions as needed. We have more updates in Network Analytics dashboard, which are not covered in the scope of this post, including:

  • Export logs as a CSV
  • Zoom-in feature in the time series chart
  • Drop-down view option for average rate and total volume
  • Increased Top N views for source and destination values
  • Addition of country and data center for source values
  • New visualisation of the TCP flag distribution

Details on these updates can be found in our Help Center, which you can now access via the dashboard as well.

In the near future, we will also expand Network Analytics to Spectrum customers on the Business plan, and WAF customers on the Enterprise and Business plans. Stay tuned!

If you are a customer in Magic Transit, Spectrum or BYOIP, go try out the Network Analytics dashboard yourself today.

If you operate your own network, try Cloudflare Magic Transit for free with a limited time offer:

Categories: Technology

Fall 2020 RPKI Update

Fri, 06/11/2020 - 12:36
Fall 2020 RPKI Update

The Internet is a network of networks. In order to find the path between two points and exchange data, the network devices rely on the information from their peers. This information consists of IP addresses and Autonomous Systems (AS) which announce the addresses using Border Gateway Protocol (BGP).

One problem arises from this design: what protects against a malevolent peer who decides to announce incorrect information? The damage caused by route hijacks can be major.

Routing Public Key Infrastructure (RPKI) is a framework created in 2008. Its goal is to provide a source of truth for Internet Resources (IP addresses) and ASes in signed cryptographically signed records called Route Origin Objects (ROA).

Recently, we’ve seen the significant threshold of two hundred thousands of ROAs being passed. This represents a big step in making the Internet more secure against accidental and deliberate BGP tampering.

We have talked about RPKI in the past but we thought it would be a good time for an update.

In a more technical context, the RPKI framework consists of two parts:

  • IP addresses need to be cryptographically signed by their owners in a database managed by a Trust Anchor: Afrinic, APNIC, ARIN, LACNIC and RIPE. Those five organizations are in charge of allocating Internet resources. The ROA indicates which Network Operator is allowed to announce the addresses using BGP.
  • Network operators download the list of ROAs, perform the cryptographic checks and then apply filters on the prefixes they receive: this is called BGP Origin Validation.
The “Is BGP Safe Yet” website

The launch of the website to test if your ISP correctly performs BGP Origin Validation was a success. Since launch, it has been visited more than five million times from over 223 countries and 13,000 unique networks (20% of the entire Internet), generating half a million BGP Origin Validation tests.

Many providers subsequently indicated on social media (for example, here or here) that they had an RPKI deployment in the works. This increase in Origin Validation by networks is increasing the security of the Internet globally.

The site’s test for Origin Validation consists of queries toward two addresses, one of which is behind an RPKI invalid prefix and the other behind an RPKI valid prefix. If the query towards the invalid succeeds, the test fails as the ISP does not implement Origin Validation. We counted the number of queries that failed to reach This also included a few thousand RIPE Atlas tests that were started by Cloudflare and various contributors, providing coverage for smaller networks.

Every month since launch we’ve seen that around 10 to 20 networks are deploying RPKI Origin Validation. Among the major providers we can build the following table:

Month Networks August Swisscom (Switzerland), Salt (Switzerland) July Telstra (Australia), Quadranet (USA), Videotron (Canada) June Colocrossing (USA), Get Norway (Norway), Vocus (Australia), Hurricane Electric (Worldwide), Cogent (Worldwide) May Sengked Fiber (Indonesia), (France), WebAfrica Networks (South Africa), CableNet (Cyprus), IDnet (Indonesia), Worldstream (Netherlands), GTT (Worldwide)

With the help of many contributors, we have compiled a list of network operators and public statements at the top of the page.

We excluded providers that manually blocked the traffic towards the prefix instead of using RPKI. Among the techniques we see are firewall filtering and manual prefix rejection. The filtering is often propagated to other customer ISPs. In a unique case, an ISP generated a “more-specific” blackhole route that leaked to multiple peers over the Internet.

The deployment of RPKI by major transit providers, also known as Tier 1, such as Cogent, GTT, Hurricane Electric, NTT and Telia made many downstream networks more secure without them having them deploying validation software.

Overall, we looked at the evolution of the successful tests per ASN and we noticed a steady increase over the recent months of 8%.

Fall 2020 RPKI Update

Furthermore, when we probed the entire IPv4 space this month, using a similar technique to the test, many more networks were not able to reach an RPKI invalid prefix than compared to the same period last year. This confirms an increase of RPKI Origin Validation deployment across all network operators. The picture below shows the IPv4 space behind a network with RPKI Origin Validation enabled in yellow and the active space in blue. It uses a Hilbert Curve to efficiently plot IP addresses: for example one /20 prefix (4096 IPs) is a pixel, a /16 prefix (65536 IPs) will form a 4x4 pixels square.

The more the yellow spreads, the safer the Internet becomes.

Fall 2020 RPKI Update

What does it mean exactly? If you were hijacking a prefix, the users behind the yellow space would likely not be affected. This also applies if you miss-sign your prefixes: you would not be able to reach the services or users behind the yellow space. Once RPKI is enabled everywhere, there will only be yellow squares.

Progression of signed prefixes

Owners of IP addresses indicate the networks allowed to announce them. They do this by signing prefixes: they create Route Origin Objects (ROA). As of today, there are more than 200,000 ROAs. The distribution shows that the RIPE region is still leading in ROA count, then followed by the APNIC region.

Fall 2020 RPKI Update

2020 started with 172,000 records and the count is getting close to 200,000 at the beginning of November, approximately a quarter of all the Internet routes. Since last year, the database of ROAs grew by more than 70 percent, from 100,000 records, an average pace of 5% every month.

On the following graph of unique ROAs count per day, we can see two points that were followed by a change in ROA creation rate: 140/day, then 231/day, and since August, 351 new ROAs per day.

It is not yet clear what caused the increase in August.

Fall 2020 RPKI UpdateFree services and software

In 2018 and 2019, Cloudflare was impacted by BGP route hijacks. Both could have been avoided with RPKI. Not long after the first incident, we started signing prefixes and developing RPKI software. It was necessary to make BGP safer and we wanted to do more than talk about it. But we also needed enough networks to be deploying RPKI as well. By making deployment easier for everyone, we hoped to increase adoption.

The following is a reminder of what we built over the years around RPKI and how it grew.

OctoRPKI is Cloudflare’s open source RPKI Validation software. It periodically generates a JSON document of validated prefixes that we pass onto our routers using GoRTR. It generates most of the data behind the graphs here.

The latest version, 1.2.0, of OctoRPKI was released at the end of October. It implements important security fixes, better memory management and extended logging. This is the first validator to provide detailed information around cryptographically invalid records into Sentry and performance data in distributed tracing tools.
GoRTR remains heavily used in production, including by transit providers. It can natively connect to other validators like rpki-client.

When we released our public rpki.json endpoint in early 2019, the idea was to enable anyone to see what Cloudflare was filtering.

The file is also used as a bootstrap by GoRTR, so that users can test a deployment. The file is cached on more than 200 data centers, ensuring quick and secure delivery of a list of valid prefixes, making RPKI more accessible for smaller networks and developers.

Between March 2019 and November 2020, the number of queries more than doubled and there are five times more networks querying this file.

The growth of queries follows approximately the rate of ROA creation (~5% per month).

Fall 2020 RPKI Update

A public RTR server is also available on It includes a plaintext endpoint on port 8282 and an SSH endpoint on port 8283. This allows us to test new versions of GoRTR before release.

Later in 2019, we also built a public dashboard where you can see in-depth RPKI validation. With a GraphQL API, you can now explore the validation data, test a list of prefixes, or see the status of the current routing table.

Fall 2020 RPKI Update

Currently, the API is used by BGPalerter, an open-source tool that detects routing issues (including hijacks!) from a stream of BGP updates.

Additionally, starting in November, you can access the historical data from May 2019. Data is computed daily and contains the unique records. The team behind the dashboard worked hard to provide a fast and accurate visualization of the daily ROA changes and the volumes of files changed over the day.

Fall 2020 RPKI UpdateThe future

We believe RPKI is going to continue growing, and we would like to thank the hundreds of network engineers around the world who are making the Internet routing more secure by deploying RPKI.

25% of routes are signed and 20% of the Internet is doing origin validation and those numbers grow everyday. We believe BGP will be safer before reaching 100% of deployment; for instance, once the remaining transit providers enable Origin Validation, it is unlikely a BGP hijack will make it to the front page of world news outlets.

While difficult to quantify, we believe that critical mass of protected resources will be reached in late 2021.

We will keep improving the tooling; OctoRPKI and GoRTR are open-source and we welcome contributions. In the near future, we plan on releasing a packaged version of GoRTR that can be directly installed on certain routers. Stay tuned!

Categories: Technology

ClickHouse Capacity Estimation Framework

Thu, 05/11/2020 - 14:12
ClickHouse Capacity Estimation Framework

We use ClickHouse widely at Cloudflare. It helps us with our internal analytics workload, bot management, customer dashboards, and many other systems. For instance, before Bot Management can analyze and classify our traffic, we need to collect logs. The Firewall Analytics tool needs to store and query data somewhere too. The same goes for our new Cloudflare Radar project. We are using ClickHouse for this purpose. It is a big database that can store huge amounts of data and return it on demand. This is not the first time we have talked about ClickHouse, there is a dedicated blogpost on how we introduced ClickHouse for HTTP analytics.

Our biggest cluster has more than 100 nodes, another one about half that number. Besides that, we have over 20 clusters that have at least three nodes and the replication factor of three. Our current insertion rate is about 90M rows per second.

We use the standard approach in ClickHouse schema design. At the top level we have clusters, which hold shards, a group of nodes, and a node is a physical machine. You can find technical characteristics of the nodes here. Stored data is replicated between clusters. Different shards hold different parts of the data, but inside of each shard replicas are equal.

Schema of one cluster:

ClickHouse Capacity Estimation FrameworkCapacity planning

As engineers, we periodically face the question of how many additional nodes we have to order to support the growing demand for the next X months, with disk space as our prime concern.

ClickHouse stores extensive information in system tables about the operating processes, which is helpful. From the early days of using ClickHouse we added clickhouse_exporter as part of our monitoring stack. One of the metrics we are interested in is exposed from the table. Roughly speaking, clickhouse_exporter runs SQL queries asking how many bytes are used by each table. After that, these metrics are sent from Prometheus to Thanos and stored for at least a year.

Every time we wanted to make a forecast of disk usage we queried Thanos for historical data using this expression:

sum by (table) (sum(table_parts_bytes{cluster="{cluster}"}))

After that, we uploaded data in dataframes to a Jupyter notebook.

There were a few problems with this approach. Only a few people knew where the notebooks were and how to get them running. It wasn't trivial to download historical data. And most importantly, it was hard to look at past predictions and assess whether or not they were correct since results were not stored anywhere except internal blog posts. Also, as the number and size of clusters and products grew it became impossible for a single team to work on capacity planning and we needed to get engineers building products involved as they have the most context on how the growth will change in the future.

We wanted to automate this process and made calculations more transparent for our colleagues, including those who use ClickHouse for their services. Honestly, at the beginning we weren’t sure if it was even possible and what we would get out of it.

Finding the right metrics

The crucial moment of adding new nodes for us is a disk space, so this was a place to start. We decided to use, as we used it before with the manual approach.

Luckily, we started doing it for the cluster that had recently changed its topology. That cluster had two shards with four and five nodes in every shard. After the topology change, it was replaced with three shards and three nodes in every shard, but the number of machines and unreplicated data on the disks remained the same. However, it had an impact on our metrics: we previously had four replicated nodes in one shard and five replicated in another, we took one node off from the first shard and two nodes from the second and created a new one based on these three nodes. The new shard was empty, so we just added it, but the total amount of data in the first and the second shards was less as the count of the remaining nodes.

You can see on the graph below in April we had this sharp decrease caused by topology changes. We got ~550T instead of ~850T among all shards and replicas.

ClickHouse Capacity Estimation Framework

When we tried to train our model based on the real data due to the April drop it thought we had a downward trend. Which was incorrect as we only dropped replicated data. The trend for unreplicated data hadn’t changed. So we decided to take into account only unreplicated data. It saved us from the topology change and node replacement in case of problems with hardware.

The rule that we use for metrics now is:

sum by(cluster) ( max by (cluster, shardgroup) ( node_clickhouse_shardgroupinfo{} * on (instance) group_right (cluster, shardgroup) sum(table_parts_bytes{cluster="%s"}) by (instance) ))

We continue using from clickhouse_exporter, but instead of counting the whole amount of data we use the maximum of unreplicated data from every shard.

In the image below there is the same cluster as in the image above but instead of counting the whole amount of data we look at unreplicated data from all shards. You can clearly see that we continued to grow and didn’t have any drop in data.

ClickHouse Capacity Estimation Framework

Another problem we faced was that we migrated some tables from one cluster to another because we were running out of space and it required immediate action. However, our model didn’t know that part of the tables didn’t live there anymore, and we didn’t want them to be a part of the prediction. To solve this problem we queried Prometheus to get the list of the tables that existed at the prediction time, then filtered historical data to include only these tables and used them as the input for training a model.

Load of metrics

After determining the correct metrics, we needed to obtain them for our forecasting procedure. Our long-term metrics solution, Thanos, stores billions of data points. Querying it for a cluster with over one hundred nodes even for one day takes a huge amount of time, and we needed these data points for a year.

As we planned to use Python we wrote a small client using aiohttp that concurrently sends HTTP requests to Thanos. The requests are sent in chunks, and every request has start/end dates with a difference of one hour. We needed to get the data for the whole year once and then append new ones day by day. We got csv files: one file for one cluster. The client became a part of the project, and it runs once a day, queries Thanos for new metrics (previous day) and appends data to the files.

Forecasting procedure

At this point, we have collected metrics in files, now it’s time to make a forecast. We needed something for time-series metrics, so we chose Prophet from Facebook. It’s very simple to use, you can follow the documentation and get good results even with the default parameters.

One challenge we faced using Prophet was the need to feed it one data point for a day. In the metric files we have thousands of those for every day. It looks logical to take the point at the end of every day, but it’s not really true. All tables have a retention period, the time for how long we store data in ClickHouse. We don’t know when the data is cleared, it happens gradually throughout the day. So, we decided to take the maximum number for a day.

Drawing Graphs

We chose Grafana to present results, though we needed to store predicted data points somewhere. The first thought was to use Prometheus, but because of high cardinality, we had about 300,000 points in summary for clusters and tables, so we passed. We decided to use ClickHouse itself. We wanted to have both graphs, real and predicted, on the same dashboard. We had real data points in Prometheus and with mixed data source could do this. However, the problem was the same as the loading of metrics into files, for some clusters it’s impossible to obtain metrics for a long period of time. We added functionality to upload real metrics in ClickHouse as well, now both real and predicted metrics are displayed in Grafana, taken from ClickHouse.


This is what we have in Grafana:

  • The yellow line shows the real data;
  • The green line was created based on Prophet output;
  • The red line - the maximum disk capacity. We already increased it twice.
ClickHouse Capacity Estimation Framework

We have a service running in Kubernetes that does all the toil, and we created an environment for other metrics. We have the place where we collect metrics from Thanos and expose them in the required format to Grafana. If we find the right metrics for accounting other resources like IO, CPU or other systems like Kafka we can easily add them to our framework. We can easily replace Prophet with another algorithm, and we can go back months and evaluate how close we were in our prediction according to the real data.

With this automation we were able to spot we were going out of disk space for a couple of clusters which we didn’t expect. We have over 20 clusters and have updates for all of them every day. This dashboard is used not only by our colleagues who are direct customers of ClickHouse but by the team who makes a plan for buying servers. It is easy to read and costs none of developers time.

This project was carried out by the Core SRE team to improve our daily work. If you are interested in this project, check out our job openings.

We didn’t know what we would get at the end, we discussed, looked for solutions and tried different approaches. Huge thanks for this to Nicolae Vartolomei, Alex Semiglazov and John Skopis.

Categories: Technology

Looking Ahead: Five Opportunities on The Horizon According to Tech Leaders

Tue, 03/11/2020 - 10:26
 Five Opportunities on The Horizon According to Tech Leaders

Dozens of top leaders and thinkers from the tech industry and beyond recently joined us for a series of fireside chats commemorating Cloudflare’s 10th birthday. Over the course of 24 hours of conversation, these leaders shared their thoughts on everything from entrepreneurship to mental health — and how the Internet will continue to play a vital role.

Here are some of the highlights.

On the global opportunity for entrepreneurs

Anu Hariharan
Partner, Y Combinator’s Continuity Fund

 Five Opportunities on The Horizon According to Tech Leaders

Fast forwarding ten years from now, I think entrepreneurship is global, and you're already seeing signs of that. 27% of YC startups are headquartered outside the US. And I'm willing to bet that in a decade, at least 50% of YC startups will be headquartered outside the US. And so I think the sheer nature of the Internet democratizing information, more companies being global, like Facebook, Google, Uber — talent is everywhere. I think you will see multi-billion dollar companies coming out of other regions.

People have this perception that everything is a zero sum game, or that we are already at peak Internet penetration. Absolutely not. The global market cap is ~$85 trillion. Less than 10% is e-commerce. Internet enabled businesses is $8 trillion. So even if you play this out for another twenty years, Internet enabled businesses should be at least $66 trillion. So we have a lot more to go. And I think the zero sum game that investors tend to think of, what we've gotten wrong is — most of these Internet enabled businesses are expanding TAM.

Watch the full interview

On democratizing and normalizing mental health

Karan Singh
Co-founder and COO of Ginger

 Five Opportunities on The Horizon According to Tech Leaders

Our vision is a world where mental health is never an obstacle, and that's a never-ending vision. I don't think that will be done in 10 years, but I am hopeful that in 10 years or even well before that, this whole new virtual-first sort of care paradigm can really start to take shape, where you start digitally and then progress to an in-person should you need it.

And for some people who are more acute, or in specific situations, they absolutely do need to see an in-person provider. But for many people, starting virtual — virtual being the default — feels like a more democratic and equitable experience in the world.

Watch the full interview

On leveling the playing field

Jennifer Hyman,
CEO and co-founder of Rent the Runway

 Five Opportunities on The Horizon According to Tech Leaders

Where I'm optimistic is that I think that in a life post-vaccine, when kids are back in school, when things are a little bit more normal, businesses are no longer going to require their employees to come to work five days a week in the same way and in the same structure that existed in the past. We realize that because of technology, we can work more flexibly, we can work more virtually.

And I think that that is going to have unlocks for everyone, don't get me wrong, but it'll have huge unlocks for women who are often the ones making the sacrifice to spend more time with the kids, be at home, do all of the house-related leadership, so I think that this will be a great equalizer in many ways.

Watch the full interview

On expecting the unexpected

Eric Schmidt
Former CEO & Executive Chairman, Google
Co-Founder, Schmidt Futures

 Five Opportunities on The Horizon According to Tech Leaders

It seems to me that the gains in machine learning and the investment that everyone, including Cloudflare, Google, et cetera, is putting in it — are going to yield a whole new set of applications.

We should expect more of the unexpected because of the level of investment. And so the people who sit there and say, oh, you know, it's Apple and Google and Amazon and Microsoft and so forth, and it's all done. They're missing the narrative. The narrative is that there's a new platform emerging which the big guys and the new guys, the new little guys are going to compete over. And that competition will generally be incredibly helpful. It will produce very significant large companies as they figure out a way to monetize. But more importantly, it'll have an impact on society, both in terms of entertainment, as we saw with TikTok and its predecessors, but also in terms of information and productivity.

Watch the full interview

On the future of video conferencing

Eric Yuan
Founder and CEO of Zoom

 Five Opportunities on The Horizon According to Tech Leaders

For now, if we all work from home, from a productivity perspective there's no productive loss. However, social interaction is a problem. Mental health is another problem. The reason why, no matter how good we think it is now — it cannot deliver a better experience than a face-to-face meeting.

If I didn’t see you for a while, and I wanted to give you a big hug — you cannot feel my intimacy over Zoom, right? And if you are getting a cup of coffee, I can not enjoy the smell, not like when you and I are in a Starbucks.

I think that technology-wise, in the future with those cutting edge technologies, we should believe that videoconferencing like Zoom can deliver a better experience than a face-to-face meeting. I shake hands with you remotely, you can feel my hand-shaking. And even if you speak a different language, with AI, with real-time language translation — I think those technologies can truly help make sure that online communication is better than face to face meeting. In the next 10 or 15 years, I think we will get there with some technology.

Watch the full interview

Quotes have been lightly edited for clarity and length.

Categories: Technology

The Serverlist: Serverless Wasm AI, Building Automatic Platform Optimizations, and more!

Sat, 31/10/2020 - 12:00
 Serverless Wasm AI, Building Automatic Platform Optimizations, and more!

Check out our twenty-first edition of The Serverlist below. Get the latest scoop on the serverless space, get your hands dirty with new developer tutorials, engage in conversations with other serverless developers, and find upcoming meetups and conferences to attend.

Sign up below to have The Serverlist sent directly to your mailbox.

MktoForms2.loadForm("", "713-XSC-918", 1654, function(form) {   form.onSuccess(function() {     mktoForm_1654.innerHTML = "Thank you! Your email has been added to our list."     return false   }) }); Your privacy is important to us /* These !important rules are necessary to override inline color and font-size attrs on elements generated from Marketo */ .newsletter form { color:#646567 !important; font-family: inherit !important; font-size: inherit !important; width: 100% !important; } .newsletter form { display: flex; flex-direction: row; justify-content: flex-start; line-height: 1.5em; margin-bottom: 1em; padding: 0em; } .newsletter input[type="Email"], .mktoForm.mktoHasWidth.mktoLayoutLeft .mktoButtonWrap.mktoSimple .mktoButton { border-radius: 3px !important; font: inherit; line-height: 1.5em; padding-top: .5em; padding-bottom: .5em; } .newsletter input[type="Email"] { border: 1px solid #ccc; box-shadow: none; height: initial; margin: 0; margin-right: .5em; padding-left: .8em; padding-right: .8em; /* This !important is necessary to override inline width attrs on elements generated from Marketo */ width: 100% !important; } .newsletter input[type="Email"]:focus { border: 1px solid #3279b3; } .newsletter .mktoForm.mktoHasWidth.mktoLayoutLeft .mktoButtonWrap.mktoSimple .mktoButton { background-color: #f18030 !important; border: 1px solid #f18030 !important; color: #fff !important; padding-left: 1.25em; padding-right: 1.25em; } .newsletter .mktoForm .mktoButtonWrap.mktoSimple .mktoButton:hover { border: 1px solid #f18030 !important; } .newsletter .privacy-link { font-size: .8em; } .newsletter .mktoAsterix, .newsletter .mktoGutter, .newsletter .mktoLabel, .newsletter .mktoOffset { display: none; } .newsletter .mktoButtonWrap, .newsletter .mktoFieldDescriptor { /* This !important is necessary to override inline margin attrs on elements generated from Marketo */ margin: 0px !important; } .newsletter .mktoForm .mktoButtonRow { margin-left: 0.5em; } .newsletter .mktoFormRow:first-of-type, .newsletter .mktoFormRow:first-of-type .mktoFieldDescriptor.mktoFormCol, .newsletter .mktoFormRow:first-of-type .mktoFieldDescriptor.mktoFormCol .mktoFieldWrap.mktoRequiredField { width: 100% !important; } .newsletter .mktoForm .mktoField:not([type=checkbox]):not([type=radio]) { width: 100% !important } iframe[seamless]{ background-color: transparent; border: 0 none transparent; padding: 0; overflow: hidden; } const magic = document.getElementById('magic') function resizeIframe() { const iframeDoc = magic.contentDocument const iframeWindow = magic.contentWindow magic.height = iframeDoc.body.clientHeight magic.width = "100%" const injectedStyle = iframeDoc.createElement('style') injectedStyle.innerHTML = ` body { background: white !important; } .stack { display: flex; align-items: center; } #footerModulec155fe9e-f964-4fdf-829b-1366f112e82b .stack { display: block; } ` magic.contentDocument.head.appendChild(injectedStyle) function onFinish() { setTimeout(() => { = '' }, 80) } if (iframeDoc.readyState === 'loading') { iframeWindow.addEventListener('load', onFinish) } else { onFinish() } } async function fetchURL(url) { magic.addEventListener('load', resizeIframe) const call = await fetch(`${url}`) const text = await call.text() const divie = document.createElement("div") divie.innerHTML = text const listie = divie.getElementsByTagName("a") for (var i = 0; i < listie.length; i++) { listie[i].setAttribute("target", "_blank") } magic.scrolling = "no" magic.srcdoc = divie.innerHTML } fetchURL("")
Categories: Technology

Unwrap the SERVFAIL

Fri, 30/10/2020 - 12:00
Unwrap the SERVFAIL

We recently released a new version of Cloudflare Resolver which adds a piece of information called “Extended DNS Errors” (EDE) along with the response code under certain circumstances. This will be helpful in tracing DNS resolution errors and figuring out what went wrong behind the scenes.

Unwrap the SERVFAIL(image from: tight-lipped agent

The DNS protocol was designed to map domain names to IP addresses. To inform the client about the result of the lookup, the protocol has a 4 bit field, called response code/RCODE. The logic to serve a response might look something like this:

function lookup(domain) { ... switch result { case "No error condition": return NOERROR with client expected answer case "No record for the request type": return NOERROR case "The request domain does not exist": return NXDOMAIN case "Refuse to perform the specified operation for policy reasons": return REFUSE default("Server failure: unable to process this query due to a problem with the name server"): return SERVFAIL } } try { lookup(domain) } catch { return SERVFAIL }

Although the context hasn't changed much, protocol extensions such as DNSSEC have been added, which makes the RCODE run out of space to express the server's internal status. To keep backward compatibility, DNS servers have to squeeze various statuses into existing ones. This behavior could confuse the client, especially with the "catch-all" SERVFAIL: something went wrong but what exactly?

Most often, end users don't talk to authoritative name servers directly, but use a stub and/or a recursive resolver as an agent to acquire the information it needs. When a user receives  SERVFAIL, the failure can be one of the following:

  • The stub resolver fails to send the request.
  • The stub resolver doesn’t get a response.
  • The recursive resolver, which the stub resolver sends its query to, is overloaded.
  • The recursive resolver is unable to communicate with upstream authoritative servers.
  • The recursive resolver fails to verify the DNSSEC chain.
  • The authoritative server takes too long to respond.
  • ...

In such cases, it is nearly impossible for the user to know exactly what’s wrong. The resolver is usually the one to be blamed, because, as an agent, it fails to get back the answer, and doesn’t return a clear reason for the failure in the response.

Keep backward compatibility

It seems we need to return more information, but (there's always a but) we also need to keep the behavior of existing clients unchanged.

One way is to extend the RCODE space, which came out with the Extension mechanisms for DNS or EDNS. It defines a 8 bit EXTENDED-RCODE, as high-order bits to current 4 bit RCODE. Together they make up a 12 bit integer. This changes the processing of RCODE, requires both client and server to fully support the logic unfortunately.

Another approach is to provide out-of-band data without touching the current RCODE. This is how Extended DNS Errors is defined. It introduces a new option to EDNS, containing an INFO-CODE to describe error details with an EXTRA-TEXT as an optional supplement. The option can be repeated as many times as needed, so it's possible for the client to get a full error chain with detailed messages. The INFO-CODE is just something like RCODE, but is 16 bits wide, while the EXTRA-TEXT is an utf-8 encoded string. For example, let’s say a client sends a request to a resolver, and the requested domain has two name servers. The client may receive a SERVFAIL response with an OPT record (see below) which contains two extended errors, one from one of the authoritative servers that shows it's not ready to serve, and the other from the resolver, showing it cannot connect to the other name server.

;; OPT PSEUDOSECTION: ; ... ; EDE: 14 (Not Ready) ; EDE: 23 (Network Error): (cannot reach upstream ; ...

Google has something similar in their DoH JSON API, which provides diagnostic information in the "Comment" field.

Let's dig into it

Our service has an initial support of the draft version of Extended DNS Errors, while we are still trying to find the best practice. As we mentioned above, this is not a breaking change, and existing clients will not be affected. The additional options can be safely ignored without any problem, since the RCODE stays the same.

If you have a newer version of dig, you can simply check it out with a known problematic domain. As you can see, due to DNSSEC verification failing, the RCODE is still SERVFAIL, but the extended error shows the failure is "DNSSEC Bogus".

$ dig @ ; <<>> DiG 9.16.4-Debian <<>> @ ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 1111 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ; EDE: 6 (DNSSEC Bogus) ;; QUESTION SECTION: ; IN A ;; Query time: 111 msec ;; SERVER: ;; WHEN: Wed Sep 01 00:00:00 PDT 2020 ;; MSG SIZE rcvd: 52

Note that Extended DNS Error relies on EDNS. So to be able to get one, the client needs to support EDNS, and needs to enable it in the request. At the time of writing this blog post, we see about 17% of queries that received had EDNS enabled within a short time range. We hope this information will help you uncover the root cause of a SERVFAIL in the future.

Categories: Technology

Introducing Bot Analytics

Thu, 29/10/2020 - 12:00
Introducing Bot AnalyticsIntroducing Bot Analytics

Bots — both good and bad — are everywhere on the Internet. Roughly 40% of Internet traffic is automated. Fortunately, Cloudflare offers a tool that can detect and block unwanted bots: we call it Bot Management. This is the most recent platform in our long history of detecting bots for our customers. In fact, Cloudflare has always offered some form of bot detection. Over the past two years, our team has focused on building advanced detection engines, innovating as bots become more sophisticated, and creating new features.

Today, we are releasing Bot Analytics to help you visualize your automated traffic.


It’s worth including some background for those who are new to bots.

Many websites expect human behavior. When I shop online, I behave as anyone else would: I might search for a few items, read reviews when I find something interesting, and eventually complete an order. This is expected. It is a standard use of the Internet.

Introducing Bot Analytics

Unfortunately, without protection these sites can be ripe for exploitation. Those shoes I was looking at? They are limited edition sneakers that resell for five times the price. Sneaker hoarders clamor at the chance to buy a pair (or fifty). Or perhaps I just added a book to my cart: there are probably hundreds of online retailers that sell the same book, each one eager to offer the best price. These retailers desperately want to know what their competitors’ prices are.

You can see where this is going. While most humans make good use of the Internet, some use automated tools to perform abuse at scale. For example, attackers will deplete sneaker inventories by using automated bots to check out quickly. By the time humans click “add to cart,” bots have already paid for shipping. Humans hardly stand a chance. Similarly, online retailers keep track of their competitors with “price scraping” bots that collect pricing information. So when one retailer lowers a book price to $10, another retailer’s bot will respond by pricing at $9.99. This is how we end up with weird prices like $12.32 for toilet paper. Worst of all, malicious bots are incentivized to hide their identities. They’re hidden among us.

Introducing Bot Analytics

Not all bots are bad. Cloudflare maintains a list of verified good bots that we keep separated from the rest. Verified bots are usually transparent about who they are: DuckDuckGo, for example, publicly lists the IP addresses it uses for its search engine. This is a well-intentioned service that happens to be automated, so we verified it. We also verify bots for error monitoring and other tools.

Enter: Bot Analytics
Introducing Bot Analytics

As discussed earlier, we built a Bot Management platform that intelligently detects bots on the Internet, allowing our customers to block bad ones and allow good ones. If you’re curious about how our solution works, read here.

Beginning today, we are going to show you the bots that reach your website. You can see these bots with a new tool called Bot Analytics. It’s fast, accurate, and loaded with information. You can query data up to one month in the past with no noticeable lag. To accomplish this, we exposed the data with GraphQL and paired it with adaptive bitrate (ABR) technology to dynamically load content. If you already have Bot Management added to your Cloudflare account, Bot Analytics is included in your service. Open up your dashboard and let’s take a tour…

The Tour

First: where to go? Bot Analytics lives under the Firewall tab of the dashboard. Once you’re in the Firewall, go to “Overview” and click the second thumbnail on the left. Remember, Bot Management must be added to your account for full access to analytics.

Introducing Bot Analytics

It’s worth noting that Enterprise sites without Bot Management can see a snapshot of their bot traffic. This data is updated in real time and should help you determine if you have a bot problem. Generally speaking, if you have a double-digit percentage of automated traffic, you might be spending more on origin costs than you have to. More importantly, you might be losing revenue or sensitive information to inventory hoarding and credential stuffing.

“Requests by bot score” is the first section on the page. Here, we show traffic over time, but we split it vertically by the traffic type. Green segments represent verified bots, while shades of purple and blue show varying degrees of bot/human likelihood.

Introducing Bot Analytics

“Bot score distribution” is next. This shows similar data, but we display it horizontally without the notion of time. Use the slider below to filter on subsets of traffic and watch the rest of the page adapt.

Introducing Bot Analytics

We recommend that you use the slider to find your ideal bot threshold. In other words: what is the cutoff for suspicious traffic on your site? We generally consider traffic below 30 to be automated, but customers might choose to challenge traffic below 40 or block traffic below 10 (you can even do both!). You should set a threshold that is ambitious but not too aggressive. If your traffic looks like the example below, consider setting a threshold at a “drop off” point like 3 or 14. Why? Notice that the request density is very high near scores 1-2 and 12-13. Many of these requests will have similar characteristics, meaning that the scores immediately above them (3 and 14) offer some differentiating quality. These are the most promising places to segment your bot rules. Notably, not every graph is this pronounced.

Introducing Bot Analytics

“Bot score source” sits lower on the page. Here, you can examine the detection engines that are responsible for scoring your traffic. If you can’t remember the purpose of each engine, simply hover over the tooltip to view a brief description. Customers may wonder why some requests are flagged as “not computed.” This commonly occurs when Cloudflare has issued an error page on your behalf. Perhaps a visitor’s request was met with a gateway timeout (error 504), in which case Cloudflare responded with a branded error page. The error page would not have warranted a challenge or a block, so we did not spend time calculating a bot score. We published another blog post that provides an overview of the most common sources, including machine learning and heuristics.

Introducing Bot Analytics

“Top requests by source” is the final section of Bot Analytics. Although it’s not quite as colorful as the sections above, this section grounds Bot Analytics in highly specific data. You can filter or exclude request attributes, including IP addresses, user agents, and ASNs. In the next section, we’ll use this to spot a bot attack.

Let's Spot A Bot Attack!

First, I’m going to use the “bot score source” tool to select the most obvious bot requests — those detected by our heuristics engine. This provides us with the following information, some of which has been redacted for privacy reasons:

Introducing Bot Analytics

I already suspect a correlation between a few of these attributes. First, the IP addresses all have very similar request counts. No human would access a site 22,000 times, and the uniformity across IPs 2-5 suggests foul play. Not surprisingly, the same pattern occurs for user agents on the right. User agents tell us about the browser and device associated with a particular request. When Bot Analytics shows this much uniformity and presents clear anomalies in country and ASN, I get suspicious (and you should too). I’m now going to filter on these anomalies to see if my instinct is right:

Introducing Bot Analytics

The trends hold true — to be sure, I briefly expanded the table and found nine separate IP addresses exhibiting the same behavior. This is likely an aggressive content scraper. Notably, it is not marked as a verified bot, so Bot Management issued the lowest possible score and flagged it as “automated.” At the top of Bot Analytics, I will narrow down the traffic and keep the time period at 24 hours:

Introducing Bot Analytics

The most severe attacks come and go. This traffic is clearly sustained, and my best guess is that someone is frequently scraping the homepage for content. This isn’t the most malicious of attacks, but content is still being taken. If I wanted to, I could set a firewall rule to target this bot score or any of the filters I used.

Try It Out

As a reminder, all Enterprise customers will be able to see a snapshot of their bot traffic. Even if you don’t have Bot Management for your site, visit the Firewall for some high-level insights that are updated in real time.

Introducing Bot Analytics

And for those of you with Bot Management — check out Bot Analytics! It’s live now, and we hope you’ll have fun using it. Keep your eyes open for new analytics features in the coming months.

Categories: Technology

Diving into /proc/pid/mem

Tue, 27/10/2020 - 12:00
Diving into /proc/pid/memDiving into /proc/pid/mem

A few months ago, after reading about Cloudflare doubling its intern class size, I quickly dusted off my CV and applied for an internship. Long story short: now, a couple of months later, I found myself staring into Linux kernel code and adding a pretty cool feature to gVisor, a Linux container runtime.

My internship was under the Emerging Technologies and Incubation group on a project involving gVisor. A co-worker contacted my team about not being able to read the debug symbols of stack traces inside the sandbox. For example, when the isolated process crashed, this is what we saw in the logs:

*** Check failure stack trace: *** @ 0x7ff5f69e50bd (unknown) @ 0x7ff5f69e9c9c (unknown) @ 0x7ff5f69e4dbd (unknown) @ 0x7ff5f69e55a9 (unknown) @ 0x5564b27912da (unknown) @ 0x7ff5f650ecca (unknown) @ 0x5564b27910fa (unknown)

Obviously, this wasn't very useful. I eagerly volunteered to fix this stack unwinding code - how hard could it be?

After some debugging, we found that the logging library used in the project opened /proc/self/mem to look for ELF headers at the start of each memory-mapped region. This was necessary to calculate an offset to find the correct addresses for debug symbols.

It turns out this mechanism is rather common. The stack unwinding code is often run in weird contexts - like a SIGSEGV handler - so it would not be appropriate to dig over real memory addresses back and forth to read the ELF. This could trigger another SIGSEGV. And SIGSEGV inside a SIGSEGV handler means either termination via the default handler for a segfault or recursing into the same handler again and again (if one sets SA_NODEFER) leading to a stack overflow.

However, inside gVisor, each call of open() on /proc/self/mem resulted in ENOENT, because the entire /proc/self/mem file was missing. In order to provide a robust sandbox, gVisor has to carefully reimplement the Linux kernel interfaces. This particular /proc file was simply unimplemented in the virtual file system of Sentry, one of gVisor's sandboxing components.
Marek asked the devs on the project chat and got confirmation - they would be happy to accept a patch implementing this file.
Diving into /proc/pid/mem

The easy way out would have been to make a small, local patch to the unwinder behavior, yet I found myself diving into the Linux kernel trying to figure how the mem file worked in an attempt to implement it in Sentry's VFS.

What does /proc/[pid]/mem do?

The file itself is quite powerful, because it allows raw access to the virtual address space of a process. According to manpages, the documented file operations are open(), read() and lseek(). Typical use cases are debugging tasks or dumping process memory.

Opening the file

When a process wants to open the file, the kernel does the file permissions check, looks up the associated operations for mem and invokes a method called proc_mem_open. It retrieves the associated task and calls a method named mm_access.

/* * Grab a reference to a task's mm, if it is not already going away * and ptrace_may_access with the mode parameter passed to it * succeeds. */

Seems relatively straightforward, right? The special thing about mm_access is that it verifies the permissions the current task has regarding the task to which the memory belongs. If the current task and target task do not share the same memory manager, the kernel invokes a method named __ptrace_may_access.

/* * May we inspect the given task? * This check is used both for attaching with ptrace * and for allowing access to sensitive information in /proc. * * ptrace_attach denies several cases that /proc allows * because setting up the necessary parent/child relationship * or halting the specified task is impossible. * */

According to the manpages, a process which would like to read from an unrelated /proc/[pid]/mem file should have access mode PTRACE_MODE_ATTACH_FSCREDS. This check does not verify that a process is attached via PTRACE_ATTACH, but rather if it has the permission to attach with the specified credentials mode.

Access checks

After skimming through the function, you will see that a process is allowed access if the current task belongs to the same thread group as the target task, or denied access (depending on whether PTRACE_MODE_FSCREDS or PTRACE_MODE_REALCREDS is set, we will use either the file-system UID / GID, which is typically the same as the effective UID/GID, or the real UID / GID) if none of the following conditions are met:

  • the current task's credentials (UID, GID) match up with the credentials (real, effective and saved set-UID/GID) of the target process
  • the current task has CAP_SYS_PTRACE inside the user namespace of the target process

In the next check, access is denied if the current task has neither CAP_SYS_PTRACE inside the user namespace of the target task, nor the target's dumpable attribute is set to SUID_DUMP_USER. The dumpable attribute is typically required to allow producing core dumps.

After these three checks, we also go through the commoncap Linux Security Module (and other LSMs) to verify our access mode is fine. LSMs you may know are SELinux and AppArmor. The commoncap LSM performs the checks on the basis of effective or permitted process capabilities (depending on the mode being FSCREDS or REALCREDS), allowing access if

  • the capabilities of the current task are a superset of the capabilities of the target task, or
  • the current task has CAP_SYS_PTRACE in the target task's user namespace

In conclusion, one has access (with only commoncap LSM checks active) if:

  • the current task is in the same task group as the target task, or
  • the current task has CAP_SYS_PTRACE in the target task's user namespace, or
  • the credentials of the current and target task match up in the given credentials mode, the target task is dumpable, they run in the same user namespace and the target task's capabilities are a subset of the current task's capabilities

I highly recommend reading through the ptrace manpages to dig deeper into the different modes, options and checks.

Reading from the file

Since all the access checks occur when opening the file, reading from it is quite straightforward. When one invokes read() on a mem file, it calls up mem_rw (which actually can do both reading and writing).

To avoid using lots of memory, mem_rw performs the copy in a loop and buffers the data in an intermediate page. mem_rw has a hidden superpower, that is, it uses FOLL_FORCE to avoid permission checks on user-owned pages (handling pages marked as non-readable/non-writable readable and writable).

mem_rw has other specialties, such as its error handling. Some interesting cases are:

  • if the target task has exited after opening the file descriptor, performing read() will always succeed with reading 0 bytes
  • if the initial copy from the target task's memory to the intermediate page fails, it does not always return an error but only if no data has been read

You can also perform lseek on the file excluding SEEK_END.

How it works in gVisor

Luckily, gVisor already implemented ptrace_may_access as kernel.task.CanTrace, so one can avoid reimplementing all the ptrace access logic. However, the implementation in gVisor is less complicated due to the lack of support for PTRACE_MODE_FSCREDS (which is still an open issue).

When a new file descriptor is open()ed, the GetFile method of the virtual Inode is invoked, therefore this is where the access check naturally happens. After a successful access check, the method returns a fs.File. The fs.File implements all the file operations you would expect such as Read() and Write(). gVisor also provides tons of primitives for quickly building a working file structure so that one does not have to reimplement a generic lseek() for example.

In case a task invokes a Read() call onto the fs.File, the Read method retrieves the memory manager of the file’s Task.
Accessing the task's memory manager is a breeze with comfortable CopyIn and CopyOut methods, with interfaces similar to io.Writer and io.Reader.

After implementing all of this, we finally got a useful stack trace.

*** Check failure stack trace: *** @ 0x7f190c9e70bd google::LogMessage::Fail() @ 0x7f190c9ebc9c google::LogMessage::SendToLog() @ 0x7f190c9e6dbd google::LogMessage::Flush() @ 0x7f190c9e75a9 google::LogMessageFatal::~LogMessageFatal() @ 0x55d6f718c2da main @ 0x7f190c510cca __libc_start_main @ 0x55d6f718c0fa _start Conclusion

A comprehensive victory! The /proc/<pid>/mem file is an important mechanism that gives insight into contents of process memory. It is essential to stack unwinders to do their work in case of complicated and unforeseeable failures. Because the process memory contains highly-sensitive information, data access to the file is determined by a complex set of poorly documented rules. With a bit of effort, you can emulate /proc/[PID]/mem inside gVisor’s sandbox, where the process only has access to the subset of procfs that has been implemented by the gVisor authors and, as a result, you can have access to an easily readable stack trace in case of a crash.

Now I can't wait to get the PR merged into gVisor.

Categories: Technology

Road to gRPC

Mon, 26/10/2020 - 16:40
Road to gRPCRoad to gRPC

Cloudflare launched support for gRPC® during our 2020 Birthday Week. We’ve been humbled by the immense interest in the beta, and we’d like to thank everyone that has applied and tried out gRPC! In this post we’ll do a deep-dive into the technical details on how we implemented support.

What is gRPC?

gRPC is an open source RPC framework running over HTTP/2. RPC (remote procedure call) is a way for one machine to tell another machine to do something, rather than calling a local function in a library. RPC has been around in the history of distributed computing, with different implementations focusing on different areas, for a long time. What makes gRPC unique are the following characteristics:

  • It requires the modern HTTP/2 protocol for transport, which is now widely available.
  • A full client/server reference implementation, demo, and test suites are available as open source.
  • It does not specify a message format, although Protocol Buffers are the preferred serialization mechanism.
  • Both clients and servers can stream data, which avoids having to poll for new data or create new connections.

In terms of the protocol, gRPC uses HTTP/2 frames extensively: requests and responses look very similar to a normal HTTP/2 request.

What’s unusual, however, is gRPC’s usage of the HTTP trailer. While it’s not widely used in the wild, HTTP trailers have been around since 1999, as defined in original HTTP/1.1 RFC2616. HTTP message headers are defined to come before the HTTP message body, but HTTP trailer is a set of HTTP headers that can be appended after the message body. However, because there are not many use cases for trailers, many server and client implementations don't fully support them. While HTTP/1.1 needs to use chunked transfer encoding for its body to send an HTTP trailer, in the case of HTTP/2 the trailer is in HEADER frame after the DATA frame of the body.

There are some cases where an HTTP trailer is useful. For example, we use an HTTP response code to indicate the status of request, but the response code is the very first line of the HTTP response, so we need to decide on the response code very early. A trailer makes it possible to send some metadata after the body. For example, let’s say your web server sends a stream of large data (which is not a fixed size), and in the end you want to send a SHA1 checksum of the data you sent so that the client can verify the contents. Normally, this is not possible with an HTTP status code or the response header which should be sent at the beginning of the response. Using a HTTP trailer header, you can send another header (e.g. Checksum: XXX) after having sent all the data.

gRPC uses HTTP trailers for two purposes. To begin with, it sends its final status (grpc-status) as a trailer header after the content has been sent. The second reason is to support streaming use cases. These use cases last much longer than normal HTTP requests. The HTTP trailer is used to give the post processing result of the request or the response. For example if there is an error during streaming data processing, you can send an error code using the trailer, which is not possible with the header before the message body.

Here is a simple example of a gRPC request and response in HTTP/2 frames:

Road to gRPCAdding gRPC support to the Cloudflare Edge

Since gRPC uses HTTP/2, it may sound easy to natively support gRPC, because Cloudflare already supports HTTP/2. However, we had a couple of issues:

  • The HTTP request/response trailer headers were not fully supported by our edge proxy: Cloudflare uses NGINX to accept traffic from eyeballs, and it has limited support for trailers. Further complicating things, requests and responses flowing through Cloudflare go through a set of other proxies.
  • HTTP/2 to origin: our edge proxy uses HTTP/1.1 to fetch objects (whether dynamic or static) from origin. To proxy gRPC traffic, we need support connections to customer gRPC origins using HTTP/2.
  • gRPC streaming needs to allow bidirectional request/response flow: gRPC has two types of protocol flow; one is unary, which is a simple request and response, and another is streaming, which allows non-stop data flow in each direction. To fully support the streaming, the HTTP message body needs to be sent after receiving the response header on the other end. For example, client streaming will keep sending a request body after receiving a response header.

Due to these reasons, gRPC requests would break when proxied through our network. To overcome these limitations, we looked at various solutions. For example, NGINX has a gRPC upstream module to support HTTP/2 gRPC origin, but it’s a separate module, and it also requires HTTP/2 downstream, which cannot be used for our service, as requests cascade through multiple HTTP proxies in some cases. Using HTTP/2 everywhere in the pipeline is not realistic, because of the characteristics of our internal load balancing architecture, and because it would have taken too much effort to make sure all internal traffic uses HTTP/2.

Road to gRPCConverting to HTTP/1.1?

Ultimately, we discovered a better way: convert gRPC messages to HTTP/1.1 messages without a trailer inside our network, and then convert them back to HTTP/2 before sending the request off to origin. This would work with most HTTP proxies inside Cloudflare that don't support HTTP trailers, and we would need minimal changes.

Rather than inventing our own format, the gRPC community has already come up with an HTTP/1.1-compatible version: gRPC-web. gRPC-web is a modification of the original HTTP/2 based gRPC specification. The original purpose was to be used with the web browsers, which lack direct access HTTP/2 frames. With gRPC-web, the HTTP trailer is moved to the body, so we don’t need to worry about HTTP trailer support inside the proxy. It also comes with streaming support. The resulting HTTP/1.1 message can be still inspected by our security products, such as WAF and Bot Management, to provide the same level of security that Cloudflare brings to other HTTP traffic.

When an HTTP/2 gRPC message is received at Cloudflare’s edge proxy, the message is “converted” to HTTP/1.1 gRPC-web format. Once the gRPC message is converted, it goes through our pipeline, applying services such as WAF, Cache and Argo services the same way any normal HTTP request would.

Right before a gRPC-web message leaves the Cloudflare network, it needs to be “converted back” to HTTP/2 gRPC again. Requests that are converted by our system are marked so that our system won’t accidentally convert gRPC-web traffic originated from clients.

HTTP/2 Origin Support

One of the engineering challenges was to support using HTTP/2 to connect to origins. Before this project, Cloudflare didn't have the ability to connect to origins via HTTP/2.

Therefore, we decided to build support for HTTP/2 origin support in-house. We built a standalone origin proxy that is able to connect to origins via HTTP/2. On top of this new platform, we implemented the conversion logic for gRPC. gRPC support is the first feature that takes advantage of this new platform. Broader support for HTTP/2 connections to origin servers is on the roadmap.

gRPC Streaming Support

As explained above, gRPC has a streaming mode that request body or response body can be sent in stream; in the lifetime of gRPC requests, gRPC message blocks can be sent at any time. At the end of the stream, there will be a HEADER frame indicating the end of the stream. When it’s converted to gRPC-web, we will send the body using chunked encoding and keep the connection open, accepting both sides of the body until we get a gRPC message block, which indicates the end of the stream. This requires our proxy to support bidirectional transfer.

For example, client streaming is an interesting mode where the server already responds with a response code and its header, but the client is still able to send the request body.

Interoperability Testing

Every new feature at Cloudflare needs proper testing before release. During initial development, we used the envoy proxy with its gRPC-web filter feature and official examples of gRPC. We prepared a test environment with envoy and a gRPC test origin to make sure that the edge proxy worked properly with gRPC requests. Requests from the gRPC test client are sent to the edge proxy and converted to gRPC-web, and forwarded to the envoy proxy. After that, envoy converts back to gRPC request and sends to gRPC test origin. We were able to verify the basic behavior in this way.

Once we had basic functionality ready, we also needed to make sure both ends’ conversion functionality worked properly. To do that, we built deeper interoperability testing.

We referenced the existing gRPC interoperability test cases for our test suite and ran the first iteration of tests between the edge proxy and the new origin proxy locally.

For the second iteration of tests we used different gRPC implementations. For example, some servers sent their final status (grpc-status)  in a trailers-only response when there was an immediate error. This response would contain the HTTP/2 response headers and trailer in a single HEADERS frame block with both the END_STREAM and END_HEADERS flags set. Other implementations sent the final status as trailer in a separate HEADERS frame.

After verifying interoperability locally we ran the test harness against a development environment that supports all the services we have in production. We were then able to ensure no unintended side effects were impacting gRPC requests.

We love dogfooding! One of the first services we successfully deployed edge gRPC support to is the Cloudflare drand randomness beacon. Onboarding was easy and we’ve been running the beacon in production for the last few weeks without a hitch.


Supporting a new protocol is exciting work! Implementing support for new technologies in existing systems is exciting and intricate, often involving tradeoffs between speed of implementation and overall system complexity. In the case of gRPC, we were able to build support quickly and in a way that did not require significant changes to the Cloudflare edge. This was accomplished by carefully considering implementation options before settling on the idea of converting between HTTP/2 gRPC and HTTP/1.1 gRPC-web format. This design choice made service integration quicker and easier while still satisfying our user’s expectations and constraints.

If you are interested in using Cloudflare to secure and accelerate your gRPC service, you can read more here. And if you want to work on interesting engineering challenges like the one described in this post, apply!

gRPC® is a registered trademark of The Linux Foundation.

Categories: Technology

A Last Call for QUIC, a giant leap for the Internet

Thu, 22/10/2020 - 15:08
A Last Call for QUIC, a giant leap for the Internet

QUIC is a new Internet transport protocol for secure, reliable and multiplexed communications. HTTP/3 builds on top of QUIC, leveraging the new features to fix performance problems such as Head-of-Line blocking. This enables web pages to load faster, especially over troublesome networks.

QUIC and HTTP/3 are open standards that have been under development in the IETF for almost exactly 4 years. On October 21, 2020, following two rounds of Working Group Last Call, draft 32 of the family of documents that describe QUIC and HTTP/3 were put into IETF Last Call. This is an important milestone for the group. We are now telling the entire IETF community that we think we're almost done and that we'd welcome their final review.

A Last Call for QUIC, a giant leap for the Internet

Speaking personally, I've been involved with QUIC in some shape or form for many years now. Earlier this year I was honoured to be asked to help co-chair the Working Group. I'm pleased to help shepherd the documents through this important phase, and grateful for the efforts of everyone involved in getting us there, especially the editors. I'm also excited about future opportunities to evolve on top of QUIC v1 to help build a better Internet.

There are two aspects to protocol development. One aspect involves writing and iterating upon the documents that describe the protocols themselves. Then, there's implementing, deploying and testing libraries, clients and/or servers. These aspects operate hand in hand, helping the Working Group move towards satisfying the goals listed in its charter. IETF Last Call marks the point that the group and their responsible Area Director (in this case Magnus Westerlund) believe the job is almost done. Now is the time to solicit feedback from the wider IETF community for review. At the end of the Last Call period, the stakeholders will take stock, address feedback as needed and, fingers crossed, go onto the next step of requesting the documents be published as RFCs on the Standards Track.

Although specification and implementation work hand in hand, they often progress at different rates, and that is totally fine. The QUIC specification has been mature and deployable for a long time now. HTTP/3 has been generally available on the Cloudflare edge since September 2019, and we've been delighted to see support roll out in user agents such as Chrome, Firefox, Safari, curl and so on. Although draft 32 is the latest specification, the community has for the time being settled on draft 29 as a solid basis for interoperability. This shouldn't be surprising, as foundational aspects crystallize the scope of changes between iterations decreases. For the average person in the street, there's not really much difference between 29 and 32.

So today, if you visit a website with HTTP/3 enabled—such as—you’ll probably see response headers that contain Alt-Svc: h3-29="… . And in a while, once Last Call completes and the RFCs ship, you'll start to see websites simply offer Alt-Svc: h3="… (note, no draft version!).

Need a deep dive?

We've collected a bunch of resource links at If you're more of an interactive visual learner, you might be pleased to hear that I've also been hosting a series on Cloudflare TV called "Levelling up Web Performance with HTTP/3". There are over 12 hours of content including the basics of QUIC, ways to measure and debug the protocol in action using tools like Wireshark, and several deep dives into specific topics. I've also been lucky to have some guest experts join me along the way. The table below gives an overview of the episodes that are available on demand.

A Last Call for QUIC, a giant leap for the Internet Episode Description 1 Introduction to QUIC. 2 Introduction to HTTP/3. 3 QUIC & HTTP/3 logging and analysis using qlog and qvis. Featuring Robin Marx. 4 QUIC & HTTP/3 packet capture and analysis using Wireshark. Featuring Peter Wu. 5 The roles of Server Push and Prioritization in HTTP/2 and HTTP/3. Featuring Yoav Weiss. 6 "After dinner chat" about curl and QUIC. Featuring Daniel Stenberg. 7 Qlog vs. Wireshark. Featuring Robin Marx and Peter Wu. 8 Understanding protocol performance using WebPageTest. Featuring Pat Meenan and Andy Davies. 9 Handshake deep dive. 10 Getting to grips with quiche, Cloudflare's QUIC and HTTP/3 library. 11 A review of SIGCOMM's EPIQ workshop on evolving QUIC. 12 Understanding the role of congestion control in QUIC. Featuring Junho Choi.

Whither QUIC?

So does Last Call mean QUIC is "done"? Not by a long shot. The new protocol is a giant leap for the Internet, because it enables new opportunities and innovation. QUIC v1 is basically the set of documents that have gone into Last Call. We'll continue to see people gain experience deploying and testing this, and no doubt cool blog posts about tweaking parameters for efficiency and performance are on the radar. But QUIC and HTTP/3 are extensible, so we'll see people interested in trying new things like multipath, different congestion control approaches, or new ways to carry data unreliably such as the DATAGRAM frame.

We're also seeing people interested in using QUIC for other use cases. Mapping other application protocols like DNS to QUIC is a rapid way to get its improvements. We're seeing people that want to use QUIC as a substrate for carrying other transport protocols, hence the formation of the MASQUE Working Group. There's folks that want to use QUIC and HTTP/3 as a "supercharged WebSocket", hence the formation of the WebTransport Working Group.

Whatever the future holds for QUIC, we're just getting started, and I'm excited.

Categories: Technology

A Virtual Product Management Internship Experience

Thu, 22/10/2020 - 14:14
A Virtual Product Management Internship ExperienceA Virtual Product Management Internship Experience

In July 2020, I joined Cloudflare as a Product Management Intern on the DDoS (Distributed Denial of Service) team to enhance the benefits that Network Analytics brings to our customers. In the following, I am excited to share with you my experience with remote working as an intern, and how I acclimatized into Cloudflare. I also give details about what my work entailed and how we approached the process of Product Management.

Onboarding to Cloudflare during COVID19

As a long-time user of Cloudflare’s Free CDN plan myself, I was thrilled to join the company and learn what was happening behind the scenes while making its products. The entering internship class consisted of students and recent graduates from various backgrounds around the world - all with a mutual passion in helping build a better Internet.

The catch here was that 2020 would make the experience of being an intern very different. As it was the case with many other fellow interns, it was the first time I had taken up work remotely from scratch. The initial challenge was to integrate into the working environment without ever meeting colleagues in a physical office. Because everything took place online, it was much harder to pick up non-verbal cues that play a key role in communication, such as eye contact and body language.

To face this challenge, Cloudflare introduced creative and active ways in which we could better interact with one another. From the very first day, I was welcomed to an abundance of knowledge sharing talks and coffee chats with new and existing colleagues in different offices across the world. Whether it was data protection from the Legal team or going serverless with Workers, we were welcomed to afternoon seminars every week on a new area that was being pursued within Cloudflare.

Cloudflare not only retained the summer internship scheme, but in fact doubled the size of the class; this reinforced an optimistic mood within the entering class and a sense of personal responsibility. I was paired up with a mentor, a buddy, and a manager who helped me find my way quickly within Cloudflare, and without which my experience would not have been the same. Thanks to Omer, Pat, Val and countless others for all your incredible support!

Social interactions took various forms and were scheduled for all global time zones. I was invited to weekly virtual yoga sessions and intern meetups to network and discover what other interns across the world were working on. We got to virtually mingle at an “Intern Mixer” where we shared answers to philosophical prompts – what’s more, this was accompanied by an UberEats coupon for us to enjoy refreshments in our work-from-home setting. We also had Pub Quizzes with colleagues in the EMEA region to brush up on our trivia skills. At this uncertain time of the year, part of which I spent in complete self-isolation, these gatherings helped create a sense of belonging within the community, as well as an affinity towards the colleagues I interacted with.

Product Management at Cloudflare

My internship also offered a unique learning experience from the Product Management perspective. I took on the task of increasing the value of Network Analytics by giving customers and internal stakeholders improved  transparency in the traffic patterns and attacks taking place. Network Analytics is Cloudflare’s packet- and bit-oriented dashboard that provides visibility into network- and transport-layer attacks which are mitigated across the world. Among various updates I led are the new trends insights and extended support to Cloudflare Spectrum, an L4 reverse-proxy that provides DDoS protection against attacks and facilitates network performance.

I was at the intersection of multiple teams that contributed to Network Analytics from different angles, including user interface, UX research, product design, product content and backend engineering, among many others. The key to a successful delivery of Network Analytics as a product, given its interdisciplinary nature, meant that I actively facilitated communication and collaboration across experts in these teams as well as reflected the needs of the users.

I spent the first month of the internship approaching internal stakeholders, namely Customer Support engineers, Solutions Engineers, Customer Success Managers, and Product Managers, to better understand the common pain points. Given their past experience with customers, their insights revealed how Network Analytics could both leverage the existing visibility features to reduce overhead costs on the internal support side and empower users with actionable insights. This process also helped ensure that I didn’t reinvent wheels that had already been explored by existing Product Managers.

I then approached customers to enquire about desired areas for improvements. An example of such a desired improvement was that the display of data in the dashboard was not helping users infer any meaning regarding next steps. It did not answer questions like: What do these numbers represent in retrospect, and should I be concerned? Discussing these aspects helped validate the needs, and we subsequently came up with rough solutions to address them, such as dynamic trends view. Over the calls, we confirmed that - especially from those who rarely accessed the dashboard - having an overview of these numbers in the form of a trends card would incentivize users to log in more often and get more value from the product.

A Virtual Product Management Internship ExperienceTrends Insights

The 1:1 dialogues were incredibly helpful in understanding how Network Analytics could be more effectively utilized, and guided ways for us to better surface the performance of our DDoS mitigation tools to our customers. In the first few weeks of the internship, I shadowed customer calls of other products; this helped me gain the confidence, knowledge, and language appropriate in Cloudflare’s user research. I did a run-through of the interview questions with a UX Researcher, and was informed on the procedure for getting in touch with appropriate customers. We even had bilingual calls where the Customer Success Manager helped translate the dialogues real-time.

In the following weeks, I synthesized these findings into a Product Requirements Document and lined up the features according to quarterly goals that could now be addressed in collaboration with other teams. After a formal review and discussion with Product Managers, engineers, and designers, we developed and rolled out each feature to the customers on a bi-weekly basis. We always welcomed feedback before and after the feature releases, as the goal wasn’t to have an ultimate final product, but to deliver incremental enhancements to meet the evolving needs of our customers.

Of course, all my interactions, including customer and internal stakeholder calls, were all held remotely. We all embraced video conferencing and instant chat messengers to make it feel as though we were physically close. I had weekly check-ins with various colleagues including my managers, Network Analytics team, DDoS engineering team, and DDoS reports team, to ensure that things were on track. For me, the key to working remotely was the instant chat function, which was not as intrusive as a fully fledged meeting, but a quick and considerate way to communicate in a tightly-knit team.

Looking Back

Product Management is a growth process - both for the corresponding individual and the product. As an individual, you grow fast through creative thinking, problem solving and incessant curiosity to better understand a product in the shoes of a customer. At the same time, the product continues to evolve and grow as a result of synergy between experts from diverse fields and customer feedback. Products are used and experienced by people, so it is a no-brainer that maintaining constant and direct feedback from our customers and internal stakeholders are what bolsters their quality.

It was an incredible opportunity to have been a part of an organization that represents one of the largest networks. Network Analytics is a window into the efforts led by Cloudflare engineers and technicians to help secure the Internet, and we are ambitious to scale the transparency across further mitigation systems in the future.

The internship was a successful immersive experience into the world of Network Analytics and Product Management, even in the face of a pandemic. Owing to Cloudflare’s flexibility and ready access to resources for remote work, I was able to adapt to the work environment from the first day onwards and gain an authentic learning experience into how products work. As I now return to university, I look back on an internship that significantly added to my personal and professional growth. I am happy to leave behind the latest evolution of Network Analytics dashboard with hopefully many more to come. Thanks to Cloudflare and all my colleagues for making this possible!

Categories: Technology

The Cloudflare Radar 2020 Elections Dashboard

Wed, 21/10/2020 - 14:00
The Cloudflare Radar 2020 Elections DashboardThe Cloudflare Radar 2020 Elections Dashboard

There is significant global attention around the upcoming United States election. Through the Athenian Project and Cloudflare for Campaigns, Cloudflare is providing free protection from cyber attacks to a significant number of state and local elections' websites, as well as those of federal campaigns.

One of the bedrocks of a democracy is that people need to be able to get access to relevant information to make a choice about the future of their country. This includes information about the candidates up for election; learning about how to register, and how to cast a vote; and obtaining accurate information on the results.

A question that I’ve been increasingly asked these past few months: are cyberattacks going to impact these resources leading up to and on election day?

Internally, we have been closely monitoring attacks on the broader elections and campaign websites and have a team standing by 24x7 to help our current customers as well as state and local governments and eligible political campaigns to protect them at no cost from any cyberattacks they may see.

The good news is that, so far, cyberattacks have not been impacting the websites of campaigns and elections officials we are monitoring and protecting. While we do see some background noise of attacks, they have not interfered in the process so far. The attack traffic is below what we saw in 2016 and below what is typical in elections we have observed in other countries.

But there are still nearly two weeks before election day so our guard is up. We thought it was important to provide a view into how overall traffic to campaign and elections sites is trending as well as a view into the cyberattacks we're observing. To that end, today we're sharing data from our internal monitoring systems publicly through Cloudflare Radar. You can access the special “Election 2020” Radar dashboard here:

The dashboard is updated continuously with information we're tracking on traffic to elections-related sites, both legitimate and from cyberattacks. It is normal to see fluctuations in this traffic depending on the time of day as well as when there will be occasional cyberattacks. So far, nothing here surprises us.

It's important to note that Cloudflare does not see everything. We do not, for instance, have any view into misinformation campaigns that may be on social media. We also do not protect every state and local government or every campaign.

That said, we have Athenian Project participants in more than half of US states — including so-called red states, blue states, purple states, and several of the battleground states. We also have hundreds of federal campaigns that are using us ranging across the political spectrum. While we may not see a targeted cyberattack, given the critical role the web now plays to the election process, we believe we would likely see any wide-spread attacks attempting to disrupt the US elections.

So far, we are not seeing anything that suggests such an attack has impacted the election to date.

Our team will continue to monitor the situation. If any state or local elections agency or campaigns comes under attack, we stand ready to help at no cost through the Athenian Project and Cloudflare for Campaigns.

We could not have built Cloudflare into the company it is today without a stable, functional government. In the United States, that process depends on democracy and fair elections not tainted by outside influence like cyberattacks. We believe it is our duty to provide our technology where we can to help ensure this election runs smoothly.

Categories: Technology


Additional Terms