Blogroll: CloudFlare

I read blogs, as well as write one. The 'blogroll' on this site reproduces some posts from some of the people I enjoy reading. There are currently 70 posts from the blog 'CloudFlare.'

Disclaimer: Reproducing an article here need not necessarily imply agreement or endorsement!

Subscribe to CloudFlare feed CloudFlare
Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet.
Updated: 3 hours 44 min ago

Internship Experience: Software Development Intern

Wed, 18/05/2022 - 13:59
 Software Development Intern

Before we dive into my experience interning at Cloudflare, let me quickly introduce myself. I am currently a master’s student at the National University of Singapore (NUS) studying Computer Science. I am passionate about building software that improves people’s lives and making the Internet a better place for everyone. Back in December 2021, I joined Cloudflare as a Software Development Intern on the Partnerships team to help improve the experience that Partners have when using the platform. I was extremely excited about this opportunity and jumped at the prospect of working on serverless technology to build viable tools for our partners and customers. In this blog post, I detail my experience working at Cloudflare and the many highlights of my internship.

Interview Experience

The process began for me back when I was taking a software engineering module at NUS where one of my classmates had shared a job post for an internship at Cloudflare. I had known about Cloudflare’s DNS service prior and was really excited to learn more about the internship opportunity because I really resonated with the company's mission to help build a better Internet.

I knew right away that this would be a great opportunity and submitted my application. Soon after, I heard back from the recruiting team and went through the interview process – the entire interview process was extremely accommodating and is definitely the most enjoyable interview experience I have had. Throughout the process, I was constantly asked about the kind of things I would like to work on and the relevance of the work that I would be doing. I felt that this thorough communication carried on throughout the internship and really was a cornerstone of my experience interning at Cloudflare.

My Internship

My internship began with onboarding and training, and then after, I had discussions with my mentor, Ayush Verma, on the projects we aimed to complete during the internship and the order of objectives. The main issues we wanted to address was the current manual process that our internal teams and partners go through when they want to duplicate the configuration settings on a zone, or when they want to compare one zone to other zones to ensure that there are no misconfigurations. As you can imagine, with the number of different configurations offered on the Cloudflare dashboard for customers, it could take significant time to copy over every setting and rule manually from one zone to another. Additionally, this process, when done manually, poses a potential threat for misconfigurations due to human error. Furthermore, as more and more customers onboard different zones onto Cloudflare, there needs to be a more automated and improved way for them to make these configuration setups.

Initially, we discussed using Terraform as Cloudflare already supports terraform automation. However, this approach would only cater towards customers and users that have more technical resources and, in true Cloudflare spirit, we wanted to keep it simple such that it could be used by any and everyone. Therefore, we decided to leverage the publicly available Cloudflare APIs and create a browser-based application that interacts with these APIs to display configurations and make changes easily from a simple UI.

With the end goal of simplifying the experience for our partners and customers in duplicating zone configurations, we decided to build a Zone Copier web application solely built on Cloudflare Workers. This tool would, in a click of a button, automatically copy over every setting that can be copied from one zone to another, significantly reducing the amount of time and effort required to make the changes.

Alongside the Zone Copier, we would have some auxiliary tools such as a Zone Viewer, and Zone Comparison, where a customer can easily have a full view of their configurations on a single webpage and be able to compare different zones that they use respectively. These other applications improve upon the existing methods through which Cloudflare users can view their zone configurations, and allow for the direct comparison between different zones.

Importantly, these applications are not to replace the Cloudflare Dashboard, but to complement it instead – for deeper dives into a single particular configuration setting, the Cloudflare Dashboard remains the way to go.

To begin building the web application, I spent the first few weeks diving into the publicly available APIs offered by Cloudflare as part of the v4 API to verify the outputs of each endpoint, and the type of data that would be sent as a response from a request. This took much longer than expected as certain endpoints provided different default responses for a zone that has either an empty setting – for example, not having any Firewall Rules created – or uses a nested structure for its relevant response. These different potential responses have to be examined so that when the web application calls the respective API endpoint, the responses are handled appropriately. This process was quite manual as each endpoint had to be verified individually to ensure the output would work seamlessly with the application.

Once I completed my research, I was able to start designing the web application. Building the web application was a very interesting experience as the stack rested solely on Workers, a serverless application platform. My prior experiences building web applications used servers that require the deployment of a server built using Express and Node.js, whereas for my internship project, I completely relied on a backend built using the itty-router library on Workers to interface with the publicly available Cloudflare APIs. I found this extremely exciting as building a serverless application required less overhead compared to setting up a server and deploying it, and using Workers itself has many other added benefits such as zero cold starts. This introduction to serverless technology and my experience deep-diving into the capabilities of Workers has really opened my eyes to the possibilities that Workers as a platform can offer. With Workers, you can deploy any application on Cloudflare’s global network like I did!

For the frontend of the web application, I used React and the Chakra-UI library to build the user interface for which the Zone Viewer, Zone Comparison, and Zone Copier, is based on. The routing between different pages was done using React Router and the application is deployed directly through Workers.

Here is a screenshot of the application:

 Software Development InternPresenting the prototype application

As developers will know, the best way to obtain feedback for the tool that you’re building is to directly have your customers use them and let you know what they think of your application and the kind of features they want to have built on top of it. Therefore, once we had a prototype version of the web application for the Zone Viewer and Zone Comparison complete, we presented the application to the Solutions Engineering team to hear their thoughts on the impact the tool would have on their work and additional features they would like to see built on the application. I found this process very enriching as they collectively mentioned how impactful the application would be for their work and the value add this project provides to them.

Some interesting feedback and feature requests I received were:

  1. The Zone Copier would definitely be very useful for our partners who have to replicate the configuration of one zone to another regularly, and it's definitely going to help make sure there are less human errors in the process of configuring the setups.
  2. Besides duplicating configurations from zone-to-zone, could we use this to replicate the configurations from a best-in-class setup for different use cases and allow partners to deploy this with a few clicks?
  3. Can we use this tool to generate quarterly reports?
  4. The Zone Viewer would be very helpful for us when we produce documentation on a particular zone's configuration as part of a POC report.
  5. The Zone Viewer will also give us much deeper insight to better understand the current zone configurations and provide recommendations to improve it.

It was also a very cool experience speaking to the broad Solutions Engineering team as I found that many were very technically inclined and had many valid suggestions for improving the architecture and development of the applications. A special thanks to Edwin Wong for setting up the sharing session with the internal team, and many thanks to Xin Meng, AQ Jiao, Yonggil Choi, Steve Molloy, Kyouhei Hayama, Claire Lim and Jamal Boutkabout for their great insight and suggestions!

Impact of Cloudflare outside of work

While Cloudflare is known for its impeccable transparency throughout the company, and the stellar products it provides in helping make the Internet better, I wanted to take this opportunity to talk about the other endeavors that the company has too.

Cloudflare is part of the Pledge 1%, where the company dedicates 1% of products and 1% of our time to give back to the local communities as well as all the communities we support online around the world.

I took part in one of these activities, where we spent a morning cleaning up parts of the East Coast Park beach, by picking up trash and litter that had been left behind by other park users. Here’s a picture of us from that morning:

 Software Development Intern

From day one, I have been thoroughly impressed by Cloudflare’s commitment to its culture and the effort everyone at Cloudflare puts in to make the company a great place to work and have a positive impact on the surrounding community.

In addition to giving back to the community, other aspects of company culture include having a good team spirit and safe working environment where you feel appreciated and taken care of. At Cloudflare, I have found that everyone is very understanding of work commitments. I faced a few challenges during the internship where I had to spend additional time on university related projects and work, and my manager has always been very supportive and understanding if I required additional time to complete parts of the internship project.

Concluding takeaways

My experience interning at Cloudflare has been extremely positive, and I have seen first hand how transparent the company is with not only its employees but also its customers, and it truly is a great place to work. Cloudflare’s collaborative culture allowed me to access members from different teams, to obtain their thoughts and assistance with certain issues that I faced from time to time. I would not have been able to produce an impactful project without the help of the different brilliant, and motivated, people I worked with across the span of the internship, and I am truly grateful for such a rewarding experience.

We are getting ready to open intern roles for this coming Fall, so we encourage you to visit our careers page frequently, to be up-to-date on all the opportunities we have within our teams.

Categories: Technology

Integrating Network Analytics Logs with your SIEM dashboard

Tue, 17/05/2022 - 16:46
Integrating Network Analytics Logs with your SIEM dashboardIntegrating Network Analytics Logs with your SIEM dashboard

We’re excited to announce the availability of Network Analytics Logs. Magic Transit, Magic Firewall, Magic WAN, and Spectrum customers on the Enterprise plan can feed packet samples directly into storage services, network monitoring tools such as Kentik, or their Security Information Event Management (SIEM) systems such as Splunk to gain near real-time visibility into network traffic and DDoS attacks.

What’s included in the logs

By creating a Network Analytics Logs job, Cloudflare will continuously push logs of packet samples directly to the HTTP endpoint of your choice, including Websockets. The logs arrive in JSON format which makes them easy to parse, transform, and aggregate. The logs include packet samples of traffic dropped and passed by the following systems:

  1. Network-layer DDoS Protection Ruleset
  2. Advanced TCP Protection
  3. Magic Firewall

Note that not all mitigation systems are applicable to all Cloudflare services. Below is a table describing which mitigation service is applicable to which Cloudflare service:

.tg {border-collapse:collapse;border-spacing:0;} .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; overflow:hidden;padding:10px 5px;word-break:normal;} .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px; font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;} .tg .tg-1wig{font-weight:bold;text-align:left;vertical-align:top} .tg .tg-baqh{text-align:center;vertical-align:top} .tg .tg-0lax{text-align:left;vertical-align:top} .tg .tg-amwm{font-weight:bold;text-align:center;vertical-align:top}
Mitigation System Cloudflare Service Magic Transit Magic WAN Spectrum Network-layer DDoS Protection Ruleset ✅ ❌ ✅ Advanced TCP Protection ✅ ❌ ❌ Magic Firewall ✅ ✅ ❌

Packets are processed by the mitigation systems in the order outlined above. Therefore, a packet that passed all three systems may produce three packet samples, one from each system. This can be very insightful when troubleshooting and wanting to understand where in the stack a packet was dropped. To avoid overcounting the total passed traffic, Magic Transit users should only take into consideration the passed packets from the last mitigation system, Magic Firewall.

An example of a packet sample log:

{"AttackCampaignID":"","AttackID":"","ColoName":"bkk06","Datetime":1652295571783000000,"DestinationASN":13335,"Direction":"ingress","IPDestinationAddress":"(redacted)","IPDestinationSubnet":"/24","IPProtocol":17,"IPSourceAddress":"(redacted)","IPSourceSubnet":"/24","MitigationReason":"","MitigationScope":"","MitigationSystem":"magic-firewall","Outcome":"pass","ProtocolState":"","RuleID":"(redacted)","RulesetID":"(redacted)","RulesetOverrideID":"","SampleInterval":100,"SourceASN":38794,"Verdict":"drop"}

All the available log fields are documented here: https://developers.cloudflare.com/logs/reference/log-fields/account/network_analytics_logs/

Setting up the logs

In this walkthrough, we will demonstrate how to feed the Network Analytics Logs into Splunk via Postman. At this time, it is only possible to set up Network Analytics Logs via API. Setting up the logs requires three main steps:

  1. Create a Cloudflare API token.
  2. Create a Splunk Cloud HTTP Event Collector (HEC) token.
  3. Create and enable a Cloudflare Logpush job.

Let’s get started!

1) Create a Cloudflare API token
  1. Log in to your Cloudflare account and navigate to My Profile.
  2. On the left-hand side, in the collapsing navigation menu, click API Tokens.
  3. Click Create Token and then, under Custom token, click Get started.
  4. Give your custom token a name, and select an Account scoped permission to edit Logs. You can also scope it to a specific/subset/all of your accounts.
  5. At the bottom, click Continue to summary, and then Create Token.
  6. Copy and save your token. You can also test your token with the provided snippet in Terminal.

When you're using an API token, you don't need to provide your email address as part of the API credentials.

Integrating Network Analytics Logs with your SIEM dashboard

Read more about creating an API token on the Cloudflare Developers website: https://developers.cloudflare.com/api/tokens/create/

2) Create a Splunk token for an HTTP Event Collector

In this walkthrough, we’re using a Splunk Cloud free trial, but you can use almost any service that can accept logs over HTTPS. In some cases, if you’re using an on-premise SIEM solution, you may need to allowlist Cloudflare IP address in your firewall to be able to receive the logs.

  1. Create a Splunk Cloud account. I created a trial account for the purpose of this blog.
  2. In the Splunk Cloud dashboard, go to Settings > Data Input.
  3. Next to HTTP Event Collector, click Add new.
  4. Follow the steps to create a token.
  5. Copy your token and your allocated Splunk hostname and save both for later.
Integrating Network Analytics Logs with your SIEM dashboard

Read more about using Splunk with Cloudflare Logpush on the Cloudflare Developers website: https://developers.cloudflare.com/logs/get-started/enable-destinations/splunk/

Read more about creating an HTTP Event Collector token on Splunk’s website: https://docs.splunk.com/Documentation/Splunk/8.2.6/Data/UsetheHTTPEventCollector

3) Create a Cloudflare Logpush job

Creating and enabling a job is very straightforward. It requires only one API call to Cloudflare to create and enable a job.

To send the API calls I used Postman, which is a user-friendly API client that was recommended to me by a colleague. It allows you to save and customize API calls. You can also use Terminal/CMD or any other API client/script of your choice.

One thing to notice is Network Analytics Logs are account-scoped. The API endpoint is therefore a tad different from what you would normally use for zone-scoped datasets such as HTTP request logs and DNS logs.

This is the endpoint for creating an account-scoped Logpush job:

https://api.cloudflare.com/client/v4/accounts/{account-id}/logpush/jobs

Your account identifier number is a unique identifier of your account. It is a string of 32 numbers and characters. If you’re not sure what your account identifier is, log in to Cloudflare, select the appropriate account, and copy the string at the end of the URL.

https://dash.cloudflare.com/{account-id}

Then, set up a new request in Postman (or any other API client/CLI tool).

To successfully create a Logpush job, you’ll need the HTTP method, URL, Authorization token, and request body (data). The request body must include a destination configuration (destination_conf), the specified dataset (network_analytics_logs, in our case), and the token (your Splunk token).

Integrating Network Analytics Logs with your SIEM dashboard

Method:

POST

URL:

https://api.cloudflare.com/client/v4/accounts/{account-id}/logpush/jobs

Authorization: Define a Bearer authorization in the Authorization tab, or add it to the header, and add your Cloudflare API token.

Body: Select a Raw > JSON

{ "destination_conf": "{your-unique-splunk-configuration}", "dataset": "network_analytics_logs", "token": "{your-splunk-hec-tag}", "enabled": "true" }

If you’re using Splunk Cloud, then your unique configuration has the following format:

{your-unique-splunk-configuration}=splunk://{your-splunk-hostname}.splunkcloud.com:8088/services/collector/raw?channel={channel-id}&header_Authorization=Splunk%20{your-splunk–hec-token}&insecure-skip-verify=false

Definition of the variables:

{your-splunk-hostname}= Your allocated Splunk Cloud hostname.

{channel-id} = A unique ID that you choose to assign for.`{your-splunk–hec-token}` = The token that you generated for your Splunk HEC.

An important note is that customers should have a valid SSL/TLS certificate on their Splunk instance to support an encrypted connection.

After you’ve done that, you can create a GET request to the same URL (no request body needed) to verify that the job was created and is enabled.

The response should be similar to the following:

{ "errors": [], "messages": [], "result": { "id": {job-id}, "dataset": "network_analytics_logs", "frequency": "high", "kind": "", "enabled": true, "name": null, "logpull_options": null, "destination_conf": "{your-unique-splunk-configuration}", "last_complete": null, "last_error": null, "error_message": null }, "success": true }

Shortly after, you should start receiving logs to your Splunk HEC.

Integrating Network Analytics Logs with your SIEM dashboard

Read more about enabling Logpush on the Cloudflare Developers website: https://developers.cloudflare.com/logs/reference/logpush-api-configuration/examples/example-logpush-curl/

Reduce costs with R2 storage

Depending on the amount of logs that you read and write, the cost of third party cloud storage can skyrocket — forcing you to decide between managing a tight budget and being able to properly investigate networking and security issues. However, we believe that you shouldn’t have to make those trade-offs. With R2’s low costs, we’re making this decision easier for our customers. Instead of feeding logs to a third party, you can reap the cost benefits of storing them in R2.

To learn more about the R2 features and pricing, check out the full blog post. To enable R2, contact your account team.

Cloudflare logs for maximum visibility

Cloudflare Enterprise customers have access to detailed logs of the metadata generated by our products. These logs are helpful for troubleshooting, identifying network and configuration adjustments, and generating reports, especially when combined with logs from other sources, such as your servers, firewalls, routers, and other appliances.

Network Analytics Logs joins Cloudflare’s family of products on Logpush: DNS logs, Firewall events, HTTP requests, NEL reports, Spectrum events, Audit logs, Gateway DNS, Gateway HTTP, and Gateway Network.

Not using Cloudflare yet? Start now with our Free and Pro plans to protect your websites against DDoS attacks, or contact us for comprehensive DDoS protection and firewall-as-a-service for your entire network.

Categories: Technology

Debugging Hardware Performance on Gen X Servers

Tue, 17/05/2022 - 13:59
Debugging Hardware Performance on Gen X ServersDebugging Hardware Performance on Gen X Servers

In Cloudflare’s global network, every server runs the whole software stack. Therefore, it's critical that every server performs to its maximum potential capacity. In order to provide us better flexibility from a supply chain perspective, we buy server hardware from multiple vendors with the exact same configuration. However, after the deployment of our Gen X AMD EPYC Zen 2 (Rome) servers, we noticed that servers from one vendor (which we’ll call SKU-B) were consistently performing 5-10% worse than servers from second vendor (which we'll call SKU-A).

The graph below shows the performance discrepancy between the two SKUs in terms of percentage difference. The performance is gauged on the metric of requests per second, and this data is an average of observations captured over 24 hours.

Debugging Hardware Performance on Gen X ServersMachines before implementing performance improvements. The average RPS for SKU-B is approximately 10% below SKU-A.Compute performance via DGEMM

The initial debugging efforts centered around the compute performance. We ran AMD’s DGEMM high performance computing tool to determine if CPU performance was the cause. DGEMM is designed to measure the sustained floating-point computation rate of a single server. Specifically, the code measures the floating point rate of execution of a real matrix–matrix multiplication with double precision. We ran a modified version of DGEMM equipped with specific AMD libraries to support the EPYC processor instruction set.

The DGEMM test brought out a few points related to the processor’s Thermal Design Power (TDP). TDP refers to the maximum power that a processor can draw for a thermally significant period while running a software application. The underperforming servers were only drawing 215 to 220 watts of power when fully stressed, whereas the max supported TDP on the processors is 240 watts. Additionally, their floating-point computation rate was ~100 gigaflops (GFLOPS) behind the better performing servers.

Screenshot from the DGEMM run of a good system:

Debugging Hardware Performance on Gen X Servers

Screenshot from an underperforming system:

Debugging Hardware Performance on Gen X Servers

To debug the issue, we first tried disabling idle power saving mode (also known as C-states) in the CPU BIOS configuration. The servers reported expected GFLOPS numbers and achieved max TDP. Believing that this could have been the root cause, the servers were moved back into the production test environment for data collection.

However, the performance delta was still there.

Further debugging

We started the debugging process all over again by comparing the BIOS settings logs of both SKU-A and SKU-B. Once this debugging option was exhausted, the focus shifted to the network interface. We ran the open source network interface tool iPerf to check if there were any bottlenecks — no issues were observed either.

During this activity, we noticed that the BIOS of both SKUs were not using the AMD Preferred I/O functionality, which allows devices on a single PCIe bus to obtain improved DMA write performance. We enabled the Preferred I/O option on SKU-B and tested the change in production. Unfortunately, there were no performance gains. After the focus on network activity, the team started looking into memory configuration and operating speed.

AMD HSMP tool & Infinity Fabric diagnosis

The Gen X systems are configured with DDR4 memory modules that can support a maximum of 2933 megatransfers per second (MT/s). With the BIOS configuration for memory clock Frequency set to Auto, the 2933 MT/s automatically configures the memory clock frequency to 1467 MHz. Double Data Rate (DDR) technology allows for the memory signal to be sampled twice per clock cycle: once on the rising edge and once on the falling edge of the clock signal. Because of this, the reported memory speed rate is twice the true memory clock frequency. Memory bandwidth was validated to be working quite well by running a Stream benchmark test.

Running out of debugging options, we reached out to AMD for assistance and were provided a debug tool called HSMP that lets users access the Host System Management Port. This tool provides a wide variety of processor management options, such as reading and changing processor TDP limits, reading processor and memory temperatures, and reading memory and processor clock frequencies. When we ran the HSMP tool, we discovered that the memory was being clocked at a different rate from AMD’s Infinity Fabric system — the architecture which connects the memory to the processor core. As shown below, the memory clock frequency was set to 1467 MHz while the Infinity Fabric clock frequency was set to 1333 MHz.

Debugging Hardware Performance on Gen X Servers

Ideally, the two clock frequencies need to be equal for optimized workloads sensitive to latency and throughput. Below is a snapshot for an SKU-A server where both memory and Infinity Fabric frequencies are equal.

Debugging Hardware Performance on Gen X ServersImproved Performance

Since the Infinity Fabric clock setting was not exposed as a tunable parameter in the BIOS, we asked the vendor to provide a new BIOS that set the frequency to 1467 MHz during compile time. Once we deployed the new BIOS on the underperforming systems in production, we saw that all servers started performing to their expected levels. Below is a snapshot of the same set of systems with data captured and averaged over a 24 hours window on the same day of the week as the previous dataset. Now, the requests per second performance of SKU-B equals and in some cases exceeds the performance of SKU-A!

Debugging Hardware Performance on Gen X ServersInternet Requests Per Second (RPS) for four SKU-A machines and four SKU-B machines after implementing the BIOS fix on SKU-B. The RPS of SKU-B now equals the RPS of SKU-A.

Hello, I am Yasir Jamal. I recently joined Cloudflare as a Hardware Engineer with the goal of helping provide a better Internet to everyone. If you share the same interest, come join us!

Categories: Technology

Announcing our Spring Developer Challenge

Tue, 17/05/2022 - 13:58
Announcing our Spring Developer ChallengeAnnouncing our Spring Developer Challenge

After many announcements from Platform Week, we’re thrilled to make one more: our Spring Developer Challenge!

The theme for this challenge is building real-time, collaborative applications — one of the most exciting use-cases emerging in the Cloudflare ecosystem. This is an opportunity for developers to merge their ideas with our newly released features, earn recognition on our blog, and take home our best swag yet.

Here’s a list of our tools that will get you started:

  • Workers can either be powerful middleware connecting your app to different APIs and an origin — or it can be the entire application itself. We recommend using Worktop, a popular framework for Workers, if you need TypeScript support, routing, and well-organized submodules. Worktop can also complement your existing app even if it already uses a framework,  such as Svelte.
  • Cloudflare Pages makes it incredibly easy to deploy sites, which you can make into truly dynamic apps by putting a Worker in front or using the Pages Functions (beta).
  • Durable Objects are great for collaborative apps because you can use websockets while coordinating state at the edge, seen in this chat demo. To help scale any load, we also recommend Durable Object Groups.
  • Workers KV provides a global key-value data store that securely stores and quickly serves data across Cloudflare’s network. R2 allows you to store enormous amounts of data without trapping you with costly egress services.

Last year, our Developer Spotlight series highlighted how developers around the world built entire applications on Cloudflare. Our Discord server maintained that momentum with users demonstrating that any type of application can be built. Need a way to organize thousands of lines of JSON? JSON Hero, built with Remix and deployed with Workers, provides an incredibly readable UI for your JSON files. Trying to deploy a GraphQL server for your app that scales? helix-flare deploys a GraphQL server easily through Workers and uses Durable Objects to coordinate data.

We hope developers continue to explore the boundaries of what they can build on Cloudflare as our platform evolves. During our Summer Developer Challenge in 2021, we received over 1,200 submissions that revealed you can build almost any app imaginable with Workers, Pages, and the rest of the developer ecosystem. We sent out hundreds of swag boxes to participants, to show our appreciation. The ensuing unboxing videos on Twitter and YouTube thrilled our team.

This year’s Spring Developer Challenge is all about making real-time, collaborative apps such as chat rooms, games, web-based editing tools, or anything else in your imagination! Here are the rules:

This year’s Spring Developer Challenge is all about making real-time, collaborative apps such as chat rooms, games, web-based editing tools, or anything else in your imagination! Here are the rules:

  • You must be at least 18 years old to participate
  • You can work in teams of up to 10 people per submission
  • The deadline to submit your repo is May 24 June 7

Enter the challenge by going to this site.

As you build your app, join our Discord if you or your team need any help. We will be enthusiastically reviewing submissions, promoting them on Twitter, and sending out swag boxes.

If you’re new to Cloudflare or have an exciting idea as a developer, this is your opportunity to see how far our platform has evolved and get rewarded for it!

Categories: Technology

Network performance update: Platform Week

Mon, 16/05/2022 - 14:45
 Platform Week Platform Week

In September 2021, we shared extensive benchmarking results of 1,000 networks all around the world. The results showed that on a range of tests (TCP connection time, time to first byte, time to last byte), and on different measures (p95, mean), Cloudflare was the fastest provider in 49% of networks around the world. Since then, we’ve worked to continuously improve performance, with the ultimate goal of being the fastest everywhere and an intermediate goal to grow the number of networks where we’re the fastest by at least 10% every Innovation Week. We met that goal during Security Week (March 2022), and we’re carrying the work over to Platform Week (May 2022).

We’re excited to update you on the latest results, but before we do: after running with this benchmark for nine months, we've also been looking for ways to improve the benchmark itself — to make it even more representative of speeds in the real world. To that end, we're expanding our measured networks from 1,000 to 3,000, to give an even more accurate sense of real world performance across the globe.

In terms of results: using the old benchmark of 1,000 networks, we’re the fastest in 69% of them. In this new expanded view of 3,000 networks, we’re the fastest in 42% of them. We’ve demonstrated a consistent ability to improve our performance against what we measure, and we’re excited to optimize our performance and lift our ranking in these smaller networks all around the world.

In addition to sharing a general update on where our network performance stands, we’re also sharing updated performance metrics on our Workers platform (given that it’s Platform Week!). We’ve done an extensive benchmark of Cloudflare Workers vs Fastly’s Compute@Edge.

We’ve got the results below, but before we get to that, we want to spend a bit of time on our revamped measurements.

Revamped measurements

A few months ago, we discussed the performance of Cloudflare Workers, as compared to other similar offerings out there. We compared our performance to Fastly’s Compute@Edge, showing that Workers was significantly faster.

After we published our results, there were questions and suggestions on how to improve our testing methodology (including sharing more detail about where and how we ran the tests). As we re-ran tests for this iteration, we made some small changes, and also worked to address the suggestions from the community. Let’s talk about what’s changed and why.

Measuring what matters

To quantify global network performance, we have to get enough data from around the world, across all manner of different networks, comparing ourselves with other providers. We used Real User Measurements (RUM) to fetch a 100kB file from different providers. Users around the world report the performance of different providers. The more users who report the data, the higher fidelity the signal is. The goal is to provide an accurate picture of where different providers are faster, and more importantly, where Cloudflare can improve. You can read more about the methodology in the original Speed Week blog post here.

In the process of quantifying network performance, it became clear where we needed to expand our scope in measuring performance. After Security Week, we were fastest in 71% of networks, and we decided that we wanted to expand the pool of networks from the top 1,000 most reported networks to the top 3,000 most reported networks.

We’ve shown the graph detailing the number of networks where Cloudflare is #1 in network performance, but from here on out, we will only be showing this graph in percentages of networks where we are number one out of 100%, since the denominator will be changing going forward as we set ourselves harder and harder challenges.

Benchmarking for everyone

At the time we published our earlier set of benchmarks, we had an unnecessary “no benchmarking” clause in our Terms of Service. It has since been removed. It’s been a while since we’ve worried about such things, and the clause lived past its intended life in our ToS.

We’ve done the work to show where we are the fastest provider, and it’s important that everyone else be able to validate that independently. We’re also confident that well run benchmarks will only help further improve performance for Workers, other Cloudflare products, and the Internet as a whole. Game on.

Measuring Workers performance from real users

To run our tests, we used Catchpoint, an industry standard “synthetic” testing tool, and measurements collected from real users distributed around the world. Catchpoint is a monitoring platform that has around 2,000 total endpoints distributed around the world that can be configured to fetch specific resources and time each test. Catchpoint is useful for network providers like us as it provides a consistent, repeatable way to measure end-to-end performance of a workload, and delivers a best-effort approximation for what a user sees.

Catchpoint has a series of backbone nodes that are embedded in ISPs around the world. That means that these nodes are plugged into ISP routers just like you are, and the traffic goes through the ISP network to each endpoint they are monitoring. These can approximate a real user, but they will never truly approximate a real user. For example, the bandwidth for these nodes is 100% dedicated for platform monitoring, as opposed to your home Internet connection, where your Internet experience will be a mixed bag of different use cases, some of which won’t talk to Workers applications at all.

For this new test, we chose 300 backbone nodes that are embedded in last mile ISPs around the world. We filtered out nodes in cloud providers, or in metro areas with multiple transit options, trying to remove duplicate paths as much as possible.

We cross-checked these tests with our own data set, which is collected from users connecting to free websites when they are served 1xxx error pages, similar to how we collect data for global network performance. When a user sees this error page, that page will contain a call that will execute these tests and upload performance metrics on these calls to Cloudflare. Users would run these calls independently of the error page, ensuring that Cloudflare did not get a head start in these tests.

We also changed our test methodology to use paid accounts for both Fastly and Cloudflare.

Changing the numbers we look at

For this new test, we decided to look at Wait time in addition to Time to First Byte. Time to First Byte is the time it takes for a web server to establish a connection, fetch the content, and deliver the first byte of a response to a user, and Wait time is the time the client spends waiting for the server to send back that first byte after the connection is established. We are using this particular measurement to look at the actual time it takes for the server to compute the request on the machine. Wait time is a subcomponent of TTFB, and contains the machine processing time, the time it takes the server to give the response to the socket to be sent back to the user, and the time it takes the response to reach the user. There could be latency in passing the request to the socket on the machine, but we measured the actual time it took the server to send the request for all tests and found it to be zero:

 Platform Week

So we would calculate the wait time as the amount of time spent on the box plus the time it took the server to send the packet over the wire to the client.

However, others have noted that using Time to First Byte to measure performance of serverless computing solutions could potentially be misleading because user performance measurements can be influenced by more than just the time spent computing functions on the machine. For example, things like the time to connect to the server, DNS resolution, and cache times can impact Time to First Byte. Time to connect to the server (connect) can be impacted by how well peered a network is (or how poorly). DNS resolution can be impacted by client behavior, local DNS behavior, or by the performance of the provider’s DNS. Cache times can be driven by the performance of the cache on the server itself.

We agree that it is difficult to tease out computing time from Time to First Byte. To look at that particular aspect of serverless computing, we look at Wait, which is a value that contains the least amount of variables: we aren’t caching anything during our tests, DNS isn’t part of the wait time, and the only thing impacting Wait aside from the time spent on the machine is the Connect time.

However, since we’re trying to measure the end user experience and not just the amount of time spent on a server, we want to use both the Wait and TTFB so that we can show the time spent both on the server and how that impacts end-to-end performance.

What’s in the test?

For our test this time around, we decided to measure three things: a simple JavaScript function like last time, a complex JavaScript function, and a complex Rust function. Here are the functions for each of them:

async function getErrorResponse(event, message, status) { return new Response(message, {status: status, headers: {'Content-Type': 'text/plain'}}); } function testHardBusyLoop() { let value = 0; let offset = Date.now(); for (let n = 0; n < 15000; n++) { value += Math.floor(Math.abs(Math.sin(offset + n)) * 10); } return value; } fn test_hard_busy_loop() -> i32 { let mut value = 0; let offset = Date::now().as_millis(); for n in 0..15000 { value += (((offset + n) as f64).sin().abs() * 10.0) as i32; } value }

The goals of each of these tests is simple: test the ability of Workers and Compute@Edge to perform compute actions.

Latest update on network performance

We have two updates on data: a general network performance update, and then data on how Workers compares with Compute@Edge.

At Security Week (March 2022), we shared that we were faster in more of the most reported networks than our competitors. Out of the top 1,000 networks in the world (by number of IPv4 addresses advertised), here’s a breakdown of how many providers are number one in p95 TCP Connection Time, which represents the time it takes for a user to connect to the provider. This data is from Security Week (March 2022):

 Platform Week

Recognizing that we are now looking at different numbers of networks, here is what the distribution looks like for the top 3,000 networks for Platform Week (May 2022):

 Platform Week

In addition to being the fastest across popular networks, Cloudflare is also committed to being the fastest provider in every country.

Using data on the top 1,000 networks from Full Stack Week (March 2022), here’s what the world map looks like:

 Platform Week

And here’s what the world looks like while looking at the top 3,000 networks:

 Platform Week

Cloudflare became #1 in more countries in Africa and Europe, some of which previously did not have enough samples to officially be counted to one provider or another.

Workers vs Compute@Edge

Moving on from general network results, what does performance look like when comparing serverless compute products across providers — in this case, Cloudflare Workers performance to Compute@Edge? Looking at the Catchpoint data, the first thing we noticed was Cloudflare is faster than Fastly in all tests at the 95th percentile for Time to First Byte:

 Platform Week Test 95th percentile TTFB (ms) Cloudflare JS no-op 469 Fastly JS no-op 596 Cloudflare JS hard 481 Fastly JS hard 631 Cloudflare Rust hard 493 Fastly Rust hard 577

Cloudflare is significantly faster at all tests compared to Fastly. But let’s dig into why that is. Looking at p95 Wait, we can see that Cloudflare does have an edge in most tests related to compute on-box:

 Platform Week Test 95th Percentile Wait (ms) Cloudflare JS no-op 123 Fastly JS no-op 123 Cloudflare JS hard 136 Fastly JS hard 170 Cloudflare Rust hard 160 Fastly Rust hard 121

Looking at the Wait times, you can see that Cloudflare does have a significant edge in on-box performance, except in Rust, where Fastly claims their workloads are the most optimized. This data backs up that claim. But why is Fastly so much slower on Time to First Byte? The answer lies in the rest of the request. While latency spent in compute matters, it only matters in conjunction with the rest of the network performance. And Cloudflare has an advantage over Fastly in Connect times and SSL establishment times:

 Platform Week Test 95th Percentile Connect (ms) 95th Percentile SSL (ms) Cloudflare JS no-op 81 289 Fastly JS no-op 88 293 Cloudflare JS hard 81 286 Fastly JS hard 88 291 Cloudflare Rust hard 81 288 Fastly Rust hard 88 295

Cloudflare’s hyper-optimized web stack allows us to process all requests faster, meaning that Workers code gets started faster. Having that extra head start in Connect and SSL allows us to further increase the distance between us and Compute@Edge.

To verify these results, we compared the Catchpoint results to our own data set. Here is the p95 TTFB for the JavaScript and Rust hard loops for both Fastly and Cloudflare from our data:

 Platform Week

As you can see, Cloudflare is faster on JavaScript and Rust calls. And when you look at wait time, you will see that outside of Catchpoint results, Cloudflare even beats Fastly in the Rust hard tests:

 Platform Week

The big takeaway from this is that in addition to Cloudflare being faster for the time spent processing requests, Cloudflare’s network and performance optimizations as a whole set us apart and make our Workers platform even faster.

A fast network makes for a faster developer platform

Cloudflare’s commitment to building the fastest network allows us to deliver unparalleled performance for all applications, including our developer tools. Whether it’s by accelerating Cloudflare Pages performance by hosting on every single Cloudflare server, or by deploying Workers in 275 cities without developers needing to configure a thing, Cloudflare’s developer tools are all built on top of Cloudflare’s global network. We’re committed to making our network faster so that our developer products are as performant as they could possibly be.

And it’s not just the developer platform. Cloudflare runs an integrated and optimized stack that includes DDoS protection, WAF, rate limiting, bot management, caching, SSL, smart routing, and more. By having a single software stack we are able to offer the widest range of features while ensuring that performance (no matter which of our products you use) remains excellent. We don’t want you to have to compromise performance to get security or vice versa.

Categories: Technology

How Ramadan shows up in Internet trends

Mon, 16/05/2022 - 14:44
How Ramadan shows up in Internet trendsHow Ramadan shows up in Internet trends

What happens to the Internet traffic in countries where many observe Ramadan? Depending on the country, there are clear shifts and changing patterns in Internet use, particularly before dawn and after sunset.

This year, Ramadan started on April 2, and it continued until May 1, 2022, (dates vary and are dependent on the appearance of the crescent moon). For Muslims, it is a period of introspection, communal prayer and also of fasting every day from dawn to sunset. That means that people only eat at night (Iftar is the first meal after sunset that breaks the fast and often also a family or community event), and also before sunrise (Suhur).

In some countries, the impact is so big that we can see in our Internet traffic charts when the sun sets. Sunrise is more difficult to check in the charts, but in the countries more impacted, people wake up much earlier than usual and were using the Internet in the early morning because of that.

Cloudflare Radar data shows that Internet traffic was impacted in several countries by Ramadan, with a clear increase in traffic before sunrise, and a bigger than usual decrease after sunset. All times in this blog post are local. The data in the charts is bucketed into hours. So, for example, when we show an increase in traffic at 0400 we are showing that an increase occurred between 0400 and 0459 local time.

Indonesia is a clear example of that, showing trends that continued until the end of Ramadan:

How Ramadan shows up in Internet trends

In the next table, we show a country ranking by order of impact. Here, we include traffic changes before dawn and after sunset. In the last column, you can also see the change in traffic after Ramadan ended, right after sunset. In this case, we’re looking at Wednesday, May 4, right after the Eid al-fitr — the May 2-3, 2022 holiday of breaking the fast, in a comparison with the previous Wednesday at the same time (when Ramadan was ongoing):

Internet traffic: Ramadan’s impact Before sunrise After sunset Post-Ramadan, May 4 (after sunset) Afghanistan +203% -28% +20% Pakistan +119% -39% +13% Indonesia +98% -13% - Morocco +90% -36% +44% Libya +81% -27% +48% Turkey +78% -19% +22% Bangladesh +62% -40% +12% Saudi Arabia +55% -45% -5% United Arab Emirates +52% -13% +4% Bahrain +44% -31% +21% Malaysia +41% -8% -9% Qatar +35% -23% +5% Egypt +31% -32% +56% Tunisia +25% -43% +101% Iran +24% +10% -12% Singapore +8% -5% +4% India - -15% -

Afghanistan, Pakistan, Indonesia, Morocco, Libya and Turkey had the biggest impact in an increase in traffic before sunrise. After sunset, it was (by order of impact) Saudi Arabia, Tunisia, Bangladesh, Pakistan that showed a more clear decrease in traffic after sunset.

Here’s the impact of the start of Ramadan on Bangladesh, with more highlights inside the next chart:

How Ramadan shows up in Internet trendsWaking up earlier

There’s a clear pattern in most of the countries, Internet traffic was much higher than usual between 04:00 to 04:59 local time (where usually it’s the time with the lowest traffic).

The same early spike is seen in Turkey and the United Arab Emirates. In the case of the United Arab Emirates, the time before sunrise for the Suhur meal had more mobile usage than usual (so people were using their mobile devices to access the Internet more than usual at that time).

That’s also the case for Pakistan, where traffic is 119% higher on the 04:00 to 04:59 hour on April 3, than on the previous Sunday, but also in Qatar (sunrise at 05:25 and a spike of 35%) or Afghanistan. In the latter, the spike is 203% higher:

How Ramadan shows up in Internet trends

We also saw the same trend in Indonesia, sunrise was at 05:55 local time at the beginning of April, and there’s a clear spike in traffic in the 04:00 to 04:59 hour with a 98% growth in requests.

Northern African countries like Egypt, Tunisia, Morocco or Libya (sunrise at 06:54), show the same 04:00 to 04:59 hour spike. In Libya, traffic was 81% higher on Sunday, April 3, than it was the previous Sunday at the same time. Usually, the 04:00 to 04:59 hour is the lowest point in traffic in the country, but on April 3 and the following days it was at 08:00.

Saudi Arabia shows a similar pattern in terms of Internet traffic on Sunday, April 3, 2022, sunrise was at 05:44, and there was 55% more Internet use than at the same time on the previous Sunday, before Ramadan.

How Ramadan shows up in Internet trendsDoes daily total Internet traffic go up or down?

The short answer is: depends on the country, given that there are examples of a  general increase and decrease in traffic in the most impacted countries. We see similar trends for the sunset and sunrise times of day, but it's a different story throughout the 30 days of Ramadan.

Iran, in general, shows an increase in traffic after Ramadan started on April 2, and a decrease after it ended on May 3 (of around 15%).

How Ramadan shows up in Internet trends

Something similar is seen in Pakistan, that had a general decrease in traffic the week after Ramadan ended, but during the 18:00 to 18:59 hour, May 4, had 13% more traffic than at the same time on the previous Wednesday, when Ramadan was being observed and the iftar meal would have happened during the 18:00 to 18:59 hour.

How Ramadan shows up in Internet trends

The opposite happens in Libya, where traffic, generally speaking, declined during Ramadan and picked up after — comparing Wednesday, May 4, 2022, with the previous one during the 19:00 to 19:59 hour, traffic grew around 48%. The same trend is seen in another North African country: Morocco (growth of 44% after Ramadan ended).

How Ramadan shows up in Internet trendsAfter Ramadan, sunsets ‘bring’ more Internet traffic

Another pattern, unsurprisingly, that our chart at the beginning of this blog post shows is how the sunset period changes when Ramadan (and the holiday that follows) ends, in most cases clearly increasing traffic at around 18:00 or 19:00.

Of the 16 countries with a bigger Ramadan impact, only four had a decrease in traffic after sunset on May 4: Iran, Indonesia, Saudi Arabia and Malaysia. All of these countries had an increase (or sustained traffic) in daily traffic during Ramadan and lost daily Internet usage after it ended (in May).

Here’s the example of Indonesia through the Ramadan period that includes April and May:

How Ramadan shows up in Internet trends

And a zoomed-in Indonesia chart after Ramadan ended (May 1, but bear in mind that May 2-3 is the holiday Eid al-fitr) that shows not only the general decrease in traffic, but also how the sunset period doesn’t have a clear drop in requests as seen in the Ramadan period:

How Ramadan shows up in Internet trendsConclusion: a human impact

Ramadan has a clear impact on Internet traffic patterns as humans change their habits.

The Internet may be the network of networks, where there are many bots (friendly and less friendly), but it continues to be a human-powered network, made by humans for humans.

Follow our Internet trends (including details about ASNs) on Cloudflare Radar, and also on Radar’s Twitter account.

Categories: Technology

Proof of Stake and our next experiments in web3

Mon, 16/05/2022 - 13:58
Proof of Stake and our next experiments in web3Proof of Stake and our next experiments in web3

A little under four years ago we announced Cloudflare's first experiments in web3 with our gateway to the InterPlanetary File System (IPFS). Three years ago we announced our experimental Ethereum Gateway. At Cloudflare, we often take experimental bets on the future of the Internet to help new technologies gain maturity and stability and for us to gain deep understanding of them.

Four years after our initial experiments in web3, it’s time to launch our next series of experiments to help advance the state of the art. These experiments are focused on finding new ways to help solve the scale and environmental challenges that face blockchain technologies today. Over the last two years there has been a rapid increase in the use of the underlying technologies that power web3. This growth has been a catalyst for a generation of startups; and yet it has also had negative externalities.

At Cloudflare, we are committed to helping to build a better Internet. That commitment balances technology and impact. The impact of certain older blockchain technologies on the environment has been challenging. The Proof of Work (PoW) consensus systems that secure many blockchains were instrumental in the bootstrapping of the web3 ecosystem. However, these systems do not scale well with the usage rates we see today. Proof of Work networks rely on complex cryptographic functions to create consensus and prevent attacks. Every node on the network is in a race to find a solution to this cryptographic problem. This cryptographic function is called a hash function. A hash function takes a number in x, and gives a number out y. If you know x, it's easy to compute y. However, given y, it's prohibitively costly to find a matching x. Miners have to find an x that would output a y with certain properties, such as 10 leading zero. Even with advanced chips, this is costly and heavily based on chance.

Proof of Stake and our next experiments in web3

While that process is incredibly secure because of the vast amount of hashing that occurs, Proof of Work networks are wasteful. This waste is driven by the fact that Proof of Work consensus mechanisms are electricity-intensive. In a PoW ecosystem, energy consumption scales directly with usage. So, for example, when web3 took off in 2020, the Bitcoin network started to consume the same amount of electricity as Switzerland (according to Cambridge Bitcoin Electricity Consumption Index).

This inefficiency in resource use is by design to make it difficult for any single actor to threaten the security of the blockchain, and for the majority of the last decade, while use was low, this inefficiency was an acceptable trade off.

As recently as two years ago, web3 was an up-and-coming ecosystem with minimal traffic compared to the majority of the traffic on the Internet. The last two years changed the web3 use landscape. Traffic to the Bitcoin and the Ethereum networks has exploded. Technologies like Smart Contracts that power DeFi, NFTs, and DAOs have started to become commonplace in the developer community (and in the Twitter lexicon). The next generation of blockchains like Flow, Avalanche, Solana, as well as Polygon are being developed and adopted by major companies like Meta.

In order to balance our commitment to the environment and to help build a better Internet, it is clear to us that we should launch new experiments to help find a path forward. A path that balances the clear need to drastically reduce the energy consumption of web3 technologies and the capability to scale the web3 networks by orders of magnitude if the need arises to support increased user adoption over the next five years. How do we do that? We believe that the next generation of consensus protocols is likely to be based on Proof of Stake and Proof of Spacetime consensus mechanisms.

Proof of Stake and our next experiments in web3

Today, we are excited to announce the start of our experiments for the next generation of web3 networks that are embracing Proof of Stake; beginning with Ethereum. Over the next few months, Cloudflare will launch, and fully stake, Ethereum validator nodes on the Cloudflare global network as the community approaches its transition from Proof of Work to Proof of Stake with “The Merge.” These nodes will serve as a testing ground for research on energy efficiency, consistency management, and network speed.

There is a lot in that paragraph so let’s break it down! The short version though is that Cloudflare is going to participate in the research and development of the core infrastructure that helps keep Ethereum secure, fast, as well as energy efficient for everyone.

Proof of Stake is a next-generation consensus protocol to secure blockchains. Unlike Proof of Work that relies on miners racing each other with increasingly complex cryptography to mine a block, Proof of Stake secures new transactions to the network through self-interest. Validator's nodes (people who verify new blocks for the chain) are required to put a significant asset up as collateral in a smart contract to prove that they will act in good faith. For instance, for Ethereum that is 32 ETH. Validator nodes that follow the network's rules earn rewards; validators that violate the rules will have portions of their stake taken away. Anyone can operate a validator node as long as they meet the stake requirement. This is key. Proof of Stake networks require lots and lots of validators nodes to validate and attest to new transactions. The more participants there are in the network, the harder it is for bad actors to launch a 51% attack to compromise the security of the blockchain.

To add new blocks to the Ethereum chain, once it shifts to Proof of Stake, validators are chosen at random to create new blocks (validate). Once they have a new block ready it is submitted to a committee of 128 other validators who act as attestors. They verify that the transactions included in the block are legitimate and that the block is not malicious. If the block is found to be valid, it is this attestation that is added to the blockchain. Critically, the validation and attestation process is not computationally complex. This reduction in complexity leads to significant energy efficiency improvements.

The energy required to operate a Proof of Stake validator node is magnitudes less than a Proof of Work miner. Early estimates from the Ethereum Foundation estimate that the entire Ethereum network could use as little as 2.6 megawatts of power. Put another way, Ethereum will use 99.5% less energy post merge than today.

We are going to support the development and deployment of Ethereum by running Proof of Stake validator nodes on our network. For the Ethereum ecosystem, running validator nodes on our network allows us to offer even more geographic decentralization in places like EMEA, LATAM, and APJC while also adding infrastructure decentralization to the network. This helps keep the network secure, outage resistant, and closer to the goal of global decentralization for everyone. For us, It’s a commitment to helping research and experiment with scaling the next generation of blockchain networks that could be a key part of the story of the Internet. Running blockchain technology at scale is incredibly difficult, and Proof of Stake systems have their own unique requirements that will take time to understand and improve. Part of this experimentation though is a commitment. As part of our experiments Cloudflare has not and will not run our own Proof of Work infrastructure on our network.

Finally, what is “The Merge”? The Merge is the planned combination of the Ethereum Mainnet (the chain you interact with today when you make an Eth transaction) and the Ethereum Beacon Chain. The Beacon chain is the Proof of Stake blockchain running in parallel with the Ethereum Mainnet. The Merge is much more significant than a consensus update.

Consensus updates have happened multiple times already on Ethereum. This could be because of a vulnerability, or the need to introduce new features. The Merge is different in a way that it cannot be made by a normal upgrade. Previous forks have been enabled at a specific block number. This means that if you mine enough blocks, the new consensus rules apply automatically. With the Merge, there should only be one state root for the fork to happen. The merge state root would be the start of the Ethereum Mainnet PoS journey. It is chosen and set at a certain difficulty, instead of block height, to avoid merging a high block with low difficulty.

Once The Merge occurs later this year in 2022, Ethereum will be a full Proof of Stake ecosystem that no longer relies on Proof of Work.

This is just the start of our commitment to help build the next generation of web3 networks. We are excited to work with our partners across the cryptography, web3, and infrastructure communities to help create the next generation of blockchain ecosystems that are environmentally sustainable, secure, and blazing fast.

Categories: Technology

Public access for our Ethereum and IPFS gateways now available

Mon, 16/05/2022 - 13:57
Public access for our Ethereum and IPFS gateways now availablePublic access for our Ethereum and IPFS gateways now available

Today we are excited to announce that our Ethereum and IPFS gateways are publicly available to all Cloudflare customers for the first time. Since our announcement of our private beta last September the interest in our Eth and IPFS gateways has been overwhelming. We are humbled by the demand for these tools, and we are excited to get them into as many developers' hands as possible. Starting today, any Cloudflare customer can log into the dashboard and configure a zone for Ethereum, IPFS, or both!

Over the last eight months of the private beta, we’ve been busy working to fully operationalize the gateways to ensure they meet the needs of our customers!

First, we have created a new API with end-to-end managed hostname deployment. This ensures the creation and management of gateways as you continue to scale remains extremely quick and easy! It is paramount to give time and focus back to developers to focus on your core product and services and leave the infrastructural components to us!

Second, we’ve added a brand new UI bringing web3 to Cloudflare's zone-level dashboard. Now, regardless of the workflow you are used to, we have parity between our UI and API to ensure we fit into your existing processes and no time is wasted internally to have to figure out ‘how we integrate’, but rather, a quick setup and start to serve content or connect your services!

Third, we are pleased to say that you will soon have testnet support to ensure your new development can be easily tested, hardened, and deployed to your mainnet without incurring additional risk to your brand trust, product availability, or concern that something may fail silently and begin a cascade of problems throughout your network. We want to ensure that anyone leveraging our web3 gateways is able to achieve more confidence and trust that whatever changes are pushed forward do not impact end user experience. At the end of the day, the Internet is for end users and their experience and perception must always be kept within our purview at all times.

Lastly, Cloudflare loves to build on top of Cloudflare. This helps us stay resilient and also shows our commitment and belief in all the products we create! We have always used our SSL for SaaS and Workers products in the background. Building on our own services gives our customers the ability to define and control their own HTTP features on top of traffic destined for web3 gateways, including: rate limits, WAF rules, custom security filters, serving video, customer defined Workers logic, custom redirects and more!

Public access for our Ethereum and IPFS gateways now available

Today thousands of different individuals, companies, and DAO's are building new products leveraging our web3 gateways -- the most reliable web3 infrastructure with the largest network

Here are just a few snippets of how people are already using our web3 Gateways, and we can’t wait to see what you build on them:

  • DeFi DAO’s use the Cloudflare IPFS gateway to serve their front end web applications globally without latency or cache penalties.
  • NFT designers use the Ethereum Gateway to effortlessly drop new offerings, and the IPFS gateway to store them in a fully decentralized system.
  • Large Dapp developers trust us to handle huge traffic spikes quickly and efficiently, without rate limits or overage caps. They combine all our offerings into a single pane of glass so that they don’t have to juggle multiple systems.

As part of this announcement, we will begin migrating our existing users away from the legacy gateway endpoints and onto our new API, which is easier, highly managed, and more robust. To ensure a smooth transition, you will first need to make sure you have signed up for a Cloudflare account if you did not already have one. On top of that, we have made sure to keep our free users in mind and thus our free users will continue to use the gateways at no cost with our free tier option! This includes no cap in the amount of traffic that can be pushed through our gateways along with offering the most transparent and forecastable pricing models in the market today. We are very excited about the future and look forward to sharing the next iterations of web3 at Cloudflare!

Also, if you can’t wait to start building on our gateways, check out our product documentation for more guidance.

Categories: Technology

Gaining visibility in IPFS systems

Mon, 16/05/2022 - 13:57
Gaining visibility in IPFS systemsGaining visibility in IPFS systems

We have been operating an IPFS gateway for the last four years. It started as a research experiment in 2018, providing end-to-end integrity with IPFS. A year later, we made IPFS content faster to fetch. Last year, we announced we were committed to making IPFS gateway part of our product offering. Through this process, we needed to inform our design decisions to know how our setup performed.

To this end, we've developed the IPFS Gateway monitor, an observability tool that runs various IPFS scenarios on a given gateway endpoint. In this post, you'll learn how we use this tool and go over discoveries we made along the way.

Refresher on IPFS

IPFS is a distributed system for storing and accessing files, websites, applications, and data. It's different from a traditional centralized file system in that IPFS is completely distributed. Any participant can join and leave at any time without the loss of overall performance.

However, in order to access any file in IPFS, users cannot just use web browsers. They need to run an IPFS node to access the file from IPFS using its own protocol. IPFS Gateways play the role of enabling users to do this using only web browsers.

Cloudflare provides an IPFS gateway at https://cloudflare-ipfs.com, so anyone can just access IPFS files by using the gateway URL in their browsers.

As IPFS and the Cloudflare IPFS Gateway have become more and more popular, we need a way to know how performant it is: how much time it takes to retrieve IPFS-hosted content and how reliable it is. However, IPFS gateways are not like normal websites which only receive HTTP requests and return HTTP responses. The gateways need to run IPFS nodes internally and sometimes do content routing and peer routing to find the nodes which provide IPFS contents. They sometimes also need to resolve IPNS names to discover the contents. So, in order to measure the performance, we need to do measurements many times for many scenarios.

Enter the IPFS Gateway monitor

IPFS Gateway monitor is this tool. It allows anyone to check the performance of their gateway and export it to the Prometheus monitoring system.

This monitor is composed of three independent binaries:

  1. ipfs-gw-measure is the tool that calls the gateway URL and does the measurement scenarios.
  2. ipfs-gw-monitor is a way to call the measurement tool multiple times.
  3. Prometheus Exporter exposes prometheus-readable metrics.

To interact with the IPFS network, the codebase also provides a way to start an IPFS node.

A scenario is a set of instructions a user performs given the state of our IPFS system. For instance, we want to know how fast newly uploaded content can be found by the gateway, or if popular content has a low query time. We'll discuss more of this in the next section.

Putting this together, the system is the following.

Gaining visibility in IPFS systems

During this experience, we have operated both the IPFS Monitor, and a test IPFS node. The IPFS node allows the monitor to provide content to the IPFS network. This allows us to be sure that the content provided is fresh, and actually hosted. Peers have not been fixed, and leverage the IPFS default peer discovery mechanism.

Cloudflare IPFS Gateway is treated as an opaque system. The monitor performs end-to-end tests by contacting the gateway via its public API, either https://cloudflare-ipfs.com or via a custom domain registered with the gateway.

The following scenarios have been run on consumer hardware in March. They are not representative of all IPFS users. All values provided below have been sourced against Cloudflare’s IPFS gateway endpoint.

First scenarios: Gateway to serve immutable IPFS contentsGaining visibility in IPFS systems

IPFS contents are the most primitive contents being served by the IPFS network and IPFS gateways. By their nature, they are immutable and addressable only by the hash of the content. Users can access the IPFS contents by putting the Content IDentifier (CID) in the URL path of the gateway. For example, ipfs://bafybeig7hluox6xefqdgmwcntvsguxcziw2oeogg2fbvygex2aj6qcfo64. We measure three common scenarios that users will often encounter.

The first one is when users fetch popular content which has a high probability of being found in our cache already. During our experiment, we measured a response time for such content is around 116ms.

The second one is the case where the users create and upload the content to the IPFS network, such as via the integration between Cloudflare Pages and IPFS (LINK TO BLOG). This scenario is a lot slower than the first because the content is not in our cache yet, and it takes some time to discover the content. The content that we upload during this measurement is a random piece of 32-byte content.

The last measurement is when users try to download content that does not exist. This one is the slowest. Because of the nature of content routing of IPFS, there is no indication that tells us that content doesn't exist. So, setting the timeout is the only way to tell if the content exists or not. Currently, the Cloudflare IPFS gateway has a timeout of around five minutes.

Scenarios Min (s) Max (s) Avg (s) 1 ipfs/newly-created-content 0.276 343 44.4 2 ipfs/in-cache-content 0.0825 0.539 0.116 3 ipfs/unavailable-cid 90 341 279 Second scenarios: Gateway to serve mutable IPNS named contentsGaining visibility in IPFS systems

Since IPFS contents are immutable, when users want to change the content, the only way to do so is to create new content and distribute the new CID to other users. Sometimes distributing the new CID is hard, and is out of scope of IPFS. The InterPlanetary Naming System (IPNS) tries to solve this. IPNS is a naming system that — instead of locating the content using the CID — allows users to locate the content using the IPNS name instead. This name is a hash of a user's public key. Internally, IPNS maintains a section of the IPFS DHT which maps from a name to a CID. Therefore, when the users want to download the contents using names through the gateway, the gateway has to first resolve the name to get the CID, then use the CID to query the IPFS content as usual.

The example for fetching the IPNS named content is at ipns://k51qzi5uqu5diu2krtwp4jbt9u824cid3a4gbdybhgoekmcz4zhd5ivntan5ey

We measured many scenarios for IPNS as shown in the table below. Three scenarios are similar to the ones involving IPFS contents. There are two more scenarios added: the first one is measuring the response time when users query the gateway using an existing IPNS name, and the second one is measuring the response time when users query the gateway immediately after new content is published under the name.

Scenarios Min (s) Max (s) Avg (s) 1 ipns/newly-created-name 5.50 110 33.7 2 ipns/existing-name 7.19 113 28.0 3 ipns/republished-name 5.62 80.4 43.8 4 ipns/in-cache-content 0.0353 0.0886 0.0503 5 ipns/unavailable-name 60.0 146 81.0 Third scenarios: Gateway to serve DNSLink websites

Even though users can use IPNS to provide others a stable address to fetch the content, the address can still be hard to remember. This is what DNSLink is for. Users can address their content using DNSLink, which is just a normal domain name in the Domain Name System (DNS). As a domain owner, you only have to create a TXT record with the value dnslink=/ipfs/baf…1, and your domain can be fetched from a gateway.

There are two ways to access the DNSLink websites using the gateway. The first way is to access the website using the domain name as a URL hostname, for example, https://ipfs.example.com. The second way is to put the domain name as a URL path, for example, https://cloudflare-ipfs.com/ipns/ipfs.example.com.

Scenarios Min (s) Max (s) Avg (s) 1 dnslink/ipfs-domain-as-url-hostname 0.251 18.6 0.831 2 dnslink/ipfs-domain-as-url-path 0.148 1.70 0.346 3 dnslink/ipns-domain-as-url-hostname 7.87 44.2 21.0 4 dnslink/ipns-domain-as-url-path 6.87 72.6 19.0 What does this mean for regular IPFS users?

These measurements highlight that IPFS content is fetched best when cached. This is an order of magnitude faster than fetching new content. This result is expected as content publication on IPFS can take time for nodes to retrieve, as highlighted in previous studies. Then, when it comes to naming IPFS resources, leveraging DNSLink appears to be the faster strategy. This is likely due to the poor connection of the IPFS node used in this setup, preventing IPNS name from propagating rapidly. Overall, IPNS names would benefit from using a resolver to speed up resolution without losing the trust they provide.

As we mentioned in September, IPFS use has seen important growth. So has our tooling. The IPFS Gateway monitor can be found on GitHub, and we will keep looking at improving this first set of metrics.

At the time of writing, using IPFS via a gateway seems to provide lower retrieval times, while allowing for finer grain control over security settings in the browser context. This configuration preserves the content validity properties offered by IPFS, but reduces the number of nodes a user is peering with to one: the gateway. Ideally, we would like users to peer with Cloudflare because we're offering the best service, while still having the possibility to retrieve content from external sources if they want to. We'll be conducting more measurements to better understand how to best leverage Cloudflare presence in 270 cities to better serve the IPFS network.

Categories: Technology

Part 1: Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

Sat, 14/05/2022 - 15:50
 Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

There is a famous quote attributed to a Netscape engineer: “There are only two difficult problems in computer science: cache invalidation and naming things.” While naming things does oddly take up an inordinate amount of time, cache invalidation shouldn’t.

In the past we’ve written about Cloudflare’s incredibly fast response times, whether content is cached on our global network or not. If content is cached, it can be served from a Cloudflare cache server, which are distributed across the globe and are generally a lot closer in physical proximity to the visitor. This saves the visitor’s request from needing to go all the way back to an origin server for a response. But what happens when a webmaster updates something on their origin and would like these caches to be updated as well? This is where cache “purging” (also known as “invalidation”) comes in.

Customers thinking about setting up a CDN and caching infrastructure consider questions like:

  • How do different caching invalidation/purge mechanisms compare?
  • How many times a day/hour/minute do I expect to purge content?
  • How quickly can the cache be purged when needed?

This blog will discuss why invalidating cached assets is hard, what Cloudflare has done to make it easy (because we care about your experience as a developer), and the engineering work we’re putting in this year to make the performance and scalability of our purge services the best in the industry.

What makes purging difficult also makes it useful

(i) Scale
The first thing that complicates cache invalidation is doing it at scale. With data centers in over 270 cities around the globe, our most popular users’ assets can be replicated at every corner of our network. This also means that a purge request needs to be distributed to all data centers where that content is cached. When a data center receives a purge request, it needs to locate the cached content to ensure that subsequent visitor requests for that content are not served stale/outdated data. Requests for the purged content should be forwarded to the origin for a fresh copy, which is then re-cached on its way back to the user.

This process repeats for every data center in Cloudflare’s fleet. And due to Cloudflare’s massive network, maintaining this consistency when certain data centers may be unreachable or go offline, is what makes purging at scale difficult.

Making sure that every data center gets the purge command and remains up-to-date with its content logs is only part of the problem. Getting the purge request to data centers quickly so that content is updated uniformly is the next reason why cache invalidation is hard.  

(ii) Speed
When purging an asset, race conditions abound. Requests for an asset can happen at any time, and may not follow a pattern of predictability. Content can also change unpredictably. Therefore, when content changes and a purge request is sent, it must be distributed across the globe quickly. If purging an individual asset, say an image, takes too long, some visitors will be served the new version, while others are served outdated content. This data inconsistency degrades user experience, and can lead to confusion as to which version is the “right” version. Websites can sometimes even break in their entirety due to this purge latency (e.g. by upgrading versions of a non-backwards compatible JavaScript library).

Purging at speed is also difficult when combined with Cloudflare’s massive global footprint. For example, if a purge request is traveling at the speed of light between Tokyo and Cape Town (both cities where Cloudflare has data centers), just the transit alone (no authorization of the purge request or execution) would take over 180ms on average based on submarine cable placement. Purging a smaller network footprint may reduce these speed concerns while making purge times appear faster, but does so at the expense of worse performance for customers who want to make sure that their cached content is fast for everyone.

(iii) Scope
The final thing that makes purge difficult is making sure that only the unneeded web assets are invalidated. Maintaining a cache is important for egress cost savings and response speed. Webmasters’ origins could be knocked over by a thundering herd of requests, if they choose to purge all content needlessly. It’s a delicate balance of purging just enough: too much can result in both monetary and downtime costs, and too little will result in visitors receiving outdated content.

At Cloudflare, what to invalidate in a data center is often dictated by the type of purge. Purge everything, as you could probably guess, purges all cached content associated with a website. Purge by prefix purges content based on a URL prefix. Purge by hostname can invalidate content based on a hostname. Purge by URL or single file purge focuses on purging specified URLs. Finally, Purge by tag purges assets that are marked with Cache-Tag headers. These markers offer webmasters flexibility in grouping assets together. When a purge request for a tag comes into a data center, all assets marked with that tag will be invalidated.

With that overview in mind, the remainder of this blog will focus on putting each element of invalidation together to benchmark the performance of Cloudflare’s purge pipeline and provide context for what performance means in the real-world. We’ll be reviewing how fast Cloudflare can invalidate cached content across the world. This will provide a baseline analysis for how quick our purge systems are presently, which we will use to show how much we will improve by the time we launch our new purge system later this year.

How does purge work currently? Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

In general, purge takes the following route through Cloudflare’s data centers.

  • A purge request is initiated via the API or UI. This request specifies how our data centers should identify the assets to be purged. This can be accomplished via cache-tag header(s), URL(s), entire hostnames, and much more.
  • The request is received by any Cloudflare data center and is identified to be a purge request. It is then routed to a Cloudflare core data center (a set of a few data centers responsible for network management activities).
  • When a core data center receives it, the request is processed by a number of internal services that (for example) make sure the request is being sent from an account with the appropriate authorization to purge the asset. Following this, the request gets fanned out globally to all Cloudflare data centers using our distribution service.
  • When received by a data center, the purge request is processed and all assets with the matching identification criteria are either located and removed, or marked as stale. These stale assets are not served in response to requests and are instead re-pulled from the origin.
  • After being pulled from the origin, the response is written to cache again, replacing the purged version.

Now let’s look at this process in practice. Below we describe Cloudflare’s purge benchmarking that uses real-world performance data from our purge pipeline.

Benchmarking purge performance design

In order to understand how performant Cloudflare’s purge system is, we measured the time it took from sending the purge request to the moment that the purge is complete and the asset is no longer served from cache.  

In general, the process of measuring purge speeds involves: (i) ensuring that a particular piece of content is cached, (ii) sending the command to invalidate the cache, (iii) simultaneously checking our internal system logs for how the purge request is routed through our infrastructure, and (iv) measuring when the asset is removed from cache (first miss).

This process measures how quickly cache is invalidated from the perspective of an average user.

  • Clock starts
    As noted above, in this experiment we’re using sampled RUM data from our purge systems. The goal of this experiment is to benchmark current data for how long it can take to purge an asset on Cloudflare across different regions. Once the asset was cached in a region on Cloudflare, we identify when a purge request is received for that asset. At that same instant, the clock started for this experiment. We include in this time any retrys that we needed to make (due to data centers missing the initial purge request) to ensure that the purge was done consistently across our network. The clock continues as the request transits our purge pipeline  (data center > core > fanout > purge from all data centers).  
  • Clock stops
    What caused the clock to stop was the purged asset being removed from cache, meaning that the data center is no longer serving the asset from cache to visitor’s requests. Our internal logging measures the precise moment that the cache content has been removed or expired and from that data we were able to determine the following benchmarks for our purge types in various regions.  
Results

We’ve divided our benchmarks in two ways: by purge type and by region.

We singled out Purge by URL because it identifies a single target asset to be purged. While that asset can be stored in multiple locations, the amount of data to be purged is strictly defined.

We’ve combined all other types of purge (everything, tag, prefix, hostname) together because the amount of data to be removed is highly variable. Purging a whole website or by assets identified with cache tags could mean we need to find and remove a multitude of content from many different data centers in our network.

Secondly, we have segmented our benchmark measurements by regions and specifically we confined the benchmarks to specific data center servers in the region because we were concerned about clock skews between different data centers. This is the reason why we limited the test to the same cache servers so that even if there was skew, they’d all be skewed in the same way.  

We took the latency from the representative data centers in each of the following regions and the global latency. Data centers were not evenly distributed in each region, but in total represent about 90 different cities around the world:  

  • Africa
  • Asia Pacific Region (APAC)
  • Eastern Europe (EEUR)
  • Eastern North America (ENAM)
  • Oceania
  • South America (SA)
  • Western Europe (WEUR)
  • Western North America (WNAM)

The global latency numbers represent the purge data from all Cloudflare data centers in over 270 cities globally. In the results below, global latency numbers may be larger than the regional numbers because it represents all of our data centers instead of only a regional portion so outliers and retries might have an outsized effect.

Below are the results for how quickly our current purge pipeline was able to invalidate content by purge type and region. All times are represented in seconds and divided into P50, P75, and P99 quantiles. Meaning for “P50” that 50% of the purges were at the indicated latency or faster.  

Purge By URL

P50 P75 P99 AFRICA 0.95s 1.94s 6.42s APAC 0.91s 1.87s 6.34s EEUR 0.84s 1.66s 6.30s ENAM 0.85s 1.71s 6.27s OCEANIA 0.95s 1.96s 6.40s SA 0.91s 1.86s 6.33s WEUR 0.84s 1.68s 6.30s WNAM 0.87s 1.74s 6.25s GLOBAL 1.31s 1.80s 6.35s

Purge Everything, by Tag, by Prefix, by Hostname

P50 P75 P99 AFRICA 1.42s 1.93s 4.24s APAC 1.30s 2.00s 5.11s EEUR 1.24s 1.77s 4.07s ENAM 1.08s 1.62s 3.92s OCEANIA 1.16s 1.70s 4.01s SA 1.25s 1.79s 4.106s WEUR 1.19s 1.73s 4.04s WNAM 0.9995s 1.53s 3.83s GLOBAL 1.57s 2.32s 5.97s

A general note about these benchmarks — the data represented here was taken from over 48 hours (two days) of RUM purge latency data in May 2022. If you are interested in how quickly your content can be invalidated on Cloudflare, we suggest you test our platform with your website.

Those numbers are good and much faster than most of our competitors. Even in the worst case, we see the time from when you tell us to purge an item to when it is removed globally is less than seven seconds. In most cases, it’s less than a second. That’s great for most applications, but we want to be even faster. Our goal is to get cache purge to as close as theoretically possible to the speed of light limit for a network our size, which is 200ms.

Intriguingly, LEO satellite networks may be able to provide even lower global latency than fiber optics because of the straightness of the paths between satellites that use laser links. We've done calculations of latency between LEO satellites that suggest that there are situations in which going to space will be the fastest path between two points on Earth. We'll let you know if we end up using laser-space-purge.

Just as we have with network performance, we are going to relentlessly measure our cache performance as well as the cache performance of our competitors. We won’t be satisfied until we verifiably are the fastest everywhere. To do that, we’ve built a new cache purge architecture which we’re confident will make us the fastest cache purge in the industry.

Our new architecture

Through the end of 2022, we will continue this blog series incrementally showing how we will become the fastest, most-scalable purge system in the industry. We will continue to update you with how our purge system is developing  and benchmark our data along the way.

Getting there will involve rearchitecting and optimizing our purge service, which hasn’t received a systematic redesign in over a decade. We’re excited to do our development in the open, and bring you along on our journey.

So what do we plan on updating?

Introducing Coreless Purge

The first version of our cache purge system was designed on top of a set of central core services including authorization, authentication, request distribution, and filtering among other features that made it a high-reliability service. These core components had ultimately become a bottleneck in terms of scale and performance as our network continues to expand globally. While most of our purge dependencies have been containerized, the message queue used was still running on bare metals, which led to increased operational overhead when our system needed to scale.

Last summer, we built a proof of concept for a completely decentralized cache invalidation system using in-house tech – Cloudflare Workers and Durable Objects. Using Durable Objects as a queuing mechanism gives us the flexibility to scale horizontally by adding more Durable Objects as needed and can reduce time to purge with quick regional fanouts of purge requests.

In the new purge system we’re ripping out the reliance on core data centers and moving all that functionality to every data center, we’re calling it coreless purge.

 Rethinking Cache Purge, Fast and Scalable Global Cache Invalidation

Here’s a general overview of how coreless purge will work:

  • A purge request will be initiated via the API or UI. This request will specify how we should identify the assets to be purged.
  • The request will be routed to the nearest Cloudflare data center where it is identified to be a purge request and be passed to a Worker that will perform several of the key functions that currently occur in the core (like authorization, filtering, etc).
  • From there, the Worker will pass the purge request to a Durable Object in the data center. The Durable Object will queue all the requests and broadcast them to every data center when they are ready to be processed.
  • When the Durable Object broadcasts the purge request to every data center, another Worker will pass the request to the service in the data center that will invalidate the content in cache (executes the purge).

We believe this re-architecture of our system built by stringing together multiple services from the Workers platform will help improve both the speed and scalability of the purge requests we will be able to handle.

Conclusion

We’re going to spend a lot of time building and optimizing purge because, if there’s one thing we learned here today, it's that cache invalidation is a difficult problem but those are exactly the types of problems that get us out of bed in the morning.

If you want to help us optimize our purge pipeline, we’re hiring.

Categories: Technology

Announcing the Cloudflare Images Sourcing Kit

Fri, 13/05/2022 - 13:59
Announcing the Cloudflare Images Sourcing KitAnnouncing the Cloudflare Images Sourcing Kit

When we announced Cloudflare Images to the world, we introduced a way to store images within the product and help customers move away from the egress fees met when using remote sources for their deliveries via Cloudflare.

To store the images in Cloudflare, customers can upload them via UI with a simple drag and drop, or via API for scenarios with a high number of objects for which scripting their way through the upload process makes more sense.

To create flexibility on how to import the images, we’ve recently also included the ability to upload via URL or define custom names and paths for your images to allow a simple mapping between customer repositories and the objects in Cloudflare. It's also possible to serve from a custom hostname to create flexibility on how your end-users see the path, to improve the delivery performance by removing the need to do TLS negotiations or to improve your brand recognition through URL consistency.

Still, there was no simple way to tell our product: “Tens of millions of images are in this repository URL. Go and grab them all from me”.  

In some scenarios, our customers have buckets with millions of images to upload to Cloudflare Images. Their goal is to migrate all objects to Cloudflare through a one-time process, allowing you to drop the external storage altogether.

In another common scenario, different departments in larger companies use independent systems configured with varying storage repositories, all of which they feed at specific times with uneven upload volumes. And it would be best if they could reuse definitions to get all those new Images in Cloudflare to ensure the portfolio is up-to-date while not paying egregious egress fees by serving the public directly from those multiple storage providers.

These situations required the upload process to Cloudflare Images to include logistical coordination and scripting knowledge. Until now.

Announcing the Cloudflare Images Sourcing Kit

Today, we are happy to share with you our Sourcing Kit, where you can define one or more sources containing the objects you want to migrate to Cloudflare Images.

But, what exactly is Sourcing? In industries like manufacturing, it implies a number of operations, from selecting suppliers, to vetting raw materials and delivering reports to the process owners.

So, we borrowed that definition and translated it into a Cloudflare Images set of capabilities allowing you to:

  1. Define one or multiple repositories of images to bulk import;
  2. Reuse those sources and import only new images;
  3. Make sure that only actual usable images are imported and not other objects or file types that exist in that source;
  4. Define the target path and filename for imported images;
  5. Obtain Logs for the bulk operations;

The new kit does it all. So let's go through it.

How the Cloudflare Images Sourcing Kit works

In the Cloudflare Dashboard, you will soon find the Sourcing Kit under Images.

In it, you will be able to create a new source definition, view existing ones, and view the status of the last operations.

Announcing the Cloudflare Images Sourcing Kit

Clicking on the create button will launch the wizard that will guide you through the first bulk import from your defined source:

Announcing the Cloudflare Images Sourcing Kit

First, you will need to input the Name of the Source and the URL for accessing it. You’ll be able to save the definitions and reuse the source whenever you wish.
After running the necessary validations, you’ll be able to define the rules for the import process.

The first option you have allows an Optional Prefix Path. Defining a prefix allows a unique identifier for the images uploaded from this particular source, differentiating the ones imported from this source.

Announcing the Cloudflare Images Sourcing Kit

The naming rule in place respects the source image name and path already, so let's assume there's a puppy image to be retrieved at:

https://my-bucket.s3.us-west-2.amazonaws.com/folderA/puppy.png

When imported without any Path Prefix, you’ll find the image at

/folderA/puppy.png">https://imagedelivery.net/<AccountId>/folderA/puppy.png

Now, you might want to create an additional Path Prefix to identify the source, for example by mentioning that this bucket is from the Technical Writing department. In the puppy case, the result would be:

/techwriting/folderA/puppy.png">https://imagedelivery.net/<AccountId>/techwriting/folderA/puppy.png

Custom Path prefixes also provide a way to prevent name clashes coming from other sources.

Still, there will be times when customers don't want to use them. And, when re-using the source to import images, a same path+filename destinations clash might occur.

By default, we don’t overwrite existing images, but we allow you to select that option and refresh your catalog present in the Cloudflare pipeline.

Announcing the Cloudflare Images Sourcing Kit

Once these inputs are defined, a click on the Create and start migration button at the bottom will trigger the upload process.

Announcing the Cloudflare Images Sourcing Kit

This action will show the final wizard screen, where the migration status is displayed. The progress log will report any errors obtained during the upload and is also available to download.

Announcing the Cloudflare Images Sourcing KitAnnouncing the Cloudflare Images Sourcing Kit

You can reuse, edit or delete source definitions when no operations are running, and at any point, from the home page of the kit, it's possible to access the status and return to the ongoing or last migration report.

Announcing the Cloudflare Images Sourcing KitWhat’s next?

With the Beta version of the Cloudflare Images Sourcing Kit, we will allow you to define AWS S3 buckets as a source for the imports. In the following versions, we will enable definitions for other common repositories, such as the ones from Azure Storage Accounts or Google Cloud Storage.

And while we're aiming for this to be a simple UI, we also plan to make everything available through CLI: from defining the repository URL to starting the upload process and retrieving a final report.

Apply for the Beta version

We will be releasing the Beta version of this kit in the following weeks, allowing you to source your images from third party repositories to Cloudflare.

If you want to be the first to use Sourcing Kit, request to join the waitlist on the Cloudflare Images dashboard.

Announcing the Cloudflare Images Sourcing Kit
Categories: Technology

Send email using Workers with MailChannels

Fri, 13/05/2022 - 13:59
Send email using Workers with MailChannelsSend email using Workers with MailChannels

Here at Cloudflare we often talk about HTTP and related protocols as we work to help build a better Internet. However, the Simple Mail Transfer Protocol (SMTP) — used to send emails — is still a massive part of the Internet too.

Even though SMTP is turning 40 years old this year, most businesses still rely on email to validate user accounts, send notifications, announce new features, and more.

Sending an email is simple from a technical standpoint, but getting an email actually delivered to an inbox can be extremely tricky. Because of the enormous amount of spam that is sent every single day, all major email providers are very wary of things like new domains and IP addresses that start sending emails.

That is why we are delighted to announce a partnership with MailChannels. MailChannels has created an email sending service specifically for Cloudflare Workers that removes all the friction associated with sending emails. To use their service, you do not need to validate a domain or create a separate account. MailChannels filters spam before sending out an email, so you can feel safe putting user-submitted content in an email and be confident that it won’t ruin your domain reputation with email providers. But the absolute best part? Thanks to our friends at MailChannels, it is completely free to send email.

In the words of their CEO Ken Simpson: “Cloudflare Workers and Pages are changing the game when it comes to ease of use and removing friction to get started. So when we sat down to see what friction we could remove from sending out emails, it turns out that with our incredible anti-spam and anti-phishing, the answer is “everything". We can’t wait to see what applications the community is going to build on top of this.”

The only constraint currently is that the integration only works when the request comes from a Cloudflare IP address. So it won’t work yet when you are developing on your local machine or running a test on your build server.

First let’s walk you through how to send out your first email using a Worker.

export default { async fetch(request) { send_request = new Request('https://api.mailchannels.net/tx/v1/send', { method: 'POST', headers: { 'content-type': 'application/json', }, body: JSON.stringify({ personalizations: [ { to: [{ email: 'test@example.com', name: 'Test Recipient' }], }, ], from: { email: 'sender@example.com', name: 'Workers - MailChannels integration', }, subject: 'Look! No servers', content: [ { type: 'text/plain', value: 'And no email service accounts and all for free too!', }, ], }), }) }, }

That is all there is to it. You can modify the example to make it send whatever email you want.

The MailChannels integration makes it easy to send emails to and from anywhere with Workers. However, we also wanted to make it easier to send emails to yourself from a form on your website. This is perfect for quickly and painlessly setting up pages such as “Contact Us” forms, landing pages, and sales inquiries.

The Pages Plugin Framework that we announced earlier this week allows other people to email you without exposing your email address.

The only thing you need to do is copy and paste the following code snippet in your /functions/_middleware.ts file. Now, every form that has the data-static-form-name attribute will automatically be emailed to you.

import mailchannelsPlugin from "@cloudflare/pages-plugin-mailchannels"; export const onRequest = mailchannelsPlugin({ personalizations: [ { to: [{ name: "ACME Support", email: "hello@example.com" }], }, ], from: { name: "Enquiry", email: "no-reply@example.com" }, respondWith: () => new Response(null, { status: 302, headers: { Location: "/thank-you" }, }), });

Here is an example of what such a form would look like. You can make the form as complex as you like, the only thing it needs is the data-static-form-name attribute. You can give it any name you like to be able to distinguish between different forms.

<!DOCTYPE html> <html> <body> <h1>Contact</h1> <form data-static-form-name="contact"> <div> <label>Name<input type="text" name="name" /></label> </div> <div> <label>Email<input type="email" name="email" /></label> </div> <div> <label>Message<textarea name="message"></textarea></label> </div> <button type="submit">Send!</button> </form> </body> </html>

So as you can see there is no barrier left when it comes to sending out emails. You can copy and paste the above Worker or Pages code into your projects and immediately start to send email for free.

If you have any questions about using MailChannels in your Workers, or want to learn more about Workers in general, please join our Cloudflare Developer Discord server.

Categories: Technology

Route to Workers, automate your email processing

Fri, 13/05/2022 - 13:59
Route to Workers, automate your email processingRoute to Workers, automate your email processing

Cloudflare Email Routing has quickly grown to a few hundred thousand users, and we’re incredibly excited with the number of feature requests that reach our product team every week. We hear you, we love the feedback, and we want to give you all that you’ve been asking for. What we don’t like is making you wait, or making you feel like your needs are too unique to be addressed.

That’s why we’re taking a different approach - we’re giving you the power tools that you need to implement any logic you can dream of to process your emails in the fastest, most scalable way possible.

Today we’re announcing Route to Workers, for which we’ll start a closed beta soon. You can join the waitlist today.

How this works

When using Route to Workers your Email Routing rules can have a Worker process the messages reaching any of your custom Email addresses.

Route to Workers, automate your email processing

Even if you haven’t used Cloudflare Workers before, we are making onboarding as easy as can be. You can start creating Workers straight from the Email Routing dashboard, with just one click.

Route to Workers, automate your email processing

After clicking Create, you will be able to choose a starter that allows you to get up and running with minimal effort. Starters are templates that pre-populate your Worker with the code you would write for popular use cases such as creating a blocklist or allowlist, content based filtering, tagging messages, pinging you on Slack for urgent emails, etc.

Route to Workers, automate your email processing

You can then use the code editor to make your new Worker process emails in exactly the way you want it to - the options are endless.

Route to Workers, automate your email processing

And for those of you that prefer to jump right into writing their own code, you can go straight to the editor without using a starter. You can write Workers with a language you likely already know. Cloudflare built Workers to execute JavaScript and WebAssembly and has continuously added support for new languages.

The Workers you’ll use for processing emails are just regular Workers that listen to incoming events, implement some logic, and reply accordingly. You can use all the features that a normal Worker would.

The main difference being that instead of:

export default { async fetch(request, env, ctx) { handleRequest(request); } }

You'll have:

export default { async email(message, env, ctx) { handleEmail(message); } }

The new `email` event will provide you with the "from", "to" fields, the full headers, and the raw body of the message. You can then use them in any way that fits your use case, including calling other APIs and orchestrating complex decision workflows. In the end, you can decide what action to take, including rejecting or forwarding the email to one of your Email Routing destination addresses.

With these capabilities you can easily create logic that, for example, only accepts messages coming from one specific address and, when one matches the criteria, forwards to one or more of your verified destination addresses while also immediately alerting you on Slack. Code for such feature could be as simple as this:

export default { async email(message, env, ctx) { switch (message.to) { case "marketing@example.com": await fetch("https://webhook.slack/notification", { body: `Got a marketing email from ${ message.from }, subject: ${ message.headers.get("subject") }`, }); sendEmail(message, [ "marketing@corp", "sales@corp", ]); break; default: message.reject("Unknown address"); } }, };

Route to Workers enables everyone to programmatically process their emails and use them as triggers for any other action. We think this is pretty powerful.

Process up to 100,000 emails/day for free

The first 100,000 Worker requests (or Email Triggers) each day are free, and paid plans start at just $5 per 10 million requests. You will be able to keep track of your Email Workers usage right from the Email Routing dashboard.

Route to Workers, automate your email processingJoin the Waitlist

You can join the waitlist today by going to the Email section of your dashboard, navigating to the Email Workers tab, and clicking the Join Waitlist button.

Route to Workers, automate your email processing

We are expecting to start the closed beta in just a few weeks, and can’t wait to hear about what you’ll build with it!

As usual, if you have any questions or feedback about Email Routing, please come see us in the Cloudflare Community and the Cloudflare Discord.

Categories: Technology

Closed Caption support coming to Stream Live

Fri, 13/05/2022 - 13:59
Closed Caption support coming to Stream LiveClosed Caption support coming to Stream Live

Building inclusive technology is core to the Cloudflare mission. Cloudflare Stream has supported captions for on-demand videos for several years. Soon, Stream will auto-detect embedded captions and include it in the live stream delivered to your viewers.

Thousands of Cloudflare customers use the Stream product to build video functionality into their apps. With live caption support, Stream customers can better serve their users with a more comprehensive viewing experience.

Enabling Closed Captions in Stream Live

Stream Live scans for CEA-608 and CEA-708 captions in incoming live streams ingested via SRT and RTMPS.  Assuming the live streams you are pushing to Cloudflare Stream contain captions, you don’t have to do anything further: the captions will simply get included in the manifest file.

Closed Caption support coming to Stream Live

If you are using the Stream Player, these captions will be rendered by the Stream Player. If you are using your own player, you simply have to configure your player to display captions.  

Closed Caption support coming to Stream Live

Currently, Stream Live supports captions for a single language during the live event. While the support for captions is limited to one language during the live stream, you can upload captions for multiple languages once the event completes and the live event becomes an on-demand video.

What is CEA-608 and CEA-708?

When captions were first introduced in 1973, they were open captions. This means the captions were literally overlaid on top of the picture in the video and therefore, could not be turned off. In 1982, we saw the introduction of closed captions during live television. Captions were no longer imprinted on the video and were instead passed via a separate feed and rendered on the video by the television set.

CEA-608 (also known as Line 21) and CEA-708 are well-established standards used to transmit captions. CEA-708 is a modern iteration of CEA-608, offering support for nearly every language and text positioning–something not supported with CEA-608.

Availability

Live caption support will be available in closed beta next month. To request access, sign up for the closed beta.

Including captions in any video stream is critical to making your content more accessible. For example, the majority of live events are watched on mute and thereby, increasing the value of captions. While Stream Live does not generate live captions yet, we plan to build support for automatic live captions in the future.

Categories: Technology

Stream with sub-second latency is like a magical HDMI cable to the cloud

Fri, 13/05/2022 - 13:59
Stream with sub-second latency is like a magical HDMI cable to the cloudStream with sub-second latency is like a magical HDMI cable to the cloud

Starting today, in open beta, Cloudflare Stream supports video playback with sub-second latency over SRT or RTMPS at scale. Just like HLS and DASH formats, playback over RTMPS and SRT costs $1 per 1,000 minutes delivered regardless of video encoding settings used.

Stream is like a magic HDMI cable to the cloud. You can easily connect a video stream and display it from as many screens as you want wherever you want around the world.

What do we mean by sub-second?

Video latency is the time it takes from when a camera sees something happen live to when viewers of a broadcast see the same thing happen via their screen. Although we like to think what’s on TV is happening simultaneously in the studio and your living room at the same time, this is not the case. Often, cable TV takes five seconds to reach your home.

On the Internet, the range of latencies across different services varies widely from multiple minutes down to a few seconds or less. Live streaming technologies like HLS and DASH, used on by the most common video streaming websites typically offer 10 to 30 seconds of latency, and this is what you can achieve with Stream Live today. However, this range does not feel natural for quite a few use cases where the viewers interact with the broadcasters. Imagine a text chat next to an esports live stream or Q&A session in a remote webinar. These new ways of interacting with the broadcast won’t work with typical latencies that the industry is used to. You need one to two seconds at most to achieve the feeling that the viewer is in the same room as the broadcaster.

We expect Cloudflare Stream to deliver sub-second latencies reliably in most parts of the world by routing the video as much as possible within the Cloudflare network. For example, when you’re sending video from San Francisco on your Comcast home connection, the video travels directly to the nearest point where Comcast and Cloudflare connect, for example, San Jose. Whenever a viewer joins, say from Austin, the viewer connects to the Cloudflare location in Dallas, which then establishes a connection using the Cloudflare backbone to San Jose. This setup avoids unreliable long distance connections and allows Cloudflare to monitor the reliability and latency of the video all the way from broadcaster the last mile to the viewer last mile.

Serverless, dynamic topology

With Cloudflare Stream, the latency of content from the source to the destination is purely dependent on the physical distance between them: with no centralized routing, each Cloudflare location talks to other Cloudflare locations and shares the video among each other. This results in the minimum possible latency regardless of the locale you are broadcasting from.

We’ve tested about 500ms of glass to glass latency from San Francisco to London, both from and to residential networks. If both the broadcaster and the viewers were in California, this number would be lower, simply because of lower delay caused by less distance to travel over speed of light. An early tester was able to achieve 300ms of latency by broadcasting using OBS via RTMPS to Cloudflare Stream and pulling down that content over SRT using ffplay.

Stream with sub-second latency is like a magical HDMI cable to the cloud

Any server in the Cloudflare Anycast network can receive and publish low-latency video, which means that you're automatically broadcasting to the nearest server with no configuration necessary. To minimize latency and avoid network congestion, we route video traffic between broadcaster and audience servers using the same network telemetry as Argo.

On top of this, we construct a dynamic distribution topology, unique to the stream, which grows to meet the capacity needs of the broadcast. We’re just getting started with low-latency video, and we will continue to focus on latency and playback reliability as our real-time video features grow.

An HDMI cable to the cloud

Most video on the Internet uses HTTP - the protocol for loading websites on your browser to deliver video. This has many advantages, such as easy to achieve interoperability across viewer devices. Maybe more importantly, HTTP can use the existing infrastructure like caches which reduce the cost of video delivery.

Using HTTP has a cost in latency as it is not a protocol built to deliver video. There’s been many attempts made to deliver low latency video over HTTP, with some reducing latency to a few seconds, but none reach the levels achievable by protocols designed with video in mind. WebRTC and video delivery over QUIC have the potential to further reduce latency, but face inconsistent support across platforms today.

Video-oriented protocols, such as RTMPS and SRT, side-step some of the challenges above but often require custom client libraries and are not available in modern web browsers. While we now support low latency video today over RTMPS and SRT, we are actively exploring other delivery protocols.

There’s no silver bullet – yet, and our goal is to make video delivery as easy as possible by supporting the set of protocols that enables our customers to meet their unique and creative needs. Today that can mean receiving RTMPS and delivering low-latency SRT, or ingesting SRT while publishing HLS. In the future, that may include ingesting WebRTC or publishing over QUIC or HTTP/3 or WebTransport. There are many interesting technologies on the horizon.

We’re excited to see new use cases emerge as low-latency video becomes easier to integrate and less costly to manage. A remote cycling instructor can ask her students to slow down in response to an increase in heart rate; an esports league can effortlessly repeat their live feed to remote broadcasters to provide timely, localized commentary while interacting with their audience.

Creative uses of low latency video

Viewer experience at events like a concert or a sporting event can be augmented with live video delivered in real time to participants’ phones. This way they can experience the event in real-time and see the goal scored or details of what’s going happening on the stage.

Often in big cities, people who cheer loudly across the city can be heard before seeing a goal scored on your own screen. This can be eliminated by when every video screen shows the same content at the same time.

Esports games, large company meetings or conferences where presenters or commentators react real time to comments on chat. The delay between a fan making a comment and them seeing the reaction on the video stream can be eliminated.

Online exercise bikes can provide even more relevant and timely feedback from the live instructors, adding to the sense of community developed while riding them.

Participants in esports streams can be switched from a passive viewer to an active live participant easily as there is no delay in the broadcast.

Security cameras can be monitored from anywhere in the world without having to open ports or set up centralized servers to receive and relay video.

Getting Started

Get started by using your existing inputs on Cloudflare Stream. Without the need to reconnect, they will be available instantly for playback with the RTMPS/SRT playback URLs.

If you don’t have any inputs on Stream, sign up for $5/mo. You will get the ability to push live video, broadcast, record and now pull video with sub-second latency.

You will need to use a computer program like FFmpeg or OBS to push video. To playback RTMPS you can use VLC and FFplay for SRT. To integrate in your native app, you can utilize FFmpeg wrappers for native apps such as ffmpeg-kit for iOS.

RTMPS and SRT playback work with the recently launched custom domain support, so you can use the domain of your choice and keep your branding.

Categories: Technology

Bring your own ingest domain to Stream Live

Fri, 13/05/2022 - 13:59
Bring your own ingest domain to Stream LiveBring your own ingest domain to Stream Live

The last two years have given rise to hundreds of live streaming platforms. Most live streaming platforms enable their creators to go live by providing them with a server and an RTMP/SRT key that they can configure in their broadcasting app.

Until today, even if your live streaming platform was called live-yoga-classes.com, your users would need to push the RTMPS feed to live.cloudflare.com. Starting today, every Stream account can configure its own domain in the Stream dashboard. And your creators can broadcast to a domain such as push.live-yoga-classes.com.

This feature is available to all Stream accounts, including self-serve customers at no additional cost. Every Cloudflare account with a Stream subscription can add up to five ingest domains.

Secure CNAMEing for live video ingestion

Cloudflare Stream only supports encrypted video ingestion using RTMPS and SRT protocols. These are secure protocols and, similar to HTTPS, ensure encryption between the broadcaster and Cloudflare servers. Unlike non-secure protocols like RTMP, secure RTMP (or RTMPS) protects your users from monster-in-the-middle attacks.

In an unsecure world, you could simply CNAME a domain to another domain regardless of whether you own the domain you are sending traffic to. Because Stream Live intentionally does not support insecure live streams, you cannot simply CNAME your domain to live.cloudflare.com. So we leveraged other Cloudflare products such as Spectrum to natively support custom-branded domains in the Stream Live product without making the live streams less private for your broadcasters.

Configuring Custom Domain for Live Ingestion

To begin configuring your custom domain, add the domain to your Cloudflare account as a regular zone.

Bring your own ingest domain to Stream LiveAdd a zone to your Cloudflare account

Next, CNAME the domain to live.cloudflare.com.

Bring your own ingest domain to Stream LiveCNAME the zone to live.cloudflare.com

Assuming you have a Stream subscription, visit the Inputs page and click on the Settings icon:

Bring your own ingest domain to Stream LiveClick on Settings icon on Live Inputs page

Next, add the domain you configured in the previous step as a Live Ingest Domain:

Bring your own ingest domain to Stream LiveAdd Custom Ingest Domain

If your domain is successfully added, you will see a confirmation:

Bring your own ingest domain to Stream LiveConfirmation of domain being added as an ingest domain

Once you’ve added your ingest domain, test it by changing your existing configuration in your broadcasting software to your ingest domain. You can read the complete docs and limitations in the Stream Live developer docs.

What’s Next

Besides the branding upside —  you don’t have to instruct your users to configure a domain such as live.cloudflare.com — custom domains help you avoid vendor lock-in and seamless migration. For example, if you have an existing live video pipeline that you are considering moving to Stream Live, this makes the migration one step easier because you no longer have to ask your users to change any settings in their broadcasting app.

A natural next step is to support custom keys. Currently, your users must still use keys that are provided by Stream Live. Soon, you will be able to bring your own keys. Custom domains combined with custom keys will help you migrate to Stream Live with zero breaking changes for your end users.

Categories: Technology

Cloudflare Stream simplifies creator management for creator platforms

Fri, 13/05/2022 - 13:59
Cloudflare Stream simplifies creator management for creator platformsCloudflare Stream simplifies creator management for creator platforms

Creator platforms across the world use Cloudflare Stream to rapidly build video experiences into their apps. These platforms serve a diverse range of creators, enabling them to share their passion with their beloved audience. While working with creator platforms, we learned that many Stream customers track video usage on a per-creator basis in order to answer critical questions such as:

  • “Who are our fastest growing creators?”
  • “How much do we charge or pay creators each month?”
  • “What can we do more of in order to serve our creators?”
Introducing the Creator Property

Creator platforms enable artists, teachers and hobbyists to express themselves through various media, including video. We built Cloudflare Stream for these platforms, enabling them to rapidly build video use cases without needing to build and maintain a video pipeline at scale.

At its heart, every creator platform must manage ownership of user-generated content. When a video is uploaded to Stream, Stream returns a video ID. Platforms using Stream have traditionally had to maintain their own index to track content ownership. For example, when a user with internal user ID 83721759 uploads a video to Stream with video ID 06aadc28eb1897702d41b4841b85f322, the platform must maintain a database table of some sort to keep track of the fact that Stream video ID 06aadc28eb1897702d41b4841b85f322 belongs to internal user 83721759.

With the introduction of the creator property, platforms no longer need to maintain this index. Stream already has a direct creator upload feature to enable users to upload videos directly to Stream using tokenized URLs and without exposing account-wide auth information. You can now set the creator field with your user’s internal user ID at the time of requesting a tokenized upload URL:

curl -X POST "https://api.cloudflare.com/client/v4/accounts/023e105f4ecef8ad9ca31a8372d0c353/stream/direct_upload" \ -H "X-Auth-Email: user@example.com" \ -H "X-Auth-Key: c2547eb745079dac9320b638f5e225cf483cc5cfdda41" \ -H "Content-Type: application/json" \ --data '{"maxDurationSeconds":300,"expiry":"2021-01-02T02:20:00Z","creator": "<CREATOR_ID>", "thumbnailTimestampPct":0.529241,"allowedOrigins":["example.com"],"requireSignedURLs":true,"watermark":{"uid":"ea95132c15732412d22c1476fa83f27a"}}'

When the user uploads the video, the creator property would be automatically set to your internal user ID and can be leveraged for operations such as pulling analytics data for your creators.

Query By Creator Property

Setting the creator property on your video uploads is just the beginning. You can now filter Stream Analytics via the Dashboard or the GraphQL API using the creator property.

Cloudflare Stream simplifies creator management for creator platformsFilter Stream Analytics in the Dashboard using the Creator property

Previously, if you wanted to generate a monthly report of all your creators and the number of minutes of watch time recorded for their videos, you’d likely use a scripting language such as Python to do the following:

  1. Call the Stream GraphQL API requesting a list of videos and their watch time
  2. Traverse through the list of videos and query your internal index to determine which creator each video belongs to
  3. Sum up the video watch time for each creator to get a clean report showing you video consumption grouped by the video creator

The creator property eliminates this three step manual process. You can make a single API call to the GraphQL API to request a list of creators and the consumption of their videos for a given time period. Here is an example GraphQL API query that returns minutes delivered by creator:

query { viewer { accounts(filter: { accountTag: "<ACCOUNT_ID>" }) { streamMinutesViewedAdaptiveGroups( limit: 10 orderBy: [sum_minutesViewed_DESC] filter: { date_lt: "2022-04-01", date_gt: "2022-04-31" } ) { sum { minutesViewed } dimensions { creator } } } } }

Stream is focused on helping creator platforms innovate and scale. Matt Ober, CTO of NFT media management platform Piñata, says "By allowing us to upload and then query using creator IDs, large-scale analytics of Cloudflare Stream is about to get a lot easier."

Getting Started

Read the docs to learn more about setting the creator property on new and previously uploaded videos. You can also set the creator property on live inputs, so the recorded videos generated from the live event will already have the creator field populated.

Being able to filter analytics is just the beginning. We can’t wait to enable more creator-level operations, so you can spend more time on what makes your idea unique and less time maintaining table stakes infrastructure.

Categories: Technology

Announcing Pub/Sub: Programmable MQTT-based Messaging

Thu, 12/05/2022 - 13:59
Announcing Pub/Sub: Programmable MQTT-based MessagingAnnouncing Pub/Sub: Programmable MQTT-based Messaging

One of the underlying questions that drives Platform Week is “how do we enable developers to build full stack applications on Cloudflare?”. With Workers as a serverless environment for easily deploying distributed-by-default applications, KV and Durable Objects for caching and coordination, and R2 as our zero-egress cost object store, we’ve continued to discuss what else we need to build to help developers both build new apps and/or bring existing ones over to Cloudflare’s Developer Platform.

With that in mind, we’re excited to announce the private beta of Cloudflare Pub/Sub, a programmable message bus built on the ubiquitous and industry-standard MQTT protocol supported by tens of millions of existing devices today.

In a nutshell, Pub/Sub allows you to:

  • Publish event, telemetry or sensor data from any MQTT capable client (and in the future, other client-facing protocols)
  • Write code that can filter, aggregate and/or modify messages as they’re published to the broker using Cloudflare Workers, and before they’re distributed to subscribers, without the need to ferry messages to a single “cloud region”
  • Push events from applications in other clouds, or from on-prem, with Pub/Sub acting as a programmable event router or a hook into persistent data storage (such as R2 or KV)
  • Move logic out of the client, where it can be hard (or risky!) to push updates, or where running code on devices raises the materials cost (CPU, memory), while still keeping latency as low as possible (your code runs in every location).

And there’s likely a long list of things we haven’t even predicted yet. We’ve seen developers build incredible things on top of Cloudflare Workers, and we’re excited to see what they build with the power of a programmable message bus like Pub/Sub, too.

Why, and what is, MQTT?

If you haven’t heard of MQTT before, you might be surprised to know that it’s one of the most pervasive “messaging protocols” deployed today. There are tens of millions (at least!) of devices that speak MQTT today, from connected payment terminals through to autonomous vehicles, cell phones, and even video games. Sensor readings, telemetry, financial transactions and/or mobile notifications & messages are all common use-cases for MQTT, and the flexibility of the protocol allows developers to make trade-offs around reliability, topic hierarchy and persistence specific to their use-case.

We chose MQTT as the foundation for Cloudflare Pub/Sub as we believe in building on top of open, accessible standards, as we did when we chose the Service Worker API as the foundation for Workers, and with our recently announced participation in the Winter Community Group around server-side runtime APIs. We also wanted to enable existing clients an easy path to benefit from Cloudflare’s scale and programmability, and ensure that developers have a rich ecosystem of client libraries in languages they’re familiar with today.

Beyond that, however, we also think MQTT meets the needs of a modern “publish-subscribe” messaging service. It has flexible delivery guarantees, TLS for transport encryption (no bespoke crypto!), a scalable topic creation and subscription model, extensible per-message metadata, and importantly, it provides a well-defined specification with clear error messages.

With that in mind, we expect to support many more “on-ramps” to Pub/Sub: a lot of the best parts of MQTT can be abstracted away from clients who might want to talk to us over HTTP or WebSockets.

Building Blocks

Given the ability to write code that acts on every message published to a Pub/Sub Broker, what does it look like in practice?

Here’s a simple-but-illustrative example of handling Pub/Sub messages directly in a Worker. We have clients (in this case, payment terminals) reporting back transaction data, and we want to capture the number of transactions processed in each region, so we can track transaction volumes over time.

Specifically, we:

  1. Filter on a specific topic prefix for messages we care about
  2. Parse the message for a specific key:value pair as a metric
  3. Write that metric directly into Workers Analytics Engine, our new serverless time-series analytics service, so we can directly query it with GraphQL.

This saves us having to stand up and maintain an external metrics service, configure another cloud service, or think about how it will scale: we can do it all directly on Cloudflare.

# language: TypeScript async function pubsub( messages: Array<PubSubMessage>, env: any, ctx: ExecutionContext ): Promise<Array<PubSubMessage>> { for (let msg of messages) { // Extract a value from the payload and write it to Analytics Engine // In this example, a transactionsProcessed counter that our clients are sending // back to us. if (msg.topic.startsWith(“/transactions/”)) { // This is non-blocking, and doesn’t hold up our message // processing. env.TELEMETRY.writeDataPoint({ // We label this metric so that we can query against these labels labels: [`${msg.broker}.${msg.namespace}`, msg.payload.region, msg.payload.merchantId], metrics: [msg.payload.transactionsProcessed ?? 0] }); } } // Return our messages back to the Broker return messages; } const worker = { async fetch(req: Request, env: any, ctx: ExecutionContext) { // Critical: you must validate the incoming request is from your Broker // In the future, Workers will be able to do this on your behalf for Workers // in the same account as your Pub/Sub Broker. if (await isValidBrokerRequest(req)) { // Parse the incoming PubSub messages let incomingMessages: Array<PubSubMessage> = await req.json(); // Pass the message to our pubsub handler, and capture the returned // messages let outgoingMessages = await pubsub(incomingMessages, env, ctx); // Re-serialize the messages and return a HTTP 200 so our Broker // knows we’ve successfully handled them return new Response(JSON.stringify(outgoingMessages), { status: 200 }); } return new Response("not a valid Broker request", { status: 403 }); }, }; export default worker;

We can then query these metrics directly using a familiar language: SQL. Our query takes the metrics we’ve written and gives us a breakdown of transactions processed by our payment devices, grouped by merchant (and again, all on Cloudflare):

SELECT label_2 as region, label_3 as merchantId, sum(metric_1) as total_transactions FROM TELEMETRY WHERE metric_1 > 0 AND timestamp >= now() - 604800 GROUP BY region, merchantId ORDER BY total_transactions DESC LIMIT 10

You could replace or augment the calls to Analytics Engine with any number of examples:

  • Asynchronously writing messages (using ctx.waitUntil) on specific topics to our R2 object storage without blocking message delivery
  • Rewriting messages on-the-fly with data populated from KV, before the message is pushed to subscribers
  • Aggregate messages based on their payload and HTTP POST them to legacy infrastructure hosted outside of Cloudflare

Pub/Sub gives you a way to get data into Cloudflare’s network, filter, aggregate and/or mutate it, and push it back out to subscribers — whether there’s 10, 1,000 or 10,000 of them listening on that topic.

Where are we headed?

As we often like to say: we’re just getting started. The private beta for Pub/Sub is just the beginning of our journey, and we have a long list of capabilities we’re already working on.

Critically, one of our priorities is to cover as much of the MQTT v5.0 specification as we can, so that customers can migrate existing deployments and have it “just work”. Useful capabilities like shared subscriptions that allow you to load-balance messages across many subscribers; wildcard subscriptions (both single- and multi-tier) for aggregation use cases, stronger delivery guarantees (QoS), and support for additional authentication modes (specifically, Mutual TLS) are just a few of the things we’re working on.

Beyond that, we’re focused on making sure Pub/Sub’s developer experience is the best it  can be, and during the beta we’ll be:

  • Supporting a new set of “pubsub” sub-commands in Wrangler, our developer CLI, so that getting started is as low-friction as possible
  • Building ‘native’ bindings (similar to how Workers KV operates) that allow you to publish messages and subscribe to topics directly from Worker code, regardless of whether the message originates from (or is destined for) a client beyond Cloudflare
  • Exploring more ways to publish & subscribe from non-MQTT based clients, including HTTP requests and WebSockets, so that integrating existing code is even easier.

Our developer documentation will cover these capabilities as we land them.

We’re also aware that pricing is a huge part of developer experience, and are committed to ensuring that there is an accessible and flexible free tier. We want to enable developers to experiment, prototype and solve problems we haven’t thought of yet. We’ll be sharing more on pricing during the course of the beta.

Getting Started

If you want to start using Pub/Sub, sign up for the private beta: we plan to start enabling access within the next month. We’re looking forward to collecting feedback from developers and seeing what folks start to build.

In the meantime, review the brand-new Pub/Sub developer documentation to understand how Pub/Sub works under the hood, the MQTT protocol, and how it integrates with Cloudflare Workers.

Categories: Technology

How we built config staging and versioning with HTTP applications

Thu, 12/05/2022 - 13:58
How we built config staging and versioning with HTTP applicationsHow we built config staging and versioning with HTTP applications

Last December, we announced a closed beta of a new product, HTTP Applications, giving customers the ability to better control their L7 Cloudflare configuration with versioning and staging capabilities. Today, we are expanding this beta to all enterprise customers who want to participate. In this post, I will talk about some of the improvements that have landed and go into more detail about how this product works.

HTTP Applications

A quick recap of what HTTP Applications are and what they can do. For a deeper dive on how to use them see the previous blog post.

As previously mentioned: HTTP Applications are a way to manage configuration by use case, rather than by hostname. Each HTTP Application has a purpose, whether that is handling the configuration of your marketing website or an internal application. Each HTTP Application consists of a set of versions where each represents a snapshot of settings for managing traffic — Page Rules, Firewall Rules, cache settings, etc.  Each version of configuration inside the HTTP Application is independent of the others, and when a new version is created, it is initialized as a copy of the version that preceded it.

An HTTP Application can be represented with the following diagram:

How we built config staging and versioning with HTTP applications

Each HTTP Application is sourced from an existing zone. That zone’s current configuration will be copied to instantiate the first version of the HTTP Application. After that any changes made to the zone or version 1 will be independent of the other. Versions themselves don’t affect any traffic for a zone until they are deployed via the use of Routing Rules.

Routing Rules

Unlike zones, each version of an HTTP Application is independent of any specific hostname. So if versions are not tied to a hostname, like zones, then how do you decide which version of an HTTP Application will affect a specific set of traffic? The answer is Routing Rules. With Routing Rules, you get to decide which version of an HTTP Application is applied to traffic. Routing Rules are powered by Cloudflare’s Ruleset Engine and rely on the use of conditional “if, then” rules to map hostnames controlled in your Cloudflare account to a version of configuration. As an example, a rule could be:

If zone.name = `example.com` Then use configuration of HTTP Application id: xyz, version 2

When this rule executes on our global network, instead of applying the regular zone configuration of example.com, we will instead use the configuration defined in version 2 of the HTTP Application.

Expanding the previous diagram we get the following:

How we built config staging and versioning with HTTP applications

The combination of Routing Rules and HTTP Applications means you can ‘stage’ a set of changes, via a version, without requiring a separate staging zone as has been required in the past. Cloudflare will provide you with specific IPs that can be used to test the configuration before rolling it out to production. This means you can catch misconfigurations in rules or other settings before it impacts your customers.

How HTTP Applications and Routing Rules work

Let’s break down how this all works behind the scenes and gives you a safe way to test changes to your configuration. In all of Cloudflare’s data centers around the world, every request is first inspected and associated with an account/config pair so that we know what configuration settings we should apply to this request. If you have the zone ‘example.com’, with an id of 123, in your account, with an id of 777, then when a request for example.com/cat.jpg arrives at the Cloudflare network, the ownership lookup will return a value like 777:123 which then denotes the account and config settings we should use to process that request.

When HTTP Applications and Routing Rules are being used, the ownership lookup occurs as normal, but instead of loading configuration based on the zone for the account:config pair, Cloudflare does one additional lookup to see if any routing rules are in place that would change which configuration should be used. If a rule exists like before:

If zone.name = `example.com` Then use configuration of HTTP Application id: xyz, version 2

Then when ownership is evaluated, instead of loading configuration for account:config 777:123, Cloudflare will load the configuration of the version of that HTTP Application, let’s say that version 2 from the rule has a config id of 456. Then the lookup value for loading configuration will instead be 777:456.

How we built config staging and versioning with HTTP applications

Because Routing Rules are implemented with the Ruleset Engine, we can implement a special type of rule to allow a version to be staged such that it is only executed for requests when the request is sent to IPs reserved for testing. The resulting diagram is almost the same, but because the request is being sent to staging IPs, Cloudflare’s network will route that request to a different version of the HTTP Application that has a set of changes not yet deployed for all other traffic.

How we built config staging and versioning with HTTP applications

This is what enables you to safely test a set of changes and then simultaneously deploy the exact same configuration to all traffic. If anything goes wrong when testing in staging or rolling out to production, you can simply roll back the configuration to the previous version that was deployed. No need to try and hunt down what settings may have changed. That investigation can be done after the issue has been resolved through a quick, one-click rollback.

Now available for Enterprise Customers

HTTP Applications and Routing Rules put power and safety in customer’s hands so that configuration changes can be made more easily. When issues do arise they can be resolved quickly through rollbacks. We will continue to be expanding the capabilities offered throughout the year, but if you are interested in trying it out now and are an enterprise customer, talk to your account manager to get access!

Categories: Technology

Introducing Custom Domains for Workers

Thu, 12/05/2022 - 13:58
Introducing Custom Domains for WorkersIntroducing Custom Domains for Workers

Today, we’re happy to announce Custom Domains for Workers. Custom Domains allow you to hook up a domain to your Worker, without having to fuss about certificates, origin servers or DNS – it just works. Let’s take a look at how we built Custom Domains and how you can use them.

The magic of Cloudflare DNS

Under the hood, we’re leveraging Cloudflare DNS to register your Worker as the origin for your domain. All you need to do is head to your Worker, go to the Triggers tab, and click Add Custom Domain. Cloudflare will handle creating the DNS record and issuing a certificate on your behalf. In seconds, your domain will point to your Worker, and all you need to worry about is writing your code. We’ll also help guide you through the process of creating these new records and replace any existing ones. We built this with a straightforward ethos in mind: we should be clear and transparent about actions we’re taking, and make it easy to understand.

We’ve made a few welcome changes when you’re using a Custom Domain on your Worker. First off, when you send a request to any path on that Custom Domain, your Worker will be triggered. No need to create a route with /* at the end.

Introducing Custom Domains for Workers

Second, your Custom Domain Worker is considered the ‘origin server’. That means, no need to `fetch(event.request)` once you’re in that Worker; instead, talk to any internal or external services you need to by creating request objects in your code, or talk to other Workers services using any of our available bindings. We’ve increased the limit of external requests you can make, when using our Unbound usage model, to 1,000. You can talk to any services you’d like to – things like payment, communication, analytics, or tracking services come to mind, not to mention your databases. If that’s not enough for your use case, feel free to reach out via Discord or support, and we’ll happily chat.

Introducing Custom Domains for Workers

Finally, what if you need to talk to your Worker from another one? Since these Workers act as an origin server, you can just send a normal request to the associated endpoint, and we’ll invoke that Worker – even if it’s on the same Cloudflare zone.

Introducing Custom Domains for WorkersLet’s build an example application

We’ll start by writing our application. I have an example that I’ve called api-gateway below. It checks the incoming request path, and delegates work to downstream Workers using Service Bindings. For any privileged endpoints, it performs an authorization check before executing that code:

export default { async fetch(request, environment) { const url = new URL(request.url); switch (url.pathname) { case '/login': return await environment.login.fetch(request); case '/logout': return await environment.logout.fetch(request); case '/admin': { // Check that the "Authorization" header is sent when authenticated. const authCheck = await environment.auth.fetch(request.clone()); if (authCheck.status != 200) { return authCheck } // If the auth check passes, send the request to the /admin endpoint return await environment.admin.fetch(request); } case '/robots.txt': return new Response(null, { status: 204 }); } return new Response('Not Found.', { status: 404 }); } }```

Now that I have a working application, I want to serve it on my Custom Domain. To hook this up, head over to Workers, Triggers, click ‘Add Custom Domain’, and type in your desired hostname. You’ll be guided through a simple workflow to generate your new Worker record, and your Worker will be the target.

Best of all, with Custom Domains you can reap the performance benefits of DNS-routable Workers; Cloudflare never has to look through a routing table to invoke your Worker. And, by leveraging Service Bindings, you can customize your routing to your heart’s content – using URL parameters, headers, request content, or query strings to appropriately invoke the right Worker at the right time.

We’re excited to see what you build with Custom Domains. Custom Domains are available in an Open Beta starting today. Support is built right into the Cloudflare Dashboard and API’s, and CLI support via Wrangler is coming soon.

Categories: Technology

Pages

Additional Terms