Blogroll Category: Technology
I read blogs, as well as write one. The 'blogroll' on this site reproduces some posts from some of the people I enjoy reading. There are currently 38 posts from the category 'Technology.'
Disclaimer: Reproducing an article here need not necessarily imply agreement or endorsement!
Group - Less critical - Access bypass - SA-CONTRIB-2023-054
The Group module has the ability to make content private to specific groups. When viewing a list of entities, e.g. nodes, a visitor should only see those entities that are either not attached to a group or that they have group access to.
The module doesn't sufficiently enforce list access under the scenario where two users have the same outsider and insider permissions, but are members of different groups without any individual roles being assigned to said memberships. In such a scenario, the permissions hash for both will be the same even though it should differ.
This vulnerability is mitigated by the fact that an attacker must have the same hash as someone else, which is quite rare yet not unthinkable.
Solution:Install the latest version:
- Sites using Group version 2 should upgrade to Group v2.2.2
- Sites using Group version 3 should upgrade to Group v3.2.2
- Damien McKenna of the Drupal Security Team
- Greg Knaddison of the Drupal Security Team
Xsendfile - Moderately critical - Access bypass - SA-CONTRIB-2023-053
The Xsendfile module enables fast transfer for private files in Drupal.
In order to control private file downloads, the module overrides ImageStyleDownloadController, for which a vulnerability was disclosed in SA-CORE-2023-005. The Xsendfile module was still based on an insecure version of ImageStyleDownloadController.
Solution:Install the latest version:
- If you use the Xsendfile module for Drupal 8.x, upgrade to Xsendfile 8.x-1.2.
- Greg Knaddison of the Drupal Security Team
PHP 8.1.26 Released!
PHP 8.3.0 Released!
PHP 8.2.13 Released!
International PHP Conference Berlin 2024
Mollie for Drupal - Moderately critical - Faulty payment confirmation logic - SA-CONTRIB-2023-052
This module enables you to pay online via Mollie.
The module might not properly load the correct order to update the payment status when Mollie redirects to the redirect URL. This can allow an attacker to apply other people's orders to their own, getting credit without paying.
This vulnerability is mitigated by the fact that an attacker must have some knowledge about the module's internal functionality. The issue only affects installations that use the Mollie for Drupal Commerce submodule.
Solution:Install the latest version:
- If you use the Mollie for Drupal module, upgrade to Mollie for Drupal 2.2.1.
- Greg Knaddison of the Drupal Security Team
- xjm of the Drupal Security Team
PHP 8.3.0 RC 6 available for testing
GraphQL - Moderately critical - Cross Site Request Forgery - SA-CONTRIB-2023-051
The GraphQL module enables you to build GraphQL APIs which can include data fetching through Queries and data updates (create, update, delete) through mutations.
The module does not sufficiently validate incoming requests that are made from domains other than the one serving the GraphQL endpoint. In case a user visits a malicious site, that site may make requests on the users behalf which can lead to the execution of mutations, exposing a CSRF vulnerability. Whether data is returned to the malicious site depends on your sites CORS configuration.
This vulnerability is mitigated by the fact that a user with access to the API must have an active session cookie while visiting a malicious site. This vulnerability is also mitigated by restricting session cookies with the SameSite attribute (see solution below).
Solution:Install the latest version:
- If you use the GraphQL module v4 upgrade to GraphQL 8.x-4.6
- If you use the GraphQL module v3 upgrade to GraphQL 8.x-3.4
This vulnerability can also be mitigated by setting the SameSite attribute on session cookies to Lax (recommended) or Strict. This might not be suitable for sites that need to share the Drupal session cookie in some way with other sites. Set the following in your site's services.yml file:
parameters: session.storage.options: # Session cookies are only used for backend admin accounts, so we restrict # the cookies to be used only from the backend origin. We don't use "Strict" # because that also removes cookies whenever an admin navigates from an # email or chat app, which is inconvenient. See # https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#samesitesamesite-value cookie_samesite: Lax Reported By: Fixed By:- Sam Becker
- Klaus Purer
- Alexander Varwijk
- Luis
- Lee Rowlands of the Drupal Security Team
- Greg Knaddison of the Drupal Security Team
- Damien McKenna of the Drupal Security Team
GraphQL - Moderately critical - Access bypass - SA-CONTRIB-2023-050
This module lets you craft and expose a GraphQL schema for Drupal 9 and 10.
The module currently does not adequately verify whether a given user has the necessary permissions to access an entity's label creating an access bypass vulnerability.
This vulnerability is mitigated by the fact that entity view and entity label access are usually handled by the same access check; developers have to opt-in for supporting different logic on entity types. Additionally your schema must make use of the EntityLabel DataProducer to be affected.
Solution:Install the latest version:
- If you use the GraphQL module v4 upgrade to GraphQL 8.x-4.6
- If you use the GraphQL module v3 upgrade to GraphQL 8.x-3.4
- Greg Knaddison of the Drupal Security Team
Paragraphs admin - Moderately critical - - SA-CONTRIB-2023-049
This module enables you to view all paragraph entities in an admin view.
The module contains an access bypass that allows non admin users to access the view.
The vulnerability can be mitigated by editing the view to change the permission required to access the page.
Install the latest version:
- If you use the paragraphs_admin module for Drupal 8.x, upgrade to paragraphs_admin 8.x-1.5
- Lee Rowlands of the Drupal Security Team
- Greg Knaddison of the Drupal Security Team
- Damien McKenna of the Drupal Security Team
Drupal 9 is end of life - PSA-2023-11-01
Drupal 9 relies on several other software projects, including Symfony, CKEditor, and Twig. With Symfony 4's end of life, CKEditor 4's end of life, and Twig 2's end of life all coming up soon, Drupal 9 went end of life on November 1st, 2023. There will be no further releases of Drupal 9.
Two changes for Drupal contributed projects will occur before the end of January 2024. One is that the automated testing platform DrupalCI support for Drupal 9 will stop. The other is that release branches of contributed projects that only support Drupal 9 will be marked unsupported (see the tracking issue for details).
Thanks to everyone who helped create and maintain Drupal 9.
It is time to update to Drupal 10 compatible releasesCheck the documentation on updating a site to Drupal 10.
If a contributed project is not yet compatible with Drupal 10, now is a good time to update it. Check for existing Drupal 10 compatibility issues relevant for your projects.
If your project is already compatible with Drupal 10 but does not yet have a stable release, please tag a release, once you are confident in your project's stability. Where possible, tag a minor release supporting both Drupal 9 and 10 to ensure users have a smooth upgrade path.
PHP 8.1.25 Released!
PHP 8.2.12 Released!
PHP 8.3.0 RC 5 available for testing
Southeast PHP Conference
Reduce origin load, save on cloud egress fees, and maximize cache hits with Cache Reserve


Earlier this year, we introduced Cache Reserve. Cache Reserve helps users serve content from Cloudflare’s cache for longer by using R2’s persistent data storage. Serving content from Cloudflare’s cache benefits website operators by reducing their bills for egress fees from origins, while also benefiting website visitors by having content load faster.
Cache Reserve has been in closed beta for a few months while we’ve collected feedback from our initial users and continued to develop the product. After several rounds of iterating on this feedback, today we’re extremely excited to announce that Cache Reserve is graduating to open beta – users will now be able to test it and integrate it into their content delivery strategy without any additional waiting.
If you want to see the benefits of Cache Reserve for yourself and give us some feedback– you can go to the Cloudflare dashboard, navigate to the Caching section and enable Cache Reserve by pushing one button.
How does Cache Reserve fit into the larger picture?Content served from Cloudflare’s cache begins its journey at an origin server, where the content is hosted. When a request reaches the origin, the origin compiles the content needed for the response and sends it back to the visitor.
The distance between the visitor and the origin can affect the performance of the asset as it may travel a long distance for the response. This is also where the user is charged a fee to move the content from where it’s stored on the origin to the visitor requesting the content. These fees, known as “bandwidth” or “egress” fees, are familiar monthly line items on the invoices for users that host their content on cloud providers.

Cloudflare’s CDN sits between the origin and visitor and evaluates the origin’s response to see if it can be cached. If it can be added to Cloudflare’s cache, then the next time a request comes in for that content, Cloudflare can respond with the cached asset, which means there's no need to send the request to the origin– reducing egress fees for our customers. We also cache content in data centers close to the visitor to improve the performance and cut down on the transit time for a response.
To help assets remain cached for longer, a few years ago we introduced Tiered Cache which organizes all of our 250+ global data centers into a hierarchy of lower-tiers (generally closer to visitors) and upper-tiers (generally closer to origins). When a request for content cannot be served from a lower-tier’s cache, the upper-tier is checked before going to the origin for a fresh copy of the content. Organizing our data centers into tiers helps us cache content in the right places for longer by putting multiple caches between the visitor’s request and the origin.
Why do cache misses occur?
Misses occur when Cloudflare cannot serve the content from cache and must go back to the origin to retrieve a fresh copy. This can happen when a customer sets the cache-control time to signify when the content is out of date (stale) and needs to be revalidated. The other element at play – how long the network wants content to remain cached – is a bit more complicated and can fluctuate depending on eviction criteria.
CDNs must consider whether they need to evict content early to optimize storage of other assets when cache space is full. At Cloudflare, we prioritize eviction based on how recently a piece of cached content was requested by using an algorithm called “least recently used” or LRU. This means that even if cache-control signifies that a piece of content should be cached for many days, we may still need to evict it earlier (if it is least-requested in that cache) to cache more popular content.
This works well for most customers and website visitors, but is often a point of confusion for people wondering why content is unexpectedly displaying a miss. If eviction did not happen then content would need to be cached in data centers that were further away from visitors requesting that data, harming the performance of the asset and injecting inefficiencies into how Cloudflare’s network operates.
Some customers, however, have large libraries of content that may not be requested for long periods of time. Using the traditional cache, these assets would likely be evicted and, if requested again, served from the origin. Keeping assets in cache requires that they remain popular on the Internet which is hard given what’s popular or current is constantly changing. Evicting content that becomes cold means additional origin egress for the customer if that content needs to be pulled repeatedly from the origin.

Enter Cache Reserve
This is where Cache Reserve shines. Cache Reserve serves as the ultimate upper-tier data center for content that might otherwise be evicted from cache. Once admitted to Cache Reserve, content can be stored for a much longer period of time– 30 days by default. If another request comes in during that period, it can be extended for another 30 days (and so on) or until cache-control signifies that we should no longer serve that content from cache. Cache Reserve serves as a safety net to backstop all cacheable content, so customers don't have to worry about unwanted cache eviction and origin egress fees.
The promise of Cache Reserve is that hit ratios will increase and egress fees from origins will decrease for long tail content that is rarely requested and may be evicted from cache.
However, there are additional egress savings built into the product. For example, objects are written to Cache Reserve on misses. This means that when fetching the content from the origin on a cache miss, we both use that to respond to a request while also writing the asset to Cache Reserve, so customers won’t experience egress from serving that asset for a long time.
Cache Reserve is designed to be used with tiered cache enabled for maximum origin shielding. When there is a cache miss in both the lower and upper tiers, Cache Reserve is checked and if there is a hit, the response will be cached in both the lower and upper tier on its way back to the visitor without the origin needing to see the request or serve any additional data.
Cache Reserve accomplishes these origin egress savings for a low price, based on R2 costs. For more information on Cache Reserve prices and operations, please see the documentation here.
Scaling Cache Reserve on Cloudflare’s developer platformWhen we first announced Cache Reserve, the response was overwhelming. Over 20,000 users wanted access to the beta, and we quickly made several interesting discoveries about how people wanted to use Cache Reserve.
The first big challenge we found was that users hated egress fees as much as we do and wanted to make sure that as much content as possible was in Cache Reserve. During the closed beta we saw usage above 8,000 PUT operations per second sustained, and objects served at a rate of over 3,000 GETs per second. We were also caching around 600Tb for some of our large customers. We knew that we wanted to open the product up to anyone that wanted to use it and in order to scale to meet this demand, we needed to make several changes quickly. So we turned to Cloudflare’s developer platform.
Cache Reserve stores data on R2 using its S3-compatible API. Under the hood, R2 handles all the complexity of an object storage system using our performant and scalable developer primitives: Workers and Durable Objects. We decided to use developer platform tools because it would allow us to implement different scaling strategies quickly. The advantage of building on the Cloudflare developer platform is that Cache Reserve was easily able to experiment to see how we could best distribute the high load we were seeing, all while shielding the complexity of how Cache Reserve works from users.
With the single press of a button, Cache Reserve performs these functions:
- On a cache miss, Pingora (our new L7 proxy) reaches out to the origin for the content and writes the response to R2. This happens while the content continues its trip back to the visitor (thereby avoiding needless latency).
- Inside R2, a Worker writes the content to R2’s persistent data storage while also keeping track of the important metadata that Pingora sends about the object (like origin headers, freshness values, and retention information) using Durable Objects storage.
- When the content is next requested, Pingora looks up where the data is stored in R2 by computing the cache key. The cache key’s hash determines both the object name in R2 and which bucket it was written to, as each zone’s assets are sharded across multiple buckets to distribute load.
- Once found, Pingora attaches the relevant metadata and sends the content from R2 to the nearest upper-tier to be cached, then to the lower-tier and finally back to the visitor.

This is magic! None of the above needs to be managed by the user. By bringing together R2, Workers, Durable Objects, Pingora, and Tiered Cache we were able to quickly build and make changes to Cache Reserve to scale as needed…
What’s next for Cache ReserveIn addition to the work we’ve done to scale Cache Reserve, opening the product up also opens the door to more features and integrations across Cloudflare. We plan on putting additional analytics and metrics in the hands of Cache Reserve users, so they know precisely what’s in Cache Reserve and how much egress it’s saving them. We also plan on building out more complex integrations with R2 so if customers want to begin managing their storage, they are able to easily make that transition. Finally, we’re going to be looking into providing more options for customers to control precisely what is eligible for Cache Reserve. These features represent just the beginning for how customers will control and customize their cache on Cloudflare.
What’s some of the feedback been so far?As a long time Cloudflare customer, we were eager to deploy Cache Reserve to provide cost savings and improved performance for our end users. Ensuring our application always performs optimally for our global partners and delivery riders is a primary focus of Delivery Hero. With Cache Reserve our cache hit ratio improved by 5% enabling us to scale back our infrastructure and simplify what is needed to operate our global site and provide additional cost savings.Wai Hang Tang, Director of Engineering at Delivery HeroAnthology uses Cloudflare's global cache to drastically improve the performance of content for our end users at schools and universities. By pushing a single button to enable Cache Reserve, we were able to provide a great experience for teachers and students and reduce two-thirds of our daily egress traffic.
Paul Pearcy, Senior Staff Engineer at AnthologyAt Enjoei we’re always looking for ways to help make our end-user sites faster and more efficient. By using Cloudflare Cache Reserve, we were able to drastically improve our cache hit ratio by more than 10% which reduced our origin egress costs. Cache Reserve also improved the performance for many of our merchants’ sites in South America, which improved their SEO and discoverability across the Internet (Google, Criteo, Facebook, Tiktok)– and it took no time to set it up.
Elomar Correia, Head of DevOps SRE | Enterprise Solutions Architect at EnjoeiIn the live events industry, the size and demand for our cacheable content can be extremely volatile, which causes unpredictable swings in our egress fees. Additionally, keeping data as close to our users as possible is critical for customer experience in the high traffic and low bandwidth scenarios our products are used in, such as conventions and music festivals. Cache Reserve helps us mitigate both of these problems with minimal impact on our engineering teams, giving us more predictable costs and lower latency than existing solutions.
Jarrett Hawrylak, VP of Engineering | Enterprise Ticketing at Patron TechnologyHow can I use it today?
As of today, Cache Reserve is in open beta, meaning that it’s available to anyone who wants to use it.
To use the Cache Reserve:
- Simply go to the Caching tile in the dashboard.
- Navigate to the Cache Reserve page and push the enable data sync button (or purchase button).
Enterprise Customers can work with their Cloudflare Account team to access Cache Reserve.
Customers can ensure Cache Reserve is working by looking at the baseline metrics regarding how much data is cached and how many operations we’ve seen in the Cache Reserve section of the dashboard. Specific requests served by Cache Reserve are available by using Logpush v2 and finding HTTP requests with the field “CacheReserveUsed.”
We will continue to make sure that we are quickly triaging the feedback you give us and making improvements to help ensure Cache Reserve is easy to use, massively beneficial, and your choice for reducing egress fees for cached content.

We’ve been so excited to get Cache Reserve in more people’s hands. There will be more exciting developments to Cache Reserve as we continue to invest in giving you all the tools you need to build your perfect cache.
Try Cache Reserve today and let us know what you think.
Indexing millions of HTTP requests using Durable Objects


Our customers rely on their Cloudflare logs to troubleshoot problems and debug issues. One of the biggest challenges with logs is the cost of managing them, so earlier this year, we launched the ability to store and retrieve Cloudflare logs using R2.
In this post, I’ll explain how we built the R2 Log Retrieval API using Cloudflare Workers with a focus on Durable Objects and the Streams API. Using these, allows a customer to index and query millions of their Cloudflare logs stored in batches on R2.
Before we dive into the internals you might be wondering why one doesn't just use a traditional database to index these logs? After all, databases are a well proven technology. Well, the reason is that individual developers or companies, both large and small, often don't have the resources necessary to maintain such a database and the surrounding infrastructure to create this kind of setup.
Our approach instead relies on Durable Objects to maintain indexes of the data stored in R2, removing the complexity of managing and maintaining your own database. It was also super easy to add Durable Objects to our existing Workers code with just a few lines of config and some code. And R2 is very economical.
IndexingIndexing data is often used to reduce the lookup time for a query by first pre-processing the data and computing an index – usually a file (or set of files) that have a known structure which can be used to perform lookups on the underlying data. This approach makes lookups quick as indexes typically contain the answer for a given query, or at the very least, tells you how to find it. For this project we are going to index records by the unique identifier called a RayID which our customers use to identify an HTTP request in their logs, but this solution can be modified to index any many other types of data.
When indexing RayIDs for logs stored in R2, we choose an index structure that is fairly straightforward and is commonly known as a forward-index. This type of index consists of a key-value mapping between a document's name and a list of words contained in that document. The terms "document" and "words" are meant to be generic, and you get to define what a document and a word is.
In our case, a document is a batch of logs stored in R2 and the words are RayIDs contained within that document. For example, our index currently looks like this:

In order to maintain the state of our index, we chose to use Durable Objects. Each Durable Object has its own transactional key-value storage. Therefore, to store our index, we assign each key to be the document (batch) name and the value being a JSON-array of RayIDs within that document. When extracting the RayIDs for a given batch, updating the index becomes as simple as storage.put(batchName, rayIds). Likewise, getting all the RayIDs for a document is just a call to storage.get(batchName).
When performing indexing, since the batches are stored using compression and often with a 30-100x compression ratio, reading an entire batch into memory can lead to out-of-memory (OOM) errors in our Worker. To get around this, we use the Streams API to avoid the OOM errors by processing the data in smaller chunks. There are two types of streams available: byte-oriented and value-oriented. Byte-oriented streams operate at the byte-level for things such as compressing and decompressing data, while the value-orientated streams work on first class values in JavaScript. Numbers, strings, undefined, null, objects, you name it. If it's a valid JavaScript value, you can process it using a value-oriented stream. The Streams API also allows us to define our own JavaScript transformations for both byte- and value-oriented streams.
So, when our API receives a request to index a batch, our Worker streams the contents from R2 into a pipeline of TransformationStreams to decompress the data, decode the bytes into strings, split the data into records based on the newlines, and finally collect all the RayIDs. Once we've collected all the RayIDs, the data is then persisted in the index by making calls to the Durable Object, which in turn calls the aforementioned storage.put method to persist the data to the index. To illustrate what I mean, I include some code detailing the steps described above.
async function index(r2, bucket, key, storage) { const obj = await getObject(r2, bucket, key); const rawStream = obj.Body as ReadableStream; const index: Record<string, string[]> = {}; const rayIdIndexingStream = new TransformStream({ transform(chunk: string) { for (const match of chunk.matchAll(RAYID_FIELD_REGEX)) { const { rayid } = match.groups!; if (key in index) { index[key].push(rayid); } else { index[key] = [rayid]; } } } }); await collectStream(rawStream.pipeThrough(new DecompressionStream('gzip')).pipeThrough(textDecoderStream()).pipeThrough(readlineStream()).pipeThrough(rayIdIndexingStream)); storage.put(index); } Searching for a RayIDOnce a batch has been indexed, and the index is persisted to our Durable Object, we can then query it using a combination of the RayID and a batch name to check the presence of a RayID in that given batch. Assuming that a zone is producing a batch of logs at the rate of one batch per second this means that over 86,400 batches would be produced in a day! Over the course of a week, there would be far too many keys in our index for us to be able to iterate through them all in a timely manner. This is where the encoding of a RayID and the naming of each batch comes into play.
A RayID is currently, and I emphasize currently because this can change and has over time, a 16-byte hex encoded value where the first 36-bits encode a timestamp, and it looks something like: 764660d8ec5c2ab1. Note that the format of the RayID is likely to evolve in the near future, but for now we can use it to optimize retrieval.
Each batch produced by Logpush also happens to encode the time the batch was started and completed. Last but not least, upon analysis of our logging pipeline we found that 95% of RayIDs can be found in a batch produced within five minutes of the request completing. (Note that the encoded time sets a lower bound of the batch time we need to search).


For example: say we have a request that was made on November 3 at 16:00:00 UTC. We only need to check the batches under the prefix 20221103 and those batches that contain the time range of 16:00:00 to 16:05:00 UTC.

By reducing the number of batches to just a small handful of possibilities for a given RayID, we can simply ask our index if any of those batches contains the RayID by iterating through all the batch names (keys).
async function lookup(r2, bucket, prefix, start, end, rayid, storage) { const keys = await listObjects(r2, bucket, prefix, start, end); for (const key of keys) { const haystack: string[] | undefined = await storage.get(key); if (haystack && haystack.includes(rayid)) { return key } } return undefined }If the RayID is found in a batch, we then stream the corresponding batch back from R2 using another TransformationStream pipeline to filter out any non-matching records from the final response. If no result was found, we return an error message saying we were unable to find the RayID.
SummaryTo recap, we showed how we can use Durable Objects and their underlying storage to create and manage forward-indexes for performing efficient lookup on the RayID for potentially millions of records. All without needing to manage your own database or logging infrastructure or doing a full-scan of the data.
While this is just one possible use case for Durable Objects, we are just getting started. If you haven't read it before, checkout How we built Instant Logs to see another application of Durable Objects to stream millions of logs in real-time to your Cloudflare Dashboard!
What’s nextWe currently offer RayID lookups for the HTTP Requests, Firewall Events, and soon support for Workers Trace Events. This is just the beginning for our Log Retrieval API, and we are already looking to add the ability to index and query on more types of fields such as status codes and host names. We also plan to integrate this into the dashboard so that developers can quickly retrieve the logs they need without needing to craft the necessary API calls by hand.
Last but not least, we want to make our retrieval functionality even more powerful and are looking at adding more complex types of filters and queries that you can run against your logs.
As always, stay tuned to the blog for more updates to our developer documentation for instructions on how to get started with log retrieval. If you’re interested in joining the beta, please email logs-engine-beta@cloudflare.com.
Store and process your Cloudflare Logs... with Cloudflare


Millions of customers trust Cloudflare to accelerate their website, protect their network, or as a platform to build their own applications. But, once you’re running in production, how do you know what’s going on with your application? You need logs from Cloudflare – a record of what happened on our network when your customers interacted with your product that uses Cloudflare.
Cloudflare Logs are an indispensable tool for debugging applications, identifying security vulnerabilities, or just understanding how users are interacting with your product. However, our customers generate petabytes of logs, and store them for months or years at a time. Log data is tantalizing: all those answers, just waiting to be revealed with the right query! But until now, it’s been too hard for customers to actually store, search, and understand their logs without expensive and cumbersome third party tools.
Today we’re announcing Cloudflare Logs Engine: a new product to enable any kind of investigation with Cloudflare Logs — all within Cloudflare.
Starting today, Cloudflare customers who push their logs to R2 can retrieve them by time range and unique identifier. Over the coming months we want to enable customers to:
- Store logs for any Cloudflare dataset, for as long as you want, with a few clicks
- Access logs no matter what plan you use, without relying on third party tools
- Write queries that include multiple datasets
- Quickly identify the logs you need and take action based on what you find
When it comes to visibility into your traffic, most customers start with analytics. Cloudflare dashboard is full of analytics about all of our products, which give a high-level overview of what’s happening: for example, number of requests served, the ratio of cache hits, or the amount of CPU time used.
But sometimes, more detail is needed. Developers especially need to be able to read individual log lines to debug applications. For example, suppose you notice a problem where your application throws an error in an unexpected way – you need to know the cause of that error and see every request with that pattern.
Cloudflare offers tools like Instant Logs and wrangler tail which excel at real-time debugging. These are incredibly helpful if you’re making changes on the fly, or if the problem occurs frequently enough that it will appear during your debugging session.
In other cases, you need to find that needle in a haystack — the one rare event that causes everything to go wrong. Or you might have identified a security issue and want to make sure you’ve identified every time that issue could have been exploited in your application’s history.
When this happens, you need logs. In particular, you need forensics: the ability to search the entire history of your logs.
A brief overview of log analysisBefore we take a look at Logs Engine itself, I want to briefly talk about alternatives – how have our customers been dealing with their logs so far?
Cloudflare has long offered Logpull and Logpush. Logpull enables enterprise customers to store their HTTP logs on Cloudflare for up to seven days, and retrieve them by either time or RayID. Logpush can send your Cloudflare logs just about anywhere on the Internet, quickly and reliably. While Logpush provides more flexibility, it’s been up to customers to actually store and analyze those logs.
Cloudflare has a number of partnerships with SIEMs and data warehouses/data lakes. Many of these tools even have pre-built Cloudflare dashboards for easy visibility. And third party tools have a big advantage in that you can store and search across many log sources, not just Cloudflare.
That said, we’ve heard from customers that they have some challenges with these solutions.
First, third party log tooling can be expensive! Most tools require that you pay not just for storage, but for indexing all of that data when it’s ingested. While that enables powerful search functionality later on, Cloudflare (by its nature) is often one of the largest emitters of logs that a developer will have. If you were to store and index every log line we generate, it can cost more money to analyze the logs than to deliver the actual service.
Second, these tools can be hard to use. Logs are often used to track down an issue that customers discover via analytics in the Cloudflare dashboard. After finding what you need in logs, it can be hard to get back to the right part of the Cloudflare dashboard to make the appropriate configuration changes.
Finally, Logpush was previously limited to Enterprise plans. Soon, we will start offering these services to customers at any scale, regardless of plan type or how they choose to pay.
Why Logs Engine?With Logs Engine, we wanted to solve these problems. We wanted to build something affordable, easy to use, and accessible to any Cloudflare customer. And we wanted it to work for any Cloudflare logs dataset, for any span of time.
Our first insight was that to make logs affordable, we need to separate storage and compute. The cost of Storage is actually quite low! Thanks to R2, there’s no reason many of our customers can’t store all of their logs for long periods of time. At the same time, we want to separate out the analysis of logs so that customers only pay for the compute of logs they analyze – not every line ingested. While we’re still developing our query pricing, our aim is to be predictable, transparent and upfront. You should never be surprised by the cost of a query (or land a huge bill by accident).
It’s great to separate storage and compute. But, if you need to scan all of your logs anyway to answer the first question you have, you haven’t gained any benefits to this separation. In order to realize cost savings, it’s critical to narrow down your search before executing a query. That’s where our next big idea came in: a tight integration with analytics.
Most of the time, when analyzing logs, you don’t know what you’re looking for. For example, if you’re trying to find the cause of a specific origin status code, you may need to spend some time understanding which origins are impacted, which clients are sending them, and the time range in which these errors happened. Thanks to our ABR analytics, we can provide a good summary of the data very quickly – but not the exact details of what happened. By integrating with analytics, we can help customers narrow down their queries, then switch to Logs Engine once you know exactly what you’re looking for.
Finally, we wanted to make logs accessible to anyone. That means all plan types – not just Enterprise.
Additionally, we want to make it easy to both set up log storage and analysis, and also to take action on logs once you find problems. With Logs Engine, it will be possible to search logs right from the dashboard, and to immediately create rules based on the patterns you find there.
What’s available today and our roadmapToday, Enterprise customers can store logs in R2 and retrieve them via time range. Currently in beta, we also allow customers to retrieve logs by RayID (see our companion blog post) — to join the beta, please email logs-engine-beta@cloudflare.com.
Coming soon, we will enable customers on all plan types — not just Enterprise — to ingest logs into Logs Engine. Details on pricing will follow soon.
We also plan to build more powerful querying capability, beyond time range and RayID lookup. For example, we plan to support arbitrary filtering on any column, plus more expressive queries that can look across datasets or aggregate data.
But why stop at logs? This foundation lays the groundwork to support other types of data sources and queries one day. We are just getting started. Over the long term, we’re also exploring the ability to ingest data sources outside of Cloudflare and query them. Paired with Analytics Engine this is a formidable way to explore any data set in a cost-effective way!