Blogroll: CloudFlare

I read blogs, as well as write one. The 'blogroll' on this site reproduces some posts from some of the people I enjoy reading. There are currently 14 posts from the blog 'CloudFlare.'

Disclaimer: Reproducing an article here need not necessarily imply agreement or endorsement!

Subscribe to CloudFlare feed
Cloudflare Blog
Updated: 2 hours 58 min ago

Aquele Abraço Rio de Janeiro: Cloudflare's 116th Data Center!

Wed, 19/07/2017 - 18:56

Cloudflare is excited to announce our newest data center in Rio de Janeiro, Brazil. This is our eighth data center in South America, and expands the Cloudflare network to 116 cities across 57 countries. Our newest deployment will improve the performance and security of over six million Internet applications across Brazil, while providing redundancy to our existing São Paulo data center. As additional ISPs peer with us at the local internet exchange (IX.br), we’ll be able to provide even closer coverage to a growing share of Brazil Internet users.

A Cloudflare está muito feliz de anunciar o nosso mais recente data center: Rio de Janeiro, Brasil. Este é o nosso oitavo data center na América do Sul, e com ele a rede da Cloudflare se expande por 116 cidades em 57 países. Este lançamento vai acelerar e proteger mais de seis milhões de sites e aplicações web pelo Brasil, também provendo redundância para o nosso data center em São Paulo. Provendo acesso à nossa rede para mais parceiros através do Ponto de Troca de Tráfego (IX-RJ), nós estamos chegando mais perto dos usuários da Internet em todo o Brasil.


alt

History

Rio de Janeiro plays a great role in the history of Internet in Brazil. In 1988, the National Laboratory of Scientific Computation, headquartered in Rio de Janeiro connected to the University of Maryland via Bitnet, a network to exchange messages. The next year, the Federal University of Rio de Janeiro also connected to Bitnet, becoming the third institution (with São Paulo State Foundation for Research Support) to have access to this technology.

O Rio de janeiro tem papel central na história da Internet no Brasil. Em 1988, o Laboratório Nacional de Computação Científica (LNCC), conectou-se à Universidade de Mariland através da Bitnet, que era uma rede que permitia o envio de e-mail entre as instituições acadêmicas. Em 1989, a Universidade Federal do Rio de Janeiro também se conectou na Bitnet através de outra universidade americana, se tornando a terceira instituição Brasileira a se conectar na Internet (a FAPESP também já estava na rede).

alt
CC BY-NC 2.0 image by Lau Rey

Today, the city of Rio de Janeiro is very well connected. Internet access can be found all over, and better connectivity can boost entrepreneurship. In some Favelas (slums), the residents are creating their own ISPs, providing Internet access to some users that big ISPs are not able to reach.

Hoje, a cidade do Rio de Janeiro é muito bem conectada. Acesso à internet pode ser encontrado em todo lugar, inclusive incentivando o empreendedorismo. Em algumas favelas os próprios moradores criaram seus provedores de internet via Wi-Fi, e estão proporcionando a inclusão digital em áreas onde os grandes provedores não chegam.

LatAm expansion

We have an additional eight datacenters in progress across Latin America. If managing the many moving parts of building a large global network interest you, come join our team!

Nós temos mais oito datacenters a caminho na América Latina. Se você se interessa em gerenciar uma rede de alcance global, venha fazer parte do nosso time!

-The Cloudflare team

Categories: Technology

Ninth Circuit Rules on National Security Letter Gag Orders

Tue, 18/07/2017 - 19:39

As we’ve previously discussed on this blog, Cloudflare has been challenging for years the constitutionality of the FBI’s use of national security letters (NSLs) to demand user data on a confidential basis. On Monday morning, a three-judge panel of the U.S. Ninth Circuit Court of Appeals released the latest decision in our lawsuit, and endorsed the use of gag orders that severely restrict a company's disclosures related to NSLs.

CC-BY 2.0 image by a200/a77Wells

This is the latest chapter in a court proceeding that dates back to 2013, when Cloudflare initiated a challenge to the previous form of the NSL statute with the help of our friends at EFF. Our efforts regarding NSLs have already seen considerable success. After a federal district court agreed with some of our arguments, Congress passed a new law that addressed transparency, the USA FREEDOM Act. Under the new law, companies were finally permitted to disclose the number of NSLs they receive in aggregate bands of 250. But there were still other concerns about judicial review or limitation of gag orders that remained.

Today’s outcome is disappointing for Cloudflare. NSLs are “administrative subpoenas” that fall short of a warrant, and are frequently accompanied by nondisclosure requirements that restrict even bare disclosures regarding the receipt of such letters. Such gag orders hamper transparency efforts, and limit companies’ ability to participate in the political process around surveillance reform.

What did the Court say?

In its ruling, the Ninth Circuit upheld NSL gag orders by ruling that the current system does not run afoul of the First Amendment. Currently, the laws governing the issuance of NSLs permit a nondisclosure requirement so long as the requesting official certifies that the lack of a prohibition “may result” in certain types of harm. However, there is no judicial scrutiny of these claims before the gag order goes into full effect. Only once the restriction has already been imposed can a company seek judicial review before a court. Furthermore, the FBI must only reassess the gag order at three years in, or when investigation has closed.

Along with our co-petitioner, CREDO Mobile, Cloudflare challenged the NSL gag orders as a “prior restraint” on free speech. In First Amendment law, prior restraints are judicial orders or administrative rules that function to suppress speech before it ever takes place. There is a heavy presumption against the constitutionality of prior restraints, but they can be justified in narrowly defined circumstances or if the restraint follows certain procedural safeguards. In the context of NSLs, we considered those safeguards to be lacking.

The Appeals Court disagreed: in its ruling, the Court determined that NSL gag order was indeed a prior restraint subject to “strict” constitutional scrutiny, but that such orders were “narrowly tailored to a compelling state interest” and provided enough procedural safeguards to pass constitutional muster.

What’s Next?

While we are still reviewing the specifics of the court’s decision, Cloudflare will continue to report on NSLs to the extent permitted by law. We will also continue to work with EFF as we weigh how to proceed: the next steps may be to make a request for an en banc appeal all the members of the 9th Circuit, or petition the U.S. Supreme Court to take up the case.

Cloudflare’s approach to law enforcement requests will continue to be that while we are supportive of their work, any requests we receive must adhere to due process, and be subject to judicial oversight. When we first decided to challenge the FBI’s request for customer information through a confidential NSL, we were a much smaller company. It was not an easy decision, but we decided to contest a gag order that we felt was overbroad and in violation of our principles. We are grateful to our friends at EFF for taking our case, and applaud the excellent job they have done pushing this effort.

Categories: Technology

Getting started with Cloudflare Apps

Wed, 12/07/2017 - 21:24

We recently launched our new Cloudflare Apps platform, and love to see the community it is building. In an effort to help people who run web services such as websites, APIs and more, we would like to help make your web services faster, safer and more reliable using our new Apps Platform by leveraging our 115 points of presence around the world. (Skip ahead to the fun part if you already know how Cloudflare Apps works)

How Cloudflare apps work

Here is a quick diagram of how Cloudflare apps work:

The “Origin” is the server that is providing your services, such as your website or API. The “Edge” represents a point of presence that is closest to your visitors. Cloudflare uses a routing method known as Anycast to ensure the end user, pictured on the far right, is routed through the best network path to our points of presence closest to them around the world.

Historically, to make changes or additions to your site at the edge changes to a site, you needed to be a Cloudflare employee. Now with apps, anyone can quickly make changes to the pages rendered to their users via Javascript and CSS. Today, you can do amazing things like add a donation button using PayPal, or inject a video intelligently using JavaScript or CSS to position the objects wherever you like.

Awesome apps that you can already turn on today

A great way to explore our existing Apps would be to explore our Apps store. You can access is by visiting our App store.

You can review all of them by visiting your Cloudflare dashboard and accessing the apps section, which is a button on the far right hand corner of the dashboard.

Creating an app (AKA the fun part)

Cloudflare has a simple example app that is easy to use. Feel free to fork our app to have fun with it. You can find it on GitHub here.

To start, you will want to rely on the install.json file and perform the install execution run:

npm install

It’s also best practice to double check the Javascript to ensure there are no errors in the source:

npm run lint

From here, your files can be located in your source directory:

source/app.js

This is where the magic happens. Your app starts here.

source/app.css

Styles for your app.

media/**

A directory for icons, tile images, and screenshots.

The easiest way to test your app is to use our app creation dashboard.

From there, it’s as simple as directing the creator to the folder of your app in the app creator, and testing the app. You can modify the source/app.js file to modify the nature of the Javascript injected and source/app.css to select where those changes are implemented. Once you’re happy with your app, you simply click create app at the bottom left of the page and it will be reviewed based on the code created for your page.

Would you like to get community feedback for your app before submitting it for moderation? Share your work or work-in-progress with the Cloudflare Apps part of the community. We can’t wait to see what you build.

Cloudflare is very excited about the apps platform because it not only enables our users to do powerful new things with their internet properties, but also because gives our users the chance to create an app that will be available to more than 7 million websites around the world.

If you have any questions, feel free to join our new Cloudflare Community today to join in on the fun!

Categories: Technology

High-reliability OCSP stapling and why it matters

Mon, 10/07/2017 - 13:43
High-reliability OCSP stapling and why it matters

At Cloudflare our focus is making the internet faster and more secure. Today we are announcing a new enhancement to our HTTPS service: High-Reliability OCSP stapling. This feature is a step towards enabling an important security feature on the web: certificate revocation checking. Reliable OCSP stapling also improves connection times by up to 30% in some cases. In this post, we’ll explore the importance of certificate revocation checking in HTTPS, the challenges involved in making it reliable, and how we built a robust OCSP stapling service.

Why revocation is hard

Digital certificates are the cornerstone of trust on the web. A digital certificate is like an identification card for a website. It contains identity information including the website’s hostname along with a cryptographic public key. In public key cryptography, each public key has an associated private key. This private key is kept secret by the site owner. For a browser to trust an HTTPS site, the site’s server must provide a certificate that is valid for the site’s hostname and a proof of control of the certificate’s private key. If someone gets access to a certificate’s private key, they can impersonate the site. Private key compromise is a serious risk to trust on the web.

Certificate revocation is a way to mitigate the risk of key compromise. A website owner can revoke a compromised certificate by informing the certificate issuer that it should no longer be trusted. For example, back in 2014, Cloudflare revoked all managed certificates after it was shown that the Heartbleed vulnerability could be used to steal private keys. There are other reasons to revoke, but key compromise is the most common.

Certificate revocation has a spotty history. Most of the revocation checking mechanisms implemented today don’t protect site owners from key compromise. If you know about why revocation checking is broken, feel free to skip ahead to the OCSP stapling section below.

Revocation checking: a history of failure

There are several ways a web browser can check whether a site’s certificate is revoked or not. The most well-known mechanisms are Certificate Revocation Lists (CRL) and Online Certificate Status Protocol (OCSP). A CRL is a signed list of serial numbers of certificates revoked by a CA. OCSP is a protocol that can be used to query a CA about the revocation status of a given certificate. An OCSP response contains signed assertions that a certificate is not revoked.

High-reliability OCSP stapling and why it matters High-reliability OCSP stapling and why it matters

Certificates that support OCSP contain the responder's URL and those that support CRLs contain a URLs where the CRL can be obtained. When a browser is served a certificate as part of an HTTPS connection, it can use the embedded URL to download a CRL or an OCSP response and check that the certificate hasn't been revoked before rendering the web page. The question then becomes: what should the browser do if the request for a CRL or OCSP response fails? As it turns out, both answers to that question are problematic.

Hard-fail doesn’t work

When browsers encounter a web page and there’s a problem fetching revocation information, the safe option is to block the page and show a security warning. This is called a hard-fail strategy. This strategy is conservative from a security standpoint, but prone to false positives. For example, if the proof of non-revocation could not be obtained for a valid certificate, a hard-fail strategy will show a security warning. Showing a security warning when no security issue exists is dangerous because it can lead to warning fatigue and teach users to click through security warnings, which is a bad idea.

In the real world, false positives are unavoidable. OCSP and CRL endpoints subject to service outages and network errors. There are also common situations where these endpoints are completely inaccessible to the browser, such as when the browser is behind a captive portal. For some access points used in hotels and airplanes, unencrypted traffic (like OCSP endpoints) are blocked. A hard-fail strategy force users behind captive portals and other networks that block OCSP requests to click through unnecessary security warnings. This reality is unpalatable to browser vendors.

High-reliability OCSP stapling and why it matters

Another drawback to a hard-fail strategy is that it puts an increased burden on certificate authorities to keep OCSP and CRL endpoints available and online. A broken OCSP or CRL server becomes a central point of failure for all certificates issued by a certificate authority. If browsers followed a hard-fail strategy, an OCSP outage would be an Internet outage. Certificate authorities are organizations optimized to provide trust and accountability, not necessarily resilient infrastructure. In a hard-fail world, the availability of the web as a whole would be limited by the ability for CAs to keep their OCSP services online at all times; a dangerous systemic risk to the internet as a whole.

Soft-fail: it’s not much better

To avoid the downsides of a hard-fail strategy, most browsers take another approach to certificate revocation checking. Upon seeing a new certificate, the browser will attempt to fetch the revocation information from the CRL or OCSP endpoint embedded in the certificate. If the revocation information is available, they rely on it, and otherwise they assume the certificate is not revoked and display the page without any errors. This is called a “soft-fail” strategy.

The soft-fail strategy has a critical security flaw. An attacker with network position can block the OCSP request. If this attacker also has the private key of a revoked certificate, they can intercept the outgoing connection for the site and present the revoked certificate to the browser. Since the browser doesn’t know the certificate is revoked and is following a soft-fail strategy, the page will load without alerting the user. As Adam Langley described: “soft-fail revocation checks are like a seat-belt that snaps when you crash. Even though it works 99% of the time, it's worthless because it only works when you don't need it.”

A soft-fail strategy also makes connections slower. If revocation information for a certificate is not already cached, the browser will block the rendering of the page until the revocation information is retrieved, or a timeout occurs. This additional step causes a noticeable and unwelcome delay, with marginal security benefits. This tradeoff is a hard sell for the performance-obsessed web. Because of the limited benefit, some browsers have eliminated live revocation checking for at least some subset of certificates.

Live OCSP checking has an additional downside: it leaks private browsing information. OCSP requests are sent over unencrypted HTTP and are tied to a specific certificate. Sending an OCSP request tells the certificate authority which websites you are visiting. Furthermore, everyone on the network path between your browser and the OCSP server will also know which sites you are browsing.

Alternative revocation checking

Some client still perform soft-fail OCSP checking, but it’s becoming less common due to the performance and privacy downsides described above. To protect high-value certificates, some browsers have explored alternative mechanisms for revocation checking.

One technique is to pre-package a list of revoked certificates and distribute them through browser updates. Because the list of all revoked certificates is so large, only a few high-impact certificates are included in this list. This technique is called OneCRL by Firefox and CRLSets by Chrome. This has been effective for some high-profile revocations, but it is by no means a complete solution. Not only are not all certificates covered, this technique leaves a window of vulnerability between the time the certificate is revoked and the certificate list gets to browsers.

OCSP Stapling

OCSP stapling is a technique to get revocation information to browsers that fixes some of the performance and privacy issues associated with live OCSP fetching. In OCSP stapling, the server includes a current OCSP response for the certificate included (or "stapled") into the initial HTTPS connection. That removes the need for the browser to request the OCSP response itself. OCSP stapling is widely supported by modern browsers.

Not all servers support OCSP stapling, so browsers still take a soft-fail approach to warning the user when the OCSP response is not stapled. Some browsers (such as Safari, Edge and Firefox for now) check certificate revocation for certificates, so OCSP stapling can provide a performance boost of up to 30%. For browsers like Chrome that don’t check for revocation for all certificates, OCSP stapling provides a proof of non-revocation that they would not have otherwise.

High-reliability OCSP stapling

Cloudflare started offering OCSP stapling in 2012. Cloudflare’s original implementation relied on code from nginx that was able to provide OCSP stapling for a some, but not all connections. As Cloudflare’s network grew, the implementation wasn’t able to scale with it, resulting in a drop in the percentage of connections with OCSP responses stapled. The architecture we had chosen had served us well, but we could definitely do better.

In the last year we redesigned our OCSP stapling infrastructure to make it much more robust and reliable. We’re happy to announce that we now provide reliable OCSP stapling connections to Cloudflare. As long as the certificate authority has set up OCSP for a certificate, Cloudflare will serve a valid OCSP stapled response. All Cloudflare customers now benefit from much more reliable OCSP stapling.

OCSP stapling past

In Cloudflare’s original implementation of OCSP stapling, OCSP responses were fetched opportunistically. Given a connection that required a certificate, Cloudflare would check to see if there was a fresh OCSP response to staple. If there was, it would be included in the connection. If not, then the client would not be sent an OCSP response, and Cloudflare would send a request to refresh the OCSP response in the cache in preparation for the next request.

High-reliability OCSP stapling and why it matters

If a fresh OCSP response wasn’t cached, the connection wouldn’t get an OCSP staple. The next connection for that same certificate would get a OCSP staple, because the cache will have been populated.

High-reliability OCSP stapling and why it matters

This architecture was elegant, but not robust. First, there are several situations in which the client is guaranteed to not get an OCSP response. For example, the first request in every cache region and the first request after an OCSP response expires are guaranteed to not have an OCSP response stapled. With Cloudflare’s expansion to more locations, these failures were more common. Less popular sites would have their OCSP responses fetched less often resulting in an even lower ratio of stapled connections. Another reason that connections could be missing OCSP responses is if the OCSP request from Cloudflare to fill the cache failed. There was a lot of room for improvement.

Our solution: OCSP pre-fetching

In order to be able to reliably include OCSP staples in all connection, we decided to change the model. Instead of fetching the OCSP response when a request came in, we would fetch it in a centralized location and distribute valid responses to all our servers. When a response started getting close to expiration, we’d fetch a new one. If the OCSP request fails, we put it into a queue to re-fetch at a later time. Since most OCSP staples are valid for around 7 days, there is a lot of flexibility in term of refreshing expiring responses.

High-reliability OCSP stapling and why it matters

To keep our cache of OCSP responses fresh, we created an OCSP fetching service. This service ensures that there is a valid OCSP response for every certificate managed by Cloudflare. We constantly crawl our cache of OCSP responses and refresh those that are close to expiring. We also make sure to never cache invalid OCSP responses, as this can have bad consequences. This system has been running for several months now, and we are now reliably including OCSP staples for almost every HTTPS request.

Reliable stapling improves performance for browsers that would have otherwise fetched OCSP, but it also changes the optimal failure strategy for browsers. If a browser can reliably get an OCSP staple for a certificate, why not switch back from a soft-fail to a hard-fail strategy?

OCSP must-staple

As described above, the soft-fail strategy for validating OCSP responses opens up a security hole. An attacker with a revoked certificate can simply neglect to provide an OCSP response when a browser connects to it and the browser will accept their revoked certificate.

In the OCSP fetching case, a soft-fail approach makes sense. There many reasons the browser would not be able to obtain an OCSP: captive portals, broken OCSP servers, network unreliability and more. However, as we have shown with our high-reliability OCSP fetching service, it is possible for a server to fetch OCSP responses without any of these problems. OCSP responses are re-usable and are valid for several days. When one is close to expiring, the server can fetch a new one out-of-band and be able to reliably serve OCSP staples for all connections.

High-reliability OCSP stapling and why it matters Public Domain

If the client knows that a server will always serve OCSP staples for every connection, it can apply a hard-fail approach, failing a connection if the OCSP response is missing. This closes the security hole introduced by the soft-fail strategy. This is where OCSP must-staple fits in.

OCSP must-staple is an extension that can be added to a certificate that tells the browser to expect an OCSP staple whenever it sees the certificate. This acts as an explicit signal to the browser that it’s safe to use the more secure hard-fail strategy.

Firefox enforces OCSP must-staple, returning the following error if such a certificate is presented without a stapled OCSP response.

High-reliability OCSP stapling and why it matters

Chrome provides the ability to mark a domain as “Expect-Staple”. If Chrome sees a certificate for the domain without a staple, it will send a report to a pre-configured report endpoint.

Reliability

As a part of our push to provide reliable OCSP stapling, we put our money where our mouths are and put an OCSP must-staple certificate on blog.cloudflare.com. Now if we ever don’t serve an OCSP staple, this page will fail to load on browsers like Firefox that enforce must-staple. You can identify a certificate by looking at the certificate details for the “1.3.6.1.5.5.7.1.24” OID.

High-reliability OCSP stapling and why it matters

Cloudflare customers can choose to upload must-staple custom certificates, but we encourage them not to do so yet because there may be a multi-second delay between the certificate being installed and our ability to populate the OCSP response cache. This will be fixed in the coming months. Other than the first few seconds after uploading the certificate, Cloudflare’s new OCSP fetching is robust enough to offer OCSP staples for every connection thereafter.

As of today, an attacker with access to the private key for a revoked certificate can still hijack the connection. All they need to do is to place themselves on the network path of the connection and block the OCSP request. OCSP must-staple prevents that, since an attacker will not be able to obtain an OCSP response that says the certificate has not been revoked.

The weird world of OCSP responders

For browsers, an OCSP failure is not the end of the world. Most browsers are configured to soft-fail when an OCSP responder returns an error, so users are unaffected by OCSP server failures. Some Certificate Authorities have had massive multi-day outages in their OCSP servers without affecting the availability of sites that use their certificates.

There’s no strong feedback mechanism for broken or slow OCSP servers. This lack of feedback has led to an ecosystem of faulty or unreliable OCSP servers. We experienced this first-hand while developing high-reliability OCSP stapling. In this section, we’ll outline half a dozen unexpected behaviors we found when deploying high-reliability OCSP stapling. A big thanks goes out to all the CAs who fixed the issues we pointed out. CA names redacted to preserve their identities.

CA #1: Non-overlapping periods

We noticed CA #1 certificates frequently missing their refresh-deadline, and during debugging we were lucky enough to see this:

$ date -u Sat Mar 4 02:45:35 UTC 2017 $ ocspfetch <redacted, customer ID> This Update: 2017-03-04 01:45:49 +0000 UTC Next Update: 2017-03-04 02:45:49 +0000 UTC $ date -u Sat Mar 4 02:45:48 UTC 2017 $ ocspfetch <redacted, customer ID> This Update: 2017-03-04 02:45:49 +0000 UTC Next Update: 2017-03-04 03:45:49 +0000 UTC

It shows that CA #1 had configured their OCSP responders to use an incredibly short validity period with almost no overlap between validity periods, which makes it functionally impossible to always have a fresh OCSP response for their certificates. We contacted them, and they reconfigured the responder to produce new responses every half-interval.

CA #2: Wrong signature algorithm

Several certificates from CA #2 started failing with this error:

bad OCSP signature: crypto/rsa: verification error

The issue is that the OCSP claims to be signed with SHA256-RSA, when it is actually signed with SHA1-RSA (and the reverse: indicates SHA1-RSA, actually signed with SHA256-RSA).

CA #3: Malformed OCSP responses

When we first started the project, our tool was unable to parse dozens of certificates in our database because of this error

asn1: structure error: integer not minimally-encoded

and many OCSP responses that we fetched from the same CA:

parsing ocsp response: bad OCSP signature: asn1: structure error: integer not minimally-encoded

What happened was that this CA had begun issuing <1% of certificates with a minor formatting error that rendered them unparseable by Golang’s x509 package. After contacting them directly, they quickly fixed the issue but then we had to patch Golang's parser to be more lenient about encoding bugs.

CA #4: Failed responder behind a load balancer

A small number of CA #4’s OCSP responders fell into a "bad state" without them knowing, and would return 404 on every request. Since the CA used load balancing to round-robin requests to a number of responders, it looked like 1 in 6 requests would fail inexplicably.

Two times between Jan 2017 and May 2017, this CA also experienced some kind of large data-loss event that caused them to return persistent "Try Later" responses for a large number of requests.

CA #5: Delay between certificate and OCSP responder

When a certificate is issued by CA #5, there is a large delay between the time a certificate is issued and the OCSP responder is able to start returning signed responses. This results in a delay between certificate issance and the availability of OCSP. It has mostly been resolved, but this general pattern is dangerous for OCSP must-staple. There have been some recent changes discussed in the CA/B Forum, an organization that regulates the issuance and management of certificates, to require CAs to offer OCSP soon after issuance.

CA #6: Extra certificates

It is typical to embed only one certificate in an OCSP response, if any. The one embedded certificate is supposed to be a leaf specially issued for signing an intermediate's OCSP responses. However, several CAs embed multiple certificates: the leaf they use for signing OCSP responses, the intermediate itself, and sometimes all the intermediates up to the root certificate.

Conclusions

We made OCSP stapling better and more reliable for Cloudflare customers. Despite the various strange behaviors we found in OCSP servers, we’ve been able to consistently serve OCSP responses for over 99.9% of connections since we’ve moved over to the new system. This is an important step in protecting the web community from attackers who have compromised certificate private keys.

Categories: Technology

Participate in the Net Neutrality Day of Action

Sun, 09/07/2017 - 17:29
Participate in the Net Neutrality Day of Action

We at Cloudflare strongly believe in network neutrality, the principle that networks should not discriminate against content that passes through them.  We’ve previously posted on our views on net neutrality and the role of the FCC here and here.

In May, the FCC took a first step toward revoking bright-line rules it put in place in 2015 to require ISPs to treat all web content equally. The FCC is seeking public comment on its proposal to eliminate the legal underpinning of the 2015 rules, revoking the FCC's authority to implement and enforce net neutrality protections. Public comments are also requested on whether any rules are needed to prevent ISPs from blocking or throttling web traffic, or creating “fast lanes” for some internet traffic.

To raise awareness about the FCC's efforts, July 12th will be “Internet-Wide Day of Action to save Net Neutrality.” Led by the group Battle for the Net, participating websites will show the world what the web would look like without net neutrality by displaying an alert on their homepage. Website users will be encouraged to contact Congress and the FCC in support of net neutrality.

We wanted to make sure our users had an opportunity to participate in this protest. If you install the Battle For The Net App, your visitors will see one of four alert modals — like the “spinning wheel of death” — and have an opportunity to submit a comment to the FCC or a letter to Congress in support of net neutrality. You can preview the app live on your site, even if you don’t use Cloudflare yet.

Participate in the Net Neutrality Day of Action

To participate, install the Battle For The Net App. The app will appear for your site's visitors on July 12th, the Day of Action for Net Neutrality.

Categories: Technology

How to make your site HTTPS-only

Thu, 06/07/2017 - 14:35
How to make your site HTTPS-only

The Internet is getting more secure every day as people enable HTTPS, the secure version of HTTP, on their sites and services. Last year, Mozilla reported that the percentage of requests made by Firefox using encrypted HTTPS passed 50% for the first time. HTTPS has numerous benefits that are not available over unencrypted HTTP, including improved performance with HTTP/2, SEO benefits for search engines like Google and the reassuring lock icon in the address bar.

How to make your site HTTPS-only

So how do you add HTTPS to your site or service? That’s simple, Cloudflare offers free and automatic HTTPS support for all customers with no configuration. Sign up for any plan and Cloudflare will issue an SSL certificate for you and serve your site over HTTPS.

HTTPS-only

Enabling HTTPS does not mean that all visitors are protected. If a visitor types your website’s name into the address bar of a browser or follows an HTTP link, it will bring them to the insecure HTTP version of your website. In order to make your site HTTPS-only, you need to redirect visitors from the HTTP to the HTTPS version of your site.

Going HTTPS-only should be as easy as a click of a button, so we literally added one to the Cloudflare dashboard. Enable the “Always Use HTTPS” feature and all visitors of the HTTP version of your website will be redirected to the HTTPS version. You’ll find this option just above the HTTP Strict Transport Security setting and it is of course also available through our API.

How to make your site HTTPS-only

In case you would like to redirect only some subset of your requests you can still do this by creating a Page Rule. Simply use the “Always Use HTTPS” setting on any URL pattern.

Securing your site: next steps

Once you have confirmed that your site is fully functional with HTTPS-only enabled, you can take it a step further and enable HTTP Strict Transport Security (HSTS). HSTS is a header that tells browsers that your site is available over HTTPS and will be for a set period of time. Once a browser sees an HSTS header for a site, it will automatically fetch the HTTPS version of HTTP pages without needing to follow redirects. HSTS can be enabled in the crypto app right under the Always Use HTTPS toggle.

It's also important to secure the connection between Cloudflare and your site. To do that, you can use Cloudflare's Origin CA to get a free certificate for your origin server. Once your origin server is set up with HTTPS and a valid certificate, change your SSL mode to Full (strict) to get the highest level of security.

Categories: Technology

Three little tools: mmsum, mmwatch, mmhistogram

Tue, 04/07/2017 - 11:32

In a recent blog post, my colleague Marek talked about some SSDP-based DDoS activity we'd been seeing recently. In that blog post he used a tool called mmhistogram to output an ASCII histogram.

That tool is part of a small suite of command-line tools that can be handy when messing with data. Since a reader asked for them to be open sourced... here they are.

mmhistogram

Suppose you have the following CSV of the ages of major Star Wars characters at the time of Episode IV:

Anakin Skywalker (Darth Vader),42 Boba Fett,32 C-3PO,32 Chewbacca,200 Count Dooku,102 Darth Maul,54 Han Solo,29 Jabba the Hutt,600 Jango Fett,66 Jar Jar Binks,52 Lando Calrissian,31 Leia Organa (Princess Leia),19 Luke Skywalker,19 Mace Windu,72 Obi-Wan Kenobi,57 Palpatine,82 Qui-Gon Jinn,92 R2-D2,32 Shmi Skywalker,72 Wedge Antilles,21 Yoda,896

You can get an ASCII histogram of the ages as follows using the mmhistogram tool.

$ cut -d, -f2 epiv | mmhistogram -t "Age" Age min:19.00 avg:123.90 med=54.00 max:896.00 dev:211.28 count:21 Age: value |-------------------------------------------------- count 0 | 0 1 | 0 2 | 0 4 | 0 8 | 0 16 |************************************************** 8 32 | ************************* 4 64 | ************************************* 6 128 | ****** 1 256 | 0 512 | ************ 2

Handy for getting a quick sense of the data. (These charts are inspired by the ASCII output from systemtap).

mmwatch

The mmwatch tool is handy if you want to look at output from a command-line tool that provides some snapshot of values, but need to have a rate.

For example, here's df -H on my machine:

$ df -H Filesystem Size Used Avail Capacity iused ifree %iused Mounted on /dev/disk1 250G 222G 28G 89% 54231161 6750085 89% / devfs 384k 384k 0B 100% 1298 0 100% /dev map -hosts 0B 0B 0B 100% 0 0 100% /net map auto_home 0B 0B 0B 100% 0 0 100% /home /dev/disk4 7.3G 50M 7.2G 1% 12105 1761461 1% /Volumes/LANGDON

Now imagine you were interested in understanding the rate of change in iused and ifree. You can with mmwatch. It's just like watch but looks for changing numbers and interprets them as rates:

$ mmwatch 'df -H'

Here's a short GIF showing it working:

mmsum

And the final tool is mmsum that simply sums a list of floating point numbers (one per line).

Suppose you are downloading real-time rainfall data from the UK's Environment Agency and would like to know the total current rainfall. mmsum can help:

$ curl -s 'https://environment.data.gov.uk/flood-monitoring/id/measures?parameter=rainfall' | jq -e '.items[].latestReading.value+0' | ./mmsum 40.2

All these tools can be found on the Cloudflare Github.

Categories: Technology

A container identity bootstrapping tool

Mon, 03/07/2017 - 17:21

Everybody has secrets. Software developers have many. Often these secrets—API tokens, TLS private keys, database passwords, SSH keys, and other sensitive data—are needed to make a service run properly and interact securely with other services. Today we’re sharing a tool that we built at Cloudflare to securely distribute secrets to our Dockerized production applications: PAL.

PAL is available on Github: https://github.com/cloudflare/pal.

Although PAL is not currently under active development, we have found it a useful tool and we think the community will benefit from its source being available. We believe that it's better to open source this tool and allows others to use the code then leave it hidden from view and unmaintained.

Secrets in production

CC BY 2.0 image by Personal Creations

How do you get these secrets to your services? If you’re the only developer, or one of a few on a project, you might put the secrets with your source code in your version control system. But if you just store the secrets in plain text with your code, everyone with access to your source repository can read them and use them for nefarious purposes (for example, stealing an API token and pretending to be an authorized client). Furthermore, distributed version control systems like Git will download a copy of the secret everywhere a repository is cloned, regardless of whether it’s needed there, and will keep that copy in the commit history forever. For a company where many people (including people who work on unrelated systems) have access to source control this just isn’t an option.

Another idea is to keep your secrets in a secure place and then embed them into your application artifacts (binaries, containers, packages, etc.) at build time. This can be awkward for modern CI/CD workflows because it results in multiple parallel sets of artifacts for different environments (e.g. production, staging, development). Once you have artifacts with secrets, they become secret themselves, and you will have to restrict access to the “armed” packages that contain secrets after they’re built. Consider the discovery last year that the source code of Twitter’s Vine service was available in public Docker repositories. Not only was the source code for the service leaked, but the API keys that allow Vine to interact with other services were also available. Vine paid over $10,000 when they were notified about this.

A more advanced technique to manage and deploy secrets is to use a secret management service. Secret management services can be used to create, store and rotate secrets as well as distribute them to applications. The secret management service acts as a gatekeeper, allowing access to some secrets for some applications as prescribed by an access control policy. An application that wants to gain access to a secret authenticates itself to the secret manager, the secret manager checks permissions to see if the application is authorized, and if authorized, sends the secret. There are many options to choose from, including Vault, Keywhiz, Knox, Secretary, dssss and even Docker’s own secret managment service.

Secret managers are a good solution as long as an identity/authorization system is already in place. However, since most authentication systems involve the client already being in possession of a secret, this presents a chicken and egg problem.

Identity Bootstrapping

Once we have verified a service’s identity, we can make access control decisions about what that service can access. Therefore, the real problem we must solve is the problem of bootstrapping service identity.

This problem has many solutions when services are tightly bound to individual machines (for example, we can simply install host-level credentials on each machine or even use a machine’s hardware to identify it. Virtual machine platforms like Amazon AWS have machine-based APIs for host-level identity, like IAM and KMS). Containerized services have a much more fluid lifecycle - instances may appear on many machines and may come and go over time. Furthermore, any number of trusted and untrusted containers might be running on the same host at the same time. So what we need instead is an identity that belongs to a service, not to a machine.

Every Application needs an ID to prove to the bouncer that they’re on the guest list for Club Secret.

Bootstrapping the identity of a service that lives in a container is not a solved problem, and most of the existing are deeply integrated into the container orchestration (Kubernetes, Docker Swarm, Mesos, etc.). We ran into the problem of container identity bootstrapping, and wanted something that worked with or current application deployment stack (Docker/Mesos/Marathon) but wasn’t locked down to a given orchestration platform.

Enter PAL

We use Docker containers to deploy many services across a shared, general-purpose Mesos cluster. To solve the service identity bootstrapping problem in our Docker environment, we developed PAL, which stands for Permissive Action Link, a security device for nuclear weapons. PAL makes sure secrets are only available in production, and only when jobs are authorized to run.

PAL makes it possible to keep only encrypted secrets in the configuration for a service while ensuring that those secrets can only be decrypted by authorized service instances in an approved environment (say, a production or staging environment). If those credentials serve to identify the service requesting access to secrets, PAL becomes a container identity bootstrapping solution that you can easily deploy a secret manager on top of.

How it works

The model for PAL is that the secrets are
provided in an encrypted form and either embedded in containers, or provided as runtime configuration for jobs running in an orchestration framework such as Apache Mesos.

PAL allows secrets to be decrypted at runtime after the service’s identity has been established. These credentials could allow authenticated inter-service communication, which would allow you to keep service secrets in a central repository such as Hashicorp’s Vault, KeyWhiz, or others. The credentials could also be used to issue service-level credentials (such as certificates for an internal PKI). Without PAL, you must distribute the identity credentials, required by tools like these themselves, inside your infrastructure.

PAL consists of two components: a small in-container initialization tool, pal, that requests secrets decryption and installs decrypted secret, and a daemon called pald that runs on every node in the cluster. pal and pald communicate with each other via a Unix socket. The pal tool is set as each job’s entrypoint, and it sends pald the encrypted secrets. pald then identifies the process making the job and determines whether it is allowed to access the requested secret. If so, it decrypts the secret on behalf of the job and returns the plaintext to pal, which installs the plaintext within the calling job’s container as either an environment variable or a file.

PAL currently supports two methods of encryption - PGP and Red October - but it can be extended to support more.

PAL-PGP

PGP is a popular form of encryption that has been around since the early 90s. PAL allows you to use secrets that are encrypted with PGP keys that are installed on the host. The current version of PAL does not apply policies at a per-key level (e.g. only containers with Docker Label A can use key 1), but it could easily be extended to do so.

PAL-Red October

The Red October mode is used for secrets that are very high value and need to be managed manually or with multi-person control. We open sourced Red October back in 2013. It has the nice feature of being able to encrypt a secret in such a way that multiple people are required to authorize the decryption.

In the PAL-RO typical setup, each machine in your cluster will be provisioned a Red October account. Before a container is scheduled to run, the secret owners delegate the ability to decrypt the secret to the host on which the container is going to run. When the container starts, pal calls pald which uses the machine’s Red October credentials to decrypt the secret via a call to the Red October server. Delegations can be of a limited time or for a number of decryptions. Once the delegations are used up, Red October has no way to decrypt the secret.

These two modes have been invaluable for protecting high-value secrets, where Red October provides additional oversight and control. For lower-value secrets, PGP provides a non-interactive experience that works well with ephemeral containers.

Authorization details

An important part of a secret management tool is ensuring that only authorized entities can decrypt a secret. PAL enables you to control which containers can decrypt a secret by leveraging existing code signing infrastructure. Both secrets and containers can be given optional labels that PAL will respect. Labels define which containers can access which secrets—a container must have the label of any secret it accesses. Labels are named references to security policies. An example label could be “production-team-secret” which denotes that a secret should conform to the production team’s secret policy. Labels bind cipher texts to an authorization to decrypt. These authorizations allow you to use PAL to control when and in what environment secrets can be decrypted.

By opening the Unix socket with the option SO_PASSCRED, we enable pald to obtain the process-level credentials (uid, pid, and gid) of the caller for each request. These credentials can then be used to identify containers and assign them a label. Labels allow PAL to consult a predefined policy and authorize containers to receive secrets. To get the list of labels on a container, pald uses the process id (pid) of the calling process to get its cgroups from Linux (by reading and parsing /proc/<pid>/cgroups). The names of the cgroups contain the Docker container id, which we can use to get container metadata via Docker’s inspect call. This container metadata carries a list of labels assigned by the Docker LABEL directive at build time.

Containers and their labels must be bound together using code integrity tools. PAL supports using Docker’s Notary, which confirms that a specific container hash maps to specific metadata like a container’s name and label.

PAL’s present and future

PAL represents one solution for identity bootstrapping for our environment. Other service identity bootstrapping tools bootstrap at the host-level or are highly environmentally coupled. AWS IAM, for example, only works at the level of virtual machines running on AWS; Kubernetes secrets and Docker secrets management only work in Kubernetes and Docker Swarm, respectively. While we’ve developed PAL to work alongside Mesos, we designed it to used as a service identity mechanism for many other environments simply by plugging in new ways for PAL to read identities from service artifacts (containers, packages, or binaries).

Recall the issue where Vine disclosed their source code in their Docker containers on a public repository, Docker Hub. With PAL, Vine could have kept their API keys (or even the entire codebase) encrypted in the container, published that safe version of the container to Docker Hub, and decrypted the code at container startup in their particular production environment.

Using PAL, you can give your trusted containers an identity that allows them to safely receive secrets only in production, without the risks associated with other secret distribution methods. This identity can be a secret like a cryptographic key, allowing your service to decrypt its sensitive configuration, or it could be a credential that allows it to access sensitive services such as secret managers or CAs. PAL solves a key bootstrapping problem for service identity, making it simple to run trusted and untrusted containers side-by-side while ensuring that your secrets are safe.

Credits

PAL was created by Joshua Kroll, Daniel Dao, and Ben Burkert with design prototyping by Nick Sullivan. This post was adapted from an internal blog by Joshua Kroll and a presentation I made at O’Reilly Security Amsterdam in 2016 , and BSides Las Vegas.

Categories: Technology

Stupidly Simple DDoS Protocol (SSDP) generates 100 Gbps DDoS

Wed, 28/06/2017 - 16:45

Last month we shared statistics on some popular reflection attacks. Back then the average SSDP attack size was ~12 Gbps and largest SSDP reflection we recorded was:

  • 30 Mpps (millions of packets per second)
  • 80 Gbps (billions of bits per second)
  • using 940k reflector IPs

This changed a couple of days ago when we noticed an unusually large SSDP amplification. It's worth deeper investigation since it crossed the symbolic threshold of 100 Gbps.

The packets per second chart during the attack looked like this:

The bandwidth usage:

This packet flood lasted 38 minutes. According to our sampled netflow data it utilized 930k reflector servers. We estimate that the during 38 minutes of the attack each reflector sent 112k packets to Cloudflare.

The reflector servers are across the globe, with a large presence in Argentina, Russia and China. Here are the unique IPs per country:

$ cat ips-nf-ct.txt|uniq|cut -f 2|sort|uniq -c|sort -nr|head 439126 CN 135783 RU 74825 AR 51222 US 41353 TW 32850 CA 19558 MY 18962 CO 14234 BR 10824 KR 10334 UA 9103 IT ...

The reflector IP distribution across ASNs is typical. It pretty much follows the world’s largest residential ISPs:

$ cat ips-nf-asn.txt |uniq|cut -f 2|sort|uniq -c|sort -nr|head 318405 4837 # CN China Unicom 84781 4134 # CN China Telecom 72301 22927 # AR Telefonica de Argentina 23823 3462 # TW Chunghwa Telecom 19518 6327 # CA Shaw Communications Inc. 19464 4788 # MY TM Net 18809 3816 # CO Colombia Telecomunicaciones 11328 28573 # BR Claro SA 7070 10796 # US Time Warner Cable Internet 6840 8402 # RU OJSC "Vimpelcom" 6604 3269 # IT Telecom Italia 6377 12768 # RU JSC "ER-Telecom Holding" ... What's SSDP anyway?

The attack was composed of UDP packets with source port 1900. This port is used by the SSDP and is used by the UPnP protocols. UPnP is one of the zero-configuration networking protocols. Most likely your home devices support it, allowing them to be easily discovered by your computer or phone. When a new device (like your laptop) joins the network, it can query the local network for specific devices, like internet gateways, audio systems, TVs, or printers. Read more on how UPnP compares to Bonjour.

UPnP is poorly standardised, but here's a snippet from the spec about the M-SEARCH frame - the main method for discovery:

When a control point is added to the network, the UPnP discovery protocol allows that control point to search for devices of interest on the network. It does this by multicasting on the reserved address and port (239.255.255.250:1900) a search message with a pattern, or target, equal to a type or identifier for a device or service.

Responses to M-SEARCH frame:

To be found by a network search, a device shall send a unicast UDP response to the source IP address and port that sent the request to the multicast address. Devices respond if the ST header field of the M-SEARCH request is “ssdp:all”, “upnp:rootdevice”, “uuid:” followed by a UUID that exactly matches the one advertised by the device, or if the M-SEARCH request matches a device type or service type supported by the device.

This works in practice. For example, my Chrome browser regularly asks for a Smart TV I guess:

$ sudo tcpdump -ni eth0 udp and port 1900 -A IP 192.168.1.124.53044 > 239.255.255.250.1900: UDP, length 175 M-SEARCH * HTTP/1.1 HOST: 239.255.255.250:1900 MAN: "ssdp:discover" MX: 1 ST: urn:dial-multiscreen-org:service:dial:1 USER-AGENT: Google Chrome/58.0.3029.110 Windows

This frame is sent to a multicast IP address. Other devices listening on that address and supporting this specific ST (search-target) multiscreen type are supposed to answer.

Apart from queries for specific device types, there are two "generic" ST query types:

  • upnp:rootdevice: search for root devices
  • ssdp:all: search for all UPnP devices and services

To emulate these queries you can run this python script (based on this work):

#!/usr/bin/env python2 import socket import sys dst = "239.255.255.250" if len(sys.argv) > 1: dst = sys.argv[1] st = "upnp:rootdevice" if len(sys.argv) > 2: st = sys.argv[2] msg = [ 'M-SEARCH * HTTP/1.1', 'Host:239.255.255.250:1900', 'ST:%s' % (st,), 'Man:"ssdp:discover"', 'MX:1', ''] s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM, socket.IPPROTO_UDP) s.settimeout(10) s.sendto('\r\n'.join(msg), (dst, 1900) ) while True: try: data, addr = s.recvfrom(32*1024) except socket.timeout: break print "[+] %s\n%s" % (addr, data)

On my home network two devices show up:

$ python ssdp-query.py [+] ('192.168.1.71', 1026) HTTP/1.1 200 OK CACHE-CONTROL: max-age = 60 EXT: LOCATION: http://192.168.1.71:5200/Printer.xml SERVER: Network Printer Server UPnP/1.0 OS 1.29.00.44 06-17-2009 ST: upnp:rootdevice USN: uuid:Samsung-Printer-1_0-mrgutenberg::upnp:rootdevice [+] ('192.168.1.70', 36319) HTTP/1.1 200 OK Location: http://192.168.1.70:49154/MediaRenderer/desc.xml Cache-Control: max-age=1800 Content-Length: 0 Server: Linux/3.2 UPnP/1.0 Network_Module/1.0 (RX-S601D) EXT: ST: upnp:rootdevice USN: uuid:9ab0c000-f668-11de-9976-000adedd7411::upnp:rootdevice The firewall

Now that we understand the basics of SSDP, understanding the reflection attack should be easy. You see, there are in fact two ways of delivering the M-SEARCH frame:

  • what we presented, over the multicast address
  • directly to a UPnP/SSDP enabled host on a normal unicast address

The latter method works. We can specifically target my printer IP address:

$ python ssdp-query.py 192.168.1.71 [+] ('192.168.1.71', 1026) HTTP/1.1 200 OK CACHE-CONTROL: max-age = 60 EXT: LOCATION: http://192.168.1.71:5200/Printer.xml SERVER: Network Printer Server UPnP/1.0 OS 1.29.00.44 06-17-2009 ST: upnp:rootdevice USN: uuid:Samsung-Printer-1_0-mrgutenberg::upnp:rootdevice

Now the problem is easily seen: the SSDP protocol does not check whether the querying party is in the same network as the device. It will happily respond to an M-SEARCH delivered over the public Internet. All it takes is a tiny misconfiguration in a firewall - port 1900 UDP open to the world - and a perfect target for UDP amplification will be available.

Given a misconfigured target our script will happily work over the internet:

$ python ssdp-query.py 100.42.x.x [+] ('100.42.x.x', 1900) HTTP/1.1 200 OK CACHE-CONTROL: max-age=120 ST: upnp:rootdevice USN: uuid:3e55ade9-c344-4baa-841b-826bda77dcb2::upnp:rootdevice EXT: SERVER: TBS/R2 UPnP/1.0 MiniUPnPd/1.2 LOCATION: http://192.168.2.1:40464/rootDesc.xml The amplification

The real damage is done by the ssdp:all ST type though. These responses are much larger:

$ python ssdp-query.py 100.42.x.x ssdp:all [+] ('100.42.x.x', 1900) HTTP/1.1 200 OK CACHE-CONTROL: max-age=120 ST: upnp:rootdevice USN: uuid:3e55ade9-c344-4baa-841b-826bda77dcb2::upnp:rootdevice EXT: SERVER: TBS/R2 UPnP/1.0 MiniUPnPd/1.2 LOCATION: http://192.168.2.1:40464/rootDesc.xml [+] ('100.42.x.x', 1900) HTTP/1.1 200 OK CACHE-CONTROL: max-age=120 ST: urn:schemas-upnp-org:device:InternetGatewayDevice:1 USN: uuid:3e55ade9-c344-4baa-841b-826bda77dcb2::urn:schemas-upnp-org:device:InternetGatewayDevice:1 EXT: SERVER: TBS/R2 UPnP/1.0 MiniUPnPd/1.2 LOCATION: http://192.168.2.1:40464/rootDesc.xml ... 6 more response packets....

In this particular case, a single SSDP M-SEARCH packet triggered 8 response packets. tcpdump view:

$ sudo tcpdump -ni en7 host 100.42.x.x -ttttt 00:00:00.000000 IP 192.168.1.200.61794 > 100.42.x.x.1900: UDP, length 88 00:00:00.197481 IP 100.42.x.x.1900 > 192.168.1.200.61794: UDP, length 227 00:00:00.199634 IP 100.42.x.x.1900 > 192.168.1.200.61794: UDP, length 299 00:00:00.202938 IP 100.42.x.x.1900 > 192.168.1.200.61794: UDP, length 295 00:00:00.208425 IP 100.42.x.x.1900 > 192.168.1.200.61794: UDP, length 275 00:00:00.209496 IP 100.42.x.x.1900 > 192.168.1.200.61794: UDP, length 307 00:00:00.212795 IP 100.42.x.x.1900 > 192.168.1.200.61794: UDP, length 289 00:00:00.215522 IP 100.42.x.x.1900 > 192.168.1.200.61794: UDP, length 291 00:00:00.219190 IP 100.42.x.x.1900 > 192.168.1.200.61794: UDP, length 291

That target exposes 8x packet count amplification and 26x bandwidth amplification. Sadly, this is typical for SSDP.

IP Spoofing

The final step for the attack is to fool the vulnerable servers to flood the target IP - not the attacker. For that the attacker needs to spoof the source IP address on their queries.

We probed the reflector IPs used in the shown 100 Gbps+ attack. We found that out of the 920k reflector IPs, only 350k (38%) still respond to SSDP probes.

Out of the reflectors that responded, each sent on average 7 packets:

$ cat results-first-run.txt|cut -f 1|sort|uniq -c|sed -s 's#^ \+##g'|cut -d " " -f 1| ~/mmhistogram -t "Response packets per IP" -p Response packets per IP min:1.00 avg:6.99 med=8.00 max:186.00 dev:4.44 count:350337 Response packets per IP: value |-------------------------------------------------- count 0 | ****************************** 23.29% 1 | **** 3.30% 2 | ** 2.29% 4 |************************************************** 38.73% 8 | ************************************** 29.51% 16 | *** 2.88% 32 | 0.01% 64 | 0.00% 128 | 0.00%

The response packets had 321 bytes (+/- 29 bytes) on average. Our request packets had 110 bytes.

According to our measurements with the ssdp:all M-SEARCH attacker would be able to achieve:

  • 7x packet number amplification
  • 20x bandwidth amplification

We can estimate the 43 Mpps/112 Gbps attack was generated with roughly:

  • 6.1 Mpps of spoofing capacity
  • 5.6 Gbps of spoofed bandwidth

In other words: a single well connected 10 Gbps server able to perform IP spoofing can deliver a significant SSDP attack.

More on the SSDP servers

Since we probed the vulnerable SSDP servers, here are the most common Server header values we received:

104833 Linux/2.4.22-1.2115.nptl UPnP/1.0 miniupnpd/1.0 77329 System/1.0 UPnP/1.0 IGD/1.0 66639 TBS/R2 UPnP/1.0 MiniUPnPd/1.2 12863 Ubuntu/7.10 UPnP/1.0 miniupnpd/1.0 11544 ASUSTeK UPnP/1.0 MiniUPnPd/1.4 10827 miniupnpd/1.0 UPnP/1.0 8070 Linux UPnP/1.0 Huawei-ATP-IGD 7941 TBS/R2 UPnP/1.0 MiniUPnPd/1.4 7546 Net-OS 5.xx UPnP/1.0 6043 LINUX-2.6 UPnP/1.0 MiniUPnPd/1.5 5482 Ubuntu/lucid UPnP/1.0 MiniUPnPd/1.4 4720 AirTies/ASP 1.0 UPnP/1.0 miniupnpd/1.0 4667 Linux/2.6.30.9, UPnP/1.0, Portable SDK for UPnP devices/1.6.6 3334 Fedora/10 UPnP/1.0 MiniUPnPd/1.4 2814 1.0 2044 miniupnpd/1.5 UPnP/1.0 1330 1 1325 Linux/2.6.21.5, UPnP/1.0, Portable SDK for UPnP devices/1.6.6 843 Allegro-Software-RomUpnp/4.07 UPnP/1.0 IGD/1.00 776 Upnp/1.0 UPnP/1.0 IGD/1.00 675 Unspecified, UPnP/1.0, Unspecified 648 WNR2000v5 UPnP/1.0 miniupnpd/1.0 562 MIPS LINUX/2.4 UPnP/1.0 miniupnpd/1.0 518 Fedora/8 UPnP/1.0 miniupnpd/1.0 372 Tenda UPnP/1.0 miniupnpd/1.0 346 Ubuntu/10.10 UPnP/1.0 miniupnpd/1.0 330 MF60/1.0 UPnP/1.0 miniupnpd/1.0 ...

The most common ST header values we saw:

298497 upnp:rootdevice 158442 urn:schemas-upnp-org:device:InternetGatewayDevice:1 151642 urn:schemas-upnp-org:device:WANDevice:1 148593 urn:schemas-upnp-org:device:WANConnectionDevice:1 147461 urn:schemas-upnp-org:service:WANCommonInterfaceConfig:1 146970 urn:schemas-upnp-org:service:WANIPConnection:1 145602 urn:schemas-upnp-org:service:Layer3Forwarding:1 113453 urn:schemas-upnp-org:service:WANPPPConnection:1 100961 urn:schemas-upnp-org:device:InternetGatewayDevice: 100180 urn:schemas-upnp-org:device:WANDevice: 99017 urn:schemas-upnp-org:service:WANCommonInterfaceConfig: 98112 urn:schemas-upnp-org:device:WANConnectionDevice: 97246 urn:schemas-upnp-org:service:WANPPPConnection: 96259 urn:schemas-upnp-org:service:WANIPConnection: 93987 urn:schemas-upnp-org:service:Layer3Forwarding: 91108 urn:schemas-wifialliance-org:device:WFADevice: 90818 urn:schemas-wifialliance-org:service:WFAWLANConfig: 35511 uuid:IGD{8c80f73f-4ba0-45fa-835d-042505d052be}000000000000 9822 urn:schemas-upnp-org:service:WANEthernetLinkConfig:1 7737 uuid:WAN{84807575-251b-4c02-954b-e8e2ba7216a9}000000000000 6063 urn:schemas-microsoft-com:service:OSInfo:1 ...

The vulnerable IPs are seem to be mostly unprotected home routers.

Open SSDP is a vulnerability

It's not a novelty that allowing UDP port 1900 traffic from the Internet to your home printer or such is not a good idea. This problem has been known since at least January 2013:

Authors of SSDP clearly didn't give any thought to UDP amplification potential. There are a number of obvious recommendations about future use of SSDP protocol:

  • The authors of SSDP should answer if there is any real world use of unicast M-SEARCH queries. From what I understand M-SEARCH only makes practical sense as a multicast query in local area network.

  • Unicast M-SEARCH support should be either deprecated or at least rate limited, in similar way to DNS Response Rate Limit techniques.

  • M-SEARCH responses should be only delivered to local network. Responses routed over the network make little sense and open described vulnerability.

In the meantime we recommend:

  • Network administrators should ensure inbound UDP port 1900 is blocked on firewall.

  • Internet service providers should never allow IP spoofing to be performed on their network. IP spoofing is the true root cause of the issue. See the infamous BCP38.

  • Internet service providers should allow their customers to use BGP flowspec to rate limit inbound UDP source port 1900 traffic, to relieve congestion during large SSDP attacks.

  • Internet providers should internally collect netflow protocol samples. The netflow is needed to identify the true source of the attack. With netflow it's trivial to answer questions like: "Which of my customers sent 6.4Mpps of traffic to port 1900?". Due to privacy concerns we recommend collecting netflow samples with largest possible sampling value: 1 in 64k packets. This will be sufficient to track DDoS attacks while preserving decent privacy of single customer connections.

  • Developers should not roll out their own UDP protocols without careful consideration of UDP amplification problems. UPnP should be properly standardized and scrutinized.

  • End users are encouraged to use the script scan their network for UPnP enabled devices. Consider if these devices should be allowed to access to the internet.

Furthermore, we prepared on online checking website. Click if you want to know if your public IP address has a vulnerable SSDP service:

Sadly, the most unprotected routers we saw in the described attack were from China, Russia and Argentina, places not historically known for the most agile internet service providers.

Summary

Cloudflare customers are fully protected from SSDP and other L3 amplification attacks. These attacks are nicely deflected by Cloudflare anycast infrastructure and require no special action. Unfortunately the raising of SSDP attack sizes might be a tough problem for other Internet citizens. We should encourage our ISPs to stop IP spoofing within their network, support BGP flowspec and configure in netflow collection.

This article is a joint work of Marek Majkowski and Ben Cartwright-Cox.

Dealing with large attacks sounds like fun? Join our world famous DDoS team in London, Austin, San Francisco and our elite office in Warsaw, Poland.

Categories: Technology

Announcing the Cloudflare Apps Platform and Developer Fund

Tue, 27/06/2017 - 14:01

When we started Cloudflare we had no idea if anyone would validate our core idea. Our idea was what that everyone should have the ability to be as fast and secure as the Internet giants like Google, Facebook, and Microsoft. Six years later, it's incredible how far that core idea has taken us.

CC BY-SA 2.0 image by Mobilus In Mobili

Today, Cloudflare runs one of the largest global networks. We have data centers in 115 countries around the world and continue to expand. We've built a core service that delivers performance, security, availability, and insight to more than 6 million users.

Democratizing the Internet

From the beginning, our goal has been to democratize the Internet. Today we're taking another step toward that goal with the launch of the Cloudflare Apps Platform and the Cloudflare Developer Fund. To understand that, you have to understand where we started.

When we started Cloudflare we needed two things: a collection of users for the service, and finances to help us fund our development. In both cases, people were taking a risk on Cloudflare. Our first users came from Project Honey Pot, which Lee Holloway and I created back in 2004. Members of that Open Source community use its service to track online hackers and spammers and many served as Cloudflare's initial alpha customers. We reached out to the project's users and they gave us feedback on how to make Cloudflare's nascent service valuable to them.

Once we proved the technology worked, we were fortunate to work with some of the Internet's best investors. Venrock, which provided the first funding for companies like Apple and Intel; Pelion Ventures, which was born out of networking giant Novell; and NEA, which has provided funding for some of the most successful startups in the last 25 years, all provided both financial resources and invaluable counsel as we built Cloudflare.

Running at the Edge

Prior to today, if you wanted to write code that took full advantage of Cloudflare's global network you needed to be a Cloudflare employee. Our team is able to run code on thousands of servers in hundreds of locations around the world and modify our customers' packets as they flow through our network in order to make their web site, application, or API faster and more secure. Today, we open that access to the rest of the world.

CC BY 2.0 image by Nik Cubrilovic

Today we're introducing the Cloudflare Apps Platform. The Apps Platform allows third parties to develop applications that can be delivered across Cloudflare's edge to any of our millions of customers. When we started Cloudflare, we had the reach of the thousands of Project Honey Pot users. The Cloudflare Apps Platform amplifies this and gives developers the ability to reach any of Cloudflare's more than six million current users.

Cloudflare sits in a unique place. Our customers' packets flow through our network before reaching origin servers and in the way back to clients. That enables us to filter out bad actors, introduce new compression, change protocols, and modify and inject content. The Cloudflare Apps Platform enables developers to do the same, and produce applications that help build a better Internet.

The Funds You Need

But access to users isn't enough to build a great company. You also need access to capital to fund your development. That's why we're excited to announce that Cloudflare's earliest investors — Venrock, Pelion, and NEA — have teamed up to create the $100 million Cloudflare Developer Fund.

CC BY-SA 2.0 image by Micah Elizabeth Scott

Developers who create applications using Cloudflare's Apps Platform may apply for venture capital funding through the Developer Fund. The venture capitalists who are participating in the Developer Fund understand the scale of Cloudflare, our commitment to the platform, and the opportunity that companies have to build impactful applications leveraging our global infrastructure.

Building Cloudflare was a community effort. We can't wait to see what developers build now that we've opened Cloudflare's infrastructure and access to investors to a greater audience.

Categories: Technology

Announcing the New Cloudflare Apps

Tue, 27/06/2017 - 01:05

Today we’re excited to announce the next generation of Cloudflare Apps. Cloudflare Apps is an open platform of tools to build a high quality website. It’s a place where every website owner can select from a vast catalog of Apps which can improve their websites and internet properties in every way imaginable. Selected apps can be previewed and installed instantly with just a few clicks, giving every website owner the power of technical expertise, and every developer the platform only Cloudflare can provide.

Apps Diagram

Apps can modify content and layout on the page they’re installed on, communicate with external services and dramatically improve websites. Imagine Google Analytics, YouTube videos, in-page chat tools, widgets, themes and every other business which can be built by improving websites. All of these and more can be done with Cloudflare Apps.

Cloudflare Apps makes it possible for a developer in her basement to build the next great new tool and get it on a million websites overnight. With Cloudflare Apps, even the smallest teams can get massive distribution for their apps on the web so that the best products win. With your help we will make it possible for developers like you to build a new kind of business.

Apps makes it possible for the more than six million Internet properties on Cloudflare’s network to take advantage of what you can build. Even non-technical users can preview and install apps, opening up a whole new massive audience to SaaS software companies and independent developers. Unlike other solutions, Apps get served from the site’s original domain, allowing you to get all the performance benefits of HTTP/2, TCP Pipelining and the Cloudflare edge.

We’re working with Oracle, Spotify, Pinterest, Zendesk and more great companies for launch. We can’t wait to see what apps you will create.

Develop an App › Live Preview

Before an app is installed we present the user with a live preview of the app to allow them to see what it will look like on their site. Using this preview they can customize it and ensure that it works how they wish. During the preview, users can also customize colors, placement and other options defined by the developer. As users change options in an app the preview gets updated. Even better, supporting this preview often requires no additional work beyond what’s required to build your app.

Live Preview is not just limited to Cloudflare users; as a developer you can use it to show off your apps to any user on any website.

Logins and Registrations using OAuth

OAuth support makes it easy to allow users to log into or register with your service leaving the installation page. That means users don’t have to copy-paste API keys or embed codes from your service anymore. We’ve found that allowing users to register accounts greatly increases the likelihood that a user who preview’s an app will install it.

Bidirectional Webhooks

Like webhooks you might be used to, Cloudflare Apps supports hooks which allow you to be notified when your users preview or install your app. Even better though, our hooks allow developers to modify installations as they happen. When we fire a hook, you can respond with the changes we should make to that user’s installation experience or their installation’s options. This allows you to tailor every user's installation experience to their specific account and records. For example Google Analytics allows their users to select from their analytics accounts while installing and Cover Message allows users to chose which of their Mailchimp or Constant Contact lists they’d like new leads.

Selling Apps

In the world of mobile apps it’s possible to make money as an independent developer, building apps and selling them. On the web, it’s much harder for a developer to build a business. Growing a sales and marketing team is time consuming and distracting, and ultimately results in the team with the biggest budget getting the most customers.

Cloudflare Apps allows anyone to not just build an app which gets installed onto websites, but also to sell it. You can charge any amount you’d like for your app, all recurring monthly. You can even offer multiple plans for users of different calibers or who need different features.
Providing recurring revenue to app developers makes it possible for them to make a living building their apps, and create sustainable business in the process.

Building an App

For Cloudflare Apps we've built a new Documentation site including example apps and screencasts.

We have also built an App Creator to allow developers to start developing their apps right away without having to set up a development environment. The Creator tool watches your app’s files on your computer and updates the app’s preview as you make changes live.

Develop an App ›
Categories: Technology

Project Jengo: Explaining Challenges to Patent Validity (and a looming threat)

Fri, 23/06/2017 - 14:00
 Explaining Challenges to Patent Validity (and a looming threat)

We’ve written a couple times about the problem of patent trolls, and what we are doing in response to the first case a troll filed against Cloudflare. We set a goal to find prior art on all 38 Blackbird Tech patents and applications and then obtain a legal determination that Blackbird Tech’s patents are invalid. Such a determination will end Blackbird’s ability to file or threaten to file abusive patent claims, against us or anyone else.

 Explaining Challenges to Patent Validity (and a looming threat)CC BY-SA 2.0 image by hyku

The patent system exists to reward inventors, so it is no surprise that a patent has to claim something new — an “invention.” Sometimes the United States Patent and Trademark Office (USPTO) — the agency that administers the patent system — mistakenly issues patents that do not claim anything particularly new. The patent examiner may not be aware that the proposed “invention” was already in use in the industry, and the patent applicant (the only party in the process) doesn’t have an incentive to share that information. Often, the USPTO issues patents that are too vague and can later be broadly interpreted by patent owners to cover different and subsequent technologies that could not otherwise have been patented in the first place and couldn’t have been anticipated by the patent examiner.

Bad patents are bad for innovation, so how do we get rid of them?

The first step to invalidating the Blackbird Tech patents is to collect prior art on the patents. Prior art can include almost any type of evidence showing that the “invention” described in a patent already existed when the patent was filed. Common types of prior art include earlier-filed patents, journal articles, and commercial products (e.g., computer programs) that predate the challenged patent.

Prior art can be used to challenge the validity of the patent in several ways. First, we can challenge the validity of the patent asserted against us (the ‘335 patent) in the patent infringement case that Blackbird Tech filed against us. We can also challenge any of Blackbird’s patent in proceedings at the USPTO, through either a petition for inter partes review (IPR) or ex parte reexamination.

In this blog post, we walk through the administrative challenges to patents at the USPTO, especially the IPR, which we consider to be one of the most effective ways to shield ourselves and others from patent troll attacks on innovation.

No gridlock here: Congress agrees that bad patents must go

The IPR process was created by Congress in 2011 through the America Invents Act with overwhelming bipartisan support, passing the House by a vote of 304 to 117 and the Senate 89 to 9, and was signed by President Obama on September 16, 2011. The law was based on a growing awareness that it had become too easy to obtain low quality patents and too hard to challenge them in expensive and long court proceedings. As technologies continued to grow and develop, the patent system became inundated with bad patents issued during the dot-com era, many of which included vague claims relating to rudimentary web technologies. Over a decade later, these claims were being applied out of context and asserted broadly to shake down innovative companies using vastly more sophisticated technologies.

While few of the patents used in these shake-downs would have survived an invalidity challenge in front of a judge, the prohibitively high cost and delay of litigation meant that the vast majority of patent infringement claims settled outside of court for nuisance value. Defending a patent lawsuit in federal court costs millions of dollars (regardless of whether you ultimately have to pay a Plaintiff) and can take 2-3 years to resolve. As a result, courts rarely had an opportunity to invalidate bad patents. The patents lived on, and the nefarious cycle continued.

Seeking to break the cycle of misuse, Congress stepped in and created IPR to provide a streamlined process for invalidating bad patents. IPR is an administrative proceeding conducted by the USPTO itself, rather than a federal court. IPR proceedings have a limited scope and harness the USPTO’s special expertise in resolving questions of patent validity. As a result, a typical IPR proceeding can be completed in about half the time and at one tenth of the cost of a federal court case.

To date, over 6,000 IPR petitions have been filed, of which over 1,600 have resulted in written decisions by the USPTO. Many other cases either settled or remain pending. IPR petitioners have had a high rate of success. Of the 1,600 cases that reached a written decision, 1,300 of those cases were successful in having the USPTO invalidate at least one claim, and 1,000 were successful in having the entire patent invalidated.

Looking under the hood of an IPR

To initiate an IPR, the party challenging the patent files a petition with the USPTO identifying the patents, explaining why the patents should be found invalid, and presenting the relevant prior art. The owner of the challenged patent has an opportunity file a written response to the petition.

 Explaining Challenges to Patent Validity (and a looming threat)Image by Katherine Tompkins

The USPTO actually makes two major decisions during an IPR. The first is a threshold determination of whether to institute the IPR at all. The USPTO will institute an IPR if the petition (combined with the patent owner’s response, if one was filed) establishes a “reasonable likelihood that the petitioner will prevail” in invalidating one or more claims. One unusual feature of the IPR process is that the USPTO’s institution decision cannot be appealed. Both parties must live with whatever institution decision the USPTO makes. Since 2015, the USPTO has instituted about two thirds of all IPR petitions.

After an IPR is instituted, both sides have an opportunity to gather evidence, file motions, and present arguments. The USPTO then issues a final written decision on the validity of the patent. Unlike the institution decision, the final written decision of an IPR is appealable to the Federal Circuit Court of Appeals (a specialized appeals court that, among other things, hears appeals in most cases that involve patents). The time between the institution decision and the final written decision should not exceed a year, making IPR exceptionally compact compared to most full court proceedings.

In addition to speed, the legal standards that the USPTO applies in IPR proceedings are more favorable to the patent challenger than those applied in federal court:

  • The burden of proof for invalidating a patent in an IPR is lower than in federal court. The petitioner in an IPR proceeding must meet the “preponderance of the evidence” standard. This is the same standard of proof that applies in the vast majority of civil (non-criminal) cases. Simply put, whichever side has more convincing evidence wins. By contrast, a party raising an invalidity challenge in federal court must meet a higher “clear and convincing evidence” standard to prove that the patent is invalid.

  • The “strike zone” for prior art is bigger in an IPR than in federal court. The USPTO interprets challenged patents broadly — using the “broadest reasonable interpretation” standard — making it relatively easy to find prior art that falls within the scope of the patent. This makes sense as patent trolls often take the broadest possible interpretation of their patents when they decide whom to sue, thereby creating pressure for those parties to settle. Federal courts, on the other hand, interpret the challenged patent more narrowly, through the eyes of a hypothetical “person of ordinary skill in the art” (POSITA). A POSITA is someone who has experience in the technological field of the patent and can use technical context to narrow the scope of the patent. This narrower scope makes it harder to find prior art that falls within the scope of the patent.

  • Although the legal standards in an IPR proceeding tend to favor the challenger, other aspects of IPR proceedings can favor the patent owners. For example, unlike federal courts, IPR proceedings provide an opportunity for patent owners to amend their claims to circumvent the challenger’s prior art.

Ex parte reexamination forces abusive patent owners to defend their patents

For all of the built-in efficiencies of the IPR process, an IPR still closely resembles a traditional court case in many important respects. For example, an IPR generally requires a petitioner to undertake a full slate of heavy-duty legal tasks, like writing briefs, filing motions, and making oral arguments. As a result, even though going through an IPR process is far less painful and expensive than being dragged through the federal courts, it still involves significant legal fees and demands a high level of attention that could otherwise be spent innovating.

Enter ex parte reexamination, which strips down the process for challenging patent validity even further. In ex parte reexamination, the party challenging the patent files a petition with the USPTO in much the same manner as an IPR. However, once the petition is filed, the petitioner steps out of the picture entirely. Instead, the USPTO reassesses the validity of the challenged patent by engaging in a semi-collaborative back-and-forth between the USPTO and the patent owner, without the challenger’s continued involvement. This back-and-forth is virtually identical to the original examination process between the USPTO and applicant that led to the patent being issued in the first place.

Like an IPR petition, the petition for ex parte reexamination has to cast sufficient doubt on the validity of the patent before the USPTO will order an ex parte reexamination. In this case, the legal standard that the USPTO uses is whether the petition establishes a “substantial new question of patentability.” To date, over 90% of petitions for ex parte reexamination have been granted.

With a relatively low up-front investment and no continuing costs incurred by the petitioner, an ex parte reexamination has the advantage of forcing patent owners back to the USPTO to defend their patents or risk invalidation. And more often than not, the patent owner in an ex parte reexamination will end up making significant concessions along the way -- amendments that narrow the scope of a patent. To date, 67% of ex parte reexaminations have resulted in at least an amendment to the patent, and another 12% have been successful in having an entire patent invalidated. These concessions, even seemingly minor ones, can successfully thwart a dubious claim of patent infringement and severely limit the owner’s ability to pursue future claims of infringement.

USPTO review of patent validity (IPR and ex parte) faces an uncertain future

In a surprise decision last week, the U.S. Supreme Court announced that it has decided to hear the case of Oil States vs. Greene’s Energy Group, et al. The key questions that the Court will decide is whether or not the IPR process is constitutional. The stakes are significant: if the Supreme Court finds IPR unconstitutional, then the entire system of administrative review by the USPTO — including IPR and ex parte — will be abandoned.

Oil States had a patent related to — you guessed it — oil drilling technology invalidated in an IPR proceeding and subsequently lost an appeal in a lower federal court. It has asked the Supreme Court to hear the case because it feels that once the USPTO issues a patent, an inventor has a constitutionally protected property right that — under Article III of the U.S. Constitution (which outlines the powers of the judicial branch of the government), and the 7th Amendment (which addresses the right to a jury trial in certain types of cases) — cannot be revoked without judicial intervention.

 Explaining Challenges to Patent Validity (and a looming threat)Image by Eric Kounce

It’s important to understand what it means for the Supreme Court to agree to hear a case — or as lawyers would insist: grant a writ of certiorari. The Supreme Court only hears the case it wants to in order to resolve issues of sufficient importance or disagreement. Over the past several years, the Court has only agreed to hear about 1% of the cases submitted to it. Four Justices have to vote in favor before a writ of certiorari is granted. Grants are almost always made where there is disagreement among the federal appellate courts or on issues taken from the front page.

Oil States is not the first case to make its arguments against the IPR system, but it is the first that the Supreme Court has agreed to hear. Notably, lower courts have repeatedly rejected the stance taken by the petitioner in Oil States. This is more problematic than it seems because if the Supreme Court was comfortable with the status quo, there would be no reason for them to hear the case. The decision to hear the case suggests that at least the four justices who voted to grant the writ of certiorari are considering going in another direction (though perhaps predicting the Supreme Court’s views is better left to science).

A quick fix of the patent system is better than no fix at all

Because Congressional action creating the IPR was a prudent and resoundingly popular action to blunt abuses of the patent system, there are several concerning aspects of a potential invalidation of the IPR system by the Supreme Court. The patent troll system is based on the idea that American companies would rather pay billions of dollars to resolve meritless claims of infringement than suffer the delay and expense of pursuing those claims in federal court.

In response, members of Congress set up a better--though still not perfect--system for having these disputes resolved. The court system should have taken steps to improve its own processes to make them less susceptible to being used to distort the ability to have legitimate disputes resolved. Instead, a decision by the Court to invalidate the IPR process would be the equivalent of the Court failing to clean up its own mess and then prohibiting anyone else from doing so.

And it seems a bit incongruous that rights granted through a system created by statute which gave decision making authority to a patent examiner at the USPTO—the patent system—cannot be modified or altered through a system created by a subsequent statute and using the benefit of actual experience. USPTO reviewers grant patent rights, so why are other USPTO reviewers prohibited from changing that designation?

Getting ourselves into the game

 Explaining Challenges to Patent Validity (and a looming threat)CC BY-SA 2.0 image by meanie

We will be closely monitoring developments in Oil States and expect that we will contribute to an Amicus (or “friend of the court”) brief making sure the justices are aware of our position on this issue. Our hope is that the justices voting to grant the writ of certiorari were only considering the implications on the parties to that case (neither of which is an NPE), and not fully considering the larger community of innovative companies that are helped by the IPR system.

While we await a decision in Oil States, expect to see Cloudflare initiate IPR and ex parte proceedings against Blackbird Tech patents in the coming months. In the meantime, you can continue to support our efforts by searching for prior art on the Blackbird Tech patents, and you can engage in the political process by supporting efforts to improve the patent litigation process.

Categories: Technology

When the Internet (Officially) Became the Public Square

Wed, 21/06/2017 - 14:00

Souvenir Postcard by unknown

Sometimes, well-intended efforts to prevent unacceptable behavior run into the reality of what it means to have an open and free society. That is what happened at the Supreme Court on Monday.

The Supreme Court issued an opinion confirming something we at Cloudflare have long believed -- that the First Amendment protects access to the Internet. Using sweeping language, Justice Kennedy compared internet access to access to a street or park, "essential venues for public gatherings to celebrate some views, to protest others, or simply to learn and inquire,” and concluded that "to foreclose access to social media altogether is to prevent the user from engaging in the legitimate exercise of First Amendment rights."

We share this view of the internet as a forum to discuss and debate ideas, and believe that the Court’s opinion is an important reaffirmation of the free speech principles we support.

The Packingham Case

Like many other First Amendment cases, the law at the heart of the Packingham v. North Carolina case presents complex questions about how to protect the community in ways consistent with the right to free speech.

In 2008, North Carolina passed a law making it a serious criminal offense for a registered sex offender to access certain social media sites that included children as members. Lester Packingham Jr., the defendant in the case, had registered as a sex offender after pleading guilty in 2002 to having sex with a 13 year old when he was a 21 year old college student.

Packingham was charged with a violation of the North Carolina law after he posted a statement on Facebook expressing his relief about the dismissal of a state court traffic ticket. After his conviction, Packingham appealed, arguing that the law was unconstitutional.

The Supreme Court struck down the law as a violation of the First Amendment, which, among other things, prohibits government action (“shall make no law”) that inhibits free expression or assembly. Although all eight justices to rule on the issue (the newest Justice, Neil Gorsuch, didn’t participate in this decision) agreed that the North Carolina law was unconstitutional, the Justices disagreed on the scope of First Amendment protections.

Writing on behalf of five members of the Court, Justice Kennedy emphasized the importance of protecting access to the internet, noting the substantial benefits it provides:

“Social media allows users to gain access to information and communicate with one another about it on any subject that might come to mind. . . . By prohibiting sex offenders from using those websites, North Carolina with one broad stroke bars access to what for many are the principal sources for knowing current events, checking ads for employment, speaking and listening in the modern public square, and otherwise exploring the vast realms of human thought and knowledge. These websites can provide perhaps the most powerful mechanisms available to a private citizen to make his or her voice heard. They allow a person with an Internet connection to ‘become a town crier with a voice that resonates farther than it could from any soapbox.’”

CC BY-SA 2.0 image by shoobydooby

The Court’s broad view of the importance of the internet also prompted the Justices to recommend exercising caution before allowing restrictions on internet speech. As described by Justice Kennedy,

“While we now may be coming to the realization that the Cyber Age is a revolution of historic proportions, we cannot appreciate yet its full dimensions and vast potential to alter how we think, express ourselves, and define who we want to be. The forces and directions of the Internet are so new, so protean, and so far reaching that courts must be conscious that what they say today might be obsolete tomorrow.”

The broad scope of the Court’s ruling suggests that the Supreme Court will look carefully at any restrictions that hinder access to the internet.

Justice Alito’s Concerns About the Opinion’s Implications

In a separate decision setting forth the opinion of the remaining three justices, Justice Alito took issue with the broad sweep and implications of the majority opinion. Because the law would have precluded access to a significant number of websites like Amazon or the Washington Post without furthering the state’s interest in protecting children, Justice Alito agreed that the law violated the First Amendment.

Justice Alito observed, however, that “if the internet or even just ‘social media’ sites are the 21st century equivalent of public streets and parks, then States may have little ability to restrict the sites that may be visited by even the most dangerous sex offenders.” And indeed, this case -- particularly when read in conjunction with other First Amendment cases -- suggests that the Court would have serious concerns about future government restrictions on speech, access, and communication on the Internet.

We recognize, of course, that, regardless of the internet’s value as a critical locale for discussion and debate, there are bad things online. But, as the Court held yesterday, significant restrictions on access to the internet are simply not an appropriate -- or constitutional -- solution. This historic decision confirms U.S. commitment to the freedom of expression online.

Let’s hope that the Court’s broad recognition of the central importance of the internet, along with its concerns about the harmful impact of access restrictions, become a central theme in ongoing discussions about regulation and control of the Internet.

Categories: Technology

Counting things, a lot of different things…

Wed, 07/06/2017 - 13:47
Counting things, a lot of different things…

Back in April we announced Rate Limiting of requests for every Cloudflare customer. Being able to rate limit at the edge of the network has many advantages: it’s easier for customers to set up and operate, their origin servers are not bothered by excessive traffic or layer 7 attacks, the performance and memory cost of rate limiting is offloaded to the edge, and more.

In a nutshell, rate limiting works like this:

  • Customers can define one or more rate limit rules that match particular HTTP requests (failed login attempts, expensive API calls, etc.)

  • Every request that matches the rule is counted per client IP address

  • Once that counter exceeds a threshold, further requests are not allowed to reach the origin server and an error page is returned to the client instead

This is a simple yet effective protection against brute force attacks on login pages and other sorts of abusive traffic like L7 DoS attacks.

Doing this with possibly millions of domains and even more millions of rules immediately becomes a bit more complicated. This article is a look at how we implemented a rate limiter able to run quickly and accurately at the edge of the network which is able to cope with the colossal volume of traffic we see at Cloudflare.

Let’s just do this locally!

As the Cloudflare edge servers are running NGINX, let’s first see how the stock rate limiting module works:

http { limit_req_zone $binary_remote_addr zone=ratelimitzone:10m rate=15r/m; ... server { ... location /api/expensive_endpoint { limit_req zone=ratelimitzone; } } }

This module works great: it is reasonably simple to use (but requires a config reload for each change), and very efficient. The only problem is that if the incoming requests are spread across a large number of servers, this doesn’t work any more. The obvious alternative is to use some kind of centralized data store. Thanks to NGINX’s Lua scripting module, that we already use extensively, we could easily implement similar logic using any kind of central data backend.

But then another problem arises: how to make this fast and efficient?

All roads lead to Rome? Not with anycast!

Since Cloudflare has a vast and diverse network, reporting all counters to a single central point is not a realistic solution as the latency is far too high and guaranteeing the availability of the central service causes more challenges.

First let’s take a look at how the traffic is routed in the Cloudflare network. All the traffic going to our edge servers is anycast traffic. This means that we announce the same IP address for a given web application, site or API worldwide, and traffic will be automatically and consistently routed to the closest live data center.

Counting things, a lot of different things…

This property is extremely valuable: we are sure that, under normal conditions1, the traffic from a single IP address will always reach the same PoP. Unfortunately each new TCP connection might hit a different server inside that PoP. But we can still narrow down our problem: we can actually create an isolated counting system inside each PoP. This mostly solves the latency problem and greatly improves the availability as well.

Storing counters

At Cloudflare, each server in our edge network is as independent as possible to make their administration simple. Unfortunately for rate limiting, we saw that we do need to share data across many different servers.

We actually had a similar problem in the past with SSL session IDs: each server needed to fetch TLS connection data about past connections. To solve that problem we created a Twemproxy cluster inside each of our PoPs: this allows us to split a memcache2 database across many servers. A consistent hashing algorithm ensures that when the cluster is resized, only a few number of keys are hashed differently.

In our architecture, each server hosts a shard of the database. As we already had experience with this system, we wanted to leverage it for the rate limit as well.

Algorithms

Now let’s take a deeper look at how the different rate limit algorithms work. What we call the sampling period in the next paragraph is the reference unit of time for the counter (1 second for a 10 req/sec rule, 1 minute for a 600 req/min rule, ...).

The most naive implementation is to simply increment a counter that we reset at the start of each sampling period. This works but is not terribly accurate as the counter will be arbitrarily reset at regular intervals, allowing regular traffic spikes to go through the rate limiter. This can be a problem for resource intensive endpoints.

Another solution is to store the timestamp of every request and count how many were received during the last sampling period. This is more accurate, but has huge processing and memory requirements as checking the state of the counter require reading and processing a lot of data, especially if you want to rate limit over long period of time (for instance 5,000 req per hour).

The leaky bucket algorithm allows a great level of accuracy while being nicer on resources (this is what the stock NGINX module is using). Conceptually, it works by incrementing a counter when each request comes in. That same counter is also decremented over time based on the allowed rate of requests until it reaches zero. The capacity of the bucket is what you are ready to accept as “burst” traffic (important given that legitimate traffic is not always perfectly regular). If the bucket is full despite its decay, further requests are mitigated.

Counting things, a lot of different things…

However, in our case, this approach has two drawbacks:

  • It has two parameters (average rate and burst) that are not always easy to tune properly
  • We were constrained to use the memcached protocol and this algorithm requires multiple distinct operations that we cannot do atomically3

So the situation was that the only operations available were GET, SET and INCR (atomic increment).

Sliding windows to the rescue

Counting things, a lot of different things…CC BY-SA 2.0 image by halfrain

The naive fixed window algorithm is actually not that bad: we just have to solve the problem of completely resetting the counter for each sampling period. But actually, can’t we just use the information from the previous counter in order to extrapolate an accurate approximation of the request rate?

Let’s say I set a limit of 50 requests per minute on an API endpoint. The counter can be thought of like this:

Counting things, a lot of different things…

In this situation, I did 18 requests during the current minute, which started 15 seconds ago, and 42 requests during the entire previous minute. Based on this information, the rate approximation is calculated like this:

rate = 42 * ((60-15)/60) + 18 = 42 * 0.75 + 18 = 49.5 requests

One more request during the next second and the rate limiter will start being very angry!

This algorithm assumes a constant rate of requests during the previous sampling period (which can be any time span), this is why the result is only an approximation of the actual rate. This algorithm can be improved, but in practice it proved to be good enough:

  • It smoothes the traffic spike issue that the fixed window method has

  • It very easy to understand and configure: no average vs. burst traffic, longer sampling periods can be used to achieve the same effect

  • It is still very accurate, as an analysis on 400 million requests from 270,000 distinct sources shown:

    • 0.003% of requests have been wrongly allowed or rate limited
    • An average difference of 6% between real rate and the approximate rate
    • 3 sources have been allowed despite generating traffic slightly above the threshold (false negatives), the actual rate was less than 15% above the threshold rate
    • None of the mitigated sources was below the threshold (false positives)

Moreover, it offers interesting properties in our case:

  • Tiny memory usage: only two numbers per counter

  • Incrementing a counter can be done by sending a single INCR command

  • Calculating the rate is reasonably easy: one GET command4 and some very simple, fast math

So here we are: we can finally implement a good counting system using only a few memcache primitives and without much contention. Still we were not happy with that: it requires a memcached query to get the rate. At Cloudflare we’ve seen a few of the largest L7 attacks ever. We knew that large scale attacks would have crushed the memcached cluster like this. More importantly, such operations would slow down legitimate requests a little, even under normal conditions. This is not acceptable.

This is why the increment jobs are run asynchronously without slowing down the requests. If the request rate is above the threshold, another piece of data is stored asking all servers in the PoP to start applying the mitigation for that client. Only this bit of information is checked during request processing.

Counting things, a lot of different things…

Even more interesting: once a mitigation has started, we know exactly when it will end. This means that we can cache that information in the server memory itself. Once a server starts to mitigate a client, it will not even run another query for the subsequent requests it might see from that source!

This last tweak allowed us to efficiently mitigate large L7 attacks without noticeably penalizing legitimate requests.

Conclusion

Despite being a young product, the rate limiter is already being used by many customers to control the rate of requests that their origin servers receive. The rate limiter already handles several billion requests per day and we recently mitigated attacks with as many as 400,000 requests per second to a single domain without degrading service for legitimate users.

We just started to explore how we can efficiently protect our customers with this new tool. We are looking into more advanced optimizations and create new features on the top of the existing work.

Interested in working on high-performance code running on thousands of servers at the edge of the network? Consider applying to one of our open positions!

  1. The inner workings of anycast route changes are outside of the scope of this article, but we can assume that they are rare enough in this case.

  2. Twemproxy also supports Redis, but our existing infrastructure was backed by Twemcache (a Memcached fork)

  3. Memcache does support CAS (Compare-And-Set) operations and so optimistic transactions are possible, but it is hard to use in our case: during attacks, we will have a lot of requests, creating a lot of contention, in turn resulting in a lot of CAS transactions failing.

  4. The counters for the previous and current minute can be retrieved with a single GET command

Categories: Technology
Additional Terms