Blogroll: CloudFlare

I read blogs, as well as write one. The 'blogroll' on this site reproduces some posts from some of the people I enjoy reading. There are currently 17 posts from the blog 'CloudFlare.'

Disclaimer: Reproducing an article here need not necessarily imply agreement or endorsement!

Subscribe to CloudFlare feed
Cloudflare Blog
Updated: 13 min 33 sec ago

Incident report on memory leak caused by Cloudflare parser bug

Thu, 23/02/2017 - 23:01

Last Friday, Tavis Ormandy from Google’s Project Zero contacted Cloudflare to report a security problem with our edge servers. He was seeing corrupted web pages being returned by some HTTP requests run through Cloudflare.

It turned out that in some unusual circumstances, which I’ll detail below, our edge servers were running past the end of a buffer and returning memory that contained private information such as HTTP cookies, authentication tokens, HTTP POST bodies, and other sensitive data. And some of that data had been cached by search engines.

For the avoidance of doubt, Cloudflare customer SSL private keys were not leaked. Cloudflare has always terminated SSL connections through an isolated instance of NGINX that was not affected by this bug.

We quickly identified the problem and turned off three minor Cloudflare features (email obfuscation, Server-side Excludes and Automatic HTTPS Rewrites) that were all using the same HTML parser chain that was causing the leakage. At that point it was no longer possible for memory to be returned in an HTTP response.

Because of the seriousness of such a bug, a cross-functional team from software engineering, infosec and operations formed in San Francisco and London to fully understand the underlying cause, to understand the effect of the memory leakage, and to work with Google and other search engines to remove any cached HTTP responses.

Having a global team meant that, at 12 hour intervals, work was handed over between offices enabling staff to work on the problem 24 hours a day. The team has worked continuously to ensure that this bug and its consequences are fully dealt with. One of the advantages of being a service is that bugs can go from reported to fixed in minutes to hours instead of months. The industry standard time allowed to deploy a fix for a bug like this is usually three months; we were completely finished globally in under 7 hours with an initial mitigation in 47 minutes.

The bug was serious because the leaked memory could contain private information and because it had been cached by search engines. We have also not discovered any evidence of malicious exploits of the bug or other reports of its existence.

The greatest period of impact was from February 13 and February 18 with around 1 in every 3,300,000 HTTP requests through Cloudflare potentially resulting in memory leakage (that’s about 0.00003% of requests).

We are grateful that it was found by one of the world’s top security research teams and reported to us.

This blog post is rather long but, as is our tradition, we prefer to be open and technically detailed about problems that occur with our service.

Parsing and modifying HTML on the fly

Many of Cloudflare’s services rely on parsing and modifying HTML pages as they pass through our edge servers. For example, we can insert the Google Analytics tag, safely rewrite http:// links to https://, exclude parts of a page from bad bots, obfuscate email addresses, enable AMP, and more by modifying the HTML of a page.

To modify the page, we need to read and parse the HTML to find elements that need changing. Since the very early days of Cloudflare, we’ve used a parser written using Ragel. A single .rl file contains an HTML parser used for all the on-the-fly HTML modifications that Cloudflare performs.

About a year ago we decided that the Ragel parser had become too complex to maintain and we started to write a new parser, named cf-html, to replace it. This streaming parser works correctly with HTML5 and is much, much faster and easier to maintain.

We first used this new parser for the Automatic HTTP Rewrites feature and have been slowly migrating functionality that uses the old Ragel parser to cf-html.

Both cf-html and the old Ragel parser are implemented as NGINX modules compiled into our NGINX builds. These NGINX filter modules parse buffers (blocks of memory) containing HTML responses, make modifications as necessary, and pass the buffers onto the next filter.

It turned out that the underlying bug that caused the memory leak had been present in our Ragel-based parser for many years but no memory was leaked because of the way the internal NGINX buffers were used. Introducing cf-html subtly changed the buffering which enabled the leakage even though there were no problems in cf-html itself.

Once we knew that the bug was being caused by the activation of cf-html (but before we knew why) we disabled the three features that caused it to be used. Every feature Cloudflare ships has a corresponding feature flag, which we call a ‘global kill’. We activated the Email Obfuscation global kill 47 minutes after receiving details of the problem and the Automatic HTTPS Rewrites global kill 3h05m later. The Email Obfuscation feature had been changed on February 13 and was the primary cause of the leaked memory, thus disabling it quickly stopped almost all memory leaks.

Within a few seconds, those features were disabled worldwide. We confirmed we were not seeing memory leakage via test URIs and had Google double check that they saw the same thing.

We then discovered that a third feature, Server-Side Excludes, was also vulnerable and did not have a global kill switch (it was so old it preceded the implementation of global kills). We implemented a global kill for Server-Side Excludes and deployed a patch to our fleet worldwide. From realizing Server-Side Excludes were a problem to deploying a patch took roughly three hours. However, Server-Side Excludes are rarely used and only activated for malicious IP addresses.

Root cause of the bug

The Ragel code is converted into generated C code which is then compiled. The C code uses, in the classic C manner, pointers to the HTML document being parsed, and Ragel itself gives the user a lot of control of the movement of those pointers. The underlying bug occurs because of a pointer error.

/* generated code */ if ( ++p == pe ) goto _test_eof;

The root cause of the bug was that reaching the end of a buffer was checked using the equality operator and a pointer was able to step past the end of the buffer. This is known as a buffer overrun. Had the check been done using >= instead of == jumping over the buffer end would have been caught. The equality check is generated automatically by Ragel and was not part of the code that we wrote. This indicated that we were not using Ragel correctly.

The Ragel code we wrote contained a bug that caused the pointer to jump over the end of the buffer and past the ability of an equality check to spot the buffer overrun.

Here’s a piece of Ragel code used to consume an attribute in an HTML <script> tag. The first line says that it should attempt to find zero of more unquoted_attr_char followed by (that’s the :>> concatenation operator) whitespace, forward slash or then > signifying the end of the tag.

script_consume_attr := ((unquoted_attr_char)* :>> (space|'/'|'>')) >{ ddctx("script consume_attr"); } @{ fhold; fgoto script_tag_parse; } $lerr{ dd("script consume_attr failed"); fgoto script_consume_attr; };

If an attribute is well-formed, then the Ragel parser moves to the code inside the @{ } block. If the attribute fails to parse (which is the start of the bug we are discussing today) then the $lerr{ } block is used.

For example, in certain circumstances (detailed below) if the web page ended with a broken HTML tag like this:

<script type=

the $lerr{ } block would get used and the buffer would be overrun. In this case the $lerr does dd(“script consume_attr failed”); (that’s a debug logging statement that is a nop in production) and then does fgoto script_consume_attr; (the state transitions to script_consume_attr to parse the next attribute).
From our statistics it appears that such broken tags at the end of the HTML occur on about 0.06% of websites.

If you have a keen eye you may have noticed that the @{ } transition also did a fgoto but right before it did fhold and the $lerr{ } block did not. It’s the missing fhold that resulted in the memory leakage.

Internally, the generated C code has a pointer named p that is pointing to the character being examined in the HTML document. fhold is equivalent to p-- and is essential because when the error condition occurs p will be pointing to the character that caused the script_consume_attr to fail.

And it’s doubly important because if this error condition occurs at the end of the buffer containing the HTML document then p will be after the end of the document (p will be pe + 1 internally) and a subsequent check that the end of the buffer has been reached will fail and p will run outside the buffer.

Adding an fhold to the error handler fixes the problem.

Why now

That explains how the pointer could run past the end of the buffer, but not why the problem suddenly manifested itself. After all, this code had been in production and stable for years.

Returning to the script_consume_attr definition above:

script_consume_attr := ((unquoted_attr_char)* :>> (space|'/'|'>')) >{ ddctx("script consume_attr"); } @{ fhold; fgoto script_tag_parse; } $lerr{ dd("script consume_attr failed"); fgoto script_consume_attr; };

What happens when the parser runs out of characters to parse while consuming an attribute differs whether the buffer currently being parsed is the last buffer or not. If it’s not the last buffer, then there’s no need to use $lerr as the parser doesn’t know whether an error has occurred or not as the rest of the attribute may be in the next buffer.

But if this is the last buffer, then the $lerr is executed. Here’s how the code ends up skipping over the end-of-file and running through memory.

The entry point to the parsing function is ngx_http_email_parse_email (the name is historical, it does much more than email parsing).

ngx_int_t ngx_http_email_parse_email(ngx_http_request_t *r, ngx_http_email_ctx_t *ctx) { u_char *p = ctx->pos; u_char *pe = ctx->buf->last; u_char *eof = ctx->buf->last_buf ? pe : NULL;

You can see that p points to the first character in the buffer, pe to the character after the end of the buffer and eof is set to pe if this is the last buffer in the chain (indicated by the last_buf boolean), otherwise it is NULL.

When the old and new parsers are both present during request handling a buffer such as this will be passed to the function above:

(gdb) p *in->buf $8 = { pos = 0x558a2f58be30 "<script type=\"", last = 0x558a2f58be3e "", [...] last_buf = 1, [...] }

Here there is data and last_buf is 1. When the new parser is not present the final buffer that contains data looks like this:

(gdb) p *in->buf $6 = { pos = 0x558a238e94f7 "<script type=\"", last = 0x558a238e9504 "", [...] last_buf = 0, [...] }

A final empty buffer (pos and last both NULL and last_buf = 1) will follow that buffer but ngx_http_email_parse_email is not invoked if the buffer is empty.

So, in the case where only the old parser is present, the final buffer that contains data has last_buf set to 0. That means that eof will be NULL. Now when trying to handle script_consume_attr with an unfinished tag at the end of the buffer the $lerr will not be executed because the parser believes (because of last_buf) that there may be more data coming.

The situation is different when both parsers are present. last_buf is 1, eof is set to pe and the $lerr code runs. Here’s the generated code for it:

/* #line 877 "ngx_http_email_filter_parser.rl" */ { dd("script consume_attr failed"); {goto st1266;} } goto st0; [...] st1266: if ( ++p == pe ) goto _test_eof1266;

The parser runs out of characters while trying to perform script_consume_attr and p will be pe when that happens. Because there’s no fhold (that would have done p--) when the code jumps to st1266 p is incremented and is now past pe.

It then won’t jump to _test_eof1266 (where EOF checking would have been performed) and will carry on past the end of the buffer trying to parse the HTML document.

So, the bug had been dormant for years until the internal feng shui of the buffers passed between NGINX filter modules changed with the introduction of cf-html.

Going bug hunting

Research by IBM in the 1960s and 1970s showed that bugs tend to cluster in what became known as “error-prone modules”. Since we’d identified a nasty pointer overrun in the code generated by Ragel it was prudent to go hunting for other bugs.

Part of the infosec team started fuzzing the generated code to look for other possible pointer overruns. Another team built test cases from malformed web pages found in the wild. A software engineering team began a manual inspection of the generated code looking for problems.

At that point it was decided to add explicit pointer checks to every pointer access in the generated code to prevent any future problem and to log any errors seen in the wild. The errors generated were fed to our global error logging infrastructure for analysis and trending.

#define SAFE_CHAR ({\ if (!__builtin_expect(p < pe, 1)) {\ ngx_log_error(NGX_LOG_CRIT, r->connection->log, 0, "email filter tried to access char past EOF");\ RESET();\ output_flat_saved(r, ctx);\ BUF_STATE(output);\ return NGX_ERROR;\ }\ *p;\ })

And we began seeing log lines like this:

2017/02/19 13:47:34 [crit] 27558#0: *2 email filter tried to access char past EOF while sending response to client, client: 127.0.0.1, server: localhost, request: "GET /malformed-test.html HTTP/1.1”

Every log line indicates an HTTP request that could have leaked private memory. By logging how often the problem was occurring we hoped to get an estimate of the number of times HTTP request had leaked memory while the bug was present.

In order for the memory to leak the following had to be true:

The final buffer containing data had to finish with a malformed script or img tag
The buffer had to be less than 4k in length (otherwise NGINX would crash)
The customer had to either have Email Obfuscation enabled (because it uses both the old and new parsers as we transition),
… or Automatic HTTPS Rewrites/Server Side Excludes (which use the new parser) in combination with another Cloudflare feature that uses the old parser. … and Server-Side Excludes only execute if the client IP has a poor reputation (i.e. it does not work for most visitors).

That explains why the buffer overrun resulting in a leak of memory occurred so infrequently.

Additionally, the Email Obfuscation feature (which uses both parsers and would have enabled the bug to happen on the most Cloudflare sites) was only enabled on February 13 (four days before Tavis’ report).

The three features implicated were rolled out as follows. The earliest date memory could have leaked is 2016-09-22.

2016-09-22 Automatic HTTP Rewrites enabled
2017-01-30 Server-Side Excludes migrated to new parser
2017-02-13 Email Obfuscation partially migrated to new parser
2017-02-18 Google reports problem to Cloudflare and leak is stopped

The greatest potential impact occurred for four days starting on February 13 because Automatic HTTP Rewrites wasn’t widely used and Server-Side Excludes only activate for malicious IP addresses.

Internal impact of the bug

Cloudflare runs multiple separate processes on the edge machines and these provide process and memory isolation. The memory being leaked was from a process based on NGINX that does HTTP handling. It has a separate heap from processes doing SSL, image re-compression, and caching, which meant that we were quickly able to determine that SSL private keys belonging to our customers could not have been leaked.

However, the memory space being leaked did still contain sensitive information. One obvious piece of information that had leaked was a private key used to secure connections between Cloudflare machines.

When processing HTTP requests for customers’ web sites our edge machines talk to each other within a rack, within a data center, and between data centers for logging, caching, and to retrieve web pages from origin web servers.

In response to heightened concerns about surveillance activities against Internet companies, we decided in 2013 to encrypt all connections between Cloudflare machines to prevent such an attack even if the machines were sitting in the same rack.

The private key leaked was the one used for this machine to machine encryption. There were also a small number of secrets used internally at Cloudflare for authentication present.

External impact and cache clearing

More concerning was that fact that chunks of in-flight HTTP requests for Cloudflare customers were present in the dumped memory. That meant that information that should have been private could be disclosed.

This included HTTP headers, chunks of POST data (perhaps containing passwords), JSON for API calls, URI parameters, cookies and other sensitive information used for authentication (such as API keys and OAuth tokens).

Because Cloudflare operates a large, shared infrastructure an HTTP request to a Cloudflare web site that was vulnerable to this problem could reveal information about an unrelated other Cloudflare site.

An additional problem was that Google (and other search engines) had cached some of the leaked memory through their normal crawling and caching processes. We wanted to ensure that this memory was scrubbed from search engine caches before the public disclosure of the problem so that third-parties would not be able to go hunting for sensitive information.

Our natural inclination was to get news of the bug out as quickly as possible, but we felt we had a duty of care to ensure that search engine caches were scrubbed before a public announcement.

The infosec team worked to identify URIs in search engine caches that had leaked memory and get them purged. With the help of Google, Yahoo, Bing and others, we found 770 unique URIs that had been cached and which contained leaked memory. Those 770 unique URIs covered 161 unique domains. The leaked memory has been purged with the help of the search engines.

We also undertook other search expeditions looking for potentially leaked information on sites like Pastebin and did not find anything.

Some lessons

The engineers working on the new HTML parser had been so worried about bugs affecting our service that they had spent hours verifying that it did not contain security problems.

Unfortunately, it was the ancient piece of software that contained a latent security problem and that problem only showed up as we were in the process of migrating away from it. Our internal infosec team is now undertaking a project to fuzz older software looking for potential other security problems.

Detailed Timeline

We are very grateful to our colleagues at Google for contacting us about the problem and working closely with us through its resolution. All of which occurred without any reports that outside parties had identified the issue or exploited it.

All times are UTC.

2017-02-18 0011 Tweet from Tavis Ormandy asking for Cloudflare contact information
2017-02-18 0032 Cloudflare receives details of bug from Google
2017-02-18 0040 Cross functional team assembles in San Francisco
2017-02-18 0119 Email Obfuscation disabled worldwide
2017-02-18 0122 London team joins
2017-02-18 0424 Automatic HTTPS Rewrites disabled worldwide
2017-02-18 0722 Patch implementing kill switch for cf-html parser deployed worldwide

2017-02-20 2159 SAFE_CHAR fix deployed globally

2017-02-21 1803 Automatic HTTPS Rewrites, Server-Side Excludes and Email Obfuscation re-enabled worldwide

NOTE: This post was updated to reflect updated information.

Categories: Technology

LuaJIT Hacking: Getting next() out of the NYI list

Tue, 21/02/2017 - 13:40

At Cloudflare we’re heavy users of LuaJIT and in the past have sponsored many improvements to its performance.

LuaJIT is a powerful piece of software, maybe the highest performing JIT in the industry. But it’s not always easy to get the most out of it, and sometimes a small change in one part of your code can negatively impact other, already optimized, parts.

One of the first pieces of advice anyone receives when writing Lua code to run quickly using LuaJIT is “avoid the NYIs”: the language or library features that can’t be compiled because they’re NYI (not yet implemented). And that means they run in the interpreter.

CC BY-SA 2.0 image by Dwayne Bent

Another very attractive feature of LuaJIT is the FFI library, which allows Lua code to directly interface with C code and memory structures. The JIT compiler weaves these memory operations in line with the generated machine language, making it much more efficient than using the traditional Lua C API.

Unfortunately, if for any reason the Lua code using the FFI library has to run under the interpreter, it takes a very heavy performance hit. As it happens, under the interpreter the FFI is usually much slower than the Lua C API or the basic operations. For many people, this means either avoiding the FFI or committing to a permanent vigilance to maintain the code from falling back to the interpreter.

Optimizing LuaJIT Code

Before optimizing any code, it’s important to identify which parts are actually important. It’s useless to discuss what’s the fastest way to add a few numbers before sending some data, if the send operation will take a million times longer than that addition. Likewise, there’s no benefit avoiding NYI features in code like initialization routines that might run only a few times, as it’s unlikely that the JIT would even try to optimize them, so they would always run in the interpreter. Which, by the way, is also very fast; even faster than the first version of LuaJIT itself.

But optimizing the core parts of a Lua program, like any deep inner loops, can yield huge improvements in the overall performance. In similar situations, experienced developers using other languages are used to inspecting the assembly language generated by the compiler, to see if there’s some change to the source code that can make the result better.

The command line LuaJIT executable provides a bytecode list when running with the -jbc option, a statistical profiler, activated with the -jp option, a trace list with -jv, and finally a detailed dump of all the JIT operations with -jdump.

The last two provide lots of information very useful for understanding what actually happens with the Lua code while executing, but it can be a lot of work to read the huge lists generated by -jdump. Also, some messages are hard to understand without a fairly complete understanding of how the tracing compiler in LuaJIT actually works.

One very nice feature is that all these JIT options are implemented in Lua. To accomplish this the JIT provides ‘hooks’ that can execute a Lua function at important moments with the relevant information. Sometimes the best way to understand what some -jdump output actually means is to read the code that generated that specific part of the output.

CC BY 2.0 image by Kevan

Introducing Loom

After several rounds there, and being frustrated by the limitations of the sequentially-generated dump, I decided to write a different version of -jdump, one that gathered more information to process and add cross-references to help see how things are related before displaying. The result is loom, which shows roughly the same information as -jdump, but with more resolved references and formatted in HTML with tables, columns, links and colors. It has helped me a lot to understand my own code and the workings of LuaJIT itself.

For example, let's consider the following code in a file called twoloops.lua:

for i=1,1000 do for j=1,1000 do end end

With the -jv option:

$ luajit -jv twoloops.lua [TRACE 1 twoloops.lua:2 loop] [TRACE 2 (1/3) twoloops.lua:1 -> 1]

This tells us that there were two traces, the first one contains a loop, and the second one spawns from exit #3 of the other (the “(1/3)” part) and it’s endpoint returns to the start of trace #1.

Ok, let’s get more detail with -jdump:

$ luajit -jdump twoloops.lua ---- TRACE 1 start twoloops.lua:2 0009 FORL 4 => 0009 ---- TRACE 1 IR 0001 int SLOAD #5 CI 0002 + int ADD 0001 +1 0003 > int LE 0002 +1000 0004 ------ LOOP ------------ 0005 + int ADD 0002 +1 0006 > int LE 0005 +1000 0007 int PHI 0002 0005 ---- TRACE 1 mcode 47 0bcbffd1 mov dword [0x40db1410], 0x1 0bcbffdc cvttsd2si ebp, [rdx+0x20] 0bcbffe1 add ebp, +0x01 0bcbffe4 cmp ebp, 0x3e8 0bcbffea jg 0x0bcb0014 ->1 ->LOOP: 0bcbfff0 add ebp, +0x01 0bcbfff3 cmp ebp, 0x3e8 0bcbfff9 jle 0x0bcbfff0 ->LOOP 0bcbfffb jmp 0x0bcb001c ->3 ---- TRACE 1 stop -> loop ---- TRACE 2 start 1/3 twoloops.lua:1 0010 FORL 0 => 0005 0005 KSHORT 4 1 0006 KSHORT 5 1000 0007 KSHORT 6 1 0008 JFORI 4 => 0010 ---- TRACE 2 IR 0001 num SLOAD #1 I 0002 num ADD 0001 +1 0003 > num LE 0002 +1000 ---- TRACE 2 mcode 81 0bcbff79 mov dword [0x40db1410], 0x2 0bcbff84 movsd xmm6, [0x41704068] 0bcbff8d movsd xmm5, [0x41704078] 0bcbff96 movsd xmm7, [rdx] 0bcbff9a addsd xmm7, xmm6 0bcbff9e ucomisd xmm5, xmm7 0bcbffa2 jb 0x0bcb0014 ->1 0bcbffa8 movsd [rdx+0x38], xmm6 0bcbffad movsd [rdx+0x30], xmm6 0bcbffb2 movsd [rdx+0x28], xmm5 0bcbffb7 movsd [rdx+0x20], xmm6 0bcbffbc movsd [rdx+0x18], xmm7 0bcbffc1 movsd [rdx], xmm7 0bcbffc5 jmp 0x0bcbffd1 ---- TRACE 2 stop -> 1

This tells us... well, a lot of things. If you look closely, you’ll see the same two traces, one is a loop, the second starts at 1/3 and returns to trace #1. Each one shows some bytecode instructions, an IR listing, and the final mcode. There are several options to turn on and off each listing, and more info like the registers allocated to some IR instructions, the “snapshot” structures that allow the interpreter to continue when a compiled trace exits, etc.

Now using loom:

There’s the source code, with the corresponding bytecodes, and the same two traces, with IR and mcode listings. The bytecode lines on the traces and on the top listings are linked, hovering on some arguments on the IR listing highlights the source and use of each value, the jumps between traces are correctly labeled (and colored), finally, clicking on the bytecode or IR column headers reveals more information: excerpts from the source code and snapshot formats, respectively.

Writing it was a great learning experience, I had to read the dump script’s Lua sources and went much deeper in the LuaJIT sources than ever before. And then, I was able to use loom not only to analyze and optimize Cloudflare’s Lua code, but also to watch the steps the compiler goes through to make it run fast, and also what happens when it’s not happy.

The code is the code is the code is the code

LuaJIT handles up to four different representation of a program’s code:

First comes the source code, what the developer writes.

The parser analyzes the source code and produces the Bytecode, which is what the interpreter actually executes. It has the same flow of the source code, grouped in functions, with all the calls, iterators, operations, etc. Of course, there’s no nice formatting, comments, the local variable names are replaced by indices, and all constants (other than small numbers) are stored in a separate area.

When the interpreter finds that a given point of the bytecode has been repeated several times, it’s considered a “hot” part of the code, and interprets it once again but this time it records each bytecode it encounters, generating a “code trace” or just “a trace”. At the same time, it generates an “intermediate representation”, or IR, of the code as it’s executed. The IR doesn’t represent the whole of the function or code portion, just the actual options it actually takes.

A trace is finished when it hits a loop or a recursion, returns to a lower level than when started, hits a NYI operation, or simply becomes too long. At this point, it can be either compiled into machine language, or aborted if it has reached some code that can’t be correctly translated. If successful, the bytecode is patched with an entry to the machine code, or “mcode”. If aborted, the initial trace point is “penalized” or even “blacklisted” to avoid wasting time trying to compile it again.

What’s next()?

One of the most visible characteristics of the Lua language is the heavy use of dictionary objects called tables. From the Lua manual:

“Tables are the sole data structuring mechanism in Lua; they can be used to represent ordinary arrays, symbol tables, sets, records, graphs, trees, etc.”

To iterate over all the elements in a table, the idiomatic way is to use the standard library function pairs() like this:

for k, v in pairs(t) do -- use the key in ‘k’ and the value in ‘v’ end

In the standard Lua manual, pairs() is defined as “Returns three values: the next function, the table t, and nil”, so the previous code is the same as:

for k, v in next, t, nil do -- use the key in ‘k’ and the value in ‘v’ end

But unfortunately, both the next() and pairs() functions are listed as “not compiled” in the feared NYI list. That means that any such code runs on the interpreter and is not compiled, unless the code inside is complex enough, and has other inner loops (loops that doesn’t use next() or pairs(), of course). Even in that case, the code would have to fall back to the interpreter at each loop end.

This sad news creates a tradeoff: for performance sensitive parts of the code, don’t use the most Lua-like code style. That motivates people to come up with several contortions to be able to use numerical iteration (which is compiled, and very efficient), like replacing any key with a number, storing all the keys in a numbered array, or store both keys and values at even/odd numeric indices.

Getting next() out of the NYI list

So, I finally have a non-NYI next() function! I'd like to say "a fully JITtable next() function", but it wouldn't be totally true; as it happens, there's no way to avoid some annoying trace exits on table iteration.

The purpose of the IR is to provide a representation of the execution path so it can be quickly optimized to generate the final mcode. For that, the IR traces are linear and type-specific; creating some interesting challenges for iteration on a generic container.

Traces are linear

Being linear means that each trace captures a single execution path, it can't contain conditional code or internal jumps. The only conditional branches are the "guards" that make sure that the code to be executed is the appropriate one. If a condition changes and it must now do something different, the trace must be exited. If it happens several times, it will spawn a side trace and the exit will be patched into a conditional branch. Very nice, but this still means that there can be at most one loop on each trace.

The implementation of next() has to internally skip over empty slots in the table to only return valid key/value pairs. If we try to express this in IR code, this would be the "inner" loop and the original loop would be an "outer" one, which doesn't have as much optimization opportunities. In particular, it can't hoist invariable code out of the loop.

The solution is to do that slot skipping in C. Not using the Lua C API, of course, but the inner IR CALL instruction that is compiled into a "fast" call, using CPU registers for arguments as much as possible.

The IR is in Type-specific SSA form

The SSA form (Static Single Assignment) is key for many data flow analysis heuristics that allow quick optimizations like dead code removal, allocation sinking, type narrowing, strength reduction, etc. In LuaJIT's IR it means every instruction is usable as a value for subsequent instructions and has a declared type, fixed at the moment when the trace recorder emits this particular IR instruction. In addition, every instruction can be a type guard, if the arguments are not of the expected type the trace will be exited.

Lua is dynamically typed, every value is tagged with type information so the bytecode interpreter can apply the correct operations on it. This allows us to have variables and tables that can contain and pass around any kind of object without changing the source code. Of course, this requires the interpreter to be coded very "defensively", to consider all valid ramifications of every instruction, limiting the possibility of optimizations. The IR traces, on the other hand, are optimized for a single variation of the code, and deal with only the value types that are actually observed while executing.

For example, this simple code creates a 1,000 element array and then copies to another table:

local t,t2 = {},{} for i=1,1000 do t[i] = i end for i,v in ipairs(t) do t2[i]=v end

resulting in this IR for the second loop, the one that does the copy:

0023 ------------ LOOP ------------ 0024 num CONV 0017 num.int 0025 > int ABC 0005 0017 0026 p32 AREF 0007 0017 0027 num ASTORE 0026 0022 0028 rbp + int ADD 0017 +1 0029 > int ABC 0018 0028 0030 p32 AREF 0020 0028 0031 xmm7 >+ num ALOAD 0030 0032 xmm7 num PHI 0022 0031 0033 rbp int PHI 0017 0028 0034 rbx nil RENAME 0017 #3 0035 xmm6 nil RENAME 0022 #2

Here we see the ALOAD in instruction 0031 assures that the value loaded from the table is in effect a number. If it happens to be any other value, the guard fails and the trace is exited.

But if we do an array of strings instead of numbers?

a small change:

local t,t2 = {},{} for i=1,1000 do t[i] = 's'..i end for i,v in ipairs(t) do t2[i]=v end

gives us this:

0024 ------------ LOOP ------------ 0025 num CONV 0018 num.int 0026 > int ABC 0005 0018 0027 p32 AREF 0007 0018 0028 str ASTORE 0027 0023 0029 rbp + int ADD 0018 +1 0030 > int ABC 0019 0029 0031 p32 AREF 0021 0029 0032 rbx >+ str ALOAD 0031 0033 rbx str PHI 0023 0032 0034 rbp int PHI 0018 0029 0035 r15 nil RENAME 0018 #3 0036 r14 nil RENAME 0023 #2

It's the same code, but the type that ALOAD is guarding is now a string (and it now uses a different register, I guess a vector register isn't appropriate for a string pointer).

And if the table has a values of a mix of types?

local t,t2={},{} for i=1,1000,2 do t[i], t[i+1] = i, 's'..i end for i,v in ipairs(t) do t2[i]=v end 0031 ------------ LOOP ------------ 0032 num CONV 0027 num.int 0033 > int ABC 0005 0027 0034 p32 AREF 0007 0027 0035 str ASTORE 0034 0030 0036 r15 int ADD 0027 +1 0037 > int ABC 0019 0036 0038 p32 AREF 0021 0036 0039 xmm7 > num ALOAD 0038 0040 > int ABC 0005 0036 0041 p32 AREF 0007 0036 0042 num ASTORE 0041 0039 0043 rbp + int ADD 0027 +2 0044 > int ABC 0019 0043 0045 p32 AREF 0021 0043 0046 rbx >+ str ALOAD 0045 0047 rbx str PHI 0030 0046 0048 rbp int PHI 0027 0043

Now there are two ALOADs, (and two ASTOREs), one for 'num' and one for 'str'. In other words, the JIT unrolled the loop and found that that made the types constant. =8-O

Of course, this would happen only on very simple and regular patterns. In general, it's wiser to avoid unpredictable type mixing; but polymorphic code will be optimized for each type that it's actually used with.

Back to next()

First let's see the current implementation of next() as used by the interpreter:

lj_tab.c /* Advance to the next step in a table traversal. */ int lj_tab_next(lua_State *L, GCtab *t, TValue *key) { uint32_t i = keyindex(L, t, key); /* Find predecessor key index. */ for (i++; i < t->asize; i++) /* First traverse the array keys. */ if (!tvisnil(arrayslot(t, i))) { setintV(key, i); copyTV(L, key+1, arrayslot(t, i)); return 1; } for (i -= t->asize; i <= t->hmask; i++) { /* Then traverse the hash keys. */ Node *n = &noderef(t->node)[i]; if (!tvisnil(&n->val)) { copyTV(L, key, &n->key); copyTV(L, key+1, &n->val); return 1; } } return 0; /* End of traversal. */ }

It takes the input key as a TValue pointer and calls keyindex(). This helper function searches for the key in the table and returns an index; if the key is an integer in the range of the array part, the index is the key itself. If not, it performs a hash query and returns the index of the Node, offset by the array size, if successful, or signals an error if not found (it's an error to give a nonexistent key to next()).

Back at lj_tab_next(), the index is first incremented, and if it's still within the array, it's iterated over any hole until a non-nil value is found. If it wasn't in the array (or there’s no next value there), it performs a similar "skip the nils" on the Node table.

The new lj_record_next() function in lj_record.c, like some other record functions there, first checks not only the input parameters, but also the return values to generate the most appropriate code for this specific iteration, assuming that it will likely be optimal for subsequent iterations. Of course, any such assumption must be backed by the appropriate guard.

For next(), we choose between two different forms, if the return key is in the array part, then it uses lj_tab_nexta(), which takes the input key as an integer and returns the next key, also as an integer, in the rax register. We don't do the equivalent to the keyindex() function, just check (with a guard) that the key is within the bounds of the array:

lj_tab.c /* Get the next array index */ MSize LJ_FASTCALL lj_tab_nexta(GCtab *t, MSize k) { for (k++; k < t->asize; k++) if (!tvisnil(arrayslot(t, k))) break; return k; }

The IR code looks like this:

0014 r13 int FLOAD 0011 tab.asize 0015 rsi > int CONV 0012 int.num 0017 rax + int CALLL lj_tab_nexta (0011 0015) 0018 > int ABC 0014 0017 0019 r12 p32 FLOAD 0011 tab.array 0020 p32 AREF 0019 0017 0021 [8] >+ num ALOAD 0020

Clearly, the CALL itself (at 0017) is typed as 'int', as natural for an array key; and the ALOAD (0021) is 'num', because that's what the first few values happened to be.

When we finish with the array part, the bounds check (instruction ABC on 0018) would fail and soon new IR would be generated. This time we use the lj_tab_nexth() function.

lj_tab.c LJ_FUNCA const Node *LJ_FASTCALL lj_tab_nexth(lua_State *L, GCtab *t, const Node *n) { const Node *nodeend = noderef(t->node)+t->hmask; for (n++; n <= nodeend; n++) { if (!tvisnil(&n->val)) { return n; } } return &G(L)->nilnode; }

But before doing the "skip the nils", we need to do a hash query to find the initial Node entry. Fortunately, the HREF IR instruction does that: This is the IR:

0014 rdx p32 HREF 0011 0012 0016 r12 p32 CALLL lj_tab_nexth (0011 0014) 0017 rax >+ str HKLOAD 0016 0018 [8] >+ num HLOAD 0016

There's a funny thing here: HREF is supposed to return a reference to a value in the hash table, and the last argument in lj_tab_nexth() is a Node pointer. Let's see the Node definition:

lj_obj.h /* Hash node. */ typedef struct Node { TValue val; /* Value object. Must be first field. */ TValue key; /* Key object. */ MRef next; /* Hash chain. */ #if !LJ_GC64 MRef freetop; /* Top of free elements (stored in t->node[0]). */ #endif } Node;

Ok... the value is the first field, and it says right there "Must be first field". Looks like it's not the first place with some hand-wavy pointer casts.

The return value of lj_tab_next() is a Node pointer, which can likewise be implicitly cast by HLOAD to get the value. To get the key, I added the HKLOAD instruction. Both are guarding for the expected types of the value and key, respectively.

Let's take it for a spin

So, how does it perform? These tests do a thousand loops over a 10,000 element table, first using next() and then pairs(), with a simple addition in the inner loop. To get pairs() compiled, I just disabled the ISNEXT/ITERN optimization, so it actually uses next(). In the third test the variable in the addition is initialized to 0ULL instead of just 0, triggering the use of FFI.

First test is with all 10,000 elements on sequential integers, making the table a valid sequence, so ipairs() (which is already compiled) can be used just as well:

So, compiled next() is quite a lot faster, but the pairs() optimization in the interpreter is very fast. On the other hand, the smallest smell of FFI completely trashes interpreter performance, while making compiled code slightly tighter. Finally, ipairs() is faster, but a big part of it is because it stops on the first nil, while next() has to skip over every nil at the end of the array, which by default can be up to twice as big as the sequence itself.

Now with 5,000 (sequential) integer keys and 5,000 string keys. Of course, we can't use ipairs() here:

Roughly the same pattern: the compiled next() performance is very much the same on the three forms (used directly, under pairs() and with FFI code), while the interpreter benefits from the pairs() optimization and almost dies with FFI. In this case, the interpreted pairs() actually surpasses the compiled next() performance, hinting that separately optimizing pairs() is still desirable.

A big factor in the interpreter pairs() is that it doesn't use next(); instead it directly drives the loop with a hidden variable to iterate in the Node table without having to perform a hash lookup on every step.

Repeating that in a compiled pairs() would be equally beneficial; but has to be done carefully to maintain compatibility with the interpreter. On any trace exit the interpreter would kick in and must be able to seamlessly continue iterating. For that, the rest of the system has to be aware of that hidden variable.

The best part of this is that we have lots of very challenging, yet deeply rewarding, work ahead of us! Come work for us on making LuaJIT faster and more.

Categories: Technology

You can now use Google Authenticator and any TOTP app for Two-Factor Authentication

Thu, 16/02/2017 - 21:52

Since the very beginning, Cloudflare has offered two-factor authentication with Authy, and starting today we are expanding your options to keep your account safe with Google Authenticator and any Time-based One Time Password (TOTP) app of your choice.

If you want to get started right away, visit your account settings. Setting up Two-Factor with Google Authenticator or with any TOTP app is easy - just use the app to scan the barcode you see in the Cloudflare dashboard, enter the code the app returns, and you’re good to go.

Importance of Two-Factor Authentication

Often when you hear that an account was ‘hacked’, it really means that the password was stolen.

If the media stopped saying 'hacking' and instead said 'figured out their password', people would take password security more seriously.

— Khalil Sehnaoui (@sehnaoui) January 5, 2017

Two-Factor authentication is sometimes thought of as something that should be used to protect important accounts, but the best practice is to always enable it when it is available. Without a second factor, any mishap involving your password can lead to a compromise. Journalist Mat Honan’s high profile compromise in 2012 is a great example of the importance of two-factor authentication. When he later wrote about the incident he said, "Had I used two-factor authentication for my Google account, it’s possible that none of this would have happened."

What is a TOTP app?

TOTP (Time-based One Time Password) is the mechanism that Google Authenticator, Authy and other two-factor authentication apps use to generate short-lived authentication codes. We’ve written previously on the blog about how TOTP works.

We didn’t want to limit you to only using two-factor providers that we'd built integrations with, so we built an open TOTP integration in the Cloudflare dashboard, allowing you to set up two-factor with any app that implements TOTP. That means you can choose from a wide array of apps for logging into Cloudflare securely with two-factor such as Symantec, Duo Mobile and 1Password.

Get Started

If you want to enable Two-Factor Authentication with Google Authenticator or any other TOTP provider, visit your account settings here. It’s easy to set up and the best way to secure your account. We also have step by step instructions for you in our knowledge base.

Categories: Technology

Discovering Great Talent with Path Forward

Wed, 15/02/2017 - 19:20

Cloudflare's Path Forward Candidates with Janet

In the fall of 2016, I was just beginning my job search. I’d been lucky to lead HR at a number of great cutting-edge technology start-ups, and I was looking for my next adventure. I wanted to find a company that wasn’t just a great business--I wanted one that was also making a positive impact on the world, and one that had a mission I felt passionately about.

During my two decades running HR/People organizations, I’ve spent a lot of time working with--and talking to--parents in the workplace. I’ve been motivated to do so for a few reasons. According to the US census, mothers are the fastest-growing segment of the US workforce. Companies struggle to retain talented workers after they’ve become parents, especially mothers. It’s been reported that 43 percent of highly qualified women with children leave their careers. Millennials (who make up the majority of the US workforce) are reporting that they want to be more engaged parents and are placing a high value on companies that allow them to parent and still get promoted. Ultimately, I’ve come to believe that the skills you acquire while parenting are extremely relevant and valuable to the workforce.

So when Path Forward announced its launch partners in 2016, I read about the participating companies with great interest. And that is where I discovered Cloudflare! It immediately went to the top of my short list as I knew a company that valued a partnership with Path Forward was aligned with my values.

Path Forward is a nonprofit organization that aims to empower women and men to return to the paid workforce after taking two or more years away from their career to focus on caregiving. This could mean taking two years off to care for a child, or taking multiple years off to care for an elderly family member. Everyone in this program has put their careers on hold to care for the ones they love.

Candidates apply for various roles, undergo a series of interviews, and if selected, participate in an 18-week returnship. The goal, for both candidates and participating companies, is to ultimately hire the candidates for full-time employment.

At Cloudflare, we’re focused on helping to build a better Internet, and do to that, we need the best and brightest. Sometimes that means hiring people who have tenure and skills in a very specific field, and other times, that means bringing in people who can adapt quickly and think critically, contributing to both our culture and our company mission.

Path Forward & Cloudflare

Gloria Mancu re-entered the workforce as a tech support engineer at Cloudflare after taking 10 years off to care for her son. She had initially applied to Cloudflare after seeing a job opening online, and was later integrated into the Path Forward program because of its emphasis on returnships. “Being an intern is a tremendous opportunity, because you get a feeling for the group, the company, and the culture, firsthand. On the other hand, the employer gets to know you, so it goes both ways.”

The Path Forward program is indeed a two-way street. Yes, it helps people return to the workforce, but participants also bring a ton of value to their respective companies. Men and women who’ve taken time off to care for their families bring the kind of maturity and professionalism that only come with life experience.

Wanda Chiu, a software engineer on our Edge team, took time off initially to care for her ailing mother. She later decided to start a family and wanted to be there to watch her kids grow up. Fifteen years later, she says that Path Forward has helped her comfortably transition back into the workforce. “I wasn’t sure I was qualified to apply for software engineering positions, because the industry has adapted so much in the last 15 years and there are so many new tools,” she says. “Cloudflare was willing to give me the time to pick up the new skills I needed to succeed in this software engineering role and contribute to the team.”

Path Forward Quote It’s so crucial to note that a lot of people returning to the workforce think they have to start at square one. They’ll apply for entry-level positions, only to be bumped up to the next level based on their experience and age, and then finally rejected due to their employment gap. At Cloudflare, we really wanted to give people the opportunity to pick up where they left off and bring with them all of the life experience they’ve gained.

On Monday night, we hosted the Path Forward graduation at our headquarters in San Francisco and celebrated the work of the 10 participants from Demandbase, Coursera, and Zendesk, in addition to Cloudflare. Graduates snacked on hors d’oeuvres and discussed their returnship experiences, following a keynote from Tami Forman, the program’s executive director.

We’ve extended full-time offers to both Wanda and Gloria and look forward to continuing with the Path Forward program. We’re currently interviewing and plan to welcome a new group of Path Forward participants in April. We have five open returnship positions across our Marketing, Engineering, and People teams in San Francisco, so if you or someone you know is interested, please reach out to Ed Burns at ed@cloudflare.com.

Categories: Technology

NCC Group's Cryptography Services audits our Go TLS 1.3 stack

Wed, 15/02/2017 - 00:49

The Cloudflare TLS 1.3 beta is run by a Go implementation of the protocol based on the Go standard library, crypto/tls. Starting from that excellent Go codebase allowed us to quickly start experimenting, to be the first wide server deployment of the protocol, and to effectively track the changes to the specification draft.

Of course, the security of a TLS implementation is critical, so we engaged NCC Group's Cryptography Services to perform an audit at the end of 2016.

You can find the codebase on the Cloudflare GitHub. It's a drop-in replacement for crypto/tls and comes with a go wrapper to patch the standard library as needed.

The code is developed in the open but is currently targeted only at internal use: the repository is frequently rebased and the API is not guaranteed to be stable or fully documented. You can take a sneak peek at the API here.

The final goal is to upstream the patches to the Go project so that all users of the Go standard library benefit from it. You can follow the process here.

Below we republish the article about the audit first appeared on the NCC Group's blog.

NCC Group's Cryptography Services Complete an Audit of Cloudflare's TLS1.3

NCC Group's Cryptography Services practice recently completed a two-week audit of Cloudflare's TLS 1.3 implementation. The audit took place between November 11, 2016 and December 9, 2016.

The TLS standard was last updated almost ten years ago and this version brings new features and a simplified handshake to the protocol. Many old cryptographic algorithms have been replaced with more modern ones, key exchanges have forward secrecy by default, the handshake phase will be faster, certificates will be able to enjoy security-proven signature schemes, MAC-then-Encrypt constructions are out—the weakest features of older TLS versions have been updated or removed.

Cryptography Services analyzed Cloudflare's TLS 1.3 implementation for protocol-level flaws and for deviations from the draft specification. The team found a small number of issues during the review—all of which were promptly fixed—and was pleased with the quality of the code.

Cloudflare built their implementation of TLS 1.3 on the Go programming language's standard TLS library, making use of the existing base to correctly and safely parse TLS packets. While building on top of older versions can be challenging, Cloudflare has added TLS 1.3 code in a safe and segregated way, with new defenses against downgrade attacks being added in the final implementation of the specification. This permits support for older versions of TLS while being free from unexpected conflicts or downgrades.

Using Go and its standard libraries enables Cloudflare to avoid common implementation issues stemming from vulnerable strcpy and memcpy operations, pointer arithmetic and manual memory management while providing a best-in-class crypto API.

Cloudflare implemented a conservative subset of the TLS 1.3 specification. State-of-the-art algorithms, such as Curve25519, are given priority over legacy algorithms. Session resumption is limited to the forward secure option. Cloudflare's implementation also considers efficiency, using AES-GCM if it detects accelerated hardware support and the faster-in-software Chacha20-Poly1305 in its absence.

There is still work to be done before TLS 1.3 enjoys large scale adoption. Cloudflare is paving the way with its reliable server implementation of TLS 1.3, and Firefox and Chrome's client implementations make end-to-end testing of the draft specification possible. NCC Group applauds the work of the IETF and these early implementers.

Written by: Scott Stender

Categories: Technology

Want to see your DNS analytics? We have a Grafana plugin for that

Tue, 14/02/2017 - 18:04

Curious where your DNS traffic is coming from, how much DNS traffic is on your domain, and what records people are querying for that don’t exist? We now have a Grafana plugin for you.

Grafana is an open source data visualization tool that you can use to integrate data from many sources into one cohesive dashboard, and even use it to set up alerts. We’re big Grafana fans here - we use Grafana internally for our ops metrics dashboards.

In the Cloudflare Grafana plugin, you can see the response code breakdown of your DNS traffic. During a random prefix flood, a common type of DNS DDoS attack where an attacker queries random subdomains to bypass DNS caches and overwhelm the origin nameservers, you will see the number of NXDOMAIN responses increase dramatically. It is also common during normal traffic to have a small amount of negative answers due to typos or clients searching for missing records.

You can also see the breakdown of queries by data center and by query type to understand where your traffic is coming from and what your domains are being queried for. This is very useful to identify localized issues, and to see how your traffic is spread globally.

You can filter by specific data centers, record types, query types, response codes, and query name, so you can filter down to see analytics for just the MX records that are returning errors in one of the data centers, or understand whether the negative answers are generated because of a DNS attack, or misconfigured records.

Once you have the Cloudflare Grafana Plugin installed, you can also make your own charts using the Cloudflare data set in Grafana, and integrate them into your existing dashboards.

Virtual DNS customers can also take advantage of the Grafana plugin. There is a custom Grafana dashboard that comes installed with the plugin to show traffic distribution and RTT from different Virtual DNS origins, as well as the top queries that uncached or are returning SERVFAIL.

The Grafana plugin is three steps to install once you have Grafana up and running - cd into the plugins folder, download the plugin and restart grafana. Instructions are here. Once you sign in using your user email and API key, the plugin will automatically discover zones and Virtual DNS clusters you have access to.

The Grafana plugin is built on our new DNS analytics API. If you want to explore your DNS traffic but Grafana isn’t your tool of choice, our DNS analytics API is very easy to get started with. Here’s a curl to get you started:

curl -s -H 'X-Auth-Key:####' -H 'X-Auth-Email:####' 'https://api.cloudflare.com/client/v4/zones/####/dns_analytics/report?metrics=queryCount’

To make all of this work, Cloudflare DNS is answering and logging millions of queries each second. Having high resolution data at this scale enables us to quickly pinpoint and resolve problems, and we’re excited to share this with you. More on this in a follow up deep dive blog post on improvements in our new data pipeline.

Instructions for how to get started with Grafana are here and DNS analytics API documentation is here. Enjoy!

Categories: Technology

Cloudflare Crypto Meetup #5: February 28, 2017

Tue, 07/02/2017 - 19:31

Come join us on Cloudflare HQ in San Francisco on Tuesday, Febrary 28, 2017 for another cryptography meetup. We again had a great time at the last one, we decided to host another. It's becoming a pattern.

We’ll start the evening at 6:00p.m. with time for networking, followed up with short talks by leading experts starting at 6:30p.m. Pizza and beer are provided! RSVP here.

Here are the confirmed speakers:

Deirdre Connolly

Deirdre is a senior software engineer at Brightcove, where she is trying to secure old and new web applications. Her interests include applied cryptography, secure defaults, elliptic curves and their isogenies.

Post-quantum cryptography

Post-quantum cryptography is an active field of research in developing new cryptosystems that will be resistant to attack by future quantum computers. Recently a somewhat obscure area, isogeny-based cryptography, has been getting more attention, including impressive speed and compression optimizations and robust security analyses, bringing it into regular discussion alongside other post-quantum candidates. This talk will cover isogeny-based crypto, specifically these recents results regarding supersingular isogeny diffie-hellman, which is a possible replacement for the ephemeral key exchanges in use today.

Maya Kaczorowski

Maya Kaczorowski is a Product Manager at Google in Security & Privacy. Her work focuses on encryption at rest and encryption key management.

How data at rest is encrypted in Google's Cloud, at scale

How does Google encrypt data at rest? This talk will cover how Google shards and encrypts data by default, Google's key management system, root of trust, and Google's cryptographic library. Google Cloud Platform encrypts customer content stored at rest, without any action from the customer, using one or more encryption mechanisms. We will also discuss best practices in implementing encryption for your storage system(s).

Andrew Ayer

Andrew Ayer is a security researcher interested in the Web's Public Key Infrastructure. He is the founder of SSLMate, an automated SSL certificate service, and the author of Cert Spotter, an open source Certificate Transparency monitor. Andrew participates in the IETF's Public Notary Transparency working group and recently used Certificate Transparency logs to uncover over 100 improperly-issued Symantec certificates.

Certificate Transparency

Certificate Transparency improves the security of the Web PKI by logging every publicly-trusted SSL certificate to public, verifiable, append-only logs, which domain owners can monitor to detect improperly-issued certificates for their domains. Certificate Transparency was created by Google and is now being standardized by the IETF. Beginning October 2017, Chrome will require all new certificates be logged with Certificate Transparency.

This talk will explore how Certificate Transparency works, how domain owners can take advantage of it, and what the future holds for Certificate Transparency.

Categories: Technology

DDoS Ransom: An Offer You Can Refuse

Mon, 06/02/2017 - 21:43

DDoS ransom

Cloudflare has covered DDoS ransom groups several times in the past. First, we reported on the copycat group claiming to be the Armada Collective and then not too long afterwards, we covered the "new" Lizard Squad. While in both cases the groups made threats that were ultimately empty, these types of security events can send teams scrambling to determine the correct response. Teams in this situation can choose from three types of responses: pay the ransom and enable these groups to continue their operations, not pay and hope for the best, or prepare an action plan to get protected.

Breaking the Ransom Cycle

We can’t stress enough that you should never pay the ransom. We fully understand that in the moment when your website is being attacked it might seem like a reasonable solution, but by paying the ransom, you only perpetuate the DDoS ransom group’s activities and entice other would be ransomers to start making similar threats. In fact, we have seen reports of victim organizations receiving multiple subsequent threats after they have paid the ransom. It would seem these groups are sharing lists of organizations that pay, and those organizations are more likely to be targeted again in the future. Victim organizations pay the ransom often enough that we see new “competitors” pop up every few months. As of a few weeks ago, a new group, intentionally left unnamed, has emerged and begun targeting financial institutions around the world. This group follows a similar modus operandi as previous groups, but with a significant twist.

Mostly Bark and Little Bite

The main difference between previous copycats and this new group is that this group actually sends a small demonstration attack before sending the ransom email to the typical role-based email accounts. The hope is to demonstrate to the target that the group will follow through with the ransom threat and convince them to pay the amount requested before the deadline passes. Unsurprisingly though, if the ransom amount is not paid before the deadline expires, the group does not launch a second attack.

When targeting an organization, the group sends two variations of a ransom email. The first variation is a standard threat:

Subject: ddos attack Hi! If you dont pay 8 bitcoin until 17. january your network will be hardly ddosed! Our attacks are super powerfull. And if you dont pay until 17. january ddos attack will start and price to stop will double! We are not kidding and we will do small demo now on [XXXXXXXX] to show we are serious. Pay and you are safe from us forever. OUR BITCOIN ADDRESS: [XXXXXXXX] Dont reply, we will ignore! Pay and we will be notify you payed and you are safe. Cheers!

Interestingly, the second email variation makes reference to "mirai" -- the IoT-based botnet that has been in the news recently as having contributed to many significant attacks. It is important to note -- while the second variation of ransom email references “mirai” there is no actual evidence that these demonstration attacks have anything to do with the Mirai botnet.

Subject: DDoS Attack on XXXXXXXX! Hi! If you dont pay 6 bitcoin in 24 hours your servers will be hardly ddosed! Our attacks are super powerfull. And if you dont pay in 24 hours ddos attack will start and price to stop will double and keep go up! IMPORTANT - You think you protected by CloudFlare but we pass CloudFlare and attack your servers directly. We are not kidding and we will do small demo now to show we are serious. We dont want to make damage now so we will run small attack on 2 not important your IPs - XXXXXXXX and XXXXXXXX. Just small UDP flood for 1 hour to prove us. But dont ignore our demand as we then launch heavy attack by Mirai on all your servers!! Pay and you are safe from us forever. OUR BITCOIN ADDRESS: [XXXXXXXX] Dont reply, we will ignore! Pay and we will be notify you payed and you are safe. Cheers!

While no two attacks are identical, the group’s demonstration attacks do generally follow a pattern. The attacks usually peak around 10 Gbps, last for less than an hour and use either DNS amplification or NTP reflection as the attack method. Without detailing specifics so as not to tip off the bad guys, there are also specific characteristics about the demonstration attacks that support the theory the attacks are using a booter/stresser type of service to carry out the attacks. Neither of these attack types are new, and Cloudflare successfully mitigates attacks that are substantially larger in volume many times a week.

While in this instance not paying the ransom doesn’t lead to a subsequent attack, this outcome isn’t guaranteed. Not only can your site possibly go down during the demonstration attack, but there is still nothing stopping either the original ransomer or a different attacker from launching a future attack. Regardless of an attacker’s true intent, taking no action is a suboptimal plan.

Building an Action Plan

Scrambling to build an action plan while actively under attack is not only stressful, but this is often when avoidable mistakes happen. We recommend doing your research about what protection is right for you ahead of time. DDoS protection, as well as other application level protections, don’t have to be a hassle to implement, and it can be done in under an hour with Cloudflare. Having a plan and implementing protection before a security event occurs can keep your site running smoothly. However, if you find yourself under attack and without an action plan, it’s important to remember that many of these groups are bluffing. Even when these groups are not bluffing, paying the ransom will only encourage them to continue their efforts. If you have received one of these emails, we encourage you to reach out so that we can discuss the specifics of your situation, and whether or not the specific group in question is known to follow through with their threats.

Categories: Technology

NANOG - the art of running a network and discussing common operational issues

Thu, 02/02/2017 - 12:15
NANOG - the art of running a network and discussing common operational issues

The North American Network Operators Group (NANOG) is the loci of modern Internet innovation and the day-to-day cumulative network-operational knowledge of thousands and thousands of network engineers. NANOG itself is a non-profit membership organization; but you don’t need to be a member in order to attend the conference or join the mailing list. That said, if you can become a member, then you’re helping a good cause.

The next NANOG conference starts in a few days (February 6-8 2017) in Washington, DC. Nearly 900 network professionals are converging on the city to discuss a variety of network-related issues, both big and small; but all related to running and improving the global Internet. For this upcoming meeting, Cloudflare has three network professionals in attendance. Two from the San Francisco office and one from the London office.

With the conference starting next week, it seemed a great opportunity to introduce readers of the blog as to why a NANOG conference is so worth attending.

Tutorials

While it seems obvious how to do some network tasks (you unpack the spiffy new wireless router from its box, you set up its security and plug it in); alas the global Internet is somewhat more complex. Even seasoned professionals could do with a recap on how traceroute actually works, or how DNSSEC operates, or this years subtle BGP complexities, or be enlightened about Optical Networking. All this can assist you with deployments within your networks or datacenter.

Peering

If there’s one thing that keeps the Internet (a network-of-networks) operating, it’s peering. Peering is the act of bringing together two or more networks to allow traffic (bits, bytes, packets, email messages, web pages, audio and video streams) to flow efficiently and cleanly between source and destination. The Internet is nothing more than a collection of individual networks. NANOG provides one of many forums for diverse network operators to meet face-to-face and negotiate and enable those interconnections.

While NANOG isn’t the only event that draws networks together to discuss interconnection, it’s one of the early forums to support these peering discussions.

Security and Reputation

In this day-and-age we are brutally aware that security is the number-one issue when using the Internet. This is something to think about when you choose your email password, lock screen password on your laptop, tablet or smartphone. Hint: you should always have a lock screen!

At NANOG the security discussion is focused on a much deeper part of the global Internet, in the very hardware and software practices, that operate and support the underlying networks we all use on a daily basis. An Internet backbone (rarely seen) is a network that moves traffic from one side of the globe to the other (or from one side of a city to the other). At NANOG we discuss how that underlying infrastructure can operate efficiently, securely, and be continually strengthened. The growth of the Internet over the last handful of decades has pushed the envelope when it comes to hardware deployments and network-complexity. Sometimes it only takes one compromised box to ruin your day. Discussions at conferences like NANOG are vital to the sharing of knowledge and collective improvement of everyone's networks.

Above the hardware layer (from a network stack point of view) is the Domain Name System (DNS). DNS has always been a major subject of discussion within the NANOG community. It’s very much up to the operational community to make sure that when you type a website name into web browser or type in someone’s email address into your email program that there’s a highly efficient process to convert from names to numbers (numbers, or IP address, are the address book and routing method of the Internet). DNS has had its fair share of focus in the security arena and it comes down to network operators (and their system administrator colleagues) to protect DNS infrastructure.

Network Operations; best practices and stories of disasters

Nearly everyone knows that bad news sells. It’s a fact. To be honest, the same is the case in the network operator community. However, within NANOG, those stories of disasters are nearly always told from a learning and improvement point of view. There’s simply no need to repeat a failure, no-one enjoys it a second time around. Notable stories have included subjects like route-leaks, BGP protocol hiccups, peering points, and plenty more.

We simply can’t rule out failures within portions of the network; hence NANOG has spent plenty of time discussing redundancy. The internet operates using routing protocols that explicitly allow for redundancy in the paths that traffic travels. Should a failure occur (a hardware failure, or a fiber cut), the theory is that the traffic will be routed around that failure. This is a recurring topic for NANOG meetings. Subsea cables (and their occasional cuts) always make for good talks.

Network Automation

While we learned twenty or more years ago how to type into Internet routers on the command line, those days are quickly becoming history. We simply can’t scale if network operational engineers have to type the same commands into hundreds (or thousands?) of boxes around the globe. We need automation. This is where NANOG has been a leader in this space. Cloudflare has been active in this arena and Mircea Ulinic presented our experience for Network Automation with Salt and NAPALM at the previous NANOG meeting. Mircea (and Jérôme Fleury) will be giving a follow-up in-depth tutorial on the subject at next week’s meeting.

Many more subjects covered

The first NANOG conference was held June 1994 in Ann Arbor, Michigan and the conference has grown significantly since then. While it’s fun to follow the history, it’s maybe more important to realize that NANOG has covered a multitude of subjects since that start. Go scan the archives at nanog.org and/or watch some of the online videos.

The socials (downtime between technical talks)

Let’s not forget the advantages of spending time with other operators within a relaxed setting. After all, sometimes the big conversations happen when spending time over a beer discussing common issues. NANOG has long understood this and it’s clear that the Tuesday evening Beer ’n Gear social is set up specifically to let network geeks both grab a drink (soft drinks included) and poke around with the latest and greatest network hardware on show. The social is as much about blinking lights on shiny network boxes as it is about tracking down that network buddy.

Oh; there’s a fair number of vendor giveaways (so far there’s 15 hardware and software vendors signed up for next week’s event). After all, who doesn’t need a new t-shirt?

But there’s more to the downtime and casual hallway conversations. For myself (the author of this blog), I know that sometimes the most important work is done within the hallways during breaks in the meeting vs. standing in front of the microphone presenting at the podium. The industry has long recognized this and the NANOG organizers were one of the early pioneers in providing full-time coffee and snacks that cover the full conference agenda times. Why? Because sometimes you have to step out of the regular presentations to meet and discuss with someone from another network. NANOG knows its audience!

Besides NANOG, there’s IETF, ICANN, ARIN, and many more

NANOG isn’t the only forum to discuss network operational issues, however it’s arguably the largest. It started off as a “North American” entity; however, in the same way that the Internet doesn’t have country barriers, NANOG meetings (which take place in the US, Canada and at least once in the Caribbean) have fostered an online community that has grown into a global resource. The mailing list (well worth reviewing) is a bastion of networking discussions.

In a different realm, the Internet Engineering Task Force (IETF) focuses on protocol standards. Its existence is why diverse entities can communicate. Operators participate in IETF meetings; however, it’s a meeting focused outside of the core operational mindset.

Central to the Internet’s existence is ICANN. Meeting three times a year at locations around the globe, it focused on the governance arena and in domain names and related items. Within the meetings there’s an excellent Tech Day.

In the numbers arena ARIN is an example of a regional routing registry (an RIR) that runs members meeting. An RIR deals with allocating resources like IP addresses and AS numbers. ARIN focuses on the North American area and sometimes holds its meetings alongside NANOG meetings.

ARIN’s counterparts in other parts of the world also hold meetings. Sometimes they simply focus on resource policy and sometimes they also focus on network operational issues. For example RIPE (in Europe, Central Asia and the Middle East) runs a five-day meeting that covers operational and policy issues. APNIC (Asia Pacific), AFRINIC (Africa), LACNIC (Latin America & Caribbean) all do similar variations. There isn’t one absolute method and that's a good thing. It’s worth pointing out that APNIC holds it’s members meetings once a year in conjunction with APRICOT which is the primary operations meeting in the Asia Pacific region.

While NANOG is somewhat focused on North America, there are also the regional NOGs. These regional NOGs are vital to help the education of network operators globally. Japan has JANOG, Southern Africa has SAFNOG, MENOG in the Middle East, AUSNOG & NZNOG in Australia & New Zealand, DENOG in Germany, PHNOG in the Philippines, and just to be different, the UK has UKNOF (“Forum” vs. “Group”). It would be hard to list them all; but each is a worthwhile forum for operational discussions.

Peering specific meetings also exist. Global Peering Forum, European Peering Forum, and Peering Forum de LACNOG for example. Those focus on bilateral meetings within a group of network operators or administrators and specifically focus on interconnect agreements.

In the commercial realms there’s plenty of other meetings that are attended by networks like Cloudflare. PTC and International Telecoms Week (ITW) are global telecom meetings specifically designed to host one-to-one (bilateral) meetings. They are very commercial in nature and less operational in focus.

NANOG isn’t the only forum Cloudflare attends

As you would guess, you will find our network team at RIR meetings, sometimes at IETF meetings, sometimes at ICANN meetings, often at various regional NOG meetings (like SANOG in South East Asia, NoNOG in Norway, RONOG in Romania, AUSNOG/NZNOG in Australia/New Zealand and many other NOGs). We get around; however, we also run a global network and we need to interact with many many networks around the globe. These meetings provide an ideal opportunity for one-to-one discussions.

If you've heard something you like from Cloudflare at one of these operational-focused conferences, then check out our jobs listings (in various North American cities, London, Singapore, and beyond!)

Categories: Technology

Protecting everyone from WordPress Content Injection

Wed, 01/02/2017 - 16:53

Today a severe vulnerability was announced by the WordPress Security Team that allows unauthenticated users to change content on a site using unpatched (below version 4.7.2) WordPress.

CC BY-SA 2.0 image by Nicola Sap De Mitri

The problem was found by the team at Sucuri and reported to WordPress. The WordPress team worked with WAF vendors, including Cloudflare, to roll out protection before the patch became available.

Earlier this week we rolled out two rules to protect against exploitation of this issue (both types mentioned in the Sucuri blog post). We have been monitoring the situation and have not observed any attempts to exploit this vulnerability before it was announced publicly.

Customers on a paid plan will find two rules in WAF, WP0025A and WP0025B, that protect unpatched WordPress sites from this vulnerability. If the Cloudflare WordPress ruleset is enabled then these rules are automatically turned on and blocking.

Protecting Everyone

As we have in the past with other serious and critical vulnerabilities like Shellshock and previous issues with JetPack, we have enabled these two rules for our free customers as well.

Free customers who want full protection for their WordPress sites can upgrade to a paid plan and enable the Cloudflare WordPress ruleset in the WAF.

Categories: Technology

TLS 1.3 explained by the Cloudflare Crypto Team at 33c3

Wed, 01/02/2017 - 14:57

Nick Sullivan and I gave a talk about TLS 1.3 at 33c3, the latest Chaos Communication Congress. The congress, attended by more that 13,000 hackers in Hamburg, has been one of the hallmark events of the security community for more than 30 years.

You can watch the recording below, or download it in multiple formats and languages on the CCC website.

The talk introduces TLS 1.3 and explains how it works in technical detail, why it is faster and more secure, and touches on its history and current status.

.fluid-width-video-wrapper { margin-bottom: 45px; }

The slide deck is also online.

This was an expanded and updated version of the internal talk previously transcribed on this blog.

TLS 1.3 hits Chrome and Firefox Stable

In related news, TLS 1.3 is reaching a percentage of Chrome and Firefox users this week, so websites with the Cloudflare TLS 1.3 beta enabled will load faster and more securely for all those new users.

The last few days

You can enable the TLS 1.3 beta from the Crypto section of your control panel.

TLS 1.3 toggle

Categories: Technology

Firebolt: the fastest, safest ads on the web

Mon, 30/01/2017 - 19:46
 the fastest, safest ads on the web

Cloudflare’s mission is to help build a better Internet. That means a faster, more secure, open Internet world-wide. We have millions of customers using our services like free SSL, an advanced WAF, the latest compression and the most up to date security to ensure that their web sites, mobile apps and APIs are secure and fast.

One vital area of web technology has lagged behind in terms of speed and security: online ads. And consumers have been turning to ad blocking technology to secure and speed up their own web browsing.

 the fastest, safest ads on the web

Today, Cloudflare is introducing a new product to make web ads secure, fast and safe. That product is Firebolt.

Firebolt

With Firebolt, ad networks can instantly speed up and secure their ads, resulting in happy consumers and better conversion rates.

Firebolt delivers:

Lightning fast ad delivery

Cloudflare's global network of 102 data centers in 50 countries, combined with routing and performance technologies, makes the delivery of online ads to any device up to five times faster.

Free, simple SSL

Adding SSL to ad serving has been challenging for some ad networks. Cloudflare has years of experience providing free, one click SSL for our customers. Firebolt ads are automatically available over SSL with no complex process of getting and maintaining SSL certificates.

Firebolt includes AMP for Ads

Firebolt enables any independent ad network to leverage the new AMP ad format easily. This makes it possible for ads to appear in AMP content served by Google and an increasing number of sites. Firebolt is the only independent way to serve the newly announced AMP for Ads outside of Google’s advertising network.

Cryptographically signed ads

All ad content delivered by Firebolt for AMP for Ads is cryptographically signed to ensure that it meets the required format and security standards. Signed ads reduce the risk of malware and increase confidence in ads for consumers.

The most advanced browser security

Firebolt ads take advantage of web browser security features including CORS, X-Content-Type-Options and Strict-Transport-Security to ensure the integrity of ads delivered to browsers.

A faster, safer Internet for everyone

Firebolt takes us one step closer to making the Internet a better place by benefitting everyone in the ad ecosystem, including the consumer.

During a recent test, ad platform TripleLift used Cloudflare's Firebolt to serve AMP ads on Time Inc.'s properties. Ads loaded six times faster and Time Inc. saw 13 percent more revenue relative to traditional ads. “Cloudflare was easy to set up, and we saw an impressive difference in the speed of ad delivery with Firebolt's support for AMP for Ads," said Shaun Zacharia, co-founder and President of TripleLift. "AMP Ads loaded six times faster and were three times lighter than comparable standard ads."

If you are an ad network or publisher, please reach out to firebolt@cloudflare.com to learn more about Firebolt and how Cloudflare can help you monetize the Internet content we all rely on.

Categories: Technology

Introducing Accelerated Mobile Links: Making the Mobile Web App-Quick

Thu, 12/01/2017 - 06:00
 Making the Mobile Web App-Quick

In 2017, we've predicted that more than half of the traffic to Cloudflare's network will come from mobile devices. Even if they are formatted to be displayed on a small screen, the mobile web is built on traditional web protocols and technologies that were designed for desktop CPUs, network connections, and displays. As a result, browsing the mobile web feels sluggish compared with using native mobile apps.

In October 2015, the team at Google announced Accelerated Mobile Pages (AMP), a new, open technology to make the mobile web as fast as native apps. Since then, a large number of publishers have adopted AMP. Today, 600 million pages across 700,000 different domains are available in the AMP format.

The majority of traffic to this AMP content comes from people running searches on Google.com. If a visitor finds content through some source other than a Google search, even if the content can be served from AMP, it typically won't be. As a result, the mobile web continues to be slower than it needs to be.

Making the Mobile Web App-Quick

Cloudflare's Accelerated Mobile Links helps solve this problem, making content, regardless of how it's discovered, app-quick. Once enabled, Accelerated Mobile Links automatically identifies links on a Cloudflare customer's site to content with an AMP version available. If a link is clicked from a mobile device, the AMP content will be loaded nearly instantly.

 Making the Mobile Web App-Quick

To see how it works, try viewing this post from your mobile device and clicking any of these links:

 Making the Mobile Web App-Quick

Increasing User Engagement

One of the benefits of Accelerated Mobile Links is that AMP content is loaded in a viewer directly on the site that linked to the content. As a result, when a reader is done consuming the AMP content closing the viewer returns them to the original source of the link. In that way, every Cloudflare customers' site can be more like a native mobile app, with the corresponding increase in user engagement.

For large publishers that want an even more branded experience, Cloudflare will offer the ability to customize the domain of the viewer to match the publisher's domain. This, for the first time, provides a seamless experience where AMP content can be consumed without having to send visitors to a Google owned domain. If you're a large publisher interested in customizing the Accelerated Mobile Links viewer, you can contact Cloudflare's team.

Innovating on AMP

While Google was the initial champion of AMP, the technologies involved are open. We worked closely with the Google team in developing Cloudflare's Accelerated Mobile Links as well as our own AMP cache. Malte Ubl, the technical lead for the AMP Project at Google said of our collaboration:

"Working with Cloudflare on its AMP caching solution was as seamless as open-source development can be. Cloudflare has become a regular contributor on the project and made the code base better for all users of AMP. It is always a big step for a software project to go from supporting specific caches to many, and it is awesome to see Cloudflare’s elegant solution for this."

Cloudflare now powers the only compliant non-Google AMP cache with all the same performance and security benefits as Google.

In the spirit of open source, we're working to help develop updates to the project to address some of publishers' and end users' concerns. Specifically, here are some features we're developing to address concerns that have been expressed about AMP:

  • Easier ways to share AMP content using publisher's original domains
  • Automatically redirecting desktop visitors from the AMP version back to the original version of the content
  • A way for end users who would prefer not to be redirected to the AMP version of content to opt out
  • The ability for publishers to brand the AMP viewer and serve it from their own domain

Cloudflare is committed to the AMP project. Accelerated Mobile Links is the first AMP feature we're releasing, but we'll be doing more over the months to come. As of today, Accelerated Mobile Links is available to all Cloudflare customers for free. You can enable it in your Cloudflare Performance dashboard. Stay tuned for more AMP features that will continue to increase the speed of the mobile web.

Categories: Technology

Cloudflare’s Transparency Report for Second Half 2016 and an Additional Disclosure for 2013

Tue, 10/01/2017 - 23:20

Cloudflare is publishing today its seventh transparency report, covering the second half of 2016. For the first time, we are able to present information on a previously undisclosed National Security Letter (NSL) Cloudflare received in the 2013 reporting period.

Wikipedia provides the most succinct description of an NSL:

An NSL is an administrative subpoena issued by the United States federal government to gather information for national security purposes. NSLs do not require prior approval from a judge.… NSLs typically contain a nondisclosure requirement, frequently called a "gag order", preventing the recipient of an NSL from disclosing that the FBI had requested the information. https://en.wikipedia.org/wiki/National_security_letter


Shortly before the New Year, the FBI sent us the following letter about that NSL.

The letter withdrew the nondisclosure provisions (the “gag order”) contained in NSL-12-358696, which had constrained Cloudflare since the NSL was served in February 2013. At that time, Cloudflare objected to the NSL. The Electronic Frontier Foundation agreed to take our case, and with their assistance, we brought a lawsuit under seal to protect its customers' rights.

Early in the litigation, the FBI rescinded the NSL in July 2013 and withdrew the request for information. So no customer information was ever disclosed by Cloudflare pursuant to this NSL.

Even though the request for information was no longer at issue, the NSL’s gag order remained. For nearly four years, Cloudflare has pursued its legal rights to be transparent about this request despite the threat of criminal liability. As explained above, the FBI recently removed that gag order, so we are now able to share the redacted text of NSL-12-358696, which reads as follows:

Consistent with the FBI’s request and Cloudflare policy, we have voluntarily redacted personal information about the FBI Special Agent named in the NSL as well as customer account information. Disclosing this information would provide no public benefit.

The gag order not only impacted our transparency report and our ability to talk about the sealed case, but Cloudflare has been involved in public policy discussions related to the Internet and matters of electronic communications both in Congress and in the public sphere more broadly since the early days of the company. We believe that participation in policy debates is an axiomatic part of our mission to build a better internet. The inability to disclose the receipt of NSLs and to participate in a robust discussion of the policy issues surrounding NSLs was important to Cloudflare and the members of our community.

One personal experience is particularly telling about the gag order’s negative impact on our policy advocacy efforts. In early 2014, I met with a key Capitol Hill staffer who worked on issues related to counter-terrorism, homeland security, and the judiciary. I had a conversation where I explained how Cloudflare values transparency, due process of law, and expressed concerns that NSLs are unconstitutional tools of convenience rather than necessity. The staffer dismissed my concerns and expressed that Cloudflare’s position on NSLs was a product of needless worrying, speculation, and misinformation. The staffer noted it would be impossible for an NSL to issue against Cloudflare, since the services our company provides expressly did not fall within the jurisdiction of the NSL statute. The staffer went so far as to open a copy of the U.S. Code and read from the statutory language to make her point.

Because of the gag order, I had to sit in silence, implicitly confirming the point in the mind of the staffer. At the time, I knew for a certainty that the FBI’s interpretation of the statute diverged from hers (and presumably that of her boss).

Cloudflare fought this battle for four years even after the request for customer information had been dismissed. In addition to protecting our customers’ information, we want to remain a vigorous participation in public policy discussions about our services and public law enforcement efforts. The gag rule did not allow that.

Now that this gag order has been lifted, Cloudflare is able to publish a more accurate transparency report to its customers and constituents. For us, this is not end of the story, but the beginning of a more robust, fact-informed debate.

Categories: Technology

Token Authentication for Cached Private Content and APIs

Tue, 10/01/2017 - 13:52

While working to make the Internet a better place, we also want to make it easier for our customers to have control of their content and APIs, and who has access to them. Using Cloudflare’s Token Authentication features, customers can implement access control via URL tokens or HTTP request headers without having to build complex back-end systems.

Cloudflare will check these tokens at the edge before any request is relayed to an origin or served from cache. If the token is not valid the request is blocked. Since Cloudflare handles all the token validation, the origin server does not need to have complex authentication logic. In addition, a malicious user who attempts to forge tokens will be blocked from ever reaching the origin.

Cloudflare Private Content CC BY 2.0 image by zeevveez

Leveraging our edge network of over 100 data centers, customers can use token authentication to perform access control checks on content and APIs, as well as allowing Cloudflare to cache private content and only serve it to users with a valid token tied specifically to that cached asset.

Performing access control on the edge has many benefits. Brute force attempts and other attacks on private assets don't ever reach an origin server, preventing origin CPU and bandwidth from being wasted on malicious requests.

By performing authentication on the edge it's possible to cache protected content, giving users faster access to private content because there’s no round trip to the origin for authentication. At the same time web application owners are assured that only valid, authenticated users have access to the cache content.

By validating that an API request is from a valid client, Cloudflare is able to eliminate forged requests coming from bots, attackers or non-authenticated users.

Content Access Control

Many Internet applications are not built with access control features for assets, especially static assets like images, PDFs, zip files, apps, eBooks, and other downloadable content. Building an access control layer for these can be difficult and expensive.

We’ve worked with many customers to solve problems such as:

  • A website provides downloadable static content to registered users; however, users tend to share/publish links to that content on social media platforms;
  • A website provides downloadable static content, but crawlers and scrapers are constantly trying to find/leech/look for available links;
  • An access control system is in place, but the customer would like to cache content at the edge for a better user experience and reduced bandwidth bills;
  • A website would like to generate links with an expiry time;
  • Access to specific resources hosted outside of the main application needs to be limited and restricted.
API Protection

Today most applications are client software that connect to HTTP based APIs on the Internet. Protecting those APIs from malicious use is important as it’s possible to write client software, such as bots, that talks directly to the APIs bypassing the original application. This can lead to abuse and unwanted load on API servers.

Cloudflare’s token authentication can be used to validate that an API request is coming from a valid user, client or a mobile device on the edge. Cloudflare will filter out non-authenticated or forged requests and not pass them on to the origin API server.

Along with Cloudflare’s Rate Limiting and WAF, a mobile application with an Internet API can be protected at the edge, far from the origin API server.

Cloudflare’s Token Authentication Solution

Token Authentication leverages tokens to verify that a user has access to a specific resource. The token can be sent as a URL parameter or in an HTTP header.

The token is an HMAC generated from the following:

  • A secret shared between Cloudflare and the web application or mobile app;
  • The path to the resource or API;
  • A Unix epoch timestamp;
  • Potential optional additional parameters (e.g. IP address, cookie value, username);

The Cloudflare edge validates the token and allow access or not based on the result. The generated HMAC can also be configured to expire after a certain time (e.g. 10 minutes), or so that the expiry is controlled directly from the origin server. In the latter case, the generated URLs would simply include an absolute future Unix timestamp.

Protecting Private Content with Token Authentication

In the simplest implementations, tokens can be used to protect static private content. The code required in the back end application would be as follows (in PHP):

<?php // Generate valid URL token $secret = "thisisasharedsecret"; $time = time(); $token = $time . "-" . urlencode(base64_encode(hash_hmac("sha256", "/download/private.jpg$time", $secret, true))); $url = "http://www.domain.com/download/private.jpg?verify=" . $token; ?>

The code above, given a shared secret:

  • Generates the current timestamp;
  • Generates the token by concatenating the timestamp with the cryptographic hash separated by a dash -;
  • The cryptographic hash is a SHA256 based HMAC generated from the relative path to the restricted asset concatenated with the timestamp. The key of the hash is the shared secret;
  • The hash is base64 encoded, and subsequently, URL encoded;
  • Finally, the URL to the private asset is generated by simply adding the resulting token to the query string. The token HTTP GET parameter name is customizable.

Once deployed, the authentication rules are available under the Web Application Firewall Custom User Rule Set Package. From here the rules can be configured on simulate, challenge or block or deactivated completely:

WAF Rules

Once active and in the event a user were to try to access a restricted resource without a valid token, Cloudflare would present the default WAF Block page shown below:

WAF Block Page

The block page can be fully customized to match the customer branding as necessary.

API Requests with Token Authentication

In more advanced implementations tokens can also be used to perform API authentication:

  • User requests access using a standard authentication method (e.g. username and password);
  • The origin server validates access and provides a token to the client. The token is specific to the user;
  • Client stores the token and includes it in any subsequent request to API endpoints;
  • The Cloudflare edge validates the token on every request. If the token is missing or the token is not valid the request is denied;
  • The token can be configured to expire after a certain time, forcing the client to re-authenticate with the origin server if necessary.

Using tokens for API endpoints provides many benefits:

  • No session information is stored so it is much easier to scale applications;
  • Tokens help to prevent CSFR attacks as the token is required on every request;
  • Ability to provide selective access to third party applications;
  • Lower load on API servers.

Allowing access to API servers only from Cloudflare IP ranges will ensure that users cannot bypass the token authentication.

Additionally, for API endpoints, Cloudflare can be configured to generate custom JSON responses compatible with the API specification.

Validating Mobile Apps with Token Authentication

Most mobile applications leverage HTTP based API endpoints to provide dynamic functionality to the end user. The shared secret used to generate the token can be embedded and encrypted within native mobile applications, improving protection of mobile app API endpoints and ensuring only requests from legitimate clients are allowed to access the underlying API.

Conclusion

Token Authentication is available on the Business Plan if you are able to follow the default Cloudflare parameter format and expiry times are known beforehand. Our support team is able to provide implementation details on request.

If this is not possible, or if you are looking for additional logic and/or custom behavior, please contact us and enquire about our Enterprise Plan and reference Token Authentication.

Categories: Technology

The Porcupine Attack: investigating millions of junk requests

Mon, 09/01/2017 - 14:08

We extensively monitor our network and use multiple systems that give us visibility including external monitoring and internal alerts when things go wrong. One of the most useful systems is Grafana that allows us to quickly create arbitrary dashboards. And a heavy user of Grafana we are: at last count we had 645 different Grafana dashboards configured in our system!

grafana=> select count(1) from dashboard; count ------- 645 (1 row)

This post is not about our Grafana systems though. It's about something we noticed a few days ago, while looking at one of those dashboards. We noticed this:

This chart shows the number of HTTP requests per second handled by our systems globally. You can clearly see multiple spikes, and this chart most definitely should not look like a porcupine! The spikes were large in scale - 500k to 1M HTTP requests per second. Something very strange was going on.

Tracing the spikes1

Our intuition indicated an attack - but our attack mitigation systems didn't confirm it. We'd seen no major HTTP attacks at those times.

It would be bad if we were under such heavy HTTP attack and our mitigation systems didn't notice it. Without more ideas, we went back to one of our favorite debugging tools - tcpdump.

The spikes happened every 80 minutes and lasted about 10 minutes. We waited, and tried to catch the offending traffic. Here is what the HTTP traffic looked like on the wire:

The client had sent some binary junk to our HTTP server on port 80; they weren't even sending a fake GET or POST line!

Our server politely responded with HTTP 400 error. This explains why it wasn't caught by our attack mitigation systems. Invalid HTTP requests don't trigger our HTTP DDoS mitigations - it makes no sense to mitigate traffic which is never accepted by NGINX in the first place!

The payload

At first glance the payload sent to HTTP servers seems random. A colleague of mine, Chris Branch, investigated and proved me wrong. The payload has patterns.

Let me show what's happening. Here are the first 24 bytes of the mentioned payload:

If you look closely, the pattern will start to emerge. Let's add some colors and draw it in not eight, but seven bytes per row:

This checkerboard-like pattern, is exhibited in most of the requests with payload sizes below 512 bytes.

Another engineer pointed out there appear to actually be two separate sequences generated in the same fashion. Starting with the a6 and the cb take alternating bytes

a6 ef 39 82 cb 15 5e a7 f0 3a 83 cc 16 5f cb 15 5e a7 f0 3a 83 cc 16 5f a8 f1 3b

Aligning that differently shows that the second sequence is essentially the same as the first:

a6 ef 39 82 cb 15 5e a7 f0 3a 83 cc 16 5f cb 15 5e a7 f0 3a 83 cc 16 5f a8 f1 3b

Thinking of that as one sequence gets

a6 ef 39 82 cb 15 5e a7 f0 3a 83 cc 16 5f a8 f1 3b

Which is generated by starting at ef and adding the following repeating sequence.

4a 49 49 4a 49 49 49

The 'random' binary junk is actually generated by some simple code.

The length distribution of the requests is also interesting. Here's the histogram showing the popularity of particular lengths of payloads.

About 80% of the junk requests we received had length of up to 511 bytes, uniformly distributed.

The remaining 20% had length uniformly distributed between 512 and 2047 bytes, with a few interesting spikes. For some reason lengths of 979, 1383 and 1428 bytes stand out. The rest of the distribution looks uniform.

The scale

The spikes were large. It takes a lot of firepower to generate a spike in our global HTTP statistics! On the first day the spikes reached about 600k junk requests per second. On second day the score went up to 1M rps. In total we recorded 37 spikes.

Geography

Unlike L3 attacks, L7 attacks require TCP/IP connections to be fully established. That means the source IP addresses are not spoofed and can be used to investigate the geographic distribution of attacking hosts.

The spikes were generated by IP addresses from all around the world. We recorded IP numbers from 4,912 distinct Autonomous Systems. Here are top ASN numbers by number of unique attacking IP addresses:

Percent of unique IP addresses seen: 21.51% AS36947 # AS de Algerie Telecom, Algeria 5.34% AS18881 # Telefonica Brasil S.A, Brasil 3.60% AS7738 # Telemar Norte Leste S.A., Brasil 3.48% AS27699 # Telefonica Brasil S.A, Brasil 3.37% AS28573 # CLARO S.A., Brasil 3.20% AS8167 # Brasil Telecom S/A, Brasil 2.44% AS2609 # Tunisia BackBone, Tunisia 2.22% AS6849 # PJSC "Ukrtelecom", Ukraine 1.77% AS3320 # Deutsche Telekom AG, Germany 1.73% AS12322 # Free SAS, France 1.73% AS8452 # TE-AS, Egypt 1.35% AS12880 # Information Technology Company, Iran 1.30% AS37705 # TOPNET, Tunisia 1.26% AS53006 # Algar Telecom S/A, Brasil 1.22% AS36903 # ASN du reseaux MPLs de Maroc Telecom, Morocco ... 4897 AS numbers below 1% of IP addresses.

You get the picture - the traffic was sourced all over the place, with bias towards South America and North Africa. Here is the country distribution of attacking IPs:

Percent of unique IP addresses seen: 31.76% BR 21.76% DZ 7.49% UA 5.73% TN 4.89% IR 3.96% FR 3.76% DE 2.09% EG 1.78% SK 1.36% MA 1.15% GB 1.05% ES ... 109 countries below 1% of IP addresses

The traffic was truly global and launched with IPs from 121 countries. This kind of globally distributed attack is where Cloudflare's Anycast network shines. During these spikes the load was nicely distributed across dozens of datacenters. Our datacenter in São Paulo absorbed the most traffic, roughly 4 times more traffic than the second in line - Paris. This chart shows how the traffic was distributed across many datacenters:

Unique IPs

During each of the spikes our systems recorded 200k unique source IP addresses sending us junk requests.

Normally we would conclude that whoever generated the attack controlled roughly 200k bots, and that's it. But these spikes were different. It seems the bots rotated IPs aggressively. Here is an example: during these 16 spikes we recorded a total count of a whopping 1.2M unique IP addresses attacking us.

This can be explained by bots churning through IP addresses. We believe that out of the estimated 200k bots, between 50k and 100k bots changed their IP addresses during the 80 minutes between attacks. This resulted in 1.2M unique IP addresses during the 16 spikes happening over 24 hours.

A botnet?

These spikes were unusual for a number of reasons.

  • They were generated by a large number of IP addresses. We estimate 200k concurrent bots.
  • The bots were rotating IP addresses aggressively.
  • The bots were from around the world with an emphasis on South America and North Africa.
  • The traffic generated was enormous, reaching 1M junk connections per second.
  • The spikes happened exactly every 80 minutes and lasted for 10 minutes.
  • The payload of the traffic was junk, not a usual HTTP request attack.
  • The payload had uniformly distributed payload sizes.

It's hard to draw conclusions, but we can imagine two possible scenarios. It is possible these spikes were an attack intended to break our HTTP servers.

A second possibility is that these spikes were legitimate connection attempts by some weird, obfuscated protocol. For some reason the clients were connecting to port 80/TCP and retried precisely every 80 minutes.

We are continuing our investigation. In the meantime we are looking for clues. Please do let us know if you have encountered this kind of TCP/IP payload. We are puzzled by these large spikes.

If you'd like to work on this type of problem we're hiring in London, San Francisco, Austin, Champaign and Singapore.

  1. Yes, we're aware that porcupines have spines/quills not spikes.

Categories: Technology

How and why the leap second affected Cloudflare DNS

Sun, 01/01/2017 - 22:40

At midnight UTC on New Year’s Day, deep inside Cloudflare’s custom RRDNS software, a number went negative when it should always have been, at worst, zero. A little later this negative value caused RRDNS to panic. This panic was caught using the recover feature of the Go language. The net effect was that some DNS resolutions to some Cloudflare managed web properties failed.

The problem only affected customers who use CNAME DNS records with Cloudflare, and only affected a small number of machines across Cloudflare's 102 PoPs. At peak approximately 0.2% of DNS queries to Cloudflare were affected and less than 1% of all HTTP requests to Cloudflare encountered an error.

This problem was quickly identified. The most affected machines were patched in 90 minutes and the fix was rolled out worldwide by 0645 UTC. We are sorry that our customers were affected, but we thought it was worth writing up the root cause for others to understand.

A little bit about Cloudflare DNS

Cloudflare customers use our DNS service to serve the authoritative answers for DNS queries for their domains. They need to tell us the IP address of their origin web servers so we can contact the servers to handle non-cached requests. They do this in two ways: either they enter the IP addresses associated with the names (e.g. the IP address of example.com is 192.0.2.123 and is entered as an A record) or they enter a CNAME (e.g. example.com is origin-server.example-hosting.biz).

This image shows a test site with an A record for theburritobot.com and a CNAME for www.theburritobot.com pointing directly to Heroku.

When a customer uses the CNAME option, Cloudflare has occasionally to do a lookup, using DNS, for the actual IP address of the origin server. It does this automatically using standard recursive DNS. It was this CNAME lookup code that contained the bug that caused the outage.

Internally, Cloudflare operates DNS resolvers to lookup DNS records from the Internet and RRDNS talks to these resolvers to get IP addresses when doing CNAME lookups. RRDNS keeps track of how well the internal resolvers are performing and does a weighted selection of possible resolvers (we operate multiple per PoP for redundancy) and chooses the most performant. Some of these resolutions ended up recording in a data structure a negative value during the leap second.

The weighted selection code, at a later point, was being fed the negative number which caused it to panic. The negative number got there through a combination of the leap second and smoothing.

A falsehood programmers believe about time

The root cause of the bug that affected our DNS service was the belief that time cannot go backwards. In our case, some code assumed that the difference between two times would always be, at worst, zero.

RRDNS is written in Go and uses Go’s time.Now() function to get the time. Unfortunately, this function does not guarantee monotonicity. Go currently doesn’t offer a monotonic time source (see issue 12914 for discussion).

In measuring the performance of the upstream DNS resolvers used for CNAME lookups RRDNS contains the following code:

// Update upstream sRTT on UDP queries, penalize it if it fails if !start.IsZero() { rtt := time.Now().Sub(start) if success && rcode != dns.RcodeServerFailure { s.updateRTT(rtt) } else { // The penalty should be a multiple of actual timeout // as we don't know when the good message was supposed to arrive, // but it should not put server to backoff instantly s.updateRTT(TimeoutPenalty * s.timeout) } }

In the code above rtt could be negative if time.Now() was earlier than start (which was set by a call to time.Now() earlier).

That code works well if time moves forward. Unfortunately, we’ve tuned our resolvers to be very fast which means that it’s normal for them to answer in a few milliseconds. If, right when a resolution is happening, time goes back a second the perceived resolution time will be negative.

RRDNS doesn’t just keep a single measurement for each resolver, it takes many measurements and smoothes them. So, the single measurement wouldn’t cause RRDNS to think the resolver was working in negative time, but after a few measurements the smoothed value would eventually become negative.

When RRDNS selects an upstream to resolve a CNAME it uses a weighted selection algorithm. The code takes the upstream time values and feeds them to Go’s rand.Int63n() function. rand.Int63n promptly panics if its argument is negative. That's where the RRDNS panics were coming from.

(Aside: there are many other falsehoods programmers believe about time)

The one character fix

One precaution when using a non-monotonic clock source is to always check whether the difference between two timestamps is negative. Should this happen, it’s not possible to accurately determine the time difference until the clock stops rewinding.

In this patch we allowed RRDNS to forget about current upstream performance, and let it normalize again if time skipped backwards. This prevents leaking of negative numbers to the server selection code, which would result in throwing errors before attempting to contact the upstream server.

The fix we applied prevents the recording of negative values in server selection. Restarting all the RRDNS servers then fixed any recurrence of the problem.

Timeline

The following is the complete timeline of the events around the leap second bug.

2017-01-01 00:00 UTC Impact starts
2017-01-01 00:10 UTC Escalated to engineers
2017-01-01 00:34 UTC Issue confirmed
2017-01-01 00:55 UTC Mitigation deployed to one canary node and confirmed
2017-01-01 01:03 UTC Mitigation deployed to canary PoP and confirmed
2017-01-01 01:23 UTC Fix deployed in most impacted PoP
2017-01-01 01:45 UTC Fix being deployed to major PoPs
2017-01-01 01:48 UTC Fix being deployed everywhere
2017-01-01 02:50 UTC Fix rolled out to most of the affected PoPs
2017-01-01 06:45 UTC Impact ends

This chart shows error rates for each Cloudflare PoP (some PoPs were more affected than others) and the rapid drop in errors as the fix was deployed. We deployed the fix prioritizing those locations with the most errors first.

Conclusion

We are sorry that our customers were affected by this bug and are inspecting all our code to ensure that there are no other leap second sensitive uses of time intervals.

Categories: Technology