Category Archives: TLS

TLS and its key exchange

http://security.stackexchange.com/questions/8343/what-key-exchange-mechanism-should-be-used-in-tls

You may use a key exchange (as part of a cipher suite) only if the server key type and certificate match. To see this in details, let’s have a look at cipher suites defined in the TLS 1.2 specification. Each cipher suite defines the key exchange algorithm, as well as the subsequently used symmetric encryption and integrity check algorithms; we concentrate here on the key exchange part.

  • RSA: the key exchange works by encrypting a random value (chosen by the client) with the server public key. This requires that the server public key is an RSA key, and that the server certificate does not prohibit encryption (mainly through the “Key Usage” certificate extension: if that extension is present, it must include the “keyAgreement” flag).
  • DH_RSA: the key exchange is a static Diffie-Hellman: the server public key must be a Diffie-Hellman key; moreover, that certificate must have been issued by a Certification Authority which itself was using a RSA key (the CA key is the key which was used to sign the server certificate).
  • DH_DSS: like DH_RSA, except that the CA used a DSA key.
  • DHE_RSA: the key exchange is an ephemeral Diffie-Hellman: the server dynamically generates a DH public key and sends it to the client; the server also signs what it sends. For DHE_RSA, the server public key must be of type RSA, and its certificate must be appropriate for signatures (the Key Usage extension, if present, must include the digitalSignature flag).
  • DHE_DSS: like DHE_RSA, except that the server key has type DSA.
  • DH_anon: there is no server certificate. The server uses a Diffie-Hellman key that it may have dynamically generated. The “anon” cipher suites are vulnerable to impersonating attacks (including, but not limited to, the “Man in the Middle”) since they lack any kind of server authentication. On a general basis, you shall not use an “anon” cipher suite.

Key exchange algorithms which use elliptic-curve cryptography are specified in another RFC and propose the following:

  • ECDH_ECDSA: like DH_DSA, but with elliptic curves: the server public key must be an ECDH key, in a certificate issued by a CA which itself was using an ECDSA public key.
  • ECDH_RSA: like ECDH_ECDSA, but the issuing CA has a RSA key.
  • ECDHE_ECDSA: the server sends a dynamically generated EC Diffie-Hellman key, and signs it with its own key, which must have type ECDSA. This is equivalent to DHE_DSS, but with elliptic curves for both the Diffie-Hellman part and the signature part.
  • ECDHE_RSA: like ECDHE_ECDSA, but the server public key is a RSA key, used for signing the ephemeral elliptic-curve Diffie-Hellman key.
  • ECDH_anon: an “anon” cipher suite, with dynamic elliptic-curve Diffie-Hellman.

 

 

http://security.stackexchange.com/questions/41205/diffie-hellman-and-its-tls-ssl-usage

Diffie-Hellman is used in SSL/TLS, as “ephemeral Diffie-Hellman” (the cipher suites with “DHE” in their name; see the standard). What is very rarely encountered is “static Diffie-Hellman” (cipher suites with “DH” in their name, but neither “DHE” or “DH_anon”): these cipher suites require that the server owns a certificate with a DH public key in it, which is rarely supported for a variety of historical and economical reasons, among which the main one is the availability of a free standard for RSA (PKCS#1) while the corresponding standard for Diffie-Hellman (x9.42) costs a hundred bucks, which is not much, but sufficient to deter most amateur developers.

Diffie-Hellman is a key agreement protocol, meaning that if two parties (say, the SSL client and the SSL server) run this protocol, they end up with a shared secret K. However, neither client or server gets to choose the value of K; from their points of view, K looks randomly generated. It is secret(only them know K; eavesdroppers on the line do not) and shared (they both get the same valueK), but not chosen. This is not encryption. A shared secret K is good enough, though, to process terabytes of data with a symmetric encryption algorithm (same K to encrypt on one side and decrypt on the other), and that is what happens in SSL.

There is a well-known asymmetric encryption algorithm called RSA, though. With RSA, the sender can encrypt a message M with the recipient’s public key, and the recipient can decrypt it and recover M using his private key. This time, the sender can choose the contents M. So your question might be: in a RSA world, why do we bother with AES at all ? The answer lies in the following points:

  • There are constraints on M. If the recipient’s public key has size n (in bytes, e.g. n = 256 for a 2048-bit RSA key), then the maximum size of M is n-11 bytes. In order to encrypt a longer message, we would have to split it into sufficiently small blocks, and include some reassembly mechanism. Nobody really knows how to do that securely. We have good reasons to believe that RSA on a single message is safe, but subtle weaknesses can lurk in any split-and-reassembly system and we are not comfortable with that. It is already bad enough with symmetric ciphers, where the mathematical situation is simpler.
  • Even if we could handle the splitting-and-reassembly, there would be a size expansion. With a 2048-bit RSA key, an internal message chunk has size at most 245 bytes, but yields, when encrypted, a 256-byte sequence. This wastes our lifeforce, i.e. network bandwidth. Symmetric encryption incurs only a bounded overhead (well, SSL adds a slight overhead proportional to the data size, but it is much smaller than what would occur with a RSA-only protocol).
  • Compared to AES, RSA is slow as Hell.
  • We really like to have the option of using key agreement protocols like DH instead of RSA. In older times (before 2001), RSA was patented but not DH, so the US government was recommending DH. Nowadays, we want to be able to switch algorithms in case one becomes broken. In order to support key agreement protocols, we need some symmetric encryption, so we may just as well use it with RSA. It simplifies implementation and protocol analysis.

 

http://security.stackexchange.com/questions/20803/how-does-ssl-tls-work/20847#20847

Since the general concept of SSL has already been covered into some other questions (e.g. this one and that one), this time I will go for details. Details are important. This answer is going to be somewhat verbose.

History

SSL is a protocol with a long history and several versions. First prototypes came from Netscape, when they were developing the first versions of their flagship browser, Netscape Navigator (this browser killed off Mosaic in the early times of the Browser Wars, which are still raging, albeit with new competitors). Version 1 has never been made public so we do not know how it looked like. SSL version 2 is described in a draft which can be read there; it has a number of weaknesses, some of them rather serious, so it is deprecated and newer SSL/TLS implementations do not support it (while older deactivated by default). I will not speak of SSL version 2 any further, except as an occasional reference.

SSL version 3 (which I will call “SSLv3”) was an enhanced protocol which still works today and is widely supported. Although still a property of Netscape Communications (or whoever owns that nowadays), the protocol has been published as an “historical RFC” (RFC 6101). Meanwhile, the protocol has been standardized, with a new name in order to avoid legal issues; the new name isTLS.

Three versions of TLS have been produced to far, each with its dedicated RFC: TLS 1.0, TLS 1.1and TLS 1.2. They are internally very similar with each other, and with SSLv3, to the point that an implementation can easily support SSLv3 and all three TLS versions with at least 95% of the code being common. Still internally, all versions are designated by a version number with themajor.minor format; SSLv3 is then 3.0, while the TLS versions are, respectively, 3.1, 3.2 and 3.3. Thus, it is no wonder that TLS 1.0 is sometimes called SSL 3.1 (and it is not incorrect either). SSL 3.0 and TLS 1.0 differ by only some minute details. TLS 1.1 and 1.2 are not yet widely supported, although there is impetus for that, because of possible weaknesses (see below, for the “BEAST attack”). SSLv3 and TLS 1.0 are supported “everywhere” (even IE 6.0 knows them).

Context

SSL aims at providing a secure bidirectional tunnel for arbitrary data. Consider TCP, the well known protocol for sending data over the Internet. TCP works over the IP “packets” and provides a bidirectional tunnel for bytes; it works for every byte values and send them into two streams which can operate simultaneously. TCP handles the hard work of splitting the data into packets, acknowledging them, reassembling them back into their right order, while removing duplicates and reemitting lost packets. From the point of view of the application which uses TCP, there are just two streams, and the packets are invisible; in particular, the streams are not split into “messages” (it is up to the application to take its own encoding rules if it wishes to have messages, and that’s precisely what HTTP does).

TCP is reliable in the presence of “accidents”, i.e. transmission errors due to flaky hardware, network congestion, people with smartphones who walk out range of a given base station, and other non-malicious events. However, an ill-intentioned individual (the “attacker”) with some access to the transport medium could read all the transmitted data and/or alter it intentionally, and TCP does not protect against that. Hence SSL.

SSL assumes that it works over a TCP-like protocol, which provides a reliable stream; SSL does not implement reemission of lost packets and things like that. The attacker is supposed to be in power to disrupt communication completely in an unavoidable way (for instance, he can cut the cables) so SSL’s job is to:

  • detect alterations (the attacker must not be able to alter the data silently);
  • ensure data confidentiality (the attacker must not gain knowledge of the exchanged data).

SSL fulfills these goals to a large (but not absolute) extent.

Records

SSL is layered and the bottom layer is the record protocol. Whatever data is sent in a SSL tunnel is split into records. Over the wire (the underlying TCP socket or TCP-like medium), a record looks like this:

HH V1:V2 L1:L2 data

where:

  • HH is a single byte which indicates the type of data in the record. Four types are defined:change_cipher_spec (20), alert (21), handshake (22) and application_data (23).
  • V1:V2 is the protocol version, over two bytes. For all versions currently defined, V1 has value 0x03, while V2 has value 0x00 for SSLv3, 0x01 for TLS 1.0, 0x02 for TLS 1.1 and 0x03 for TLS 1.2.
  • L1:L2 is the length of data, in bytes (big-endian convention is used: the length is 256*L1+L2). The total length of data cannot exceed 18432 bytes, but in practice it cannot even reach that value.

So a record has a five-byte header, followed by at most 18 kB of data. The data is where symmetric encryption and integrity checks are applied. When a record is emitted, both sender and receiver are supposed to agree on which cryptographic algorithms are currently applied, and with which keys; this agreement is obtained through the handshake protocol, described in the next section. Compression, if any, is also applied at that point.

In full details, the building of a record works like this:

  • Initially, there are some bytes to transfer; these are application data or some other kind of bytes. This payload consists of at most 16384 bytes, but possibly less (a payload of length 0 is legal, but it turns out that Internet Explorer 6.0 does not like that at all).
  • The payload is then compressed with whatever compression algorithm is currently agreed upon. Compression is stateful, and thus may depend upon the contents of previous records. In practice, compression is either “null” (no compression at all) or “Deflate” (RFC 3749), the latter being currently courteously but firmly shown the exit door in the Web context, due to the recent CRIME attack. Compression aims at shortening data, but it must necessarily expand it slightly in some unfavourable situations (due to the pigeonhole principle). SSL allows for an expansion of at most 1024 bytes. Of course, null compression never expands (but never shortens either); Deflate will expand by at most 10 bytes, if the implementation is any good.
  • The compressed payload is then protected against alterations and encrypted. If the current encryption-and-integrity algorithms are “null”, then this step is a no-operation. Otherwise, aMAC is appended, then some padding (depending on the encryption algorithm), and the result is encrypted. These steps again induce some expansion, which the SSL standard limits to 1024 extra bytes (combined with the maximum expansion from the compression step, this brings us to the 18432 bytes, to which we must add the 5-byte header).

The MAC is, usually, HMAC with one of the usual hash functions (mostly MD5, SHA-1 or SHA-256)(with SSLv3, this is not the “true” HMAC but something very similar and, to the best of our knowledge, as secure as HMAC). Encryption will use either a block cipher in CBC mode, or theRC4 stream cipher. Note that, in theory, other kinds of modes or algorithms could be employed, for instance one of these nifty modes which combine encryption and integrity checks; there are even some RFC for that. In practice, though, deployed implementations do not know of these yet, so they do HMAC and CBC. Crucially, the MAC is first computed and appended to the data, and the result is encrypted. This is MAC-then-encrypt and it is actually not a very good idea. The MAC is computed over the concatenation of the (compressed) payload and a sequence number, so that an industrious attacker may not swap records.

Handshake

The handshake is a protocol which is played within the record protocol. Its goal is to establish the algorithms and keys which are to be used for the records. It consists of messages. Each handshake message begins with a four-byte header, one byte which describes the message type, then three bytes for the message length (big-endian convention). The successive handshake messages are then sent with records tagged with the “handshake” type (first byte of the header of each record has value 22).

Note the layers: the handshake messages, complete with four-byte header, are then sent as records, and each record also has its own header. Furthermore, several handshake messages can be sent within the same record, and a given handshake message can be split over several records. From the point of view of the module which builds the handshake messages, the “records” are just a stream on which bytes can be sent; it is oblivious to the actual split of that stream into records.

Full Handshake

Initially, client and server “agree upon” null encryption with no MAC and null compression. This means that the record they will first send will be sent as cleartext and unprotected.

First message of a handshake is a ClientHello. It is the message by which the client states its intention to do some SSL. Note that “client” is a symbolic role; it means “the party which speaks first”. It so happens that in the HTTPS context, which is HTTP-within-SSL-within-TCP, all three layers have a notion of “client” and “server”, and they all agree (the TCP client is also the SSL client and the HTTP client), but that’s kind of a coincidence.

The ClientHello message contains:

  • the maximum protocol version that the client wishes to support;
  • the “client random” (32 bytes, out of which 28 are suppose to be generated with a cryptographically strong number generator);
  • the “session ID” (in case the client wants to resume a session in an abbreviated handshake, see below);
  • the list of “cipher suites” that the client knows of, ordered by client preference;
  • the list of compression algorithms that the client knows of, ordered by client preference;
  • some optional extensions.

A cipher suite is a 16-bit symbolic identifier for a set of cryptographic algorithms. For instance, the TLS_RSA_WITH_AES_128_CBC_SHA cipher suite has value 0x002F, and means “records use HMAC/SHA-1 and AES encryption with a 128-bit key, and the key exchange is done by encrypting a random key with the server’s RSA public key”.

The server responds to the ClientHello with a ServerHello which contains:

  • the protocol version that the client and server will use;
  • the “server random” (32 bytes, with 28 random bytes);
  • the session ID for this connection;
  • the cipher suite that will be used;
  • the compression algorithm that will be used;
  • optionally, some extensions.

The full handshake looks like this:

  Client                                               Server

  ClientHello                  -------->
                                                  ServerHello
                                                 Certificate*
                                           ServerKeyExchange*
                                          CertificateRequest*
                               <--------      ServerHelloDone
  Certificate*
  ClientKeyExchange
  CertificateVerify*
  [ChangeCipherSpec]
  Finished                     -------->
                                           [ChangeCipherSpec]
                               <--------             Finished
  Application Data             <------->     Application Data

(This schema has been shamelessly copied from the RFC.)

We see the ClientHello and ServerHello. Then, the server sends a few other messages, which depend on the cipher suite and some other parameters:

  • Certificate: the server’s certificate, which contains its public key. More on that below. This message is almost always sent, except if the cipher suite mandates a handshake without a certificate.
  • ServerKeyExchange: some extra values for the key exchange, if what is in the certificate is not sufficient. In particular, the “DHE” cipher suites use an ephemeral Diffie-Hellman key exchange, which requires that message.
  • CertificateRequest: a message requesting that the client also identifies itself with a certificate of its own. This message contains the list of names of trust anchors (aka “root certificates”) that the server will use to validate the client certificate.
  • ServerHelloDone: a marker message (of length zero) which says that the server is finished, and the client should now talk.

The client must then respond with:

  • Certificate: the client certificate, if the server requested one. There are subtle variations between versions (with SSLv3, the client must omit this message if it does not have a certificate; with TLS 1.0+, in the same situation, it must send a Certificate message with an empty list of certificates).
  • ClientKeyExchange: the client part of the actual key exchange (e.g. some random value encrypted with the server RSA key).
  • CertificateVerify: a digital signature computed by the client over all previous handshake messages. This message is sent when the server requested a client certificate, and the client complied. This is how the client proves to the server that it really “owns” the public key which is encoded in the certificate it sent.

Then the client sends a ChangeCipherSpec message, which is not a handshake message: it has its own record type, so it will be sent in a record of its own. Its contents are purely symbolic (a single byte of value 1). This message marks the point at which the client switches to the newly negotiated cipher suite and keys. The subsequent records from the client will then be encrypted.

The Finished message is a cryptographic checksum computed over all previous handshake messages (from both the client and server). Since it is emitted after the ChangeCipherSpec, it is also covered by the integrity check and the encryption. When the server receives that message and verifies its contents, it obtains a proof that it has indeed talked to the same client all along. This message protects the handshake from alterations (the attacker cannot modify the handshake messages and still get the Finished message right).

The server finally responds with its own ChangeCipherSpec then Finished. At that point, the handshake is finished, and the client and server may exchange application data (in encrypted records tagged as such).

To remember: the client suggests but the server chooses. The cipher suite is in the hands of the server. Courteous servers are supposed to follow the preferences of the client (if possible), but they can do otherwise and some actually do (e.g. as part of protection against BEAST).

Abbreviated Handshake

In the full handshake, the server sends a “session ID” (i.e. a bunch of up to 32 bytes) to the client. Later on, the client can come back and send the same session ID as part of his ClientHello. This means that the client still remembers the cipher suite and keys from the previous handshake and would like to reuse these parameters. If the server also remembers the cipher suite and keys, then it copies that specific session ID in its ServerHello, and then follows the abbreviated handshake:

  Client                                                Server

  ClientHello                   -------->
                                                   ServerHello
                                            [ChangeCipherSpec]
                                <--------             Finished
  [ChangeCipherSpec]
  Finished                      -------->
  Application Data              <------->     Application Data

The abbreviated handshake is shorter: less messages, no asymmetric cryptography business, and, most importantly, reduced latency. Web browsers and servers do that a lot. A typical Web browser will open a SSL connection with a full handshake, then do abbreviated handshakes for all other connections to the same server: the other connections it opens in parallel, and also the subsequent connections to the same server. Indeed, typical Web servers will close connections after 15 seconds of inactivity, but they will remember sessions (the cipher suite and keys) for a lot longer (possibly for hours or even days).

Key Exchange

There are several key exchange algorithms which SSL can use. This is specified by the cipher suite; each key exchange algorithm works with some kinds of server public key. The most common key exchange algorithms are:

  • RSA: the server’s key is of type RSA. The client generates a random value (the “pre-master secret” of 48 bytes, out of which 46 are random) and encrypts it with the server’s public key. There is no ServerKeyExchange.
  • DHE_RSA: the server’s key is of type RSA, but used only for signature. The actual key exchange uses Diffie-Hellman. The server sends a ServerKeyExchange message containing the DH parameters (modulus, generator) and a newly-generated DH public key; moreover, the server signs this message. The client will respond with a ClientKeyExchange message which also contains a newly-generated DH public key. The DH yields the “pre-master secret”.
  • DHE_DSS: like DHE_RSA, but the server has a DSS key (“DSS” is also known as “DSA”). DSS is a signature-only algorithm.

Less commonly used key exchange algorithms include:

  • DH: the server’s key is of type Diffie-Hellman (we are talking of a certificate which contains a DH key). This used to be “popular” in an administrative way (US federal government mandated its use) when the RSA patent was still active (this was during the previous century). Despite the bureaucratic push, it was never as widely deployed as RSA.
  • DH_anon: like the DHE suites, but without the signature from the server. This is a certificate-less cipher suite. By construction, it is vulnerable to Man-in-the-Middle attacks, thus very rarely enabled at all.
  • PSK: pre-shared key cipher suites. The symmetric-only key exchange, building on a pre-established shared secret.
  • SRP: application of the SRP protocol which is a Password Authenticated Key Exchangeprotocol. Client and server authenticate each other with regards to a shared secret, which can be a low-entropy password (whereas PSK requires a high-entropy shared secret). Very nifty. Not widely supported yet.
  • An ephemeral RSA key: like DHE but with a newly-generated RSA key pair. Since generating RSA keys is expensive, this is not a popular option, and was specified only as part of “export” cipher suites which complied to the pre-2000 US export regulations on cryptography (i.e. RSA keys of at most 512 bits). Nobody does that nowadays.
  • Variants of the DH* algorithms with elliptic curves. Very fashionable. Should become common in the future.

Certificates and Authentication

Digital certificates are vessels for asymmetric keys. They are intended to solve key distribution. Namely, the client wants to use the server’s public key. The attacker will try to make the client use the attacker’s public key. So the client must have a way to make sure that it is using the right key.

SSL is supposed to use X.509. This is a standard for certificates. Each certificate is signed by aCertification Authority. The idea is that the client inherently knows the public keys of a handful of CA (these are the “trust anchors” or “root certificates”). With these keys, the client can verify the signature computed by a CA over a certificate which has been issued to the server. This process can be extended recursively: a CA can issue a certificate for another CA (i.e. sign the certificate structure which contains the other CA name and key). A chain of certificates beginning with a root CA and ending with the server’s certificate, with intermediate CA certificates in between, each certificate being signed relatively to the public key which is encoded in the previous certificate, is called, unimaginatively, a certificate chain.

So the client is supposed to do the following:

  • Get a certificate chain ending with the server’s certificate. The Certificate message from the server is supposed to contain, precisely, such a chain.
  • Validate the chain, i.e. verifying all the signatures and names and the various X.509 bits. Also, the client should check revocation status of all the certificates in the chain, which is complex and heavy (Web browsers now do it, more or less, but it is a recent development).
  • Verify that the intended server name is indeed written in the server’s certificate. Because the client does not only want to use a validated public key, it also wants to use the public key of a specific server. See RFC 2818 for details on how this is done in a HTTPS context.

The certification model with X.509 certificates has often been criticized, not really on technical grounds, but rather for politico-economic reasons. It concentrates validation power into the hands of a few players, who are not necessarily well-intentioned, or at least not always competent. Now and again, proposals for other systems are published (e.g. Convergence or DNSSEC) but none has gained wide acceptance (yet).

For certificate-based client authentication, it is entirely up to the server to decide what to do with a client certificate (and also what to do with a client who declined to send a certificate). In the Windows/IIS/Active Directory world, a client certificate should contain an account name as a “User Principal Name” (encoded in a Subject Alt Name extension of the certificate); the server looks it up in its Active Directory server.

Handshake Again

Since a handshake is just some messages which are sent as records with the current encryption/compression conventions, nothing theoretically prevents a SSL client and server from doing a second handshake within an established SSL connection. And, indeed, it is supported and it happens in practice.

At any time, the client or the server can initiate a new handshake (the server can send a HelloRequest message to trigger it; the client just sends a ClientHello). A typical situation is the following:

  • An HTTPS server is configured to listen to SSL requests.
  • A client connects and a handshake is performed.
  • Once the handshake is done, the client sends its “applicative data”, which consists of a HTTP request. At that point (and at that point only), the server learns the target path. Up to that point, the URL which the client wishes to reach was unknown to the server (the server mighthave been made aware of the target server name through a Server Name Indication SSL extension, but this does not include the path).
  • Upon seeing the path, the server may learn that this is for a part of its data which is supposed to be accessed only by clients authenticated with certificates. But the server did not ask for a client certificate in the handshake (in particular because not-so-old Web browsers displayed freakish popups when asked for a certificate, in particular if they did not have one, so a server would refrain from asking a certificate if it did not have good reason to believe that the client has one and knows how to use it).
  • Therefore, the server triggers a new handshake, this time requesting a certificate.

There is an interesting weakness in the situation I just described; see RFC 5746 for a workaround. In a conceptual way, SSL transfers security characteristics only in the “forward” way. When doing a new handshake, whatever could be known about the client before the new handshake is still valid after (e.g. if the client had sent a good username+password within the tunnel) but not the other way round. In the situation above, the first HTTP request which was received before the new handshake is not covered by the certificate-based authentication of the second handshake, and it would have been chosen by he attacker ! Unfortunately, some Web servers just assumed that the client authentication from the second handshake extended to what was sent before that second handshake, and it allowed some nasty tricks from the attacker. RFC 5746 attempts at fixing that.

Alerts

Alert messages are just warning and error messages. They are rather uninteresting except when they could be subverted from some attacks (see later on).

There is an important alert message, called close_notify: it is a message which the client or the server sends when it wishes to close the connection. Upon receiving this message, the server or client must also respond with a close_notify and then consider the tunnel to be closed (but thesession is still valid, and can be reused in an ulterior abbreviated handshake). The interesting part is that these alert messages are, like all other records, protected by the encryption and MAC. Thus, the connection closure is covered by the cryptographic umbrella.

This is important in the context of (old) HTTP, where some data can be sent by the server without an explicit “content-length”: the data extends until the end of the transport stream. Old HTTP with SSLv2 (which did not have the close_notify) allowed an attacker to force a connection close (at the TCP level) which the client would have taken for a normal close; thus, the attacker could truncate the data without being caught. This is one of the problems with SSLv2 (arguably, the worst) and SSLv3 fixes it. Note that “modern” HTTP uses “Content-Length” headers and/or chunked encoding, which is not vulnerable to such truncation, even if the SSL layer allowed it. Still, it is nice to know that SSL offers protection on closure events.

Attacks

There is a limit on Stack Exchange answer length, so the description of some attacks on SSL will be in another answer (besides, I have some pancakes to cook). Stay tuned.

TLS handshake

http://chimera.labs.oreilly.com/books/1230000000545/ch04.html

The SSL protocol was originally developed at Netscape to enable ecommerce transaction security on the Web, which required encryption to protect customers’ personal data, as well as authentication and integrity guarantees to ensure a safe transaction. To achieve this, the SSL protocol was implemented at the application layer, directly on top of TCP (Figure 4-1), enabling protocols above it (HTTP, email, instant messaging, and many others) to operate unchanged while providing communication security when communicating across the network.

When SSL is used correctly, a third-party observer can only infer the connection endpoints, type of encryption, as well as the frequency and an approximate amount of data sent, but cannot read or modify any of the actual data.

Transport Layer Security (TLS)
Figure 4-1. Transport Layer Security (TLS)

When the SSL protocol was standardized by the IETF, it was renamed to Transport Layer Security (TLS). Many use the TLS and SSL names interchangeably, but technically, they are different, since each describes a different version of the protocol.

SSL 2.0 was the first publicly released version of the protocol, but it was quickly replaced by SSL 3.0 due to a number of discovered security flaws. Because the SSL protocol was proprietary to Netscape, the IETF formed an effort to standardize the protocol, resulting in RFC 2246, which became known as TLS 1.0 and is effectively an upgrade to SSL 3.0:

The differences between this protocol and SSL 3.0 are not dramatic, but they are significant to preclude interoperability between TLS 1.0 and SSL 3.0.

The TLS Protocol RFC 2246

Since the publication of TLS 1.0 in January 1999, two new versions have been produced by the IETF working group to address found security flaws, as well as to extend the capabilities of the protocol: TLS 1.1 in April 2006 and TLS 1.2 in August 2008. Internally the SSL 3.0 implementation, as well as all subsequent TLS versions, are very similar, and many clients continue to support SSL 3.0 and TLS 1.0 to this day, although there are very good reasons to upgrade to newer versions to protect users from known attacks!

TLS was designed to operate on top of a reliable transport protocol such as TCP. However, it has also been adapted to run over datagram protocols such as UDP. The Datagram Transport Layer Security (DTLS) protocol, defined in RFC 6347, is based on the TLS protocol and is able to provide similar security guarantees while preserving the datagram delivery model.

Encryption, Authentication, and Integrity

The TLS protocol is designed to provide three essential services to all applications running above it: encryption, authentication, and data integrity. Technically, you are not required to use all three in every situation. You may decide to accept a certificate without validating its authenticity, but you should be well aware of the security risks and implications of doing so. In practice, a secure web application will leverage all three services.

Encryption
A mechanism to obfuscate what is sent from one computer to another.
Authentication
A mechanism to verify the validity of provided identification material.
Integrity
A mechanism to detect message tampering and forgery.

In order to establish a cryptographically secure data channel, the connection peers must agree on which ciphersuites will be used and the keys used to encrypt the data. The TLS protocol specifies a well-defined handshake sequence to perform this exchange, which we will examine in detail in “TLS Handshake”. The ingenious part of this handshake, and the reason TLS works in practice, is its use of public key cryptography (also known as asymmetric key cryptography), which allows the peers to negotiate a shared secret key without having to establish any prior knowledge of each other, and to do so over an unencrypted channel.

As part of the TLS handshake, the protocol also allows both connection peers to authenticate their identity. When used in the browser, this authentication mechanism allows the client to verify that the server is who it claims to be (e.g., your bank) and not someone simply pretending to be the destination by spoofing its name or IP address. This verification is based on the established chain of trust; see “Chain of Trust and Certificate Authorities”). In addition, the server can also optionally verify the identity of the client—e.g., a company proxy server can authenticate all employees, each of whom could have his own unique certificate signed by the company.

Finally, with encryption and authentication in place, the TLS protocol also provides its own message framing mechanism and signs each message with a message authentication code (MAC). The MAC algorithm is a one-way cryptographic hash function (effectively a checksum), the keys to which are negotiated by both connection peers. Whenever a TLS record is sent, a MAC value is generated and appended for that message, and the receiver is then able to compute and verify the sent MAC value to ensure message integrity and authenticity.

Combined, all three mechanisms serve as a foundation for secure communication on the Web. All modern web browsers provide support for a variety of ciphersuites, are able to authenticate both the client and server, and transparently perform message integrity checks for every record.

TLS Handshake

Before the client and the server can begin exchanging application data over TLS, the encrypted tunnel must be negotiated: the client and the server must agree on the version of the TLS protocol, choose the ciphersuite, and verify certificates if necessary. Unfortunately, each of these steps requires new packet roundtrips (Figure 4-2) between the client and the server, which adds startup latency to all TLS connections.

TLS handshake protocol
Figure 4-2. TLS handshake protocol

Figure 4-2 assumes the same 28 millisecond one-way “light in fiber” delay between New York and London as used in previous TCP connection establishment examples; see Table 1-1.

0 ms

TLS runs over a reliable transport (TCP), which means that we must first complete the TCP three-way handshake, which takes one full roundtrip.

56 ms

With the TCP connection in place, the client sends a number of specifications in plain text, such as the version of the TLS protocol it is running, the list of supported ciphersuites, and other TLS options it may want to use.

84 ms

The server picks the TLS protocol version for further communication, decides on a ciphersuite from the list provided by the client, attaches its certificate, and sends the response back to the client. Optionally, the server can also send a request for the client’s certificate and parameters for other TLS extensions.

112 ms

Assuming both sides are able to negotiate a common version and cipher, and the client is happy with the certificate provided by the server, the client initiates either the RSA or the Diffie-Hellman key exchange, which is used to establish the symmetric key for the ensuing session.

140 ms

The server processes the key exchange parameters sent by the client, checks message integrity by verifying the MAC, and returns an encrypted “Finished” message back to the client.

168 ms

The client decrypts the message with the negotiated symmetric key, verifies the MAC, and if all is well, then the tunnel is established and application data can now be sent.

Negotiating a secure TLS tunnel is a complicated process, and there are many ways to get it wrong. The good news is all the work just shown will be done for us by the server and the browser, and all we need to do is provide and configure the certificates.

Having said that, while our web applications do not have to drive the preceding exchange, it is nonetheless important to realize that every TLS connection will require up to two extra roundtrips on top of the TCP handshake—that’s a long time to wait before any application data can be exchanged! If not managed carefully, delivering application data over TLS can add hundreds, if not thousands of milliseconds of network latency.

RSA, Diffie-Hellman and Forward Secrecy

Due to a variety of historical and commercial reasons the RSA handshake has been the dominant key exchange mechanism in most TLS deployments: the client generates a symmetric key, encrypts it with the server’s public key, and sends it to the server to use as the symmetric key for the established session. In turn, the server uses its private key to decrypt the sent symmetric key and the key-exchange is complete. From this point forward the client and server use the negotiated symmetric key to encrypt their session.

The RSA handshake works, but has a critical weakness: the same public-private key pair is used both to authenticate the server and to encrypt the symmetric session key sent to the server. As a result, if an attacker gains access to the server’s private key and listens in on the exchange, then they can decrypt the the entire session. Worse, even if an attacker does not currently have access to the private key, they can still record the encrypted session and decrypt it at a later time once they obtain the private key.

By contrast, the Diffie-Hellman key exchange allows the client and server to negotiate a shared secret without explicitly communicating it in the handshake: the server’s private key is used to sign and verify the handshake, but the established symmetric key never leaves the client or server and cannot be intercepted by a passive attacker even if they have access to the private key.

For the curious, the Wikipedia article on Diffie-Hellman key exchange is a great place to learn about the algorithm and its properties.

Best of all, Diffie-Hellman key exchange can be used to reduce the risk of compromise of past communication sessions: we can generate a new “ephemeral” symmetric key as part of each and every key exchange and discard the previous keys. As a result, because the ephemeral keys are never communicated and are actively renegotiated for each the new session, the worst-case scenario is that an attacker could compromise the client or server and access the session keys of the current and future sessions. However, knowing the private key, or the ephemeral key, for those session does not help attacker decrypt any of the previous sessions!

The combination of Diffie-Hellman and the use of ephemeral session keys are what enables “Forward Secrecy”: even if an attacker gains access to the server’s private key they are not able to passively listen in on the active session, nor can they decrypt previously recorded sessions.

Despite the historical dominance of the RSA handshake, it is now being actively phased out to address the weaknesses we saw above: all the popular browsers prefer ciphers that enable forward secrecy (i.e. Diffie-Hellman key exchange), and may only enable certain protocol optimizations when forward secrecy is available. Long story short, consult your server documentation on how to enable forward secrecy.

Application Layer Protocol Negotiation (ALPN)

Two network peers may want to use a custom application protocol to communicate with each other. One way to resolve this is to determine the protocol upfront, assign a well-known port to it (e.g., port 80 for HTTP, port 443 for TLS), and configure all clients and servers to use it. However, in practice, this is a slow and impractical process: each port assignment must be approved and, worse, firewalls and other intermediaries often permit traffic only on ports 80 and 443.

As a result, to enable easy deployment of custom protocols, we must reuse ports 80 or 443 and use an additional mechanism to negotiate the application protocol. Port 80 is reserved for HTTP, and the HTTP specification provides a special Upgrade flow for this very purpose. However, the use of Upgrade can add an extra network roundtrip of latency, and in practice is often unreliable in the presence of many intermediaries; see“Proxies, Intermediaries, TLS, and New Protocols on the Web”.

For a hands-on example of HTTP Upgrade flow, flip ahead to “Upgrading to HTTP/2”.

The solution is, you guessed it, to use port 443, which is reserved for secure HTTPS sessions (running over TLS). The use of an end-to-end encrypted tunnel obfuscates the data from intermediate proxies and enables a quick and reliable way to deploy new and arbitrary application protocols. However, while use of TLS addresses reliability, we still need a way to negotiate the protocol!

An HTTPS session could, of course, reuse the HTTP Upgrade mechanism to perform the require negotiation, but this would result in another full roundtrip of latency. What if we could negotiate the protocol as part of the TLS handshake itself?

As the name implies, Application Layer Protocol Negotiation (ALPN) is a TLS extension that introduces support for application protocol negotiation into the TLS handshake (Figure 4-2), thereby eliminating the need for an extra roundtrip required by the HTTP Upgrade workflow. Specifically, the process is as follows:

  • The client appends a new ProtocolNameList field, containing the list of supported application protocols, into the ClientHello message.
  • The server inspects the ProtocolNameList field and returns a ProtocolName field indicating the selected protocol as part of theServerHello message.

The server may respond with only a single protocol name, and if it does not support any that the client requests, then it may choose to abort the connection. As a result, once the TLS handshake is complete, both the secure tunnel is established, and the client and server are in agreement as to which application protocol will be used, they can begin communicating immediately.

ALPN eliminates the need for the HTTP Upgrade exchange, saving an extra roundtrip of latency. However, note that the TLS handshake itself still must be performed; hence ALPN negotiation is not any faster than HTTP Upgrade over an unencrypted channel. Instead, it ensures that application protocol negotiation over TLS is not any slower.

Server Name Indication (SNI)

An encrypted TLS tunnel can be established between any two TCP peers: the client only needs to know the IP address of the other peer to make the connection and perform the TLS handshake. However, what if the server wants to host multiple independent sites, each with its own TLS certificate, on the same IP address—how does that work? Trick question; it doesn’t.

To address the preceding problem, the Server Name Indication (SNI) extension was introduced to the TLS protocol, which allows the client to indicate the hostname the client is attempting to connect to at the start of the handshake. As a result, a web server can inspect the SNI hostname, select the appropriate certificate, and continue the handshake.

TLS Session Resumption

The extra latency and computational costs of the full TLS handshake impose a serious performance penalty on all applications that require secure communication. To help mitigate some of the costs, TLS provides an ability to resume or share the same negotiated secret key data between multiple connections.

Session Identifiers

The first Session Identifiers (RFC 5246) resumption mechanism was introduced in SSL 2.0, which allowed the server to create and send a 32-byte session identifier as part of its “ServerHello” message during the full TLS negotiation we saw earlier.

Internally, the server could then maintain a cache of session IDs and the negotiated session parameters for each peer. In turn, the client could then also store the session ID information and include the ID in the “ClientHello” message for a subsequent session, which serves as an indication to the server that the client still remembers the negotiated cipher suite and keys from previous handshake and is able to reuse them. Assuming both the client and the server are able to find the shared session ID parameters in their respective caches, then an abbreviated handshake (Figure 4-3) can take place. Otherwise, a full new session negotiation is required, which will generate a new session ID.

Abbreviated TLS handshake protocol
Figure 4-3. Abbreviated TLS handshake protocol

Leveraging session identifiers allows us to remove a full roundtrip, as well as the overhead of public key cryptography, which is used to negotiate the shared secret key. This allows a secure connection to be established quickly and with no loss of security, since we are reusing the previously negotiated session data.

In practice, most web applications attempt to establish multiple connections to the same host to fetch resources in parallel, which makes session resumption a must-have optimization to reduce latency and computational costs for both sides.

Most modern browsers intentionally wait for the first TLS connection to complete before opening new connections to the same server: subsequent TLS connections can reuse the SSL session parameters to avoid the costly handshake.

However, one of the practical limitations of the Session Identifiers mechanism is the requirement for the server to create and maintain a session cache for every client. This results in several problems on the server, which may see tens of thousands or even millions of unique connections every day: consumed memory for every open TLS connection, a requirement for session ID cache and eviction policies, and nontrivial deploymentchallenges for popular sites with many servers, which should, ideally, use a shared TLS session cache for best performance.

None of the preceding problems are impossible to solve, and many high-traffic sites are using session identifiers successfully today. But for any multiserver deployment, session identifiers will require some careful thinking and systems architecture to ensure a well operating session cache.

Session Tickets

To address this concern for server-side deployment of TLS session caches, the “Session Ticket” (RFC 5077) replacement mechanism was introduced, which removes the requirement for the server to keep per-client session state. Instead, if the client indicated that it supports Session Tickets, in the last exchange of the full TLS handshake, the server can include a New Session Ticket record, which includes all of the session data encrypted with a secret key known only by the server.

This session ticket is then stored by the client and can be included in the SessionTicket extension within the ClientHello message of a subsequent session. Thus, all session data is stored only on the client, but the ticket is still safe because it is encrypted with a key known only by the server.

The session identifiers and session ticket mechanisms are respectively commonly referred to as session caching and stateless resumptionmechanisms. The main improvement of stateless resumption is the removal of the server-side session cache, which simplifies deployment by requiring that the client provide the session ticket on every new connection to the server—that is, until the ticket has expired.

In practice, deploying session tickets across a set of load-balanced servers also requires some careful thinking and systems architecture: all servers must be initialized with the same session key, and an additional mechanism may be needed to periodically rotate the shared key across all servers.

Chain of Trust and Certificate Authorities

Authentication is an integral part of establishing every TLS connection. After all, it is possible to carry out a conversation over an encrypted tunnel with any peer, including an attacker, and unless we can be sure that the computer we are speaking to is the one we trust, then all the encryption work could be for nothing. To understand how we can verify the peer’s identity, let’s examine a simple authentication workflow between Alice and Bob:

  • Both Alice and Bob generate their own public and private keys.
  • Both Alice and Bob hide their respective private keys.
  • Alice shares her public key with Bob, and Bob shares his with Alice.
  • Alice generates a new message for Bob and signs it with her private key.
  • Bob uses Alice’s public key to verify the provided message signature.

Trust is a key component of the preceding exchange. Specifically, public key encryption allows us to use the public key of the sender to verify that the message was signed with the right private key, but the decision to approve the sender is still one that is based on trust. In the exchange just shown, Alice and Bob could have exchanged their public keys when they met in person, and because they know each other well, they are certain that their exchange was not compromised by an impostor—perhaps they even verified their identities through another, secret (physical) handshake they had established earlier!

Next, Alice receives a message from Charlie, whom she has never met, but who claims to be a friend of Bob’s. In fact, to prove that he is friends with Bob, Charlie asked Bob to sign his own public key with Bob’s private key and attached this signature with his message (Figure 4-4). In this case, Alice first checks Bob’s signature of Charlie’s key. She knows Bob’s public key and is thus able to verify that Bob did indeed sign Charlie’s key. Because she trusts Bob’s decision to verify Charlie, she accepts the message and performs a similar integrity check on Charlie’s message to ensure that it is, indeed, from Charlie.

Chain of trust for Alice, Bob, and Charlie
Figure 4-4. Chain of trust for Alice, Bob, and Charlie

What we have just done is established a chain of trust: Alice trusts Bob, Bob trusts Charlie, and by transitive trust, Alice decides to trust Charlie. As long as nobody in the chain gets compromised, this allows us to build and grow the list of trusted parties.

Authentication on the Web and in your browser follows the exact same process as shown. Which means that at this point you should be asking: whom does your browser trust, and whom do you trust when you use the browser? There are at least three answers to this question:

Manually specified certificates
Every browser and operating system provides a mechanism for you to manually import any certificate you trust. How you obtain the certificate and verify its integrity is completely up to you.
Certificate authorities
A certificate authority (CA) is a trusted third party that is trusted by both the subject (owner) of the certificate and the party relying upon the certificate.
The browser and the operating system
Every operating system and most browsers ship with a list of well-known certificate authorities. Thus, you also trust the vendors of this software to provide and maintain a list of trusted parties.

In practice, it would be impractical to store and manually verify each and every key for every website (although you can, if you are so inclined). Hence, the most common solution is to use certificate authorities (CAs) to do this job for us (Figure 4-5): the browser specifies which CAs to trust (root CAs), and the burden is then on the CAs to verify each site they sign, and to audit and verify that these certificates are not misused or compromised. If the security of any site with the CA’s certificate is breached, then it is also the responsibility of that CA to revoke the compromised certificate.

CA signing of digital certificates
Figure 4-5. CA signing of digital certificates

Every browser allows you to inspect the chain of trust of your secure connection (Figure 4-6), usually accessible by clicking on the lock icon beside the URL.

Certificate chain of trust for igvita.com (Google Chrome, v25)
Figure 4-6. Certificate chain of trust for igvita.com (Google Chrome, v25)
  • igvita.com certificate is signed by StartCom Class 1 Primary Intermediate Server.
  • StartCom Class 1 Primary Intermediate Server certificate is signed by the StartCom Certification Authority.
  • StartCom Certification Authority is a recognized root certificate authority.

The “trust anchor” for the entire chain is the root certificate authority, which in the case just shown, is the StartCom Certification Authority. Every browser ships with a pre-initialized list of trusted certificate authorities (“roots”), and in this case, the browser trusts and is able to verify the StartCom root certificate. Hence, through a transitive chain of trust in the browser, the browser vendor, and the StartCom certificate authority, we extend the trust to our destination site.

Every operating system vendor and every browser provide a public listing of all the certificate authorities they trust by default. Use your favorite search engine to find and investigate these lists.

In practice, there are hundreds of well-known and trusted certificate authorities, which is also a common complaint against the system. The large number of CAs creates a potentially large attack surface area against the chain of trust in your browser.

Certificate Revocation

Occasionally the issuer of a certificate will need to revoke or invalidate the certificate due to a number of possible reasons: the private key of the certificate has been compromised, the certificate authority itself has been compromised, or due to a variety of more benign reasons such as a superseding certificate, change in affiliation, and so on. To address this, the certificates themselves contain instructions (Figure 4-7) on how to check if they have been revoked. Hence, to ensure that the chain of trust is not compromised, each peer can check the status of each certificate by following the embedded instructions, along with the signatures, as it walks up the certificate chain.

CRL and OCSP instructions for igvita.com (Google Chrome, v25)
Figure 4-7. CRL and OCSP instructions for igvita.com (Google Chrome, v25)

Certificate Revocation List (CRL)

Certificate Revocation List (CRL) is defined by RFC 5280 and specifies a simple mechanism to check the status of every certificate: each certificate authority maintains and periodically publishes a list of revoked certificate serial numbers. Anyone attempting to verify a certificate is then able to download the revocation list and check the presence of the serial number within it—if it is present, then it has been revoked.

The CRL file itself can be published periodically or on every update and can be delivered via HTTP, or any other file transfer protocol. The list is also signed by the CA, and is usually allowed to be cached for a specified interval. In practice, this workflow works quite well, but there are instances where CRL mechanism may be insufficient:

  • The growing number of revocations means that the CRL list will only get longer, and each client must retrieve the entire list of serial numbers.
  • There is no mechanism for instant notification of certificate revocation—if the CRL was cached by the client before the certificate was revoked, then the CRL will deem the revoked certificate valid until the cache expires.

Online Certificate Status Protocol (OCSP)

To address some of the limitations of the CRL mechanism, the Online Certificate Status Protocol (OCSP) was introduced by RFC 2560, which provides a mechanism to perform a real-time check for status of the certificate. Unlike the CRL, which contains all the revoked serial numbers, OCSP allows the verifier to query the certificate database directly for just the serial number in question while validating the certificate chain.

As a result, the OCSP mechanism should consume much less bandwidth and is able to provide real-time validation. However, no mechanism is perfect! The requirement to perform real-time OCSP queries creates several problems of its own:

  • The CA must be able to handle the load of the real-time queries.
  • The CA must ensure that the service is up and globally available at all times.
  • The client must block on OCSP requests before proceeding with the navigation.
  • Real-time OCSP requests may impair the client’s privacy because the CA knows which sites the client is visiting.

In practice, CRL and OCSP mechanisms are complementary, and most certificates will provide instructions and endpoints for both.

The more important part is the client support and behavior: some browsers distribute their own CRL lists, others fetch and cache the CRL files from the CAs. Similarly, some browsers will perform the real-time OCSP check but will differ in their behavior if the OCSP request fails. If you are curious, check your browser and OS certificate revocation settings!

TLS Record Protocol

Not unlike the IP or TCP layers below it, all data exchanged within a TLS session is also framed using a well-defined protocol (Figure 4-8). The TLS Record protocol is responsible for identifying different types of messages (handshake, alert, or data via the “Content Type” field), as well as securing and verifying the integrity of each message.

TLS record structure
Figure 4-8. TLS record structure

A typical workflow for delivering application data is as follows:

  • Record protocol receives application data.
  • Received data is divided into blocks: maximum of 214 bytes, or 16 KB per record.
  • Application data is optionally compressed.
  • Message authentication code (MAC) or HMAC is added.
  • Data is encrypted using the negotiated cipher.

Once these steps are complete, the encrypted data is passed down to the TCP layer for transport. On the receiving end, the same workflow, but in reverse, is applied by the peer: decrypt data using negotiated cipher, verify MAC, extract and deliver the data to the application above it.

Once again, the good news is all the work just shown is handled by the TLS layer itself and is completely transparent to most applications. However, the record protocol does introduce a few important implications that you should be aware of:

  • Maximum TLS record size is 16 KB
  • Each record contains a 5-byte header, a MAC (up to 20 bytes for SSLv3, TLS 1.0, TLS 1.1, and up to 32 bytes for TLS 1.2), and padding if a block cipher is used.
  • To decrypt and verify the record, the entire record must be available.

Picking the right record size for your application, if you have the ability to do so, can be an important optimization. Small records incur a larger overhead due to record framing, whereas large records will have to be delivered and reassembled by the TCP layer before they can be processed by the TLS layer and delivered to your application.

Optimizing for TLS

Due to the layered architecture of the network protocols, running an application over TLS is no different from communicating directly over TCP. As such, there are no, or at most minimal, application modifications that you will need to make to deliver it over TLS. That is, assuming you have already applied the “Optimizing for TCP” best practices.

However, what you should investigate are the operational pieces of your TLS deployments: how and where you deploy your servers, size of TLS records and memory buffers, certificate sizes, support for abbreviated handshakes, and so on. Getting these parameters right on your servers can make an enormous positive difference in the user experience, as well as in your operational costs.

Computational Costs

Establishing and maintaining an encrypted channel introduces additional computational costs for both peers. Specifically, first there is the asymmetric (public key) encryption used during the TLS handshake (explained in “TLS Handshake”). Then, once a shared secret is established, it is used as a symmetric key to encrypt all TLS records.

As we noted earlier, public key cryptography is more computationally expensive when compared with symmetric key cryptography, and in the early days of the Web often required additional hardware to perform “SSL offloading.” The good news is this is no longer the case. Modern hardware has made great improvements to help minimize these costs, and what once required additional hardware can now be done directly on the CPU. Large organizations such as Facebook, Twitter, and Google, which offer TLS to hundreds of millions of users, perform all the necessary TLS negotiation and computation in software and on commodity hardware.

In January this year (2010), Gmail switched to using HTTPS for everything by default. Previously it had been introduced as an option, but now all of our users use HTTPS to secure their email between their browsers and Google, all the time. In order to do this we had to deploy no additional machines and no special hardware. On our production frontend machines, SSL/TLS accounts for less than 1% of the CPU load, less than 10 KB of memory per connection and less than 2% of network overhead. Many people believe that SSL/TLS takes a lot of CPU time and we hope the preceding numbers (public for the first time) will help to dispel that.

If you stop reading now you only need to remember one thing: SSL/TLS is not computationally expensive anymore.

Adam Langley (Google)

We have deployed TLS at a large scale using both hardware and software load balancers. We have found that modern software-based TLS implementations running on commodity CPUs are fast enough to handle heavy HTTPS traffic load without needing to resort to dedicated cryptographic hardware. We serve all of our HTTPS traffic using software running on commodity hardware.

Doug Beaver (Facebook)

Elliptic Curve Diffie-Hellman (ECDHE) is only a little more expensive than RSA for an equivalent security level… In practical deployment, we found that enabling and prioritizing ECDHE cipher suites actually caused negligible increase in CPU usage. HTTP keepalives and session resumption mean that most requests do not require a full handshake, so handshake operations do not dominate our CPU usage. We find 75% of Twitter’s client requests are sent over connections established using ECDHE. The remaining 25% consists mostly of older clients that don’t yet support the ECDHE cipher suites.

Jacob Hoffman-Andrews (Twitter)

Previous experiences notwithstanding, techniques such as “TLS Session Resumption” are still important optimizations, which will help you decrease the computational costs and latency of public key cryptography performed during the TLS handshake. There is no reason to spend CPU cycles on work that you don’t need to do.

Speaking of optimizing CPU cycles, make sure to upgrade your SSL libraries to the latest release, and build your web server or proxy against them! For example, recent versions of OpenSSL have made significant performance improvements, and chances are your system default OpenSSL libraries are outdated.

Early Termination

The connection setup latency imposed on every TLS connection, new or resumed, is an important area of optimization. First, recall that every TCP connection begins with a three-way handshake (explained in “Three-Way Handshake”), which takes a full roundtrip for the SYN/SYN-ACK packets. Following that, the TLS handshake (explained in “TLS Handshake”) requires up to two additional roundtrips for the full process, or one roundtrip if an abbreviated handshake (explained in “Optimizing TLS handshake with Session Resumption and False Start”) can be used.

In the worst case, before any application data can be exchanged, the TCP and TLS connection setup process will take three roundtrips! Following our earlier example of a client in New York and the server in London, with a roundtrip time of 56 milliseconds (Table 1-1), this translates to 168 milliseconds of latency for a full TCP and TLS setup, and 112 milliseconds for a TLS session that is resumed. Even worse, the higher the latency between the peers, the worse the penalty, and 56 milliseconds is definitely an optimistic number.

Because all TLS sessions run over TCP, all the advice for “Optimizing for TCP” applies here as well. If TCP connection reuse was an important consideration for unencrypted traffic, then it is a critical optimization for all applications running over TLS—if you can avoid doing the handshake, do so. However, if you have to perform the handshake, then you may want to investigate using the “early termination” technique.

As we discussed in Chapter 1, we cannot expect any dramatic improvements in latency in the future, as our packets are already traveling within a small constant factor of the speed of light. However, while we may not be able to make our packets travel faster, we can make them travel a shorter distance. Early termination is a simple technique of placing your servers closer to the user (Figure 4-9) to minimize the latency cost of each roundtrip between the client and the server.

Early termination of client connections
Figure 4-9. Early termination of client connections

The simplest way to accomplish this is to replicate or cache your data and services on servers around the world instead of forcing every user to traverse across oceans and continental links to the origin servers. Of course, this is precisely the service that many content delivery networks (CDNs) are set up to offer. However, the use case for geo-distributed servers does not stop at optimized delivery of static assets.

A nearby server can also terminate the TLS session, which means that the TCP and TLS handshake roundtrips are much quicker and the total connection setup latency is greatly reduced. In turn, the same nearby server can then establish a pool of long-lived, secure connections to the origin servers and proxy all incoming requests and responses to and from the origin servers.

In a nutshell, move the server closer to the client to accelerate TCP and TLS handshakes! Most CDN providers offer this service, and if you are adventurous, you can also deploy your own infrastructure with minimal costs: spin up cloud servers in a few data centers around the globe, configure a proxy server on each to forward requests to your origin, add geographic DNS load balancing, and you are in business.

Session Caching and Stateless Resumption

Terminating the connection closer to the user is an optimization that will help decrease latency for your users in all cases, but once again, no bit is faster than a bit not sent—send fewer bits. Enabling TLS session caching and stateless resumption will allow you to eliminate an entire roundtrip and reduce computational overhead for repeat visitors.

Session identifiers, on which TLS session caching relies, were introduced in SSL 2.0 and have wide support among most clients and servers. However, if you are configuring TLS on your server, do not assume that session support will be on by default. In fact, it is more common to have it off on most servers by default—but you know better! You should double-check and verify your configuration:

  • Servers with multiple processes or workers should use a shared session cache.
  • Size of the shared session cache should be tuned to your levels of traffic.
  • A session timeout period should be provided.
  • In a multi-server setup, routing the same client IP, or the same TLS session ID, to the same server is one way to provide good session cache utilization.
  • Where “sticky” load balancing is not an option, a shared cache should be used between different servers to provide good session cache utilization, and a secure mechanism needs to be established to share and update the secret keys to decrypt the provided session tickets.
  • Check and monitor your TLS session cache statistics for best performance.

In practice, and for best results, you should configure both session caching and session ticket mechanisms. These mechanisms are not exclusive and can work together to provide best performance coverage both for new and older clients.

TLS False Start

Session resumption provides two important benefits: it eliminates an extra handshake roundtrip for returning visitors and reduces the computational cost of the handshake by allowing reuse of previously negotiated session parameters. However, it does not help in cases where the visitor is communicating with the server for the first time, or if the previous session has expired.

To get the best of both worlds—a one roundtrip handshake for new and repeat visitors, and computational savings for repeat visitors—we can use TLS False Start, which is an optional protocol extension that allows the sender to send application data (Figure 4-10) when the handshake is only partially complete.

TLS handshake with False Start
Figure 4-10. TLS handshake with False Start

False Start does not modify the TLS handshake protocol, rather it only affects the protocol timing of when the application data can be sent. Intuitively, once the client has sent the ClientKeyExchange record, it already knows the encryption key and can begin transmitting application data—the rest of the handshake is spent confirming that nobody has tampered with the handshake records, and can be done in parallel. As a result, False Start allows us to keep the TLS handshake at one roundtrip regardless of whether we are performing a full or abbreviated handshake.

TLS Record Size

All application data delivered via TLS is transported within a record protocol (Figure 4-8). The maximum size of each record is 16 KB, and depending on the chosen cipher, each record will add anywhere from 20 to 40 bytes of overhead for the header, MAC, and optional padding. If the record then fits into a single TCP packet, then we also have to add the IP and TCP overhead: 20-byte header for IP, and 20-byte header for TCP with no options. As a result, there is potential for 60 to 100 bytes of overhead for each record. For a typical maximum transmission unit (MTU) size of 1,500 bytes on the wire, this packet structure translates to a minimum of 6% of framing overhead.

The smaller the record, the higher the framing overhead. However, simply increasing the size of the record to its maximum size (16 KB) is not necessarily a good idea! If the record spans multiple TCP packets, then the TLS layer must wait for all the TCP packets to arrive before it can decrypt the data (Figure 4-11). If any of those TCP packets get lost, reordered, or throttled due to congestion control, then the individual fragments of the TLS record will have to be buffered before they can be decoded, resulting in additional latency. In practice, these delays can create significant bottlenecks for the browser, which prefers to consume data byte by byte and as soon as possible.

WireShark capture of 11,211-byte TLS record split over 8 TCP segments
Figure 4-11. WireShark capture of 11,211-byte TLS record split over 8 TCP segments

Small records incur overhead, large records incur latency, and there is no one value for the “optimal” record size. Instead, for web applications, which are consumed by the browser, the best strategy is to dynamically adjust the record size based on the state of the TCP connection:

  • When the connection is new and TCP congestion window is low, or when the connection has been idle for some time (see “Slow-Start Restart”), each TCP packet should carry exactly one TLS record, and the TLS record should occupy the full maximum segment size (MSS) allocated by TCP.
  • When the connection congestion window is large and if we are transferring a large stream (e.g. streaming video), the size of the TLS record can be increased to span multiple TCP packets (up to 16KB) to reduce framing and CPU overhead on the client and server.

If the TCP connection has been idle, and even if Slow-Start Restart is disabled on the server, the best strategy is to decrease the record size when sending a new burst of data: the conditions may have changed since last transmission, and our goal is to minimize the probability of buffering at the application layer due to lost packets, reordering, and retransmissions.

Using a dynamic strategy delivers the best performance for interactive traffic: small record size eliminates unnecessary buffering latency and improves the time-to-first-{HTML byte, …, video frame}, and a larger record size optimizes throughput by minimizing the overhead of TLS for long-lived streams.

To determine the optimal record size for each state let’s start with the initial case of a new or idle TCP connection where we want to avoid TLS records from spanning multiple TCP packets:

  • Allocate 20 bytes for IPv4 framing overhead and 40 bytes for IPv6.
  • Allocate 20 bytes for TCP framing overhead.
  • Allocate 40 bytes for TCP options overhead (timestamps, SACKs).

Assuming a common 1,500-byte starting MTU, this leaves 1,420 bytes for a TLS record delivered over IPv4, and 1,400 bytes for IPv6. To be future-proof, use the IPv6 size, which leaves us with 1,400 bytes for each TLS record payload—adjust as needed if your MTU is lower.

Next, the decision as to when the record size should be increased and reset if the connection has been idle, can be set based on pre-configured thresholds: increase record size to up to 16 KB after X KB of data have been transferred, and reset the record size after Y milliseconds of idle time.

Typically, configuring the TLS record size is not something we can control at the application layer. Instead, this is a setting and perhaps even a compile-time constant or flag on your TLS server. For details on how to configure these values, check the documentation of your server.

TLS Compression

A little-known feature of TLS is built-in support for lossless compression of data transferred within the record protocol: the compression algorithm is negotiated during the TLS handshake, and compression is applied prior to encryption of each record. However, in practice, you should disable TLS compression on your server for several reasons:

  • The “CRIME” attack, published in 2012, leverages TLS compression to recover secret authentication cookies and allows the attacker to perform session hijacking.
  • Transport-level TLS compression is not content aware and will end up attempting to recompress already compressed data (images, video, etc.).

Double compression will waste CPU time on both the server and the client, and the security breach implications are quite serious: disable TLS compression. In practice, most browsers disable support for TLS compression, but you should nonetheless also explicitly disable it in the configuration of your server to protect your users.

Instead of relying on TLS compression, make sure your server is configured to Gzip all text-based assets and that you are using an optimal compression format for all other media types, such as images, video, and audio.

Certificate-Chain Length

Verifying the chain of trust requires that the browser traverse the chain, starting from the site certificate, and recursively verifying the certificate of the parent until it reaches a trusted root. Hence, the first optimization you should make is to verify that the server does not forget to include all the intermediate certificates when the handshake is performed. If you forget, many browsers will still work, but they will instead be forced to pause the verification and fetch the intermediate certificate on their own, verify it, and then continue. This will most likely require a new DNS lookup, TCP connection, and an HTTP GET request, adding hundreds of milliseconds to your handshake.

How does the browser know from where to fetch it? The child certificate will usually contain the URL for the parent.

Conversely, make sure you do not include unnecessary certificates in your chain! Or, more generally, you should aim to minimize the size of your certificate chain. Recall that server certificates are sent during the TLS handshake, which is likely running over a new TCP connection that is in the early stages of its slow-start algorithm. If the certificate chain exceeds TCP’s initial congestion window (Figure 4-12), then we will inadvertently add yet another roundtrip to the handshake: certificate length will overflow the congestion window and cause the server to stop and wait for a client ACK before proceeding.

WireShark capture of a 5,323-byte TLS certificate chain
Figure 4-12. WireShark capture of a 5,323-byte TLS certificate chain

The certificate chain in Figure 4-12 is over 5 KB in size, which will overflow the initial congestion window size of older servers and force another roundtrip of delay into the handshake. One possible solution is to increase the initial congestion window; see “Increasing TCP’s Initial Congestion Window”. In addition, you should investigate if it is possible to reduce the size of the sent certificates:

  • Minimize the number of intermediate CAs. Ideally, your sent certificate chain should contain exactly two certificates: your site and the CA’s intermediary certificate; use this as a criteria in the selection of your CA. The third certificate, which is the CA root, should already be in the browser’s trusted root and hence should not be sent.
  • It is not uncommon for many sites to include the root certificate of their CA in the chain, which is entirely unnecessary: if your browser does not already have the certificate in its trust store, then it won’t be trusted, and including the root certificate won’t change that.
  • A carefully managed certificate chain can be as low as 2 or 3 KB in size, while providing all the necessary information to the browser to avoid unnecessary roundtrips or out-of-band requests for the certificates themselves. Optimizing your TLS handshake mitigates a critical performance bottleneck, since every new TLS connection is subject to its overhead.

OCSP Stapling

Every new TLS connection requires that the browser must verify the signatures of the sent certificate chain. However, there is one more step we can’t forget: the browser also needs to verify that the certificate is not revoked. To do so, it may periodically download and cache the CRL of the certificate authority, but it may also need to dispatch an OCSP request during the verification process for a “real-time” check. Unfortunately, the browser behavior for this process varies wildly:

  • Some browsers may use their own update mechanism to push updated CRL lists instead of relying on on-demand requests.
  • Some browsers may do only real-time OCSP and CRL checks for Extended Validation (EV) certificates.
  • Some browsers may block the TLS handshake on either revocation method, others may not, and this behavior will vary by vendor, platform, and version of the browser.

Unfortunately, it is a complicated space with no single best solution. However, one optimization that can be made for some browsers is OCSP stapling: the server can include (staple) the OCSP response from the CA to its certificate chain, allowing the browser to skip the online check. Moving the OCSP fetch to the server allows the server to cache the signed OCSP response and save the extra request for many clients. However, there are also a few things to watch out for:

  • OCSP responses can vary from 400 to 4,000 bytes in size. Stapling this response to your certificate chain may once again overflow your TCP congestion window—pay close attention to the total size.
  • Only one OCSP response can be included, which may still mean that the browser will have to issue an OCSP request for other intermediate certificates, if it has not been cached already.

Finally, to enable OCSP stapling, you will need a server that supports it. The good news is popular servers such as Nginx, Apache, and IIS meet this criteria. Check the documentation of your own server for support and configuration instructions.

HTTP Strict Transport Security (HSTS)

HTTP Strict Transport Security is a security policy mechanism that allows the server to declare access rules to a compliant browser via a simple HTTP header—e.g. Strict-Transport-Security: max-age=31536000. Specifically, it instructs the user-agent to enforce the following rules:

  • All requests to the origin should be sent over HTTPS.
  • All insecure links and client requests should be automatically converted to HTTPS on the client before the request is sent.
  • In case of a certificate error, an error message is displayed, and the user is not allowed to circumvent the warning.
  • max-age specifies the lifetime of the specified HSTS ruleset in seconds (e.g., max-age=31536000 is equal to a 365-day cache lifetime).
  • Optionally, the UA can be instructed to remember (“pin”) the fingerprint of a host in the specified certificate chain for future access, effectively limiting the scope of authorities who can authenticate the certificate.

HSTS converts the origin to an HTTPS-only destination and helps protect the application from a variety of passive and active network attacks against the user. Performance wise, it also helps eliminate unnecessary HTTP-to-HTTPS redirects by shifting this responsibility to the client, which will automatically rewrite all links to HTTPS.

As of early 2013, HSTS is supported by Firefox 4+, Chrome 4+, Opera 12+, and Chrome and Firefox for Android. For the latest status, seecaniuse.com/stricttransportsecurity.

Performance Checklist

As an application developer, you are shielded from virtually all the complexity of TLS. Short of ensuring that you do not mix HTTP and HTTPS content on your pages, your application will run transparently on both. However, the performance of your entire application will be affected by the underlying configuration of your server.

The good news is it is never too late to make these optimizations, and once in place, they will pay high dividends for every new connection to your servers! A short list to put on the agenda:

  • Get best performance from TCP; see “Optimizing for TCP”.
  • Upgrade TLS libraries to latest release, and (re)build servers against them.
  • Enable and configure session caching and stateless resumption.
  • Monitor your session caching hit rates and adjust configuration accordingly.
  • Configure forward secrecy ciphers to enable TLS False Start.
  • Terminate TLS sessions closer to the user to minimize roundtrip latencies.
  • Use dynamic TLS record sizing to optimize latency and throughput.
  • Ensure that your certificate chain does not overflow the initial congestion window.
  • Remove unnecessary certificates from your chain; minimize the depth.
  • Configure OCSP stapling on your server.
  • Disable TLS compression on your server.
  • Configure SNI support on your server.
  • Append HTTP Strict Transport Security header.

Testing and Verification

Finally, to verify and test your configuration, you can use an online service, such as the Qualys SSL Server Test to scan your public server for common configuration and security flaws. Additionally, you should familiarize yourself with the openssl command-line interface, which will help you inspect the entire handshake and configuration of your server locally.

  $> openssl s_client -state -CAfile startssl.ca.crt -connect igvita.com:443

  CONNECTED(00000003)
  SSL_connect:before/connect initialization
  SSL_connect:SSLv2/v3 write client hello A
  SSL_connect:SSLv3 read server hello A
  depth=2 /C=IL/O=StartCom Ltd./OU=Secure Digital Certificate Signing
          /CN=StartCom Certification Authority
  verify return:1
  depth=1 /C=IL/O=StartCom Ltd./OU=Secure Digital Certificate Signing
          /CN=StartCom Class 1 Primary Intermediate Server CA
  verify return:1
  depth=0 /description=ABjQuqt3nPv7ebEG/C=US
          /CN=www.igvita.com/emailAddress=ilya@igvita.com
  verify return:1
  SSL_connect:SSLv3 read server certificate A
  SSL_connect:SSLv3 read server done A 1
  SSL_connect:SSLv3 write client key exchange A
  SSL_connect:SSLv3 write change cipher spec A
  SSL_connect:SSLv3 write finished A
  SSL_connect:SSLv3 flush data
  SSL_connect:SSLv3 read finished A
  ---
  Certificate chain 2
   0 s:/description=ABjQuqt3nPv7ebEG/C=US
       /CN=www.igvita.com/emailAddress=ilya@igvita.com
     i:/C=IL/O=StartCom Ltd./OU=Secure Digital Certificate Signing
       /CN=StartCom Class 1 Primary Intermediate Server CA
   1 s:/C=IL/O=StartCom Ltd./OU=Secure Digital Certificate Signing
       /CN=StartCom Class 1 Primary Intermediate Server CA
     i:/C=IL/O=StartCom Ltd./OU=Secure Digital Certificate Signing
       /CN=StartCom Certification Authority
  ---
  Server certificate
  -----BEGIN CERTIFICATE-----
  ... snip ...
  ---
  No client certificate CA names sent
  ---
  SSL handshake has read 3571 bytes and written 444 bytes 3
  ---
  New, TLSv1/SSLv3, Cipher is RC4-SHA
  Server public key is 2048 bit
  Secure Renegotiation IS supported
  Compression: NONE
  Expansion: NONE
  SSL-Session:
      Protocol  : TLSv1
      Cipher    : RC4-SHA
      Session-ID: 269349C84A4702EFA7 ... 4
      Session-ID-ctx:
      Master-Key: 1F5F5F33D50BE6228A ...
      Key-Arg   : None
      Start Time: 1354037095
      Timeout   : 300 (sec)
      Verify return code: 0 (ok)
  ---
1

Client completed verification of received certificate chain.

2

Received certificate chain (two certificates).

3

Size of received certificate chain.

4

Issued session identifier for stateful TLS resume.

In the preceding example, we connect to igvita.com on the default TLS port (443), and perform the TLS handshake. Because the s_client makes no assumptions about known root certificates, we manually specify the path to the root certificate of StartSSL Certificate Authority—this is important. Your browser already has StartSSL’s root certificate and is thus able to verify the chain, but s_client makes no such assumptions. Try omitting the root certificate, and you will see a verification error in the log.

Inspecting the certificate chain shows that the server sent two certificates, which added up to 3,571 bytes, which is very close to the three- to four-segment initial TCP congestion window size. We should be careful not to overflow it or raise the cwnd size on the server. Finally, we can inspect the negotiated SSL session variables—chosen protocol, cipher, key—and we can also see that the server issued a session identifier for the current session, which may be resumed in the future.