You may use a key exchange (as part of a cipher suite) only if the server key type and certificate match. To see this in details, let’s have a look at cipher suites defined in the TLS 1.2 specification. Each cipher suite defines the key exchange algorithm, as well as the subsequently used symmetric encryption and integrity check algorithms; we concentrate here on the key exchange part.
- RSA: the key exchange works by encrypting a random value (chosen by the client) with the server public key. This requires that the server public key is an RSA key, and that the server certificate does not prohibit encryption (mainly through the “Key Usage” certificate extension: if that extension is present, it must include the “keyAgreement” flag).
- DH_RSA: the key exchange is a static Diffie-Hellman: the server public key must be a Diffie-Hellman key; moreover, that certificate must have been issued by a Certification Authority which itself was using a RSA key (the CA key is the key which was used to sign the server certificate).
- DH_DSS: like DH_RSA, except that the CA used a DSA key.
- DHE_RSA: the key exchange is an ephemeral Diffie-Hellman: the server dynamically generates a DH public key and sends it to the client; the server also signs what it sends. For DHE_RSA, the server public key must be of type RSA, and its certificate must be appropriate for signatures (the Key Usage extension, if present, must include the digitalSignature flag).
- DHE_DSS: like DHE_RSA, except that the server key has type DSA.
- DH_anon: there is no server certificate. The server uses a Diffie-Hellman key that it may have dynamically generated. The “anon” cipher suites are vulnerable to impersonating attacks (including, but not limited to, the “Man in the Middle”) since they lack any kind of server authentication. On a general basis, you shall not use an “anon” cipher suite.
Key exchange algorithms which use elliptic-curve cryptography are specified in another RFC and propose the following:
- ECDH_ECDSA: like DH_DSA, but with elliptic curves: the server public key must be an ECDH key, in a certificate issued by a CA which itself was using an ECDSA public key.
- ECDH_RSA: like ECDH_ECDSA, but the issuing CA has a RSA key.
- ECDHE_ECDSA: the server sends a dynamically generated EC Diffie-Hellman key, and signs it with its own key, which must have type ECDSA. This is equivalent to DHE_DSS, but with elliptic curves for both the Diffie-Hellman part and the signature part.
- ECDHE_RSA: like ECDHE_ECDSA, but the server public key is a RSA key, used for signing the ephemeral elliptic-curve Diffie-Hellman key.
- ECDH_anon: an “anon” cipher suite, with dynamic elliptic-curve Diffie-Hellman.
Diffie-Hellman is used in SSL/TLS, as “ephemeral Diffie-Hellman” (the cipher suites with “DHE” in their name; see the standard). What is very rarely encountered is “static Diffie-Hellman” (cipher suites with “DH” in their name, but neither “DHE” or “DH_anon”): these cipher suites require that the server owns a certificate with a DH public key in it, which is rarely supported for a variety of historical and economical reasons, among which the main one is the availability of a free standard for RSA (PKCS#1) while the corresponding standard for Diffie-Hellman (x9.42) costs a hundred bucks, which is not much, but sufficient to deter most amateur developers.
Diffie-Hellman is a key agreement protocol, meaning that if two parties (say, the SSL client and the SSL server) run this protocol, they end up with a shared secret K. However, neither client or server gets to choose the value of K; from their points of view, K looks randomly generated. It is secret(only them know K; eavesdroppers on the line do not) and shared (they both get the same valueK), but not chosen. This is not encryption. A shared secret K is good enough, though, to process terabytes of data with a symmetric encryption algorithm (same K to encrypt on one side and decrypt on the other), and that is what happens in SSL.
There is a well-known asymmetric encryption algorithm called RSA, though. With RSA, the sender can encrypt a message M with the recipient’s public key, and the recipient can decrypt it and recover M using his private key. This time, the sender can choose the contents M. So your question might be: in a RSA world, why do we bother with AES at all ? The answer lies in the following points:
- There are constraints on M. If the recipient’s public key has size n (in bytes, e.g. n = 256 for a 2048-bit RSA key), then the maximum size of M is n-11 bytes. In order to encrypt a longer message, we would have to split it into sufficiently small blocks, and include some reassembly mechanism. Nobody really knows how to do that securely. We have good reasons to believe that RSA on a single message is safe, but subtle weaknesses can lurk in any split-and-reassembly system and we are not comfortable with that. It is already bad enough with symmetric ciphers, where the mathematical situation is simpler.
- Even if we could handle the splitting-and-reassembly, there would be a size expansion. With a 2048-bit RSA key, an internal message chunk has size at most 245 bytes, but yields, when encrypted, a 256-byte sequence. This wastes our lifeforce, i.e. network bandwidth. Symmetric encryption incurs only a bounded overhead (well, SSL adds a slight overhead proportional to the data size, but it is much smaller than what would occur with a RSA-only protocol).
- Compared to AES, RSA is slow as Hell.
- We really like to have the option of using key agreement protocols like DH instead of RSA. In older times (before 2001), RSA was patented but not DH, so the US government was recommending DH. Nowadays, we want to be able to switch algorithms in case one becomes broken. In order to support key agreement protocols, we need some symmetric encryption, so we may just as well use it with RSA. It simplifies implementation and protocol analysis.
Since the general concept of SSL has already been covered into some other questions (e.g. this one and that one), this time I will go for details. Details are important. This answer is going to be somewhat verbose.
SSL is a protocol with a long history and several versions. First prototypes came from Netscape, when they were developing the first versions of their flagship browser, Netscape Navigator (this browser killed off Mosaic in the early times of the Browser Wars, which are still raging, albeit with new competitors). Version 1 has never been made public so we do not know how it looked like. SSL version 2 is described in a draft which can be read there; it has a number of weaknesses, some of them rather serious, so it is deprecated and newer SSL/TLS implementations do not support it (while older deactivated by default). I will not speak of SSL version 2 any further, except as an occasional reference.
SSL version 3 (which I will call “SSLv3”) was an enhanced protocol which still works today and is widely supported. Although still a property of Netscape Communications (or whoever owns that nowadays), the protocol has been published as an “historical RFC” (RFC 6101). Meanwhile, the protocol has been standardized, with a new name in order to avoid legal issues; the new name isTLS.
Three versions of TLS have been produced to far, each with its dedicated RFC: TLS 1.0, TLS 1.1and TLS 1.2. They are internally very similar with each other, and with SSLv3, to the point that an implementation can easily support SSLv3 and all three TLS versions with at least 95% of the code being common. Still internally, all versions are designated by a version number with themajor.minor format; SSLv3 is then 3.0, while the TLS versions are, respectively, 3.1, 3.2 and 3.3. Thus, it is no wonder that TLS 1.0 is sometimes called SSL 3.1 (and it is not incorrect either). SSL 3.0 and TLS 1.0 differ by only some minute details. TLS 1.1 and 1.2 are not yet widely supported, although there is impetus for that, because of possible weaknesses (see below, for the “BEAST attack”). SSLv3 and TLS 1.0 are supported “everywhere” (even IE 6.0 knows them).
SSL aims at providing a secure bidirectional tunnel for arbitrary data. Consider TCP, the well known protocol for sending data over the Internet. TCP works over the IP “packets” and provides a bidirectional tunnel for bytes; it works for every byte values and send them into two streams which can operate simultaneously. TCP handles the hard work of splitting the data into packets, acknowledging them, reassembling them back into their right order, while removing duplicates and reemitting lost packets. From the point of view of the application which uses TCP, there are just two streams, and the packets are invisible; in particular, the streams are not split into “messages” (it is up to the application to take its own encoding rules if it wishes to have messages, and that’s precisely what HTTP does).
TCP is reliable in the presence of “accidents”, i.e. transmission errors due to flaky hardware, network congestion, people with smartphones who walk out range of a given base station, and other non-malicious events. However, an ill-intentioned individual (the “attacker”) with some access to the transport medium could read all the transmitted data and/or alter it intentionally, and TCP does not protect against that. Hence SSL.
SSL assumes that it works over a TCP-like protocol, which provides a reliable stream; SSL does not implement reemission of lost packets and things like that. The attacker is supposed to be in power to disrupt communication completely in an unavoidable way (for instance, he can cut the cables) so SSL’s job is to:
- detect alterations (the attacker must not be able to alter the data silently);
- ensure data confidentiality (the attacker must not gain knowledge of the exchanged data).
SSL fulfills these goals to a large (but not absolute) extent.
SSL is layered and the bottom layer is the record protocol. Whatever data is sent in a SSL tunnel is split into records. Over the wire (the underlying TCP socket or TCP-like medium), a record looks like this:
HHis a single byte which indicates the type of data in the record. Four types are defined:change_cipher_spec (20), alert (21), handshake (22) and application_data (23).
V1:V2is the protocol version, over two bytes. For all versions currently defined,
V1has value 0x03, while
V2has value 0x00 for SSLv3, 0x01 for TLS 1.0, 0x02 for TLS 1.1 and 0x03 for TLS 1.2.
L1:L2is the length of
data, in bytes (big-endian convention is used: the length is 256*L1+L2). The total length of
datacannot exceed 18432 bytes, but in practice it cannot even reach that value.
So a record has a five-byte header, followed by at most 18 kB of data. The
data is where symmetric encryption and integrity checks are applied. When a record is emitted, both sender and receiver are supposed to agree on which cryptographic algorithms are currently applied, and with which keys; this agreement is obtained through the handshake protocol, described in the next section. Compression, if any, is also applied at that point.
In full details, the building of a record works like this:
- Initially, there are some bytes to transfer; these are application data or some other kind of bytes. This payload consists of at most 16384 bytes, but possibly less (a payload of length 0 is legal, but it turns out that Internet Explorer 6.0 does not like that at all).
- The payload is then compressed with whatever compression algorithm is currently agreed upon. Compression is stateful, and thus may depend upon the contents of previous records. In practice, compression is either “null” (no compression at all) or “Deflate” (RFC 3749), the latter being currently courteously but firmly shown the exit door in the Web context, due to the recent CRIME attack. Compression aims at shortening data, but it must necessarily expand it slightly in some unfavourable situations (due to the pigeonhole principle). SSL allows for an expansion of at most 1024 bytes. Of course, null compression never expands (but never shortens either); Deflate will expand by at most 10 bytes, if the implementation is any good.
- The compressed payload is then protected against alterations and encrypted. If the current encryption-and-integrity algorithms are “null”, then this step is a no-operation. Otherwise, aMAC is appended, then some padding (depending on the encryption algorithm), and the result is encrypted. These steps again induce some expansion, which the SSL standard limits to 1024 extra bytes (combined with the maximum expansion from the compression step, this brings us to the 18432 bytes, to which we must add the 5-byte header).
The MAC is, usually, HMAC with one of the usual hash functions (mostly MD5, SHA-1 or SHA-256)(with SSLv3, this is not the “true” HMAC but something very similar and, to the best of our knowledge, as secure as HMAC). Encryption will use either a block cipher in CBC mode, or theRC4 stream cipher. Note that, in theory, other kinds of modes or algorithms could be employed, for instance one of these nifty modes which combine encryption and integrity checks; there are even some RFC for that. In practice, though, deployed implementations do not know of these yet, so they do HMAC and CBC. Crucially, the MAC is first computed and appended to the data, and the result is encrypted. This is MAC-then-encrypt and it is actually not a very good idea. The MAC is computed over the concatenation of the (compressed) payload and a sequence number, so that an industrious attacker may not swap records.
The handshake is a protocol which is played within the record protocol. Its goal is to establish the algorithms and keys which are to be used for the records. It consists of messages. Each handshake message begins with a four-byte header, one byte which describes the message type, then three bytes for the message length (big-endian convention). The successive handshake messages are then sent with records tagged with the “handshake” type (first byte of the header of each record has value 22).
Note the layers: the handshake messages, complete with four-byte header, are then sent as records, and each record also has its own header. Furthermore, several handshake messages can be sent within the same record, and a given handshake message can be split over several records. From the point of view of the module which builds the handshake messages, the “records” are just a stream on which bytes can be sent; it is oblivious to the actual split of that stream into records.
Initially, client and server “agree upon” null encryption with no MAC and null compression. This means that the record they will first send will be sent as cleartext and unprotected.
First message of a handshake is a
ClientHello. It is the message by which the client states its intention to do some SSL. Note that “client” is a symbolic role; it means “the party which speaks first”. It so happens that in the HTTPS context, which is HTTP-within-SSL-within-TCP, all three layers have a notion of “client” and “server”, and they all agree (the TCP client is also the SSL client and the HTTP client), but that’s kind of a coincidence.
ClientHello message contains:
- the maximum protocol version that the client wishes to support;
- the “client random” (32 bytes, out of which 28 are suppose to be generated with a cryptographically strong number generator);
- the “session ID” (in case the client wants to resume a session in an abbreviated handshake, see below);
- the list of “cipher suites” that the client knows of, ordered by client preference;
- the list of compression algorithms that the client knows of, ordered by client preference;
- some optional extensions.
A cipher suite is a 16-bit symbolic identifier for a set of cryptographic algorithms. For instance, the
TLS_RSA_WITH_AES_128_CBC_SHA cipher suite has value 0x002F, and means “records use HMAC/SHA-1 and AES encryption with a 128-bit key, and the key exchange is done by encrypting a random key with the server’s RSA public key”.
The server responds to the
ClientHello with a
ServerHello which contains:
- the protocol version that the client and server will use;
- the “server random” (32 bytes, with 28 random bytes);
- the session ID for this connection;
- the cipher suite that will be used;
- the compression algorithm that will be used;
- optionally, some extensions.
The full handshake looks like this:
Client Server ClientHello --------> ServerHello Certificate* ServerKeyExchange* CertificateRequest* <-------- ServerHelloDone Certificate* ClientKeyExchange CertificateVerify* [ChangeCipherSpec] Finished --------> [ChangeCipherSpec] <-------- Finished Application Data <-------> Application Data
(This schema has been shamelessly copied from the RFC.)
We see the
ServerHello. Then, the server sends a few other messages, which depend on the cipher suite and some other parameters:
- Certificate: the server’s certificate, which contains its public key. More on that below. This message is almost always sent, except if the cipher suite mandates a handshake without a certificate.
- ServerKeyExchange: some extra values for the key exchange, if what is in the certificate is not sufficient. In particular, the “DHE” cipher suites use an ephemeral Diffie-Hellman key exchange, which requires that message.
- CertificateRequest: a message requesting that the client also identifies itself with a certificate of its own. This message contains the list of names of trust anchors (aka “root certificates”) that the server will use to validate the client certificate.
- ServerHelloDone: a marker message (of length zero) which says that the server is finished, and the client should now talk.
The client must then respond with:
- Certificate: the client certificate, if the server requested one. There are subtle variations between versions (with SSLv3, the client must omit this message if it does not have a certificate; with TLS 1.0+, in the same situation, it must send a
Certificatemessage with an empty list of certificates).
- ClientKeyExchange: the client part of the actual key exchange (e.g. some random value encrypted with the server RSA key).
- CertificateVerify: a digital signature computed by the client over all previous handshake messages. This message is sent when the server requested a client certificate, and the client complied. This is how the client proves to the server that it really “owns” the public key which is encoded in the certificate it sent.
Then the client sends a ChangeCipherSpec message, which is not a handshake message: it has its own record type, so it will be sent in a record of its own. Its contents are purely symbolic (a single byte of value 1). This message marks the point at which the client switches to the newly negotiated cipher suite and keys. The subsequent records from the client will then be encrypted.
The Finished message is a cryptographic checksum computed over all previous handshake messages (from both the client and server). Since it is emitted after the
ChangeCipherSpec, it is also covered by the integrity check and the encryption. When the server receives that message and verifies its contents, it obtains a proof that it has indeed talked to the same client all along. This message protects the handshake from alterations (the attacker cannot modify the handshake messages and still get the
Finished message right).
The server finally responds with its own
Finished. At that point, the handshake is finished, and the client and server may exchange application data (in encrypted records tagged as such).
To remember: the client suggests but the server chooses. The cipher suite is in the hands of the server. Courteous servers are supposed to follow the preferences of the client (if possible), but they can do otherwise and some actually do (e.g. as part of protection against BEAST).
In the full handshake, the server sends a “session ID” (i.e. a bunch of up to 32 bytes) to the client. Later on, the client can come back and send the same session ID as part of his
ClientHello. This means that the client still remembers the cipher suite and keys from the previous handshake and would like to reuse these parameters. If the server also remembers the cipher suite and keys, then it copies that specific session ID in its
ServerHello, and then follows the abbreviated handshake:
Client Server ClientHello --------> ServerHello [ChangeCipherSpec] <-------- Finished [ChangeCipherSpec] Finished --------> Application Data <-------> Application Data
The abbreviated handshake is shorter: less messages, no asymmetric cryptography business, and, most importantly, reduced latency. Web browsers and servers do that a lot. A typical Web browser will open a SSL connection with a full handshake, then do abbreviated handshakes for all other connections to the same server: the other connections it opens in parallel, and also the subsequent connections to the same server. Indeed, typical Web servers will close connections after 15 seconds of inactivity, but they will remember sessions (the cipher suite and keys) for a lot longer (possibly for hours or even days).
There are several key exchange algorithms which SSL can use. This is specified by the cipher suite; each key exchange algorithm works with some kinds of server public key. The most common key exchange algorithms are:
RSA: the server’s key is of type RSA. The client generates a random value (the “pre-master secret” of 48 bytes, out of which 46 are random) and encrypts it with the server’s public key. There is no
DHE_RSA: the server’s key is of type RSA, but used only for signature. The actual key exchange uses Diffie-Hellman. The server sends a
ServerKeyExchangemessage containing the DH parameters (modulus, generator) and a newly-generated DH public key; moreover, the server signs this message. The client will respond with a
ClientKeyExchangemessage which also contains a newly-generated DH public key. The DH yields the “pre-master secret”.
DHE_RSA, but the server has a DSS key (“DSS” is also known as “DSA”). DSS is a signature-only algorithm.
Less commonly used key exchange algorithms include:
DH: the server’s key is of type Diffie-Hellman (we are talking of a certificate which contains a DH key). This used to be “popular” in an administrative way (US federal government mandated its use) when the RSA patent was still active (this was during the previous century). Despite the bureaucratic push, it was never as widely deployed as RSA.
DH_anon: like the
DHEsuites, but without the signature from the server. This is a certificate-less cipher suite. By construction, it is vulnerable to Man-in-the-Middle attacks, thus very rarely enabled at all.
PSK: pre-shared key cipher suites. The symmetric-only key exchange, building on a pre-established shared secret.
SRP: application of the SRP protocol which is a Password Authenticated Key Exchangeprotocol. Client and server authenticate each other with regards to a shared secret, which can be a low-entropy password (whereas PSK requires a high-entropy shared secret). Very nifty. Not widely supported yet.
- An ephemeral RSA key: like
DHEbut with a newly-generated RSA key pair. Since generating RSA keys is expensive, this is not a popular option, and was specified only as part of “export” cipher suites which complied to the pre-2000 US export regulations on cryptography (i.e. RSA keys of at most 512 bits). Nobody does that nowadays.
- Variants of the
DH*algorithms with elliptic curves. Very fashionable. Should become common in the future.
Certificates and Authentication
Digital certificates are vessels for asymmetric keys. They are intended to solve key distribution. Namely, the client wants to use the server’s public key. The attacker will try to make the client use the attacker’s public key. So the client must have a way to make sure that it is using the right key.
SSL is supposed to use X.509. This is a standard for certificates. Each certificate is signed by aCertification Authority. The idea is that the client inherently knows the public keys of a handful of CA (these are the “trust anchors” or “root certificates”). With these keys, the client can verify the signature computed by a CA over a certificate which has been issued to the server. This process can be extended recursively: a CA can issue a certificate for another CA (i.e. sign the certificate structure which contains the other CA name and key). A chain of certificates beginning with a root CA and ending with the server’s certificate, with intermediate CA certificates in between, each certificate being signed relatively to the public key which is encoded in the previous certificate, is called, unimaginatively, a certificate chain.
So the client is supposed to do the following:
- Get a certificate chain ending with the server’s certificate. The
Certificatemessage from the server is supposed to contain, precisely, such a chain.
- Validate the chain, i.e. verifying all the signatures and names and the various X.509 bits. Also, the client should check revocation status of all the certificates in the chain, which is complex and heavy (Web browsers now do it, more or less, but it is a recent development).
- Verify that the intended server name is indeed written in the server’s certificate. Because the client does not only want to use a validated public key, it also wants to use the public key of a specific server. See RFC 2818 for details on how this is done in a HTTPS context.
The certification model with X.509 certificates has often been criticized, not really on technical grounds, but rather for politico-economic reasons. It concentrates validation power into the hands of a few players, who are not necessarily well-intentioned, or at least not always competent. Now and again, proposals for other systems are published (e.g. Convergence or DNSSEC) but none has gained wide acceptance (yet).
For certificate-based client authentication, it is entirely up to the server to decide what to do with a client certificate (and also what to do with a client who declined to send a certificate). In the Windows/IIS/Active Directory world, a client certificate should contain an account name as a “User Principal Name” (encoded in a Subject Alt Name extension of the certificate); the server looks it up in its Active Directory server.
Since a handshake is just some messages which are sent as records with the current encryption/compression conventions, nothing theoretically prevents a SSL client and server from doing a second handshake within an established SSL connection. And, indeed, it is supported and it happens in practice.
At any time, the client or the server can initiate a new handshake (the server can send a
HelloRequest message to trigger it; the client just sends a
ClientHello). A typical situation is the following:
- An HTTPS server is configured to listen to SSL requests.
- A client connects and a handshake is performed.
- Once the handshake is done, the client sends its “applicative data”, which consists of a HTTP request. At that point (and at that point only), the server learns the target path. Up to that point, the URL which the client wishes to reach was unknown to the server (the server mighthave been made aware of the target server name through a Server Name Indication SSL extension, but this does not include the path).
- Upon seeing the path, the server may learn that this is for a part of its data which is supposed to be accessed only by clients authenticated with certificates. But the server did not ask for a client certificate in the handshake (in particular because not-so-old Web browsers displayed freakish popups when asked for a certificate, in particular if they did not have one, so a server would refrain from asking a certificate if it did not have good reason to believe that the client has one and knows how to use it).
- Therefore, the server triggers a new handshake, this time requesting a certificate.
There is an interesting weakness in the situation I just described; see RFC 5746 for a workaround. In a conceptual way, SSL transfers security characteristics only in the “forward” way. When doing a new handshake, whatever could be known about the client before the new handshake is still valid after (e.g. if the client had sent a good username+password within the tunnel) but not the other way round. In the situation above, the first HTTP request which was received before the new handshake is not covered by the certificate-based authentication of the second handshake, and it would have been chosen by he attacker ! Unfortunately, some Web servers just assumed that the client authentication from the second handshake extended to what was sent before that second handshake, and it allowed some nasty tricks from the attacker. RFC 5746 attempts at fixing that.
Alert messages are just warning and error messages. They are rather uninteresting except when they could be subverted from some attacks (see later on).
There is an important alert message, called
close_notify: it is a message which the client or the server sends when it wishes to close the connection. Upon receiving this message, the server or client must also respond with a
close_notify and then consider the tunnel to be closed (but thesession is still valid, and can be reused in an ulterior abbreviated handshake). The interesting part is that these alert messages are, like all other records, protected by the encryption and MAC. Thus, the connection closure is covered by the cryptographic umbrella.
This is important in the context of (old) HTTP, where some data can be sent by the server without an explicit “content-length”: the data extends until the end of the transport stream. Old HTTP with SSLv2 (which did not have the
close_notify) allowed an attacker to force a connection close (at the TCP level) which the client would have taken for a normal close; thus, the attacker could truncate the data without being caught. This is one of the problems with SSLv2 (arguably, the worst) and SSLv3 fixes it. Note that “modern” HTTP uses “Content-Length” headers and/or chunked encoding, which is not vulnerable to such truncation, even if the SSL layer allowed it. Still, it is nice to know that SSL offers protection on closure events.
There is a limit on Stack Exchange answer length, so the description of some attacks on SSL will be in another answer (besides, I have some pancakes to cook). Stay tuned.