Author Archives: teeatom

SQL Injection

This is not so much about security but rather about efficiency: security comes as a bonus. You certainly realize that queries must be parsed by the server in order to turn them in meaningful instructions for it to execute them. That is the same situation than with regular programmes where source code must be compiled into machine code, but with the compilation work done by the DB server.

The server will have to do that work pretty much each time it receives a SQL query, unless there’s a way to tell it to remember a query which must be executed several times. Preparing a query is really telling that to the server.

The second idea is to allow a query to be parameterized, since often times programmes need to run queries several times with only a few values changing from one iteration to the next. Parameters make this process easily blend with caching query preparations.

As for how this whole process works, as said earlier, this is just compilation. SQL is parsed, transformed into a abstract syntax tree, many transformations are applied on said tree to produce a more optimal version of it, either by eliminating useless clauses, or reformulate them in way which better leverage the structure of the targeted schema. Then the resulting tree is transformed into the stream of instructions that the server will execute each time the EXECUTE command will be issued. Depending on the place where parameters are placed in the SQL query, the compilations steps maybe delayed after initial parsing, as values may dictate how the optimization should be conducted.

The main reason why injections cannot work on prepared query is that injections rely on SQL syntax, however after statements are prepared, ie. at the time when the injection data is really submitted to the server, SQL parsing has already been done, so syntactic issues cannot occur anymore. parameters are taken “as-is”, without any form of syntactic interpretation within the surrounding query: they are solely coerced to the type required by the expression within which their associated bindings are appearing.

For instance, the classical injection method is to cut short the statement, insert some other commands, and then add a final command to ensure that the trailing SQL code does not proclduce a syntax error.

 SELECT * FROM table WHERE x = ? AND k = 1

If in the above query, we replace the question mark by the following: 0; DROP TABLE table, we realize an injection. The meaning of the statement has been diverted from its original intent, and it will now execute unwanted code. However, the trailing AND k = 1 will produce a syntax error, so we have to append another command which will syntactically correct the whole string, for instance ; SELECT 0 FROM table WHERE 1 = 1.

This injection may only work if the placeholder replacement happens before parsing. Otherwise the whole string bound to ? will be just a string without other syntactic significance. It’s coertion to another type (int, enum, etc.), may produce an error though.

The wikipedia page on the topic does a much better job at covering it than this humble answer, with many examples and references, don’t hesitate to consult it for a better understanding.

SQL Injection Prevention Cheat Sheet


Last revision (mm/dd/yy): 11/5/2015


This article is focused on providing clear, simple, actionable guidance for preventing SQL Injection flaws in your applications. SQL Injection attacks are unfortunately very common, and this is due to two factors:

  1. the significant prevalence of SQL Injection vulnerabilities, and
  2. the attractiveness of the target (i.e., the database typically contains all the interesting/critical data for your application).

It’s somewhat shameful that there are so many successful SQL Injection attacks occurring, because it is EXTREMELY simple to avoid SQL Injection vulnerabilities in your code.

SQL Injection flaws are introduced when software developers create dynamic database queries that include user supplied input. To avoid SQL injection flaws is simple. Developers need to either: a) stop writing dynamic queries; and/or b) prevent user supplied input which contains malicious SQL from affecting the logic of the executed query.

This article provides a set of simple techniques for preventing SQL Injection vulnerabilities by avoiding these two problems. These techniques can be used with practically any kind of programming language with any type of database. There are other types of databases, like XML databases, which can have similar problems (e.g., XPath and XQuery injection) and these techniques can be used to protect them as well.

Primary Defenses:

  • Option #1: Use of Prepared Statements (Parameterized Queries)
  • Option #2: Use of Stored Procedures
  • Option #3: Escaping all User Supplied Input

Additional Defenses:

  • Also Enforce: Least Privilege
  • Also Perform: White List Input Validation
Unsafe Example

SQL injection flaws typically look like this:

The following (Java) example is UNSAFE, and would allow an attacker to inject code into the query that would be executed by the database. The unvalidated “customerName” parameter that is simply appended to the query allows an attacker to inject any SQL code they want. Unfortunately, this method for accessing databases is all too common.

 String query = "SELECT account_balance FROM user_data WHERE user_name = "
   + request.getParameter("customerName");
 try {
 	Statement statement = connection.createStatement( … );
 	ResultSet results = statement.executeQuery( query );

Primary Defenses

Defense Option 1: Prepared Statements (with Parameterized Queries)

The use of prepared statements with variable binding (aka parameterized queries) is how all developers should first be taught how to write database queries. They are simple to write, and easier to understand than dynamic queries. Parameterized queries force the developer to first define all the SQL code, and then pass in each parameter to the query later. This coding style allows the database to distinguish between code and data, regardless of what user input is supplied.

Prepared statements ensure that an attacker is not able to change the intent of a query, even if SQL commands are inserted by an attacker. In the safe example below, if an attacker were to enter the userID of tom’ or ‘1’=’1, the parameterized query would not be vulnerable and would instead look for a username which literally matched the entire string tom’ or ‘1’=’1.

Language specific recommendations:

  • Java EE – use PreparedStatement() with bind variables
  • .NET – use parameterized queries like SqlCommand() or OleDbCommand() with bind variables
  • PHP – use PDO with strongly typed parameterized queries (using bindParam())
  • Hibernate – use createQuery() with bind variables (called named parameters in Hibernate)
  • SQLite – use sqlite3_prepare() to create a statement object

In rare circumstances, prepared statements can harm performance. When confronted with this situation, it is best to either a) strongly validate all data or b) escape all user supplied input using an escaping routine specific to your database vendor as described below, rather than using a prepared statement.

Safe Java Prepared Statement Example

The following code example uses a PreparedStatement, Java’s implementation of a parameterized query, to execute the same database query.

 String custname = request.getParameter("customerName"); // This should REALLY be validated too
 // perform input validation to detect attacks
 String query = "SELECT account_balance FROM user_data WHERE user_name = ? ";
 PreparedStatement pstmt = connection.prepareStatement( query );
 pstmt.setString( 1, custname); 
 ResultSet results = pstmt.executeQuery( );
Safe C# .NET Prepared Statement Example

With .NET, it’s even more straightforward. The creation and execution of the query doesn’t change. All you have to do is simply pass the parameters to the query using the Parameters.Add() call as shown here.

 String query = 
 	 "SELECT account_balance FROM user_data WHERE user_name = ?";
 try {
 	OleDbCommand command = new OleDbCommand(query, connection);
 	command.Parameters.Add(new OleDbParameter("customerName", CustomerName Name.Text));
 	OleDbDataReader reader = command.ExecuteReader();
 	// …
 } catch (OleDbException se) {
 	// error handling

We have shown examples in Java and .NET but practically all other languages, including Cold Fusion, and Classic ASP, support parameterized query interfaces. Even SQL abstraction layers, like the Hibernate Query Language (HQL) have the same type of injection problems (which we call HQL Injection). HQL supports parameterized queries as well, so we can avoid this problem:

Hibernate Query Language (HQL) Prepared Statement (Named Parameters) Examples
 First is an unsafe HQL Statement
 Query unsafeHQLQuery = session.createQuery("from Inventory where productID='"+userSuppliedParameter+"'");
 Here is a safe version of the same query using named parameters
 Query safeHQLQuery = session.createQuery("from Inventory where productID=:productid");
 safeHQLQuery.setParameter("productid", userSuppliedParameter);

For examples of parameterized queries in other languages, including Ruby, PHP, Cold Fusion, and Perl, see the Query Parameterization Cheat Sheet or

Developers tend to like the Prepared Statement approach because all the SQL code stays within the application. This makes your application relatively database independent.

Defense Option 2: Stored Procedures

Stored procedures have the same effect as the use of prepared statements when implemented safely* which is the norm for most stored procedure languages. They require the developer to just build SQL statements with parameters which are automatically parameterized unless the developer does something largely out of the norm. The difference between prepared statements and stored procedures is that the SQL code for a stored procedure is defined and stored in the database itself, and then called from the application. Both of these techniques have the same effectiveness in preventing SQL injection so your organization should choose which approach makes the most sense for you.

*Note: ‘Implemented safely’ means the stored procedure does not include any unsafe dynamic SQL generation. Developers do not usually generate dynamic SQL inside stored procedures. However, it can be done, but should be avoided. If it can’t be avoided, the stored procedure must use input validation or proper escaping as described in this article to make sure that all user supplied input to the stored procedure can’t be used to inject SQL code into the dynamically generated query. Auditors should always look for uses of sp_execute, execute or exec within SQL Server stored procedures. Similar audit guidelines are necessary for similar functions for other vendors.

There are also several cases where stored procedures can increase risk. For example, on MS SQL server, you have 3 main default roles: db_datareader, db_datawriter and db_owner. Before stored procedures came into use, DBA’s would give db_datareader or db_datawriter rights to the webservice’s user, depending on the requirements. However, stored procedures require execute rights, a role that is not available by default. Some setups where the user management has been centralized, but is limited to those 3 roles, cause all web apps to run under db_owner rights so stored procedures can work. Naturally, that means that if a server is breached the attacker has full rights to the database, where previously they might only have had read-access. More on this topic here.

Safe Java Stored Procedure Example

The following code example uses a CallableStatement, Java’s implementation of the stored procedure interface, to execute the same database query. The “sp_getAccountBalance” stored procedure would have to be predefined in the database and implement the same functionality as the query defined above.

 String custname = request.getParameter("customerName"); // This should REALLY be validated
 try {
 	CallableStatement cs = connection.prepareCall("{call sp_getAccountBalance(?)}");
 	cs.setString(1, custname);
 	ResultSet results = cs.executeQuery();		
 	// … result set handling 
 } catch (SQLException se) {			
 	// … logging and error handling
Safe VB .NET Stored Procedure Example

The following code example uses a SqlCommand, .NET’s implementation of the stored procedure interface, to execute the same database query. The “sp_getAccountBalance” stored procedure would have to be predefined in the database and implement the same functionality as the query defined above.

 	Dim command As SqlCommand = new SqlCommand("sp_getAccountBalance", connection)
 	command.CommandType = CommandType.StoredProcedure
 	command.Parameters.Add(new SqlParameter("@CustomerName", CustomerName.Text))
 	Dim reader As SqlDataReader = command.ExecuteReader()
 	‘ …
 Catch se As SqlException 
 	‘ error handling
 End Try

Defense Option 3: Escaping All User Supplied Input

This second technique is to escape user input before putting it in a query. However, this methodology is frail compared to using parameterized queries and we cannot guarantee it will prevent all SQL Injection in all situations. This technique should only be used, with caution, to retrofit legacy code in a cost effective way. Applications built from scratch, or applications requiring low risk tolerance should be built or re-written using parameterized queries.

This technique works like this. Each DBMS supports one or more character escaping schemes specific to certain kinds of queries. If you then escape all user supplied input using the proper escaping scheme for the database you are using, the DBMS will not confuse that input with SQL code written by the developer, thus avoiding any possible SQL injection vulnerabilities.

To find the javadoc specifically for the database encoders, click on the ‘Codec’ class on the left hand side. There are lots of Codecs implemented. The two Database specific codecs are OracleCodec, and MySQLCodec.

Just click on their names in the ‘All Known Implementing Classes:’ at the top of the Interface Codec page.

At this time, ESAPI currently has database encoders for:

  • Oracle
  • MySQL (Both ANSI and native modes are supported)

Database encoders for:

  • SQL Server
  • PostgreSQL

Are forthcoming. If your database encoder is missing, please let us know.

Database Specific Escaping Details

If you want to build your own escaping routines, here are the escaping details for each of the databases that we have developed ESAPI Encoders for:

Oracle Escaping

This information is based on the Oracle Escape character information found here:

Escaping Dynamic Queries

To use an ESAPI database codec is pretty simple. An Oracle example looks something like:

 ESAPI.encoder().encodeForSQL( new OracleCodec(), queryparam );

So, if you had an existing Dynamic query being generated in your code that was going to Oracle that looked like this:

 String query = "SELECT user_id FROM user_data WHERE user_name = '" + req.getParameter("userID") 
 + "' and user_password = '" + req.getParameter("pwd") +"'";
 try {
     Statement statement = connection.createStatement( … );
     ResultSet results = statement.executeQuery( query );

You would rewrite the first line to look like this:

Codec ORACLE_CODEC = new OracleCodec();
 String query = "SELECT user_id FROM user_data WHERE user_name = '" + 
   ESAPI.encoder().encodeForSQL( ORACLE_CODEC, req.getParameter("userID")) + "' and user_password = '"
   + ESAPI.encoder().encodeForSQL( ORACLE_CODEC, req.getParameter("pwd")) +"'";

And it would now be safe from SQL injection, regardless of the input supplied.

For maximum code readability, you could also construct your own OracleEncoder.

 Encoder oe = new OracleEncoder();
 String query = "SELECT user_id FROM user_data WHERE user_name = '" 
   + oe.encode( req.getParameter("userID")) + "' and user_password = '" 
   + oe.encode( req.getParameter("pwd")) +"'";

With this type of solution, all your developers would have to do is wrap each user supplied parameter being passed in into an ESAPI.encoder().encodeForOracle( ) call or whatever you named it, and you would be done.

Turn off character replacement

Use SET DEFINE OFF or SET SCAN OFF to ensure that automatic character replacement is turned off. If this character replacement is turned on, the & character will be treated like a SQLPlus variable prefix that could allow an attacker to retrieve private data.

See and for more information

Escaping Wildcard characters in Like Clauses

The LIKE keyword allows for text scanning searches. In Oracle, the underscore ‘_’ character matches only one character, while the ampersand ‘%’ is used to match zero or more occurrences of any characters. These characters must be escaped in LIKE clause criteria. For example:

SELECT name FROM emp 
WHERE id LIKE '%/_%' ESCAPE '/';
SELECT name FROM emp 
WHERE id LIKE '%\%%' ESCAPE '\';
Oracle 10g escaping

An alternative for Oracle 10g and later is to place { and } around the string to escape the entire string. However, you have to be careful that there isn’t a } character already in the string. You must search for these and if there is one, then you must replace it with }}. Otherwise that character will end the escaping early, and may introduce a vulnerability.

MySQL Escaping

MySQL supports two escaping modes:

  1. ANSI_QUOTES SQL mode, and a mode with this off, which we call
  2. MySQL mode.

ANSI SQL mode: Simply encode all ‘ (single tick) characters with ” (two single ticks)

MySQL mode, do the following:

 NUL (0x00) --> \0  [This is a zero, not the letter O]
 BS  (0x08) --> \b
 TAB (0x09) --> \t
 LF  (0x0a) --> \n
 CR  (0x0d) --> \r
 SUB (0x1a) --> \Z
 "   (0x22) --> \"
 %   (0x25) --> \%
 '   (0x27) --> \'
 \   (0x5c) --> \\
 _   (0x5f) --> \_ 
 all other non-alphanumeric characters with ASCII values less than 256  --> \c
 where 'c' is the original non-alphanumeric character.

This information is based on the MySQL Escape character information found here:

SQL Server Escaping

We have not implemented the SQL Server escaping routine yet, but the following has good pointers to articles describing how to prevent SQL injection attacks on SQL server

DB2 Escaping

This information is based on DB2 WebQuery special characters found here: as well as some information from Oracle’s JDBC DB2 driver found here:

Information in regards to differences between several DB2 Universal drivers can be found here:

Hex-encoding all input

A somewhat special case of escaping is the process of hex-encode the entire string received from the user (this can be seen as escaping every character). The web application should hex-encode the user input before including it to the SQL statement. The SQL statement should take into account this fact, and accordingly compare the data. For example, if we have to look up a record matching a sessionID, and the user transmitted the string abc123 as session ID, the select statement would be:

   SELECT ... FROM session
   WHERE hex_encode (sessionID) = '606162313233'

(hex_encode should be replace by the particular facility for the database being used). The string 606162313233 is the hex encoded version of the string received from the user (it is the sequence of hex values of the ASCII/UTF-8 codes of the user data).

If an attacker were to transmit a string containing a single-quote character followed by their attempt to inject SQL code, the constructed SQL statement will only look like:

   WHERE hex_encode ( ... ) = '2720 ... '

27 being the ASCII code (in hex) of the single-quote, which is simply hex-encoded like any other character in the string. The resulting SQL can only contain numeric digits and a to f letters, and never any special character that could enable an SQL injection.

Additional Defenses

Beyond adopting one of the three primary defenses, we also recommend adopting all of these additional defenses in order to provide defense in depth. These additional defenses are:

  • Least Privilege
  • White List Input Validation

Least Privilege

To minimize the potential damage of a successful SQL injection attack, you should minimize the privileges assigned to every database account in your environment. Do not assign DBA or admin type access rights to your application accounts. We understand that this is easy, and everything just ‘works’ when you do it this way, but it is very dangerous. Start from the ground up to determine what access rights your application accounts require, rather than trying to figure out what access rights you need to take away. Make sure that accounts that only need read access are only granted read access to the tables they need access to. If an account only needs access to portions of a table, consider creating a view that limits access to that portion of the data and assigning the account access to the view instead, rather than the underlying table. Rarely, if ever, grant create or delete access to database accounts.

If you adopt a policy where you use stored procedures everywhere, and don’t allow application accounts to directly execute their own queries, then restrict those accounts to only be able to execute the stored procedures they need. Don’t grant them any rights directly to the tables in the database.

SQL injection is not the only threat to your database data. Attackers can simply change the parameter values from one of the legal values they are presented with, to a value that is unauthorized for them, but the application itself might be authorized to access. As such, minimizing the privileges granted to your application will reduce the likelihood of such unauthorized access attempts, even when an attacker is not trying to use SQL injection as part of their exploit.

While you are at it, you should minimize the privileges of the operating system account that the DBMS runs under. Don’t run your DBMS as root or system! Most DBMSs run out of the box with a very powerful system account. For example, MySQL runs as system on Windows by default! Change the DBMS’s OS account to something more appropriate, with restricted privileges.

Multiple DB Users

The designer of web applications should not only avoid using the same owner/admin account in the web applications to connect to the database. Different DB users could be used for different web applications. In general, each separate web application that requires access to the database could have a designated database user account that the web-app will use to connect to the DB. That way, the designer of the application can have good granularity in the access control, thus reducing the privileges as much as possible. Each DB user will then have select access to what it needs only, and write-access as needed.

As an example, a login page requires read access to the username and password fields of a table, but no write access of any form (no insert, update, or delete). However, the sign-up page certainly requires insert privilege to that table; this restriction can only be enforced if these web apps use different DB users to connect to the database.


SQL views can further increase the granularity of access by limiting the read access to specific fields of a table or joins of tables. It could potentially have additional benefits: for example, suppose that the system is required (perhaps due to some specific legal requirements) to store the passwords of the users, instead of salted-hashed passwords. The designer could use views to compensate for this limitation; revoke all access to the table (from all DB users except the owner/admin) and create a view that outputs the hash of the password field and not the field itself. Any SQL injection attack that succeeds in stealing DB information will be restricted to stealing the hash of the passwords (could even be a keyed hash), since no DB user for any of the web applications has access to the table itself.

White List Input Validation

Input validation can be used to detect unauthorized input before it is passed to the SQL query. For more information please see the Input Validation Cheat Sheet. Proceed with caution here. Validated data is not necessarily safe to insert into SQL queries via string building.


Cookie path


The cookie path doesn’t provide any security (in most real-world situations).

It is important to understand that the cookie spec is ancient technology. It dates back from the earliest days of the web. The security model of the web has evolved since then, and become more carefully thought-out. The security model for cookies hasn’t evolved correspondingly.

As another example of impedance mismatches between the web’s security model and cookies, the same-origin policy treats as a different origin from, but they are treated as identical for purposes of cookies. You can find more discussion of security oddities with cookies from Michal Zalewski.

Cookies can be read by Javascript. While the browser may take the path into account when Javascript tries to read cookies, this is not a security feature: the path is not a security boundary, so malicious Javascript on one page can still read cookies for other paths (e.g., by opening an invisible iframe with the proper path, injecting malicious Javascript into it, and then grabbing the cookie). The only effective security boundary is at the granularity of an origin. As a result, the bottom line from a security perspective is: malicious Javascript on can read a cookie whose path is

In practice, developers typically avoid these corner cases that are left over from earlier days, or at least avoid relying upon them to provide extra security. For instance, web developers should not rely upon the cookie-path to provide security (at best it reduces the number of cookies sent back, which could perhaps be used to reduce bandwidth in some situations). As another example, sites these days usually avoid serving content from non-standard port numbers, since that situation is another corner case that exposes unexpected semantics.

At this point, the cookie path is mostly a vestigial remnant of earlier days. It doesn’t serve much purpose any longer, as far as I can tell, and if the cookie-path had never been introduced, today you’d probably never notice. But browsers still need to support it, for backwards-compatibility reasons. And so it goes, on the web. It is best to think of the web not as a carefully designed artifact but as something that evolved over time — and as a result, has accumulated now-useless gunk, like our appendix.

TLS and its key exchange

You may use a key exchange (as part of a cipher suite) only if the server key type and certificate match. To see this in details, let’s have a look at cipher suites defined in the TLS 1.2 specification. Each cipher suite defines the key exchange algorithm, as well as the subsequently used symmetric encryption and integrity check algorithms; we concentrate here on the key exchange part.

  • RSA: the key exchange works by encrypting a random value (chosen by the client) with the server public key. This requires that the server public key is an RSA key, and that the server certificate does not prohibit encryption (mainly through the “Key Usage” certificate extension: if that extension is present, it must include the “keyAgreement” flag).
  • DH_RSA: the key exchange is a static Diffie-Hellman: the server public key must be a Diffie-Hellman key; moreover, that certificate must have been issued by a Certification Authority which itself was using a RSA key (the CA key is the key which was used to sign the server certificate).
  • DH_DSS: like DH_RSA, except that the CA used a DSA key.
  • DHE_RSA: the key exchange is an ephemeral Diffie-Hellman: the server dynamically generates a DH public key and sends it to the client; the server also signs what it sends. For DHE_RSA, the server public key must be of type RSA, and its certificate must be appropriate for signatures (the Key Usage extension, if present, must include the digitalSignature flag).
  • DHE_DSS: like DHE_RSA, except that the server key has type DSA.
  • DH_anon: there is no server certificate. The server uses a Diffie-Hellman key that it may have dynamically generated. The “anon” cipher suites are vulnerable to impersonating attacks (including, but not limited to, the “Man in the Middle”) since they lack any kind of server authentication. On a general basis, you shall not use an “anon” cipher suite.

Key exchange algorithms which use elliptic-curve cryptography are specified in another RFC and propose the following:

  • ECDH_ECDSA: like DH_DSA, but with elliptic curves: the server public key must be an ECDH key, in a certificate issued by a CA which itself was using an ECDSA public key.
  • ECDH_RSA: like ECDH_ECDSA, but the issuing CA has a RSA key.
  • ECDHE_ECDSA: the server sends a dynamically generated EC Diffie-Hellman key, and signs it with its own key, which must have type ECDSA. This is equivalent to DHE_DSS, but with elliptic curves for both the Diffie-Hellman part and the signature part.
  • ECDHE_RSA: like ECDHE_ECDSA, but the server public key is a RSA key, used for signing the ephemeral elliptic-curve Diffie-Hellman key.
  • ECDH_anon: an “anon” cipher suite, with dynamic elliptic-curve Diffie-Hellman.

Diffie-Hellman is used in SSL/TLS, as “ephemeral Diffie-Hellman” (the cipher suites with “DHE” in their name; see the standard). What is very rarely encountered is “static Diffie-Hellman” (cipher suites with “DH” in their name, but neither “DHE” or “DH_anon”): these cipher suites require that the server owns a certificate with a DH public key in it, which is rarely supported for a variety of historical and economical reasons, among which the main one is the availability of a free standard for RSA (PKCS#1) while the corresponding standard for Diffie-Hellman (x9.42) costs a hundred bucks, which is not much, but sufficient to deter most amateur developers.

Diffie-Hellman is a key agreement protocol, meaning that if two parties (say, the SSL client and the SSL server) run this protocol, they end up with a shared secret K. However, neither client or server gets to choose the value of K; from their points of view, K looks randomly generated. It is secret(only them know K; eavesdroppers on the line do not) and shared (they both get the same valueK), but not chosen. This is not encryption. A shared secret K is good enough, though, to process terabytes of data with a symmetric encryption algorithm (same K to encrypt on one side and decrypt on the other), and that is what happens in SSL.

There is a well-known asymmetric encryption algorithm called RSA, though. With RSA, the sender can encrypt a message M with the recipient’s public key, and the recipient can decrypt it and recover M using his private key. This time, the sender can choose the contents M. So your question might be: in a RSA world, why do we bother with AES at all ? The answer lies in the following points:

  • There are constraints on M. If the recipient’s public key has size n (in bytes, e.g. n = 256 for a 2048-bit RSA key), then the maximum size of M is n-11 bytes. In order to encrypt a longer message, we would have to split it into sufficiently small blocks, and include some reassembly mechanism. Nobody really knows how to do that securely. We have good reasons to believe that RSA on a single message is safe, but subtle weaknesses can lurk in any split-and-reassembly system and we are not comfortable with that. It is already bad enough with symmetric ciphers, where the mathematical situation is simpler.
  • Even if we could handle the splitting-and-reassembly, there would be a size expansion. With a 2048-bit RSA key, an internal message chunk has size at most 245 bytes, but yields, when encrypted, a 256-byte sequence. This wastes our lifeforce, i.e. network bandwidth. Symmetric encryption incurs only a bounded overhead (well, SSL adds a slight overhead proportional to the data size, but it is much smaller than what would occur with a RSA-only protocol).
  • Compared to AES, RSA is slow as Hell.
  • We really like to have the option of using key agreement protocols like DH instead of RSA. In older times (before 2001), RSA was patented but not DH, so the US government was recommending DH. Nowadays, we want to be able to switch algorithms in case one becomes broken. In order to support key agreement protocols, we need some symmetric encryption, so we may just as well use it with RSA. It simplifies implementation and protocol analysis.

Since the general concept of SSL has already been covered into some other questions (e.g. this one and that one), this time I will go for details. Details are important. This answer is going to be somewhat verbose.


SSL is a protocol with a long history and several versions. First prototypes came from Netscape, when they were developing the first versions of their flagship browser, Netscape Navigator (this browser killed off Mosaic in the early times of the Browser Wars, which are still raging, albeit with new competitors). Version 1 has never been made public so we do not know how it looked like. SSL version 2 is described in a draft which can be read there; it has a number of weaknesses, some of them rather serious, so it is deprecated and newer SSL/TLS implementations do not support it (while older deactivated by default). I will not speak of SSL version 2 any further, except as an occasional reference.

SSL version 3 (which I will call “SSLv3”) was an enhanced protocol which still works today and is widely supported. Although still a property of Netscape Communications (or whoever owns that nowadays), the protocol has been published as an “historical RFC” (RFC 6101). Meanwhile, the protocol has been standardized, with a new name in order to avoid legal issues; the new name isTLS.

Three versions of TLS have been produced to far, each with its dedicated RFC: TLS 1.0, TLS 1.1and TLS 1.2. They are internally very similar with each other, and with SSLv3, to the point that an implementation can easily support SSLv3 and all three TLS versions with at least 95% of the code being common. Still internally, all versions are designated by a version number with themajor.minor format; SSLv3 is then 3.0, while the TLS versions are, respectively, 3.1, 3.2 and 3.3. Thus, it is no wonder that TLS 1.0 is sometimes called SSL 3.1 (and it is not incorrect either). SSL 3.0 and TLS 1.0 differ by only some minute details. TLS 1.1 and 1.2 are not yet widely supported, although there is impetus for that, because of possible weaknesses (see below, for the “BEAST attack”). SSLv3 and TLS 1.0 are supported “everywhere” (even IE 6.0 knows them).


SSL aims at providing a secure bidirectional tunnel for arbitrary data. Consider TCP, the well known protocol for sending data over the Internet. TCP works over the IP “packets” and provides a bidirectional tunnel for bytes; it works for every byte values and send them into two streams which can operate simultaneously. TCP handles the hard work of splitting the data into packets, acknowledging them, reassembling them back into their right order, while removing duplicates and reemitting lost packets. From the point of view of the application which uses TCP, there are just two streams, and the packets are invisible; in particular, the streams are not split into “messages” (it is up to the application to take its own encoding rules if it wishes to have messages, and that’s precisely what HTTP does).

TCP is reliable in the presence of “accidents”, i.e. transmission errors due to flaky hardware, network congestion, people with smartphones who walk out range of a given base station, and other non-malicious events. However, an ill-intentioned individual (the “attacker”) with some access to the transport medium could read all the transmitted data and/or alter it intentionally, and TCP does not protect against that. Hence SSL.

SSL assumes that it works over a TCP-like protocol, which provides a reliable stream; SSL does not implement reemission of lost packets and things like that. The attacker is supposed to be in power to disrupt communication completely in an unavoidable way (for instance, he can cut the cables) so SSL’s job is to:

  • detect alterations (the attacker must not be able to alter the data silently);
  • ensure data confidentiality (the attacker must not gain knowledge of the exchanged data).

SSL fulfills these goals to a large (but not absolute) extent.


SSL is layered and the bottom layer is the record protocol. Whatever data is sent in a SSL tunnel is split into records. Over the wire (the underlying TCP socket or TCP-like medium), a record looks like this:

HH V1:V2 L1:L2 data


  • HH is a single byte which indicates the type of data in the record. Four types are defined:change_cipher_spec (20), alert (21), handshake (22) and application_data (23).
  • V1:V2 is the protocol version, over two bytes. For all versions currently defined, V1 has value 0x03, while V2 has value 0x00 for SSLv3, 0x01 for TLS 1.0, 0x02 for TLS 1.1 and 0x03 for TLS 1.2.
  • L1:L2 is the length of data, in bytes (big-endian convention is used: the length is 256*L1+L2). The total length of data cannot exceed 18432 bytes, but in practice it cannot even reach that value.

So a record has a five-byte header, followed by at most 18 kB of data. The data is where symmetric encryption and integrity checks are applied. When a record is emitted, both sender and receiver are supposed to agree on which cryptographic algorithms are currently applied, and with which keys; this agreement is obtained through the handshake protocol, described in the next section. Compression, if any, is also applied at that point.

In full details, the building of a record works like this:

  • Initially, there are some bytes to transfer; these are application data or some other kind of bytes. This payload consists of at most 16384 bytes, but possibly less (a payload of length 0 is legal, but it turns out that Internet Explorer 6.0 does not like that at all).
  • The payload is then compressed with whatever compression algorithm is currently agreed upon. Compression is stateful, and thus may depend upon the contents of previous records. In practice, compression is either “null” (no compression at all) or “Deflate” (RFC 3749), the latter being currently courteously but firmly shown the exit door in the Web context, due to the recent CRIME attack. Compression aims at shortening data, but it must necessarily expand it slightly in some unfavourable situations (due to the pigeonhole principle). SSL allows for an expansion of at most 1024 bytes. Of course, null compression never expands (but never shortens either); Deflate will expand by at most 10 bytes, if the implementation is any good.
  • The compressed payload is then protected against alterations and encrypted. If the current encryption-and-integrity algorithms are “null”, then this step is a no-operation. Otherwise, aMAC is appended, then some padding (depending on the encryption algorithm), and the result is encrypted. These steps again induce some expansion, which the SSL standard limits to 1024 extra bytes (combined with the maximum expansion from the compression step, this brings us to the 18432 bytes, to which we must add the 5-byte header).

The MAC is, usually, HMAC with one of the usual hash functions (mostly MD5, SHA-1 or SHA-256)(with SSLv3, this is not the “true” HMAC but something very similar and, to the best of our knowledge, as secure as HMAC). Encryption will use either a block cipher in CBC mode, or theRC4 stream cipher. Note that, in theory, other kinds of modes or algorithms could be employed, for instance one of these nifty modes which combine encryption and integrity checks; there are even some RFC for that. In practice, though, deployed implementations do not know of these yet, so they do HMAC and CBC. Crucially, the MAC is first computed and appended to the data, and the result is encrypted. This is MAC-then-encrypt and it is actually not a very good idea. The MAC is computed over the concatenation of the (compressed) payload and a sequence number, so that an industrious attacker may not swap records.


The handshake is a protocol which is played within the record protocol. Its goal is to establish the algorithms and keys which are to be used for the records. It consists of messages. Each handshake message begins with a four-byte header, one byte which describes the message type, then three bytes for the message length (big-endian convention). The successive handshake messages are then sent with records tagged with the “handshake” type (first byte of the header of each record has value 22).

Note the layers: the handshake messages, complete with four-byte header, are then sent as records, and each record also has its own header. Furthermore, several handshake messages can be sent within the same record, and a given handshake message can be split over several records. From the point of view of the module which builds the handshake messages, the “records” are just a stream on which bytes can be sent; it is oblivious to the actual split of that stream into records.

Full Handshake

Initially, client and server “agree upon” null encryption with no MAC and null compression. This means that the record they will first send will be sent as cleartext and unprotected.

First message of a handshake is a ClientHello. It is the message by which the client states its intention to do some SSL. Note that “client” is a symbolic role; it means “the party which speaks first”. It so happens that in the HTTPS context, which is HTTP-within-SSL-within-TCP, all three layers have a notion of “client” and “server”, and they all agree (the TCP client is also the SSL client and the HTTP client), but that’s kind of a coincidence.

The ClientHello message contains:

  • the maximum protocol version that the client wishes to support;
  • the “client random” (32 bytes, out of which 28 are suppose to be generated with a cryptographically strong number generator);
  • the “session ID” (in case the client wants to resume a session in an abbreviated handshake, see below);
  • the list of “cipher suites” that the client knows of, ordered by client preference;
  • the list of compression algorithms that the client knows of, ordered by client preference;
  • some optional extensions.

A cipher suite is a 16-bit symbolic identifier for a set of cryptographic algorithms. For instance, the TLS_RSA_WITH_AES_128_CBC_SHA cipher suite has value 0x002F, and means “records use HMAC/SHA-1 and AES encryption with a 128-bit key, and the key exchange is done by encrypting a random key with the server’s RSA public key”.

The server responds to the ClientHello with a ServerHello which contains:

  • the protocol version that the client and server will use;
  • the “server random” (32 bytes, with 28 random bytes);
  • the session ID for this connection;
  • the cipher suite that will be used;
  • the compression algorithm that will be used;
  • optionally, some extensions.

The full handshake looks like this:

  Client                                               Server

  ClientHello                  -------->
                               <--------      ServerHelloDone
  Finished                     -------->
                               <--------             Finished
  Application Data             <------->     Application Data

(This schema has been shamelessly copied from the RFC.)

We see the ClientHello and ServerHello. Then, the server sends a few other messages, which depend on the cipher suite and some other parameters:

  • Certificate: the server’s certificate, which contains its public key. More on that below. This message is almost always sent, except if the cipher suite mandates a handshake without a certificate.
  • ServerKeyExchange: some extra values for the key exchange, if what is in the certificate is not sufficient. In particular, the “DHE” cipher suites use an ephemeral Diffie-Hellman key exchange, which requires that message.
  • CertificateRequest: a message requesting that the client also identifies itself with a certificate of its own. This message contains the list of names of trust anchors (aka “root certificates”) that the server will use to validate the client certificate.
  • ServerHelloDone: a marker message (of length zero) which says that the server is finished, and the client should now talk.

The client must then respond with:

  • Certificate: the client certificate, if the server requested one. There are subtle variations between versions (with SSLv3, the client must omit this message if it does not have a certificate; with TLS 1.0+, in the same situation, it must send a Certificate message with an empty list of certificates).
  • ClientKeyExchange: the client part of the actual key exchange (e.g. some random value encrypted with the server RSA key).
  • CertificateVerify: a digital signature computed by the client over all previous handshake messages. This message is sent when the server requested a client certificate, and the client complied. This is how the client proves to the server that it really “owns” the public key which is encoded in the certificate it sent.

Then the client sends a ChangeCipherSpec message, which is not a handshake message: it has its own record type, so it will be sent in a record of its own. Its contents are purely symbolic (a single byte of value 1). This message marks the point at which the client switches to the newly negotiated cipher suite and keys. The subsequent records from the client will then be encrypted.

The Finished message is a cryptographic checksum computed over all previous handshake messages (from both the client and server). Since it is emitted after the ChangeCipherSpec, it is also covered by the integrity check and the encryption. When the server receives that message and verifies its contents, it obtains a proof that it has indeed talked to the same client all along. This message protects the handshake from alterations (the attacker cannot modify the handshake messages and still get the Finished message right).

The server finally responds with its own ChangeCipherSpec then Finished. At that point, the handshake is finished, and the client and server may exchange application data (in encrypted records tagged as such).

To remember: the client suggests but the server chooses. The cipher suite is in the hands of the server. Courteous servers are supposed to follow the preferences of the client (if possible), but they can do otherwise and some actually do (e.g. as part of protection against BEAST).

Abbreviated Handshake

In the full handshake, the server sends a “session ID” (i.e. a bunch of up to 32 bytes) to the client. Later on, the client can come back and send the same session ID as part of his ClientHello. This means that the client still remembers the cipher suite and keys from the previous handshake and would like to reuse these parameters. If the server also remembers the cipher suite and keys, then it copies that specific session ID in its ServerHello, and then follows the abbreviated handshake:

  Client                                                Server

  ClientHello                   -------->
                                <--------             Finished
  Finished                      -------->
  Application Data              <------->     Application Data

The abbreviated handshake is shorter: less messages, no asymmetric cryptography business, and, most importantly, reduced latency. Web browsers and servers do that a lot. A typical Web browser will open a SSL connection with a full handshake, then do abbreviated handshakes for all other connections to the same server: the other connections it opens in parallel, and also the subsequent connections to the same server. Indeed, typical Web servers will close connections after 15 seconds of inactivity, but they will remember sessions (the cipher suite and keys) for a lot longer (possibly for hours or even days).

Key Exchange

There are several key exchange algorithms which SSL can use. This is specified by the cipher suite; each key exchange algorithm works with some kinds of server public key. The most common key exchange algorithms are:

  • RSA: the server’s key is of type RSA. The client generates a random value (the “pre-master secret” of 48 bytes, out of which 46 are random) and encrypts it with the server’s public key. There is no ServerKeyExchange.
  • DHE_RSA: the server’s key is of type RSA, but used only for signature. The actual key exchange uses Diffie-Hellman. The server sends a ServerKeyExchange message containing the DH parameters (modulus, generator) and a newly-generated DH public key; moreover, the server signs this message. The client will respond with a ClientKeyExchange message which also contains a newly-generated DH public key. The DH yields the “pre-master secret”.
  • DHE_DSS: like DHE_RSA, but the server has a DSS key (“DSS” is also known as “DSA”). DSS is a signature-only algorithm.

Less commonly used key exchange algorithms include:

  • DH: the server’s key is of type Diffie-Hellman (we are talking of a certificate which contains a DH key). This used to be “popular” in an administrative way (US federal government mandated its use) when the RSA patent was still active (this was during the previous century). Despite the bureaucratic push, it was never as widely deployed as RSA.
  • DH_anon: like the DHE suites, but without the signature from the server. This is a certificate-less cipher suite. By construction, it is vulnerable to Man-in-the-Middle attacks, thus very rarely enabled at all.
  • PSK: pre-shared key cipher suites. The symmetric-only key exchange, building on a pre-established shared secret.
  • SRP: application of the SRP protocol which is a Password Authenticated Key Exchangeprotocol. Client and server authenticate each other with regards to a shared secret, which can be a low-entropy password (whereas PSK requires a high-entropy shared secret). Very nifty. Not widely supported yet.
  • An ephemeral RSA key: like DHE but with a newly-generated RSA key pair. Since generating RSA keys is expensive, this is not a popular option, and was specified only as part of “export” cipher suites which complied to the pre-2000 US export regulations on cryptography (i.e. RSA keys of at most 512 bits). Nobody does that nowadays.
  • Variants of the DH* algorithms with elliptic curves. Very fashionable. Should become common in the future.

Certificates and Authentication

Digital certificates are vessels for asymmetric keys. They are intended to solve key distribution. Namely, the client wants to use the server’s public key. The attacker will try to make the client use the attacker’s public key. So the client must have a way to make sure that it is using the right key.

SSL is supposed to use X.509. This is a standard for certificates. Each certificate is signed by aCertification Authority. The idea is that the client inherently knows the public keys of a handful of CA (these are the “trust anchors” or “root certificates”). With these keys, the client can verify the signature computed by a CA over a certificate which has been issued to the server. This process can be extended recursively: a CA can issue a certificate for another CA (i.e. sign the certificate structure which contains the other CA name and key). A chain of certificates beginning with a root CA and ending with the server’s certificate, with intermediate CA certificates in between, each certificate being signed relatively to the public key which is encoded in the previous certificate, is called, unimaginatively, a certificate chain.

So the client is supposed to do the following:

  • Get a certificate chain ending with the server’s certificate. The Certificate message from the server is supposed to contain, precisely, such a chain.
  • Validate the chain, i.e. verifying all the signatures and names and the various X.509 bits. Also, the client should check revocation status of all the certificates in the chain, which is complex and heavy (Web browsers now do it, more or less, but it is a recent development).
  • Verify that the intended server name is indeed written in the server’s certificate. Because the client does not only want to use a validated public key, it also wants to use the public key of a specific server. See RFC 2818 for details on how this is done in a HTTPS context.

The certification model with X.509 certificates has often been criticized, not really on technical grounds, but rather for politico-economic reasons. It concentrates validation power into the hands of a few players, who are not necessarily well-intentioned, or at least not always competent. Now and again, proposals for other systems are published (e.g. Convergence or DNSSEC) but none has gained wide acceptance (yet).

For certificate-based client authentication, it is entirely up to the server to decide what to do with a client certificate (and also what to do with a client who declined to send a certificate). In the Windows/IIS/Active Directory world, a client certificate should contain an account name as a “User Principal Name” (encoded in a Subject Alt Name extension of the certificate); the server looks it up in its Active Directory server.

Handshake Again

Since a handshake is just some messages which are sent as records with the current encryption/compression conventions, nothing theoretically prevents a SSL client and server from doing a second handshake within an established SSL connection. And, indeed, it is supported and it happens in practice.

At any time, the client or the server can initiate a new handshake (the server can send a HelloRequest message to trigger it; the client just sends a ClientHello). A typical situation is the following:

  • An HTTPS server is configured to listen to SSL requests.
  • A client connects and a handshake is performed.
  • Once the handshake is done, the client sends its “applicative data”, which consists of a HTTP request. At that point (and at that point only), the server learns the target path. Up to that point, the URL which the client wishes to reach was unknown to the server (the server mighthave been made aware of the target server name through a Server Name Indication SSL extension, but this does not include the path).
  • Upon seeing the path, the server may learn that this is for a part of its data which is supposed to be accessed only by clients authenticated with certificates. But the server did not ask for a client certificate in the handshake (in particular because not-so-old Web browsers displayed freakish popups when asked for a certificate, in particular if they did not have one, so a server would refrain from asking a certificate if it did not have good reason to believe that the client has one and knows how to use it).
  • Therefore, the server triggers a new handshake, this time requesting a certificate.

There is an interesting weakness in the situation I just described; see RFC 5746 for a workaround. In a conceptual way, SSL transfers security characteristics only in the “forward” way. When doing a new handshake, whatever could be known about the client before the new handshake is still valid after (e.g. if the client had sent a good username+password within the tunnel) but not the other way round. In the situation above, the first HTTP request which was received before the new handshake is not covered by the certificate-based authentication of the second handshake, and it would have been chosen by he attacker ! Unfortunately, some Web servers just assumed that the client authentication from the second handshake extended to what was sent before that second handshake, and it allowed some nasty tricks from the attacker. RFC 5746 attempts at fixing that.


Alert messages are just warning and error messages. They are rather uninteresting except when they could be subverted from some attacks (see later on).

There is an important alert message, called close_notify: it is a message which the client or the server sends when it wishes to close the connection. Upon receiving this message, the server or client must also respond with a close_notify and then consider the tunnel to be closed (but thesession is still valid, and can be reused in an ulterior abbreviated handshake). The interesting part is that these alert messages are, like all other records, protected by the encryption and MAC. Thus, the connection closure is covered by the cryptographic umbrella.

This is important in the context of (old) HTTP, where some data can be sent by the server without an explicit “content-length”: the data extends until the end of the transport stream. Old HTTP with SSLv2 (which did not have the close_notify) allowed an attacker to force a connection close (at the TCP level) which the client would have taken for a normal close; thus, the attacker could truncate the data without being caught. This is one of the problems with SSLv2 (arguably, the worst) and SSLv3 fixes it. Note that “modern” HTTP uses “Content-Length” headers and/or chunked encoding, which is not vulnerable to such truncation, even if the SSL layer allowed it. Still, it is nice to know that SSL offers protection on closure events.


There is a limit on Stack Exchange answer length, so the description of some attacks on SSL will be in another answer (besides, I have some pancakes to cook). Stay tuned.

iOS Automatic Reference Counting (ARC)

copied from

With iOS 5 Apple has released the most appealing language feature, automatic reference counting or ARC. It makes memory management a lot simpler because you no longer have to manually keep a count of object references – the compiler takes care of it.

Manual Reference Counting

If you have been developing on the earlier versions of iOS 4 then you are probably familiar with reference counting. In a manual reference counting environment you have to keep track of object ownership. If you own an object you must make sure to release it, if you don’t then you will have a memory leak, which eventually leads to app crashes. Aside from crashes or leaks keeping count of the retain, release and autorelease is cumbersome.

Here’s a typical example, you allocate a new object called obj1, then you point obj2 to obj1. At this point you have to make sure that you retain obj2, which increases the reference count. In addition, you have to make sure you send a release to each object to free memory once the retain count goes down to zero. This is a simple example but in a normal sized app this can get very complicated, very quickly.

Manual reference counting

Automatic Reference Counting (ARC)

The above example is simplified when using ARC.

MyClass *obj1 = [[MyClass alloc] init];
MyClass *obj2 = obj1;

As you can see, you still allocate obj1 and then point obj2 to obj1. The retains and releases are still occurring, although now it all happens behind the scenes. It is the compiler’s job to maintain the life of an object and to ensure that it is appropriately deallocated when no longer in use. Since this is not garbage collection, you do not sacrifice speed. Your app is just as fast and now with less maintenance. You get to focus on the features of your app and not get distracted by memory management.

Strong vs Weak

With ARC the only thing you have to consider are the qualifiers Strong and Weak.

The strong qualifier is set by default. When using this qualifier you are instructing compiler that you would like to retain an abject for the current event cycle.

You can still use retain, however it is recommended that you use strong instead.

The weak qualifier also known as a zeroing weak reference, instructs the compiler that you do not need to retain the object. And if all the references to this object go down to zero then the object is released and set to nil. This is important because a message send to a nil object does not cause a crash, it simply doesn’t do anything.

You can still use assign, however it is recommended that you use weak instead because it will set a deallocated object to nil.

A weak qualifier is especially used in a parent child object relationship, where the parent has a strong reference to a child object. And the child object a weakreference back to parent otherwise you will end up creating a circular reference.

Weak references

Upgrading to ARC

Upgrading an existing project to use ARC is very simple. Open up your project in Xcode 4.2 and from the menu select Edit > Refactor > Convert to Objective-C ARC. The conversion process first performs a pre-flight check and then ensures that you are ready to upgrade to ARC. It then gives you a preview of all the changes it will make to your code before finally committing the changes.

Upgrading to ARC

Further Reading

This is just a preview of the changes that come with ARC. There are plenty of little nuances that you can read about in the articles below:
Technical specification of ARC
Transition to ARC release notes

Introduction to Sockets

Originally at

If you’re a beginner to networking, this is the place to start. Working with a socket can be very different from working with a file, even though the APIs may be similar. A little bit of investment in your knowledge and understanding of networking fundamentals can go a long way. And it can save you a lot of time and frustration in the long run.

We will keep it brief, and will maintain a focus on developers: just what developers need to accomplish their goal, while not skipping important fundamentals that could later cause problems.

Sockets, Ports, and DNS – Oh My!

In networking parlance, a computer is a host for a number of sockets. A socket is one end of a communication channel called a network connection; the other end is another socket. From its own point of view, any socket is the local socket, and the socket at the other end of the connection is the remote socket.

To establish the connection, one of the two sockets must contact the other socket. To make contact the socket must know the other socket’s address. Every socket has an address. The address consists of two parts: the host address and the port number. The host address is the IP address of the computer, and the port number uniquely identifies each socket hosted on the computer.

A computer can have multiple host addresses because it can have multiple networking interfaces. For example, a computer might be equipped with an ethernet card, a modem, a WiFi card, a VPN connection, Bluetooth, etc. And in addition to all this, there is a special interface for connecting to itself (called “loopback” or sometimes “localhost”).

An address such as “” corresponds to a host address, but it is not a host address itself. It is a DNS entry or DNS name, which is converted to a host address by a DNS look-up operation. One can think of DNS like a phone book. If you wanted to call someone, but didn’t know their number, you could lookup their number in the phone book. Their name is matched to a phone number. Similarly, DNS matches a name (such as “”) to an IP address.

Networking Huh?

The crux of the problem is that the network you’ll be communicating over is unreliable. Perhaps you’re sending data out over the Internet. Maybe it’s going to be sent via WiFi, or some cellular connection. Or maybe it’s going to be sent into space via a satellite. You might not even know.

But let’s assume for a moment that you did know. Let’s assume you knew that all communication was going to take place over regular ethernet, within a closed business network. The communication would be 100% reliable right? Wrong. And I’m not referring to cut wires or power outages either.

All data that gets sent or received gets broken into little packets. These packets then get pumped onto the network, and arrive at routers which have to decide where they go. But during bursts of traffic, a router might get overloaded with packets. Packets are coming in faster than the router can figure out where to route them. What happens? The same thing that happens millions of times a day all over the world: the router starts dropping packets.

In addition to lost packets on the network, the receiving computer might be forced to drop packets too. Perhaps the computer is overloaded, or the receiving application isn’t reading the data from the OS fast enough. There’s also the potential that the packet was corrupted during transmission, perhaps from electrical interference. And all of this is without getting into other issues introduced by things like the WiFi or the Internet.

If you’re new to networking, you might be thinking that it’s a miracle that everything works as well as it does. The fact is, the miracle is derived from the networking protocols that have been perfected over the last several decades, and from the developers that understand them and use them effectively. (That’s you!)

Bring on the Protocols

You can probably list dozens of protocols that have something to do with computer networking:


But every single one of these protocols is layered on top of another protocol that handles the networking for it. These lower level protocols handle the majority of the networking aspect so that the application layer protocol (those listed above) can focus on the application aspect.

The “application layer protocols” listed above are layered on top of a “transport layer protocol”. And of all the protocols listed above, there are only two transport layer protocols that are used: TCP and UDP.


The User Datagram Protocol (UDP) is the simpler of the two. You can only put a small amount of data into a UDP packet, and then you send it on its way. And then… that’s it. There is no guarantee that the message will arrive. And if you send multiple packets back-to-back, there is no guarantee that they will arrive in order. Seems pretty unreliable, no? But it’s weakness is also its strength. If you are sending time-sensitive data, such as audio in a VoIP call, then you don’t want your transport protocol wasting time retransmitting lost packets since the lost audio would arrive too late to be played anyway. In fact, streaming audio and video are some of the biggest uses for UDP.

UDP also has an advantage that it doesn’t require a “connection handshake”. Think about it like this: If you were sitting on a train, and you wanted to have a long conversation with the stranger next to you, you would probably start with an introduction. Something like, “Where are you heading? Oh yeah, I’m heading in that direction too. My name’s Robbie, what’s yours?” But if you just wanted to know what the time was, then you could skip the introduction. You wouldn’t be expected to tell the stranger your name. You could just say, “Excuse me, do you have the time?” To which the stranger could quickly respond, and you could both go back to doing whatever you were doing. This is why a protocol like DNS uses UDP. That way your computer can say, “Excuse me, what is the IP of” And the server can quickly respond.


The Transmission Control Protocol (TCP) is probably the protocol you use the most. Whether you’re browsing the web, checking your email, or sending instant messages to friends, you’re probably using TCP.

TCP is designed for “long conversations”. So there is an initial connection handshake, and after that data can flow back and forth for as long as necessary. But the great thing about TCP is that it was designed to make communication reliable in the face of an unreliable network. So it does all kinds of really cool stuff for us. If you send some information over TCP, and part of it gets lost, the protocol will automatically figure out what got lost and resend it. And when you send information, TCP makes sure that information always arrives in the correct order. But wait, there’s more! The protocol will also detect congestion in the network, and automatically scale accordingly so everybody can share.

So there are a lot of great reasons to use TCP, and it fits in nicely with a lot of networking tasks. Plus there is no limit to the amount of data you can send via TCP. It is designed to be an open stream of data flowing in both/either direction. It is simply up to the application layer to determine what that data looks like.

Where do we fit in?

So… UDP and TCP… how do we use them? Is that what the CocoaAsyncSocket libraries provide? Implementations of TCP and UDP? Nope, not quite. As you can imagine, TCP and UDP are used all over the place. So naturally they are provided by the operating system. If you open up your terminal and type “man socket” you can see the low level BSD socket API. The libraries are essentially wrappers that sits on top of low-level socket API’s and provide you, the developer, an easy to use framework in Objective-C.

So CocoaAsyncSocket provides a great API that simplifies networking for you. But networking can still be tricky, so we recommend you read the following before you get started:

TCP is a stream

Copied from

The TCP protocol is modeled on the concept of a single continuous stream of unlimited length. This is a very important concept to understand, and is the number one cause of confusion that we see.

What exactly does this mean, and how does it affect developers?

Imagine that you’re trying to send a few messages over the socket. So you do something like this (in pseudocode):

socket.write("Hi Sandy.");
socket.write("Are you busy tonight?");

How does the data show up on the other end? If you think the other end will receive two separate sentences in two separate reads, then you’ve just fallen victim to a common pitfall! Gasp! Read on.

TCP does not treat the writes as separate data. TCP considers all writes to be part of a single continuous stream. So when you issue the above writes, TCP will simply copy the data into its buffer:

TCP_Buffer = “Hi Sandy.Are you busy tonight?”

and then proceed to send the data as fast as possible. And in order to send data over the network, TCP and other networking protocols will be required to break that data into small pieces that can be transmitted over the medium (ethernet, WiFi, etc). In doing so, TCP may break apart the data in any way it sees fit. Here are some examples of how that data might be broken apart and sent:

  1. “Hi San” , “dy.Ar” , “e you ” , “busy to” , “night?”
  2. “Hi Sandy.Are you busy” , ” tonight?”
  3. “Hi Sandy.Are you busy tonight?”

The above examples also demonstrate how the data will arrive at the other end. Let’s consider example 1 for a moment.

Sandy has issued a command, and is waiting for data to arrive. So the result of her first read might be “Hi San”. Sandy will likely begin to process that data. And while the application is processing the data, the TCP stream continues to receive the 2nd and 3rd packet. Sandy then issues another command, and this time she gets “dy.Are you “.

This highlights the continuous stream nature of TCP. The TCP protocol, at the developer API level, has absolutely no concept of packets or separation of data.

But isn’t this a major shortcoming? How do all those other protocols that use TCP work?

HTTP is a great example because it’s so simple, and because most everyone has seen it before. When a client connects to a server and sends a request, it does so in a very specific manner. It sends an HTTP header, and each line of the header is terminated with a CRLF (carriage return, line feed). So something like this:

GET /page.html HTTP/1.1

Furthermore, the end of the HTTP header is signaled by two CRLF’s in a row. Since the protocol specifies the terminators, it is easy to read data from a TCP socket until the terminators are reached.

Then the server sends the response:

HTTP/1.1 200 OK
Content-Length: 216

{ Exactly 216 bytes of data go here }

Again, the HTTP protocol makes it easy to use TCP. Read data until you get back-to-back CRLF. That’s your header. Then parse the content-length from the header, and now you can simply read a certain number of bytes.

Returning to our original example, we could simply use a designated terminator for our messages:

socket.write("Hi Sandy.\n");
socket.write("Are you busy tonight?\n");

And if Sandy was using AsyncSocket she would be in luck! Because AsyncSocket provides really easy-to-use read methods that allow you to specify the terminator to look for. AsyncSocket does the rest for you, and would deliver two separate sentences in two separate reads!


What happens when you write data to a TCP socket? When the write is complete, does that mean the other party received that data? Can we at least assume the computer has sent the data? The answer is NO and NO.

Recall two things:

  • All data sent and received must get broken into little pieces in order to send it over the network.
  • TCP handles a lot of complicated issues such as resending lost packets, and providing in-order delivery so information arrives in the proper sequence.

So when you issue a write, the data is simply copied into an underlying buffer within the OS networking stack. At that point the TCP software will begin its magic, which consists of all the cool stuff mentioned earlier such as:

  • breaking the data into small pieces such that they can be sent over the network
  • ensuring that lost pieces get properly resent
  • ensuring that your data arrives at the remote destination in the proper order
  • watching out for congestion in the network
  • employing fancy algorithms to accomplish all of this as fast as possible

So when you issue the command, “write this data” the operating system responds with “I have your data, and I will do everything in my power to deliver this to the remote destination.”

BUT… how do I know when the remote destination has received my data?

And this is exactly where most people run into problems. A good way to think about it is like this:

Imagine you want to send a letter to a friend. Not an email, but the traditional snail mail. You know, through the post office. So you write the letter and put it in your mailbox. The mailman later comes by and picks it up. You can rest assured at this point that the post office will make every effort to deliver the letter to your friend. But how do you know for sure if your friend received the letter? I suppose if the letter came back with a “return to sender” stamped on it you can be certain your friend didn’t receive it. But what if it doesn’t come back? Is it enough to know that it made it into your friend’s mailbox? (Assume this is a really, really important letter.) The answer is no. Maybe it never leaves the mailbox. Maybe his roommate picks it up and accidentally throws it away. And if the roommate was responsible and left the letter on your friends desk? Would that be enough? What if your friend was on vacation and your letter gets lost in a pile of junk mail? So the only way to truly know if your friend received the letter is when you receive their response.

This is a great metaphor for sockets. When you write data to a socket, that is like putting the letter in the mailbox. The operating system is like the local mailman that comes by and picks up the letter. The giant post office system that routes the letter toward its destination is like the network. And the mailman that drops off your letter in your friends mailbox is like the operating system on your friends computer. It is then up to the application on your friends computer to read the data from the OS and process it (fetch the letter from the mailbox, and actually read it).

So how do I know when the remote destination has received my data? This is not something that TCP can tell you. At best, it can only tell you that the letter was delivered into their mailbox. It can’t tell you if the application has read that data and processed it. Maybe the application on the remote side crashed. Or maybe the remote user quit the application before it had a chance to read the data. Or maybe the remote user experienced a power outage. Long story short, it is up to the application layer to answer this question if need be.

ARM64 Function Calling Conventions

ARM64 Function Calling Conventions

In general, iOS adheres to the generic ABI specified by ARM for the ARM64 architecture. However there are some choices to be made within that framework, and some divergences from it. This document describes these issues.

Choices Made Within the Generic Procedure Call Standard

Procedure Call Standard for the ARM 64-bit Architecture delegates certain decisions to platform designers. Decisions made for iOS are described below.

  • The register x18 is reserved for the platform. Conforming software should not make use of it.
  • wchar_t is 32-bit and long is a 64-bit type.
  • Where applicable, the __fp16 type is IEEE754-2008 format.
  • The frame pointer register (x29) must always address a valid frame record, although some functions—such as leaf functions or tail calls—may elect not to create an entry in this list. As a result, stack traces will always be meaningful, even without debug information.
  • Empty struct types are ignored for parameter-passing purposes. This behavior applies to the GNU extension in C and, where permitted by the language, in C++. (This issue is not directly specified by the generic procedure call standard, but a decision was required.)

Divergences from the Generic Procedure Call Standard

iOS diverges from Procedure Call Standard for the ARM 64-bit Architecture in several ways, as described here.

Argument Passing in General

  • In the generic procedure call standard, all function arguments passed on the stack consume slots in multiples of 8 bytes. In iOS, this requirement is dropped, and values consume only the space required. For example, on entry to the function in Listing 1, s0 occupies 1 byte at sp and s1 occupies 1 byte at sp+1. Padding is still inserted on the stack to satisfy arguments’ alignment requirements.

    Listing 1  Example of space occupied by values

    void two_stack_args(char w0, char w1, char w2, char w3, char w4, char w5, char w6, char w7, char s0, char s1) {}
  • The generic procedure call standard requires that arguments with 16-byte alignment passed in integer registers begin at an even-numbered xN, skipping a previous odd-numbered xN if necessary. The iOS ABI drops this requirement. For example, in Listing 2, the parameter x1_x2 does indeed get passed in x1 and x2 instead of x2 and x3.

    Listing 2  Example of 16-bit aligned arguments passed in integer registers

    void large_type(int x0, __int128 x1_x2) {}
  • The general ABI specifies that it is the callee’s responsibility to sign or zero-extend arguments having fewer than 32 bits, and that unused bits in a register are unspecified. In iOS, however, the caller must perform such extensions, up to 32 bits.

Variadic Functions

The iOS ABI for functions that take a variable number of arguments is entirely different from the generic version.

Stages A and B of the generic procedure call standard are performed as usual—in particular, even variadic aggregates larger than 16 bytes are passed via a reference to temporary memory allocated by the caller. After that, the fixed arguments are allocated to registers and stack slots as usual in iOS.

The NSRN is then rounded up to the next multiple of 8 bytes, and each variadic argument is assigned to the appropriate number of 8-byte stack slots.

The C language requires arguments smaller than int to be promoted before a call, but beyond that, unused bytes on the stack are not specified by this ABI.

As a result of this change, the type va_list is an alias for char * rather than for the struct type specified in the generic PCS. It is also not in the std namespace when compiling C++ code.

Fundamental C Types

The iOS version of the ABI has the following differences from the generic ABI in the fundamental types provided by the C language.

  • Generally, long double is a quad-precision IEEE754 binary floating-point type. In iOS, however, it is a double-precision IEEE754 binary floating-point type. In other words, long double is identical to double in iOS.
  • In iOS, as with other Darwin platforms, both char and wchar_t are signed types.

Red Zone

The ARM64 iOS red zone consists of the 128 bytes immediately below the stack pointer sp. As with the x86-64 ABI, the operating system has committed not to modify these bytes during exceptions. User-mode programs can rely on them not to change unexpectedly, and can potentially make use of the space for local variables.

In some circumstances, this approach can save an sp-update instruction on function entry and exit.

Divergences from the Generic C++ ABI

The generic ARM64 C++ ABI is specified in C++ Application Binary Interface Standard for the ARM 64-bit architecture, which is in turn based on the Itanium C++ ABI used by many UNIX-like systems.

Some sections are ELF-specific and not applicable to the underlying object format used by iOS. There are, however, some significant differences from these specifications in iOS.

Name Mangling

When compiling C++ code, types get incorporated into the names of functions in a process referred to as “mangling.” The iOS ABI differs from the generic specification in the following small ways.

  • Because va_list is an alias for char *, it is mangled in the same way—as Pc instead of St9__va_list.
  • NEON vector types are mangled in the same way as their 32-bit ARM counterparts, rather than using the 64-bit scheme. For example, iOS uses 17__simd128_int32_t instead of the generic 11__Int32x4_t.

Other Itanium Divergences

  • In the generic ABI, empty structs are treated as aggregates with a single byte member for parameter passing. In iOS, however, they are ignored unless they have a nontrivial destructor or copy-constructor. If they do have such functions, they are considered as aggregates with one byte member in the generic manner.
  • As with the ARM 32-bit C++ ABI, iOS requires the complete-object (C1) and base-object (C2) constructors to return this to their callers. Similarly, the complete object (D1) and base object (D2) destructors return this. This requirement is not made by the generic ARM64 C++ ABI.
  • In the generic C++ ABI, array cookies change their size and alignment according to the type being allocated. As with the 32-bit ARM, iOS provides a fixed layout of two size_t words, with no extra alignment requirements.
  • In iOS, object initialization guards are nominally uint64_t rather than int64_t. This affects the prototypes of the functions __cxa_guard_acquire, __cxa_guard_release and __cxa_guard_abort.
  • In the generic ARM64 ABI, function pointers whose type differ only in being extern "C" or extern "C++” are interchangeable. This is not the case in iOS.

Data Types and Data Alignment

Using the correct data types for your variables helps to maximize the performance and portability of your programs. Data alignment specifies how data is laid out in memory. A data type’s natural alignment specifies the default alignment of values of that that type.

Table 1 lists the integer data types and their sizes and natural alignment in the ARM64 environment.

Table 1  Size and alignment of integer data types
Data type Size (in bytes) Natural alignment (in bytes)
BOOL, bool 1 1
char 1 1
short 2 2
int 4 4
long 8 8
long long 8 8
pointer 8 8
size_t 8 8
NSInteger 8 8
CFIndex 8 8
fpos_t 8 8
off_t 8 8