Category Archives: Network

Introduction to Sockets

Originally at

https://github.com/robbiehanson/CocoaAsyncSocket/wiki/Intro

If you’re a beginner to networking, this is the place to start. Working with a socket can be very different from working with a file, even though the APIs may be similar. A little bit of investment in your knowledge and understanding of networking fundamentals can go a long way. And it can save you a lot of time and frustration in the long run.

We will keep it brief, and will maintain a focus on developers: just what developers need to accomplish their goal, while not skipping important fundamentals that could later cause problems.

Sockets, Ports, and DNS – Oh My!

In networking parlance, a computer is a host for a number of sockets. A socket is one end of a communication channel called a network connection; the other end is another socket. From its own point of view, any socket is the local socket, and the socket at the other end of the connection is the remote socket.

To establish the connection, one of the two sockets must contact the other socket. To make contact the socket must know the other socket’s address. Every socket has an address. The address consists of two parts: the host address and the port number. The host address is the IP address of the computer, and the port number uniquely identifies each socket hosted on the computer.

A computer can have multiple host addresses because it can have multiple networking interfaces. For example, a computer might be equipped with an ethernet card, a modem, a WiFi card, a VPN connection, Bluetooth, etc. And in addition to all this, there is a special interface for connecting to itself (called “loopback” or sometimes “localhost”).

An address such as “google.com” corresponds to a host address, but it is not a host address itself. It is a DNS entry or DNS name, which is converted to a host address by a DNS look-up operation. One can think of DNS like a phone book. If you wanted to call someone, but didn’t know their number, you could lookup their number in the phone book. Their name is matched to a phone number. Similarly, DNS matches a name (such as “google.com”) to an IP address.

Networking Huh?

The crux of the problem is that the network you’ll be communicating over is unreliable. Perhaps you’re sending data out over the Internet. Maybe it’s going to be sent via WiFi, or some cellular connection. Or maybe it’s going to be sent into space via a satellite. You might not even know.

But let’s assume for a moment that you did know. Let’s assume you knew that all communication was going to take place over regular ethernet, within a closed business network. The communication would be 100% reliable right? Wrong. And I’m not referring to cut wires or power outages either.

All data that gets sent or received gets broken into little packets. These packets then get pumped onto the network, and arrive at routers which have to decide where they go. But during bursts of traffic, a router might get overloaded with packets. Packets are coming in faster than the router can figure out where to route them. What happens? The same thing that happens millions of times a day all over the world: the router starts dropping packets.

In addition to lost packets on the network, the receiving computer might be forced to drop packets too. Perhaps the computer is overloaded, or the receiving application isn’t reading the data from the OS fast enough. There’s also the potential that the packet was corrupted during transmission, perhaps from electrical interference. And all of this is without getting into other issues introduced by things like the WiFi or the Internet.

If you’re new to networking, you might be thinking that it’s a miracle that everything works as well as it does. The fact is, the miracle is derived from the networking protocols that have been perfected over the last several decades, and from the developers that understand them and use them effectively. (That’s you!)

Bring on the Protocols

You can probably list dozens of protocols that have something to do with computer networking:

HTTP, FTP, XMPP, POP, IMAP, SMTP, DHCP, DNS, VoIP, SIP, RTP, RTCP, …

But every single one of these protocols is layered on top of another protocol that handles the networking for it. These lower level protocols handle the majority of the networking aspect so that the application layer protocol (those listed above) can focus on the application aspect.

The “application layer protocols” listed above are layered on top of a “transport layer protocol”. And of all the protocols listed above, there are only two transport layer protocols that are used: TCP and UDP.

UDP

The User Datagram Protocol (UDP) is the simpler of the two. You can only put a small amount of data into a UDP packet, and then you send it on its way. And then… that’s it. There is no guarantee that the message will arrive. And if you send multiple packets back-to-back, there is no guarantee that they will arrive in order. Seems pretty unreliable, no? But it’s weakness is also its strength. If you are sending time-sensitive data, such as audio in a VoIP call, then you don’t want your transport protocol wasting time retransmitting lost packets since the lost audio would arrive too late to be played anyway. In fact, streaming audio and video are some of the biggest uses for UDP.

UDP also has an advantage that it doesn’t require a “connection handshake”. Think about it like this: If you were sitting on a train, and you wanted to have a long conversation with the stranger next to you, you would probably start with an introduction. Something like, “Where are you heading? Oh yeah, I’m heading in that direction too. My name’s Robbie, what’s yours?” But if you just wanted to know what the time was, then you could skip the introduction. You wouldn’t be expected to tell the stranger your name. You could just say, “Excuse me, do you have the time?” To which the stranger could quickly respond, and you could both go back to doing whatever you were doing. This is why a protocol like DNS uses UDP. That way your computer can say, “Excuse me, what is the IP of google.com?” And the server can quickly respond.

TCP

The Transmission Control Protocol (TCP) is probably the protocol you use the most. Whether you’re browsing the web, checking your email, or sending instant messages to friends, you’re probably using TCP.

TCP is designed for “long conversations”. So there is an initial connection handshake, and after that data can flow back and forth for as long as necessary. But the great thing about TCP is that it was designed to make communication reliable in the face of an unreliable network. So it does all kinds of really cool stuff for us. If you send some information over TCP, and part of it gets lost, the protocol will automatically figure out what got lost and resend it. And when you send information, TCP makes sure that information always arrives in the correct order. But wait, there’s more! The protocol will also detect congestion in the network, and automatically scale accordingly so everybody can share.

So there are a lot of great reasons to use TCP, and it fits in nicely with a lot of networking tasks. Plus there is no limit to the amount of data you can send via TCP. It is designed to be an open stream of data flowing in both/either direction. It is simply up to the application layer to determine what that data looks like.

Where do we fit in?

So… UDP and TCP… how do we use them? Is that what the CocoaAsyncSocket libraries provide? Implementations of TCP and UDP? Nope, not quite. As you can imagine, TCP and UDP are used all over the place. So naturally they are provided by the operating system. If you open up your terminal and type “man socket” you can see the low level BSD socket API. The libraries are essentially wrappers that sits on top of low-level socket API’s and provide you, the developer, an easy to use framework in Objective-C.

So CocoaAsyncSocket provides a great API that simplifies networking for you. But networking can still be tricky, so we recommend you read the following before you get started:

Advertisements

TCP is a stream

Copied from 

https://github.com/robbiehanson/CocoaAsyncSocket/wiki/CommonPitfalls

The TCP protocol is modeled on the concept of a single continuous stream of unlimited length. This is a very important concept to understand, and is the number one cause of confusion that we see.

What exactly does this mean, and how does it affect developers?

Imagine that you’re trying to send a few messages over the socket. So you do something like this (in pseudocode):

socket.write("Hi Sandy.");
socket.write("Are you busy tonight?");

How does the data show up on the other end? If you think the other end will receive two separate sentences in two separate reads, then you’ve just fallen victim to a common pitfall! Gasp! Read on.

TCP does not treat the writes as separate data. TCP considers all writes to be part of a single continuous stream. So when you issue the above writes, TCP will simply copy the data into its buffer:

TCP_Buffer = “Hi Sandy.Are you busy tonight?”

and then proceed to send the data as fast as possible. And in order to send data over the network, TCP and other networking protocols will be required to break that data into small pieces that can be transmitted over the medium (ethernet, WiFi, etc). In doing so, TCP may break apart the data in any way it sees fit. Here are some examples of how that data might be broken apart and sent:

  1. “Hi San” , “dy.Ar” , “e you ” , “busy to” , “night?”
  2. “Hi Sandy.Are you busy” , ” tonight?”
  3. “Hi Sandy.Are you busy tonight?”

The above examples also demonstrate how the data will arrive at the other end. Let’s consider example 1 for a moment.

Sandy has issued a socket.read() command, and is waiting for data to arrive. So the result of her first read might be “Hi San”. Sandy will likely begin to process that data. And while the application is processing the data, the TCP stream continues to receive the 2nd and 3rd packet. Sandy then issues another socket.read() command, and this time she gets “dy.Are you “.

This highlights the continuous stream nature of TCP. The TCP protocol, at the developer API level, has absolutely no concept of packets or separation of data.

But isn’t this a major shortcoming? How do all those other protocols that use TCP work?

HTTP is a great example because it’s so simple, and because most everyone has seen it before. When a client connects to a server and sends a request, it does so in a very specific manner. It sends an HTTP header, and each line of the header is terminated with a CRLF (carriage return, line feed). So something like this:

GET /page.html HTTP/1.1
Host: google.com

Furthermore, the end of the HTTP header is signaled by two CRLF’s in a row. Since the protocol specifies the terminators, it is easy to read data from a TCP socket until the terminators are reached.

Then the server sends the response:

HTTP/1.1 200 OK
Content-Length: 216

{ Exactly 216 bytes of data go here }

Again, the HTTP protocol makes it easy to use TCP. Read data until you get back-to-back CRLF. That’s your header. Then parse the content-length from the header, and now you can simply read a certain number of bytes.

Returning to our original example, we could simply use a designated terminator for our messages:

socket.write("Hi Sandy.\n");
socket.write("Are you busy tonight?\n");

And if Sandy was using AsyncSocket she would be in luck! Because AsyncSocket provides really easy-to-use read methods that allow you to specify the terminator to look for. AsyncSocket does the rest for you, and would deliver two separate sentences in two separate reads!

Writes

What happens when you write data to a TCP socket? When the write is complete, does that mean the other party received that data? Can we at least assume the computer has sent the data? The answer is NO and NO.

Recall two things:

  • All data sent and received must get broken into little pieces in order to send it over the network.
  • TCP handles a lot of complicated issues such as resending lost packets, and providing in-order delivery so information arrives in the proper sequence.

So when you issue a write, the data is simply copied into an underlying buffer within the OS networking stack. At that point the TCP software will begin its magic, which consists of all the cool stuff mentioned earlier such as:

  • breaking the data into small pieces such that they can be sent over the network
  • ensuring that lost pieces get properly resent
  • ensuring that your data arrives at the remote destination in the proper order
  • watching out for congestion in the network
  • employing fancy algorithms to accomplish all of this as fast as possible

So when you issue the command, “write this data” the operating system responds with “I have your data, and I will do everything in my power to deliver this to the remote destination.”

BUT… how do I know when the remote destination has received my data?

And this is exactly where most people run into problems. A good way to think about it is like this:

Imagine you want to send a letter to a friend. Not an email, but the traditional snail mail. You know, through the post office. So you write the letter and put it in your mailbox. The mailman later comes by and picks it up. You can rest assured at this point that the post office will make every effort to deliver the letter to your friend. But how do you know for sure if your friend received the letter? I suppose if the letter came back with a “return to sender” stamped on it you can be certain your friend didn’t receive it. But what if it doesn’t come back? Is it enough to know that it made it into your friend’s mailbox? (Assume this is a really, really important letter.) The answer is no. Maybe it never leaves the mailbox. Maybe his roommate picks it up and accidentally throws it away. And if the roommate was responsible and left the letter on your friends desk? Would that be enough? What if your friend was on vacation and your letter gets lost in a pile of junk mail? So the only way to truly know if your friend received the letter is when you receive their response.

This is a great metaphor for sockets. When you write data to a socket, that is like putting the letter in the mailbox. The operating system is like the local mailman that comes by and picks up the letter. The giant post office system that routes the letter toward its destination is like the network. And the mailman that drops off your letter in your friends mailbox is like the operating system on your friends computer. It is then up to the application on your friends computer to read the data from the OS and process it (fetch the letter from the mailbox, and actually read it).

So how do I know when the remote destination has received my data? This is not something that TCP can tell you. At best, it can only tell you that the letter was delivered into their mailbox. It can’t tell you if the application has read that data and processed it. Maybe the application on the remote side crashed. Or maybe the remote user quit the application before it had a chance to read the data. Or maybe the remote user experienced a power outage. Long story short, it is up to the application layer to answer this question if need be.