Monthly Archives: February 2014

Network socket

http://en.wikipedia.org/wiki/Network_socket

 

network socket is an endpoint of an inter-process communication flow across a computer network. Today, most communication between computers is based on the Internet Protocol; therefore most network sockets areInternet sockets.

socket API is an application programming interface (API), usually provided by the operating system, that allows application programs to control and use network sockets. Internet socket APIs are usually based on theBerkeley sockets standard.

socket address is the combination of an IP address and a port number, much like one end of a telephone connection is the combination of a phone number and a particular extension. Based on this address, internet sockets deliver incoming data packets to the appropriate application process or thread.

 

Overview[edit]

An Internet socket is characterized by a unique combination of the following:

  • Local socket address: Local IP address and port number
  • Remote socket address: Only for established TCP sockets. As discussed in the client-server section below, this is necessary since a TCP server may serve several clients concurrently. The server creates one socket for each client, and these sockets share the same local socket address from the point of view of the TCP server.
  • Protocol: A transport protocol (e.g., TCPUDPraw IP, or others). TCP port 53 and UDP port 53 are consequently different, distinct sockets.

Within the operating system and the application that created a socket, a socket is referred to by a unique integer value called a socket descriptor. The operating system forwards the payload of incoming IP packets to the corresponding application by extracting the socket address information from the IP and transport protocol headers and stripping the headers from the application data.

In IETF Request for CommentsInternet Standards, in many textbooks, as well as in this article, the term socket refers to an entity that is uniquely identified by the socket number. In other textbooks,[1] the socket term refers to a local socket address, i.e. a “combination of an IP address and a port number”. In the original definition of socket given in RFC 147, as it was related to the ARPA network in 1971, “the socket is specified as a 32 bit number with even sockets identifying receiving sockets and odd sockets identifying sending sockets.” Today, however, socket communications are bidirectional.

On Unix-like and Microsoft Windows based operating systems the netstat command line tool may be used to list all currently established sockets and related information.

Socket types[edit]

There are several Internet socket types available:

There are also non-Internet sockets, implemented over other transport protocols, such as Systems Network Architecture (SNA).[2] See also Unix domain sockets (UDS), for internal inter-process communication.

Advertisements

Named pipe

http://en.wikipedia.org/wiki/Named_pipe

 

In computing, a named pipe (also known as a FIFO for its behavior) is an extension to the traditional pipe concept on Unix and Unix-like systems, and is one of the methods ofinter-process communication (IPC). The concept is also found in Microsoft Windows, although the semantics differ substantially. A traditional pipe is “unnamed” because it exists anonymously and persists only for as long as the process is running. A named pipe is system-persistent and exists beyond the life of the process and must be deleted once it is no longer being used. Processes generally attach to the named pipes (usually appearing as a file) to perform inter-process communication.

 

 

In Unix[edit]

Instead of a conventional, unnamed, shell pipeline, a named pipeline makes use of the filesystem. It is explicitly created using mkfifo() or mknod(), and two separate processes can access the pipe by name — one process can open it as a reader, and the other as a writer.

For example, one can create a pipe and set up gzip to compress things piped to it:

 mkfifo my_pipe
 gzip -9 -c < my_pipe > out.gz &

In a separate process shell, independently, one could send the data to be compressed:

cat file > my_pipe

The named pipe can be deleted just like any file:

rm my_pipe

A named pipe can be used to transfer information from one application to another without the use of an intermediate temporary file. For example, you can pipe the output of gzip into a named pipe like so:

 mkfifo --mode=0666 /tmp/namedPipe
 gzip --stdout -d file.gz > /tmp/namedPipe

Then load the uncompressed data into a MySQL table[1] like so:

 LOAD DATA INFILE '/tmp/namedPipe' INTO TABLE tableName;

Without this named pipe one would need to write out the entire uncompressed version of file.gz before loading it into MySQL. Writing the temporary file is both time consuming and results in more I/O and less free space on the hard drive.

PostgreSQL‘s command line terminal, psql, also supports loading data from named pipes.[2]

inode

http://en.wikipedia.org/wiki/Inode

 

In a Unix-style file system, an index node, informally referred to as an i-node, is a data structure used to represent a filesystem object, which can be one of various things including a file or a directory. Each i-node stores the attributes and disk block location(s) of the filesystem object’s data.[1] Filesystem object attributes may include manipulationmetadata (e.g. creation, access, modify time), as well as owner and permission data (e.g. group-iduser-idpermissions).[2]

Linux directory lists other filesystem objects by name, normally identifying the listed object by referring to its inode. The directory contains an entry for itself, its parent, and each of its children.

Unix domain socket

http://en.wikipedia.org/wiki/Unix_domain_socket

Unix domain socket or IPC socket (inter-process communication socket) is a data communications endpoint for exchanging data between processes executing within the same host operating system. While similar in functionality to named pipes, Unix domain sockets may be created as connection‑mode (SOCK_STREAM or SOCK_SEQPACKET) or as connectionless (SOCK_DGRAM), while pipes are streams only. Processes using Unix domain sockets do not need to share a common ancestry. The API for Unix domain sockets is similar to that of an Internet socket, but it does not use an underlying network protocol for communication. The Unix domain socket facility is a standard component of POSIX operating systems.

Unix domain sockets use the file system as their address name space. They are referenced by processes as inodes in the file system. This allows two processes to open the same socket in order to communicate. However, communication occurs entirely within the operating system kernel.

In addition to sending data, processes may send file descriptors across a Unix domain socket connection using the sendmsg() and recvmsg() system calls.

B-tree

http://en.wikipedia.org/wiki/B-tree

computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children (Comer 1979, p. 123). Unlike self-balancing binary search trees, the B-tree is optimized for systems that read and write large blocks of data. It is commonly used in databases and filesystems.

Overview[edit]

A B-tree of order 2 (Bayer & McCreight 1972) or order 5 (Knuth 1998).

In B-trees, internal (non-leaf) nodes can have a variable number of child nodes within some pre-defined range. When data is inserted or removed from a node, its number of child nodes changes. In order to maintain the pre-defined range, internal nodes may be joined or split. Because a range of child nodes is permitted, B-trees do not need re-balancing as frequently as other self-balancing search trees, but may waste some space, since nodes are not entirely full. The lower and upper bounds on the number of child nodes are typically fixed for a particular implementation. For example, in a 2-3 B-tree (often simply referred to as a 2-3 tree), each internal node may have only 2 or 3 child nodes.

Each internal node of a B-tree will contain a number of keys. The keys act as separation values which divide its subtrees. For example, if an internal node has 3 child nodes (or subtrees) then it must have 2 keys: a1 and a2. All values in the leftmost subtree will be less thana1, all values in the middle subtree will be between a1 and a2, and all values in the rightmost subtree will be greater than a2.

Usually, the number of keys is chosen to vary between d and 2d, where d is the minimum number of keys, and d+1 is the minimum degree or branching factor of the tree. In practice, the keys take up the most space in a node. The factor of 2 will guarantee that nodes can be split or combined. If an internal node has 2d keys, then adding a key to that node can be accomplished by splitting the 2d key node into two d key nodes and adding the key to the parent node. Each split node has the required minimum number of keys. Similarly, if an internal node and its neighbor each have d keys, then a key may be deleted from the internal node by combining with its neighbor. Deleting the key would make the internal node have d-1 keys; joining the neighbor would add d keys plus one more key brought down from the neighbor’s parent. The result is an entirely full node of 2d keys.

The number of branches (or child nodes) from a node will be one more than the number of keys stored in the node. In a 2-3 B-tree, the internal nodes will store either one key (with two child nodes) or two keys (with three child nodes). A B-tree is sometimes described with the parameters (d+1) — (2d+1) or simply with the highest branching order, (2d+1).

A B-tree is kept balanced by requiring that all leaf nodes be at the same depth. This depth will increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent, and results in all leaf nodes being one more node farther away from the root.

B-trees have substantial advantages over alternative implementations when otherwise the time to access the data of a node greatly exceeds the time spent processing that data, because then the cost of accessing the node may be amortized over multiple operations within the node. This usually occurs when the node data are in secondary storage such as disk drives. By maximizing the number of keys within each internal node, the height of the tree decreases and the number of expensive node accesses is reduced. In addition, rebalancing of the tree occurs less often. The maximum number of child nodes depends on the information that must be stored for each child node and the size of a full disk block or an analogous size in secondary storage. While 2-3 B-trees are easier to explain, practical B-trees using secondary storage need a large number of child nodes to improve performance.

Variants[edit]

The term B-tree may refer to a specific design or it may refer to a general class of designs. In the narrow sense, a B-tree stores keys in its internal nodes but need not store those keys in the records at the leaves. The general class includes variations such as the B+-tree and the B*-tree.

  • In the B+-tree, copies of the keys are stored in the internal nodes; the keys and records are stored in leaves; in addition, a leaf node may include a pointer to the next leaf node to speed sequential access (Comer 1979, p. 129).
  • The B*-tree balances more neighboring internal nodes to keep the internal nodes more densely packed (Comer 1979, p. 129). This variant requires non-root nodes to be at least 2/3 full instead of 1/2 (Knuth 1998, p. 488). To maintain this, instead of immediately splitting up a node when it gets full, its keys are shared with a node next to it. When both nodes are full, then the two nodes are split into three. The act of deleting nodes is somewhat more complex than inserting however.
  • B-trees can be turned into order statistic trees to allow rapid searches for the Nth record in key order, or counting the number of records between any two records, and various other related operations.[1]

B+ tree

http://en.wikipedia.org/wiki/B+_tree

Image

Sumit tree is an n-ary tree with a variable but often large number of children per node. A B+ tree consists of a root, internal nodes and leaves.[1] The root may be either a leaf or a node with two or more children.[2]

A B+ tree can be viewed as a B-tree in which each node contains only keys (not key-value pairs), and to which an additional level is added at the bottom with linked leaves.

The primary value of a B+ tree is in storing data for efficient retrieval in a block-oriented storage context — in particular, filesystems. This is primarily because unlike binary search trees, B+ trees have very high fanout (number of pointers to child nodes in a node,[1] typically on the order of 100 or more), which reduces the number of I/O operations required to find an element in the tree.

The NTFSReiserFSNSSXFSJFS, and ReFS filesystems all use this type of tree for metadata indexing. Relational database management systems such as IBM DB2,[3] Informix,[3] Microsoft SQL Server,[3] Oracle 8,[3] Sybase ASE,[3] and SQLite[4] support this type of tree for table indices. Key-value database management systems such as CouchDB[5] and Tokyo Cabinet[6] support this type of tree for data access.

Shumit Paliwal[edit]

The order, or branching factor, b of a B+ tree measures the capacity of nodes (i.e., the number of children nodes) for internal nodes in the tree. The actual number of children for a node, referred to here as m, is constrained for internal nodes so that  \lceil b/2 \rceil \le m \le b. The root is an exception: it is allowed to have as few as two children.[1] For example, if the order of a B+ tree is 7, each internal node (except for the root) may have between 4 and 7 children; the root may have between 2 and 7. Leaf nodes have no children, but are constrained so that the number of keys must be at least  \lfloor b/2 \rfloor  and at most  b - 1 . In the situation where a B+ tree is nearly empty, it only contains one node, which is a leaf node. (The root is also the single leaf, in this case.) This node is permitted to have as little as one key if necessary, and at most b.

Node Type Children Type Min Children Max Children Example b = 7 Example b = 100
Root Node (when it is the only node in the tree) Records 1 b 1 – 7 1 – 100
Root Node Internal Nodes or Leaf Nodes 2 b 2 – 7 2 – 100
Internal Node Internal Nodes or Leaf Nodes  \lceil b/2 \rceil b 4 – 7 50 – 100
Leaf Node Records  \lfloor b/2 \rfloor b – 1 3 – 6 50 – 99

File descriptor

http://en.wikipedia.org/wiki/File_descriptor

File descriptor

From Wikipedia, the free encyclopedia

In computer programming, a file descriptor (FD) is an abstract indicator for accessing a file. The term is generally used in POSIX operating systems.

In POSIX, a file descriptor is an integer, specifically of the C type int. There are three standard POSIX file descriptors, corresponding to the three standard streams, which presumably every process (save perhaps a daemon) should expect to have:

Integer value Name
0 Standard input (stdin)
1 Standard output (stdout)
2 Standard error (stderr)

Generally, a file descriptor is an index for an entry in a kernel-resident array data structure containing the details of open files. In POSIX this data structure is called a file descriptor table, and each process has its own file descriptor table. The process passes the file descriptor to the kernel through a system call, and the kernel will access the file on behalf of the process. The process itself cannot read or write the file descriptor table directly.

On Linux, the set of file descriptors open in a process can be accessed under the path /proc/PID/fd/, where PID is the process identifier.

In Unix-like systems, file descriptors can refer to any Unix file type named in a file system. As well as regular files, this includes directoriesblock and character devices (also called “special files”), Unix domain sockets, and named pipes. File descriptors can also refer to other objects that do not normally exist in the file system, such as anonymous pipes and network sockets.

The FILE data structure in the C standard I/O library usually includes a low level file descriptor for the object in question on Unix-like systems. The overall data structure provides additional abstraction and is instead known as a file handle.