P2P Applications
Why is P2P application interesting? Client-Server application requires a server to response to repetative file requests that may lead to server overload when a large number of clients assess the server for a few popular files. This scalability problem is resolved in P2P application in which the server transmits only one copy of a requested file. A P2P server would divide a file into equal size chunks, for ease of reconstruction of the original file, and transmits different set of chunks to different peers. The collection of peers involve in a file transfer is known as a swamp. A peer will then obtain missing chunks from other peers in the swamp. It will also transmit what it receives from the source to other peers of the swamp.
P2P systems exploits the non-symmetry properties of uplink and downlink rates at each peer. Because every peer may retransmit chunks of the file it receives, the maximum aggregate upload rate is the sum of upload rates of all peers and the server, eliminating the bottleneck rate at the server in client-server applications.
The key idea behind file transfer in client-server and P2P systems are one-to-many and many-to-many transmissions respectively. With its high scalability P2P protocols or probably its enhanced version will find their applications in many future networked applications.
Basic Operations of P2P
For P2P peers to communicate, a new peer will register its presence at a Tracter when it joins the swamp. The Tracter maintains a database of all peers, their keys and IP address in a swamp. Information of current active peers is then download from the Tracter. With the information, the peer may query its peers about which chunks they have and requests for different set of chunks from different peers.
Some peers may response positively and some may not at all, the peer will then request missing sets of chunks from other unchoke peers. Unchoke peers are those that response with rates above a specific “unchoked” threshold. Choked peers are those that response below a specific “choked” threshold. The exchanges will continue until the file download is completed.
Server architecture
Centralised and distributed architectures are available. In centalised architecture, files are stored only in a server and a new file download must originate from the cental server while file storage in distributed server architecture is spread over selected peers that act as server for specific files. Distributed file storage is obviously the choice for the sake of scalability by means of a hash table.
Distributed Hash Table
A hash table maps a given string to a value within a range 0 to n. Any string can be hashed to a particular value within the range. More than one string may map to a particular value and therefore hashing is not unique. It is many to one mapping. Hashing is often applied in security system to speed up the verification and authentication processes to security applications practical. Because hashing keyword to value is not unique, it is possible for a fraudulent key to bypass a security system.
Hashing is a many to one algorithm that allows quick mapping from a keyword to a value by which certain action may be taken. For example, one may hash the Internet Protocol (IP) address to a value to allow fast routing action. So long each legitimate address maps to a unique value in the range, routing actions can be taken. Therefore hash table is a very useful tool for quick decoding of complex information for fast actions.
In P2P distributed architecture, the hash table concept in the form of distributed hash table (DHT) has been applied to spread the storage of files across as many peers as possible. Suppose there are n peers, where n = 2k for integer k, hashing can be applied to stored a file in one of the peers for future retrieval. Using our example of n peers, a key is assigned to a file. The key is then hashed to a value m in the range 0 to n. The file is then stored in peer m. If a peer needs to retrieve this file from the P2P system, it queries the DHT to find the peer in which the file is stored and request the file from the peer.
In a system when the number of peers is less than n, then a new file will be stored in a peer closest to the hashed value. When the key of a file hashes to a missing peer, the file will be stored in the next successor peer. Therefore the DHT provides a quick mapping from key-to-location hashing.
Peer Churning
So far we have neglected peer churns in which any peer may enter or leave the swamp anytime. In order to maintain the loop, every peer maintains information on its predecessor, its successor and its successor’s successor and polls its successor periodically to ensure that its successor is active. If its successor is no longer active, the peer will signal to its successor’s successor to close the loop and requests for the new successor’s successor. With this information, the loop is maintained unless two or more two sequential peers leave within a polling period.
For illustration, let’s consider an example of five sequential peers u,w,x,y and z. Peer w has u as predecessor and x as successor and x has y as successor. If x leaves, w will detect that x does not response to its poll and set y as its new successor and ask y for its successor, which is z. Then w will update its own successor information and inform y that it has become its successor. Peer y will update its predecessor as w. The logical loop is thus maintained.
Open Issues?
If more than two or more logical sequential peers left during an update period, the closed loop of peers is broken. In this case, how do we recover from this condition? How do we recover the closed loop if it becomes broken in two or more locations?
In distributed P2P architecture, storage of files is spread over all the peers. Canyou provide a solution to maximize the probability of retrieving a file in spite of peer churning?
What next?
A P2P protocol may deliver a file to a peer by means of many to many mesh connections for distributed delivery of chunks. It may be viewed as an application layer distributed multicast where a peer will retansmit the set of chunks it received to every peer in the swamp. Delay bound and out-of-sequence chunks are expected in such choatic delivery subject to underlying traffic conditions in the network layer. In this way, long playout delay at a peer is needed to build up sufficient number of contiguous chunks for probable continuous video playout. Because of the underlying best-effort Internet, smooth playout cannot be guaranteed.
Open Problem
As distributed P2P architecture depends on peers to retransmit chunks to a participating peer for file transfer, it is challenging to control the end-to-end delay as different set of chunks may take different routes to their destination. Some routes are choked and others not but the end-to-end delay dependent on the worst route. P2P file delivery is suitable for download and play video but not for real-time video such as video conferencing. In general, the playout buffer have to be sized to take into consideration of the expected end-to-end delay due to P2P transmissions. Even with this setting, the quality of experience may degrade sharply for transmission over the Internet during peak periods. Therefore, while P2P file delivery is an interesting concept, applying P2P delivery for real-time applications remains an open issue.
October 4th, 2009 at 4:31 pm
uGkgRU I want to say – thank you for this!
October 16th, 2009 at 6:03 am
Hello from Russia!
Can I quote a post in your blog with the link to you?
October 16th, 2009 at 8:29 pm
ok with proper reference to me.
February 6th, 2010 at 6:45 am
I really liked reading your post!. Quallity content. With such a valuable blog i believe you deserve to be ranking even higher in the search engines
. Check out the link in my name. That links to a tool that really helped me rank high in google. This way even more people can enjoy your posts and nothing beats a big audiance