Shared Flashcard Set

Details

Comp2005 - Lecture 10 - Distributed Hash Tables (DHT)
Alejandro Saucedo - Comp2005 Lecture 10 FlashCard Set
21
Computer Science
Undergraduate 2
05/18/2013

Additional Computer Science Flashcards

 


 

Cards

Term
Why are decentralised P2P systems desirable?
Definition
  • To mitigate attacks, legal challenges, government intervention, or the central component simply being turned off
  • Reduce large loads on specific download/update servers
  • Many hobbyists don't have access to big servers with fat bandwidth
  • Exploit millions of devices in home networks
Term
What are the goals of a Distributed Hash Table?
Definition
  • A hash table stored across a large set of nodes
    • Store K keys across N nodes
  • Seek to build a distributed P2P index where 
    • Each node keeps a small amount of information about other nodes
      • Intentionally do not have whole index at all nodes
    • Nodes can look up entries in the index quickly
    • Each node can use the index at the same time, and as nodes join and leave
      • This allows performance to grow with the number of nodes
  • The goal of the DHT is to be able to find the node responsible for holding data about given key
    • Data is location of the content peer(s), as per a tracker
Term
What are some considerations of Distributed Hash Tables (DHT)?
Definition
  • Scalable - Potentially millions of nodes
  • Robust to faults and attacks - implies some replication within the system
  • Allow nodes to come and go freely - Without refreshing the keys
  • Every node can be used to find the answer (Key, value) - want to be able to start a query at any node
  • The (key, value) is not placed on an arbitrary node, but a node identified by a hash of the key
  • The hash funtion should have uniform distribution
  • Don't want many keys in one node and none in other
Term
What are some DHT solutions available?
Definition
  • Chord
  • CAN
  • Pastry
  • Tapestry
Term
What does Chord provide?
Definition
  • Can provide a way for client to find a key in a collection of distributed nodes
  • And then the client can start talking to those peer(s) to fetch/offer the content (chunks) it is interested in
  • Stores the peer as the value in a key value pair, and store that key at some node in the DHT
Term
What are the principles of Chord?
Definition
  • At any time, a DHT instance will have N active nodes
    • Each node runs on an IP address/port
      • Can be IPv4 or IPv6 addresses
    • Each node stores part of the overall index
    • For N active nodes and K keys, each node is on average for K/N keys
  • Chord uses an m-bit number space for identifiers and keys
    • SHA-1 is typically the hash function (so generates 160-bit hash)
    • Node IP address (and port) hashed to the node identifier
      • In practice identifier is the hash modulo m
      • m should be large enough to make collisions improbable
      • Key (perhaps a torrent file) hashed to the key identifier
Term
How are keys mappend to nodes in Chord?
Definition
  • Chord maps the node identifier space to a circle with 2^m potential node locations
    • There can thus be up to 2^m active nodes in the system
    • This also means that node 0 is adjacent to node (2^m)-1
  • Chord maps keys to the same circular space
    • uses the same hash function (modulo m)
    • Each node has a predecessor (Anti-clockwise) and a successor (clockwise) in the form of a node identifier and an IP address/port
  • Key k allocated to first active node with an equal or higher identifier
    • The node n that becomes responsible for k is thus at position successor(k) in the ring
    • That node may hav ea bucket(list) with multiple keys
    • Not every potential node location is active
Term
Give an abstract explanaiton of Chord
Definition
  • You want to find the value held by a given key k
    • Your client can query at least one node in the DHT
  • The value is held by the node n whose identifier is successor(k)
  • But you don't know the IP/port of n to connect directly to it
    • If all nodes held the locations of all other nodes, it would be easy
    • But there is a big cost of doing that, keeping it up to date and consistent
  • Instead, each active node keeps just the locations (IP/port) of a small number of other nodes, and you query through the nodes to resolve the lookup
    • Nodes know of predecessor, sucessor and finger table entries
Term
Give an example of Chord successor
Definition
[image]
Term
What is bootstraping?
Definition
  • Join function assumes an existing Chord ring is in place
  • A small number of co-operating nodes can initiate a ring, with successor and predecessor information
  • A node wishing to join must know about at least one node in the system
    • If node n joins, it can ask any node for successor(n)
    • Can then notify successor(n) that it is its new predecessor and successor(n)'s former predecessor becomes the predecessor of n
    • Then the relevant key information needs to be transferred
Term
Show a diagram for transferring keys when joining a new node
Definition
[image]
Term
What happens when a node leaves?
Definition
  • All keys held by the node need to be passed to its successor
    • If node 10 leaves, keys that hash to identifiers 9 and 10 need to be managed by node 14
  • No problem if node leaves gracefully
    • Messages can be exchanged to ensure keys are passed
    • On average move K/N keys
  • If it does not there is a problem
    • Each node maintains a successor list of its r nearest successors (it also keeps a list of r predecessors)
    • So for example if its direct successor leaves abruptly it can contact the next sucessor in the list
    • Can verify its list by asking its successor for its list
Term
What does a Chord finger table consist of?
Definition
  • Makes DHT search more efficient
    • Form of 'routing' through identifier space
    • Not required, but improves efficiency significant
  • A node's finger table has m entries, indexed 0...m-1
    • Each entry identifies another active node
    • Each entry has two fields
      • Node n and successor(n)
  • To set the value for entry i in the table for node n:
    • start = (n + 2^i) modulo 2^m
    • Then entry i = successor(start(i))
    • The entry holds both the identifier and IP/port of the node
Term
How is the finger table used?
Definition
  • Looking up the location of a key k starting at node n
  • Node examines route to the request to the closest predecessor of k
  • If key identifier k is between n and successor(n)
    • Then key is held by successor(n)
  • Otherwise examine the finger table to find the entry to advance in search towards k
    • Request is then forwarded to that node
    • Essentially the search moves towards the target, with the finger table typically allowing a node to foward its query half way along the remaining distance
    • Average number of lookups is log2(N)
      • Improves on average of N/2 for a simple search
Term
Show an example of a Chord finger table lookup
Definition
[image]
Term
What are some characteristics of Chord stabilisation
Definition
  • It is important that successor information is up to date
  • Each node periodically runs a process to verify its successor data and its finger table
    • e.g. a node n will ask its successor what it believes its predecessor p to be; if a different value, then p becomes a candidate for n's new successor
    • This also allows n's successor to update its predecessor to be n if necessary, if n's successor no longer has a predecessor p
  • The stabilisation messages run independently of any key lookup queries between nodes
Term
Why avoid .torrent files and use Magnet links?
Definition
  • Torrent files specify content and where tracker is
  • Magnet links
    • Hash code of torrent
      • Unique for a given torrent (and thus the content it points to)
    • DHT used to find values that are IP of peers
    • Pirate bay default for a while now
    • Can find further peers with Peer Exchange (PEX)
      • Nodes ask peers to report their peers
  • Link is simply a URI including a hash
    • Hash is entry point to a DHT 
Term
What are the principles of GNutella?
Definition
  • Other P2P approach
  • Forms dynamic overlay network
    • Each network is aware of and talks to a set of other nodes
    • Nodes may come and go
  • Used message flooding
Term
What are the principles of GNutella flooding?
Definition
  • Original GNutella floods to find its content
  • Node queries all active neighbours in the overlay to find content corresponding to particular key
    • If neighbour has answer, responds
    • Otherwise, neighbour forwards to its neighbours
    • There is a TTL on the queries
      • Decremented each time query is forwarded
    • Generates a lot of messages and many hops
Term
What does GNutella flooding consist of?
Definition
  • Sends ripple of queries through the overlay
    • Maximum TTL typically 7
    • Nodes must node which queries they have seen to avoid loops
  • Initial neighbours bootstrapped from a well known list
    • Node builds a list of clients (to max size) from quering other clients it knows
    • Concept fo 'ultra peers' was added to broaden reach
Term
How are GNutella Query responses handled?
Definition
  • Traverse back to originator
    • Back propagation process
    • Response goes in reverse path
    • Can provide some level of anonimity
Supporting users have an ad free experience!