Shared Flashcard Set

Details

COMP3017 - Lecture 3 (B) - Access Structures
COMP3017 - Lecture 3 (B) - Access Structures
25
Computer Science
Undergraduate 2
05/01/2014

Additional Computer Science Flashcards

 


 

Cards

Term
Give an example of a B+tree
Definition
[image]
Term
What is a B+tree?
Definition
  • An ordered binary tree where the leafs can have more than one node/value.
  • Root node typically kept in memory, as it is the entrance point to the index
Term
What are some characteristics of leaf nodes?
Definition
  • If index is a primary index
    • Leaf nodes are records contiaining data, stored in the order of the primary key
    • The index provides an alternative to a sequential scan
  • If the index is a secondary index
    • Leaf nodes contain pointers to the data records
    • Data can be accessed in teh sequence of the secondary key
    • A secondary index can point to any sort of data file, for example one created by hashing
Term
What are characteristics of nodes?
Definition
  • Each node is of fixed size and contains n keys with n+1 pointers
  • Minimum nodes: Don't want nodes to be too empty (efficient use of space)
    • Non-Leaf: [(n+1)/2] pointers
    • Leaf: [(n+1)/2] pointers
Term
What are the rules for B+tree rules?
Definition
  1. All leaves same distance from root (balanced tree)
  2. Pointers in leaves point to records except for "sequence pointer"
  3. Number of pointers/keys for B+tree of order n:[image]
Term
Give an example of B+tree arithmetic
Definition
  • Parameters:
    • Block size 8kb, of which b=8000 bytes available for storage of records
    • key length k=10 bytes
    • record lengths r=100 bytes (including key)
    • key pointer p=6 bytes
  • A leaf node in a primary index can accomodate lp records, where lp=floor(b/r) = 80 records
  • A leaf node in a secondary index can accomodate ls records where ls=floor((b-p)/(k+p)) = 499 records
  • A non-leaf node could accomodate i entries, where i=floor((b-p)/(k+p)=499 records

To allow for expansion, assume initial node occupancy of u, where u=0.6

 

Term
Give an example of b+tree primary index arithmetic
Definition
  • Parameters:
    • Block size 8kb, of which b=8000 bytes available for storage of records
    • key length k=10 bytes
    • record lengths r=100 bytes (including key)
    • key pointer p=6 bytes
  • For primary index(the leaf nodes hold the records)
    • A non-leaf node initially points to i*u = 299 blocks
    • Each leaf initially contains lp*u = 48 records
    • 1 level of non-leaf nodes initially points to (1p*u)*(i*u) = 14,352 records
    • 2 levels of non-leaf nodes initially point to (i*u)^2 = 89,401 blocks, and (lp*u)*(i*u)^2 = 4,291,248 records
Term
Give an example of arithmetic for B+tree secondary index
Definition
  • Parameters:
    • Block size 8kb, of which b=8000 bytes available for storage of records
    • key length k=10 bytes
    • record lengths r=100 bytes (including key)
    • key pointer p=6 bytes
  • For primary index(the leaf nodes hold the records)
    • A non-leaf node initially points to i*u = 299 blocks
    • Each leaf initially contains lp*u = 299 records
    • 1 level of non-leaf nodes initially points to (1p*u)*(i*u) = 14,352 records
    • 2 levels of non-leaf nodes initially point to (i*u)^2 = 89,401 blocks, and (lp*u)*(i*u)^2 = 26,730,899 records
Term
What should be considered in B+tree insertion?
Definition
  1. Space available in leaf
  2. Leaf overflow
  3. Non-leaf overflow
  4. New root
Term
What should be considered for B+Tree deletion?
Definition
  • Simple case
  • Re-distribute keys
  • Coalesce with sibling (Often not implemented due to complexity)
Term
What is the comparison of B-trees vs. Static indexed sequential files?
Definition
  • Btrees consume more space
    • Blocks are not contiguous
    • Fewer disk accesses for static indexes, even allowing for reorganization
  • Concurrency control is harder in B-trees
  • However, DBA does not know when to reorganize or how full to load pages of new index
Term
What are the main concepts of hashing?
Definition
  • Main memory hash table
    • Hash function h() takes a key and computes an integer value
    • Value is used to select a bucket from a bucket array
    • Bucket array contains linked lists of records
  • Secondary storage hash table
    • Stores many more records than a main memory hash table
    • Bucket array consists of disk blocks
Term
What are some hashing approaches?
Definition
  1. Hash function calculates block pointer directly or as an offset from first block
    • Requires blocks to be in fixed, consecutive locations
  2. Hash function calculates offset in aray of block pointers (directory)
    • Used for "secondary" search keys

 

Term
Give an example of a hash function
Definition
  • Key='x1 x2 ... xn' (n byte character string), b buckets
  • h: add x1+x2+ ... xn and compute sum modulo b
  • Not a particularly good function
  • Good hash function has the same expected number of keys per bucket for each bucket
Term
Should keys be kept sorted? (In hashing)
Definition
Buckets should be kept sorted if CPU time is critical and inserts/deletes are relative infrequent
Term
What is the utilisation rule of thumb in hashing?
Definition
  • Space utilisation should be between 50% and 80%
    • Utilisation = #keys used / total #keys that fit
  • If < 50% wasting space
  • If > 80% overflows significant
    • Depends on how good hash function is and on #keys/bucket

 

Term
How is it possible to cope with growth in hashing?
Definition
  • Overflows and reorganizations
  • Dynamic hashing 
    • Extensible
    • Linear
Term
What is extensible hashing?
Definition

Combines two ideas:

  • Use i of b bits output by hash function where i grows over time
  • Use directory

[image]

Term
What are the pros and cons of extensible hashing?
Definition
  • Pros:
    • Can handle growing files
      • Wish less wasted space
      • With no full reorganizations
  • Cons:
    • Indirection
      • Not bad if directory in memory
    • Directory doubles in size
      • Now it fits in memory, now it doesn't 
      • Sudden increase in disk accesses
Term
What are the main concepts of linear hashing?
Definition
  • Another dynamic hashing scheme
  • Keeps track of utilisation. U=#used slots / total #slots
    • if U>threshold, then increase m (and maybe i)
  • Combines two ideas:
    1. Use i least significant bits of hash, where i grows over time
    2. Hash file grows incrementally and linearly (unlike extensible hash file, which periodically increases)
Term
What are the pros and cons of linear hashing?
Definition
  • Pros:
    • Can handle growing files
      • With less wasted space
      • With no full reorganizations
    • No indication like extensible hashing
  • Cons:
    • Can still have overflow chains
Term
What are some key points of index vs hashing?
Definition
  • Hashing good for probes given a key:
    • SELECT {...} FROM R WHERE R.A=5
  • Indexing (Including B-trees) good for range searches):
    • SELECT FROM R WHERE R.A > 5
Supporting users have an ad free experience!