Term
Give an example of a B+tree 

Definition


Term

Definition
 An ordered binary tree where the leafs can have more than one node/value.
 Root node typically kept in memory, as it is the entrance point to the index



Term
What are some characteristics of leaf nodes? 

Definition
 If index is a primary index
 Leaf nodes are records contiaining data, stored in the order of the primary key
 The index provides an alternative to a sequential scan
 If the index is a secondary index
 Leaf nodes contain pointers to the data records
 Data can be accessed in teh sequence of the secondary key
 A secondary index can point to any sort of data file, for example one created by hashing



Term
What are characteristics of nodes? 

Definition
 Each node is of fixed size and contains n keys with n+1 pointers
 Minimum nodes: Don't want nodes to be too empty (efficient use of space)
 NonLeaf: [(n+1)/2] pointers
 Leaf: [(n+1)/2] pointers



Term
What are the rules for B+tree rules? 

Definition
 All leaves same distance from root (balanced tree)
 Pointers in leaves point to records except for "sequence pointer"
 Number of pointers/keys for B+tree of order n:[image]



Term
Give an example of B+tree arithmetic 

Definition
 Parameters:
 Block size 8kb, of which b=8000 bytes available for storage of records
 key length k=10 bytes
 record lengths r=100 bytes (including key)
 key pointer p=6 bytes
 A leaf node in a primary index can accomodate lp records, where lp=floor(b/r) = 80 records
 A leaf node in a secondary index can accomodate ls records where ls=floor((bp)/(k+p)) = 499 records
 A nonleaf node could accomodate i entries, where i=floor((bp)/(k+p)=499 records
To allow for expansion, assume initial node occupancy of u, where u=0.6



Term
Give an example of b+tree primary index arithmetic 

Definition
 Parameters:
 Block size 8kb, of which b=8000 bytes available for storage of records
 key length k=10 bytes
 record lengths r=100 bytes (including key)
 key pointer p=6 bytes
 For primary index(the leaf nodes hold the records)
 A nonleaf node initially points to i*u = 299 blocks
 Each leaf initially contains lp*u = 48 records
 1 level of nonleaf nodes initially points to (1p*u)*(i*u) = 14,352 records
 2 levels of nonleaf nodes initially point to (i*u)^2 = 89,401 blocks, and (lp*u)*(i*u)^2 = 4,291,248 records



Term
Give an example of arithmetic for B+tree secondary index 

Definition
 Parameters:
 Block size 8kb, of which b=8000 bytes available for storage of records
 key length k=10 bytes
 record lengths r=100 bytes (including key)
 key pointer p=6 bytes
 For primary index(the leaf nodes hold the records)
 A nonleaf node initially points to i*u = 299 blocks
 Each leaf initially contains lp*u = 299 records
 1 level of nonleaf nodes initially points to (1p*u)*(i*u) = 14,352 records
 2 levels of nonleaf nodes initially point to (i*u)^2 = 89,401 blocks, and (lp*u)*(i*u)^2 = 26,730,899 records



Term
What should be considered in B+tree insertion? 

Definition
 Space available in leaf
 Leaf overflow
 Nonleaf overflow
 New root



Term
What should be considered for B+Tree deletion? 

Definition
 Simple case
 Redistribute keys
 Coalesce with sibling (Often not implemented due to complexity)



Term
What is the comparison of Btrees vs. Static indexed sequential files? 

Definition
 Btrees consume more space
 Blocks are not contiguous
 Fewer disk accesses for static indexes, even allowing for reorganization
 Concurrency control is harder in Btrees
 However, DBA does not know when to reorganize or how full to load pages of new index



Term
What are the main concepts of hashing? 

Definition
 Main memory hash table
 Hash function h() takes a key and computes an integer value
 Value is used to select a bucket from a bucket array
 Bucket array contains linked lists of records
 Secondary storage hash table
 Stores many more records than a main memory hash table
 Bucket array consists of disk blocks



Term
What are some hashing approaches? 

Definition
 Hash function calculates block pointer directly or as an offset from first block
 Requires blocks to be in fixed, consecutive locations
 Hash function calculates offset in aray of block pointers (directory)
 Used for "secondary" search keys



Term
Give an example of a hash function 

Definition
 Key='x1 x2 ... xn' (n byte character string), b buckets
 h: add x1+x2+ ... xn and compute sum modulo b
 Not a particularly good function
 Good hash function has the same expected number of keys per bucket for each bucket



Term
Should keys be kept sorted? (In hashing) 

Definition
Buckets should be kept sorted if CPU time is critical and inserts/deletes are relative infrequent 


Term
What is the utilisation rule of thumb in hashing? 

Definition
 Space utilisation should be between 50% and 80%
 Utilisation = #keys used / total #keys that fit
 If < 50% wasting space
 If > 80% overflows significant
 Depends on how good hash function is and on #keys/bucket



Term
How is it possible to cope with growth in hashing? 

Definition
 Overflows and reorganizations
 Dynamic hashing



Term
What is extensible hashing? 

Definition
Combines two ideas:
 Use i of b bits output by hash function where i grows over time
 Use directory
[image] 


Term
What are the pros and cons of extensible hashing? 

Definition
 Pros:
 Can handle growing files
 Wish less wasted space
 With no full reorganizations
 Cons:
 Indirection
 Not bad if directory in memory
 Directory doubles in size
 Now it fits in memory, now it doesn't
 Sudden increase in disk accesses



Term
What are the main concepts of linear hashing? 

Definition
 Another dynamic hashing scheme
 Keeps track of utilisation. U=#used slots / total #slots
 if U>threshold, then increase m (and maybe i)
 Combines two ideas:
 Use i least significant bits of hash, where i grows over time
 Hash file grows incrementally and linearly (unlike extensible hash file, which periodically increases)



Term
What are the pros and cons of linear hashing? 

Definition
 Pros:
 Can handle growing files
 With less wasted space
 With no full reorganizations
 No indication like extensible hashing
 Cons:
 Can still have overflow chains



Term
What are some key points of index vs hashing? 

Definition
 Hashing good for probes given a key:
 SELECT {...} FROM R WHERE R.A=5
 Indexing (Including Btrees) good for range searches):
 SELECT FROM R WHERE R.A > 5


