# Shared Flashcard Set

## Details

COMP3017 - Lecture 3 (B) - Access Structures
COMP3017 - Lecture 3 (B) - Access Structures
25
Computer Science
05/01/2014

Term
 Give an example of a B+tree
Definition
 [image]
Term
 What is a B+tree?
Definition
 An ordered binary tree where the leafs can have more than one node/value. Root node typically kept in memory, as it is the entrance point to the index
Term
 What are some characteristics of leaf nodes?
Definition
 If index is a primary index Leaf nodes are records contiaining data, stored in the order of the primary key The index provides an alternative to a sequential scan If the index is a secondary index Leaf nodes contain pointers to the data records Data can be accessed in teh sequence of the secondary key A secondary index can point to any sort of data file, for example one created by hashing
Term
 What are characteristics of nodes?
Definition
 Each node is of fixed size and contains n keys with n+1 pointers Minimum nodes: Don't want nodes to be too empty (efficient use of space) Non-Leaf: [(n+1)/2] pointers Leaf: [(n+1)/2] pointers
Term
 What are the rules for B+tree rules?
Definition
 All leaves same distance from root (balanced tree) Pointers in leaves point to records except for "sequence pointer" Number of pointers/keys for B+tree of order n:[image]
Term
 Give an example of B+tree arithmetic
Definition
 Parameters: Block size 8kb, of which b=8000 bytes available for storage of records key length k=10 bytes record lengths r=100 bytes (including key) key pointer p=6 bytes A leaf node in a primary index can accomodate lp records, where lp=floor(b/r) = 80 records A leaf node in a secondary index can accomodate ls records where ls=floor((b-p)/(k+p)) = 499 records A non-leaf node could accomodate i entries, where i=floor((b-p)/(k+p)=499 records To allow for expansion, assume initial node occupancy of u, where u=0.6
Term
 Give an example of b+tree primary index arithmetic
Definition
 Parameters: Block size 8kb, of which b=8000 bytes available for storage of records key length k=10 bytes record lengths r=100 bytes (including key) key pointer p=6 bytes For primary index(the leaf nodes hold the records) A non-leaf node initially points to i*u = 299 blocks Each leaf initially contains lp*u = 48 records 1 level of non-leaf nodes initially points to (1p*u)*(i*u) = 14,352 records 2 levels of non-leaf nodes initially point to (i*u)^2 = 89,401 blocks, and (lp*u)*(i*u)^2 = 4,291,248 records
Term
 Give an example of arithmetic for B+tree secondary index
Definition
 Parameters: Block size 8kb, of which b=8000 bytes available for storage of records key length k=10 bytes record lengths r=100 bytes (including key) key pointer p=6 bytes For primary index(the leaf nodes hold the records) A non-leaf node initially points to i*u = 299 blocks Each leaf initially contains lp*u = 299 records 1 level of non-leaf nodes initially points to (1p*u)*(i*u) = 14,352 records 2 levels of non-leaf nodes initially point to (i*u)^2 = 89,401 blocks, and (lp*u)*(i*u)^2 = 26,730,899 records
Term
 What should be considered in B+tree insertion?
Definition
 Space available in leaf Leaf overflow Non-leaf overflow New root
Term
 What should be considered for B+Tree deletion?
Definition
 Simple case Re-distribute keys Coalesce with sibling (Often not implemented due to complexity)
Term
 What is the comparison of B-trees vs. Static indexed sequential files?
Definition
 Btrees consume more space Blocks are not contiguous Fewer disk accesses for static indexes, even allowing for reorganization Concurrency control is harder in B-trees However, DBA does not know when to reorganize or how full to load pages of new index
Term
 What are the main concepts of hashing?
Definition
 Main memory hash table Hash function h() takes a key and computes an integer value Value is used to select a bucket from a bucket array Bucket array contains linked lists of records Secondary storage hash table Stores many more records than a main memory hash table Bucket array consists of disk blocks
Term
 What are some hashing approaches?
Definition
 Hash function calculates block pointer directly or as an offset from first block Requires blocks to be in fixed, consecutive locations Hash function calculates offset in aray of block pointers (directory) Used for "secondary" search keys
Term
 Give an example of a hash function
Definition
 Key='x1 x2 ... xn' (n byte character string), b buckets h: add x1+x2+ ... xn and compute sum modulo b Not a particularly good function Good hash function has the same expected number of keys per bucket for each bucket
Term
 Should keys be kept sorted? (In hashing)
Definition
 Buckets should be kept sorted if CPU time is critical and inserts/deletes are relative infrequent
Term
 What is the utilisation rule of thumb in hashing?
Definition
 Space utilisation should be between 50% and 80% Utilisation = #keys used / total #keys that fit If < 50% wasting space If > 80% overflows significant Depends on how good hash function is and on #keys/bucket
Term
 How is it possible to cope with growth in hashing?
Definition
 Overflows and reorganizations Dynamic hashing  Extensible Linear
Term
 What is extensible hashing?
Definition
 Combines two ideas: Use i of b bits output by hash function where i grows over time Use directory [image]
Term
 What are the pros and cons of extensible hashing?
Definition
 Pros: Can handle growing files Wish less wasted space With no full reorganizations Cons: Indirection Not bad if directory in memory Directory doubles in size Now it fits in memory, now it doesn't  Sudden increase in disk accesses
Term
 What are the main concepts of linear hashing?
Definition
 Another dynamic hashing scheme Keeps track of utilisation. U=#used slots / total #slots if U>threshold, then increase m (and maybe i) Combines two ideas: Use i least significant bits of hash, where i grows over time Hash file grows incrementally and linearly (unlike extensible hash file, which periodically increases)
Term
 What are the pros and cons of linear hashing?
Definition
 Pros: Can handle growing files With less wasted space With no full reorganizations No indication like extensible hashing Cons: Can still have overflow chains
Term
 What are some key points of index vs hashing?
Definition
 Hashing good for probes given a key: SELECT {...} FROM R WHERE R.A=5 Indexing (Including B-trees) good for range searches): SELECT FROM R WHERE R.A > 5
Supporting users have an ad free experience!