Shared Flashcard Set

Details

COMP3017 - Lecture 4 (A) - Multidimensional Access Structure
COMP3017 - Lecture 4 (A) - Multidimensional Access Structure
24
Computer Science
Undergraduate 2
05/01/2014

Additional Computer Science Flashcards

 


 

Cards

Term

If we have a personal database, with EMPLOYEE table and attributes Dept, Salary.

 

How can we find employees who work in the sales department and have salaries greater than 40,000?

Definition
  1. Approach 1
    1. Get all matching records using an index on one attribute
    2. Check values of other attribute on those records
  2. Approach 2
    1. Use secondary indexes on each attribute to get two sets of record pointers
    2. Take intersection of sets
  3. Approach 3
    1. Use secondary inde on one attribute to select suitable index on other attribute
    2. Get all matching recrods using selected index
Term
What are the main concepts of a Grid File?
Definition
  • Partition multi-dimensional space with a grid
  • Grid lines partition space into stripes
  • Intersections of stripes from different dimensions define regions
  • Each region associated with a pointer to a bucket of record pointers
  • Attribute for values for record determine region and therefore bucket
  • Fixed number of regions - overflow blocks used to increase bucket size as necessary
  • Can index grid on values ranges
Term
What are the Pros and Cons of Grid Files?
Definition
  • Pros
    • Good for multiple-key search
    • Supports partial-match, range and nearest-neighbour queries
  • Cons
    • Space, management overhead (nothing is free)
    • Need partitioning ranges that evenly split keys
Term
What are the main concepts of partitioned hash?
Definition
  • Hash functions takes a list of attribute values as arguments
  • Bits of hash value are divided between attributes
    • Effectively a hash function per attribute
Term
What are the pros and cons of partitioned hash?
Definition
  • Pros
    • Good hash function will evenly distribute records between buckets
    • Supports partial-match queries
  • Cons
    • No good for nearest-neighbour or range queries
Term
What are the main characteristics of a kd-tree?
Definition
  • Multidimentional binary search tree
  • Each node splits the k-dimensional space along a hyperplane
  • Nodes contain
    • An attribute-value pair
    • A pair of pointers
  • All nodes at the same level discriminate for the same attribute
  • Levels rotate between attributes of all dimensions
  • Partial-match queries
    • If we know value of attribute, we can choose which branch to explore
    • If we don't know value of attribute, must explore both branches
Term
Give an example of partitioned hash
Definition
[image]
Term
Give an example of a kd-tree
Definition
[image]
Term
How is it possible to adapt kd-trees to secondary storage?
Definition
  • Average path length form root to leaf: log2(n)
  • Disk accesses should be kept as few as possible
  • Two approaches
    • Multiway nodes (split values into n ranges)
    • Group nodes in blocks (node plus descendants to a given ply)
Term
What are Region Quad trees?
Definition
  • Each partition divides the space into four equal sub-regions
    • -ne, nw, se, sw
  • Split regions if they contain records that will fit into a block
  • Operations similar to those kd-trees
Term
Give an example of a region quad-tree
Definition
[image]
Term
What are the main characteristics of Point Quad-Tree?
Definition
  • Partitions are not equal area
    • Split lines centred on data points
    • ne/nw/se/sw sub-regions
  • Otherwise, equivalent region quad-tree
Term
Give an example of a Point Quad-tree
Definition
[image]
Term
What are the main characteristics of R-Trees?
Definition
  • Used to represent data that consists of k-dimensional data regions
  • International nodes of tree represent regions that contain data regions
  • Regions typically defined as top-right, bottom-left coordinates
Term
Give an example of an R-tree
Definition
[image]
Term
What are the main characteristics of an UB-tree?
Definition

Basic approach

  • Map n-dimensional sparce space onto a 1-dimensional line using a fractal space-filling curve
  • Partion ranges and index using a B+tree
  • When querying, identify regions of n-d space (= segments 1-d line) that intersects with query triangle)
Term
What are the characteristics of the Z-index in a UB-tree?
Definition
  • Map domain of each attribute onto n-bit integer
  • Order of points on Z-curve given by bit interleaving the portions of the axes
    • x = x1x2
    • y = y1y2
    • z-index = y1x2y2x2
Term
What does the z-region partition consist of?
Definition
  • Z-curve partitioned into contiguous ranges (z-regions)
    • Note that these may not be contiguous regions in the multidimensional space
  • Z-regions mapped to leaf nodes of a B+tree
    • A leaf node contains pointers to records whose attribute value locate them within the associated z-region
Term
How is it possible to query UB-trees?
Definition
  • Multidimensional range query can be considered as a k-dimensional rectangle
  • Algorithm identifies z-regions that intersect wth the query rectangle
Term
What are the key characteristics fo Bitmap Indexes?
Definition
  • Collection of bit-vectors used to index an attribute
    • One bit-vector for each unique attribute value
    • Once bit for each record
  • Querying index involves combining bit-vectors with bitwise operators (&,|)
    • A 1 in the ith position indicates that record i is a match

 

Term

Give the Bitmap Index Table for the following situation:

 

An online homeware vendor sells products p1... p10

  • Products p3 and p5 cost £100 pounds
  • products p1 costs £200
  • products p2,p7 and p10 costs £300
  • Products p4, p6, p8 and p9 cost £400
  • Products p1, p4, p5 and p9 are designed for lounges
  • Products p5 and p7 are designed for dinning rooms
  • Products p3, p5, p6 and p10 are designed for kitchens
Definition
[image]
Term
What are the compression characteristics of Bitmap Indexes?
Definition
  • Bit vactors are typcially sparse, with few 1 bits
    • Large amounts of wasted space
    • Run-length encoding of bit-vectors to reduce stored size
  • Bitwise operators must be applied to original bit-vectors
    • Can decode RLE bit-vectors one run at a time

 

Term
What are the pros and cons of Bitmap indexes?
Definition
  • Pros
    • Efficient answering of partial-match queries
  • Cons
    • Requires fixed record numbers
    • Changes to data file require changes to bitmap index
Supporting users have an ad free experience!