Shared Flashcard Set

Details

COMP3017 - Lecture 2 (C) - Data
COMP3017 - Lecture 2 (C) - Data
20
Computer Science
Undergraduate 2
05/01/2014

Additional Computer Science Flashcards

 


 

Cards

Term
What are some considerations when placing records in blocks?
Definition
  • Separating Records
  • Spanned vs. Unspanned
  • Sequencing
  • Indirection
Term
What are some approaches to separating records in a block?
Definition

Three approaches:

  1. Use fixed length records - no need to separate
  2. Use a special marker to indicate record end
  3. Give record lengths (or offsets)
    1. Within each record
    2. In block header
Term
What are some points of Spanned vs. Unspanned blocks?
Definition
  • Unspanned: Each record must fit within a single block
    • Simpler but may waste space
    • Problem if record size > block size
  • Spanned: Records may be split between blocks
    • If record doesn't fit, it needs a pointer to where the rest of the data is
    • The rest of the data needs a pointer to where the beginning of the data is
Term
What are some key points of squencing?
Definition
  • Ordering records in file (and block) by some key value
  • This makes it possible to efficiently read records in order (e.g. to do a merge-join)
  • Options for sequencing are:
    • Next record physically contiguous
    • Linked records (not physically contiguous)
    • Overflow area (Records in sequence)
Term
What are some characteristics of indeirection?
Definition
  • The ways to refer to records can be:
    • Physical addressing
    • Indirect addressing
    • Other options in between
  • Tradeoff between:
    • Flexibility (easier to move records on insertion/deletion)
    • Cost (of maintaining indirection)
Term
Give the specification for Physical Addressing
Definition
[image]
Term
Give an example of indirect addressing
Definition

Record ID is arbitrary bit string

[image]

Term
How is indirection implemented in block?
Definition
  • Records can be shifted within block without changing RID
  • Acces to a given RID is fast - only a single block access needed
[image]
Term
What are some characteristics of Block headers?
Definition
  • Data at beginning of block that describes block, and may contain
    • File ID (or RELATION or DB ID)
    • This block ID
    • Pointer to free space
    • Type of block (e.g. contains recs type 4; is overflow,...)
    • Pointer to other blocks "like it"
    • Timestamp
Term
What are some key points of Insertion?
Definition
  • When records are not in sequence (Easy case)
    • Insert new record at end of file or in deleted slot
    • If records are variable size it is not as easy
  • Records in squence (Hard)
    • If there's free space "close by", not too  bad...
    • Or use overflow idea...
  • Considerations:
    • How much free space to leave in each block,track, cylinder?
    • How often do I recognize file+overflow?
Term
What are some main points for deletion?
Definition
  • How to deal with dangling pointers
  • Tombstones (Leave "MARK" in map or old location, or logical IDs)
  • Two main options:
    • Immediately reclaim space
    • Mark space as deleted
      • May need chain of deleted records (for re-use)
      • Need a way to mark deleted records:
        • special characters
        • delete field
        • in map

 

Term
What are the details of Single Buffering?
Definition
  • File consisting of blocks B1, B2,...
  • Program that processes the file block-by-block
  1. Read B1 -> Buffer
  2. Process Data in buffer
  3. Read B2 -> Buffer
  4. Process Data in Buffer
  5. etc... 

Single Buffer Cost: n(P+R)

P = Time to process a block

R = Time to read a block

n = Number of blocks

 

(Recall Double Buffer Time = R + nP)

Term
What are the key concepts of double buffering?
Definition

Use a pair of buffers

  1. While reading a block and writing into buffer A
  2. Process block previously read into buffer B
  3. After block read into A, process A and read next block into B

Double Buffer Time = R + nP

 

P = Time to process a block

R = Time to read a block

n = Number of blocks

 

(Recall Single Buffer time = n(R+P))

Term
What are some key concepts of address management?
Definition
  • Every block and record has two addresses
    • A database address (when in secondary storage)
    • A memory address (when copied into main memory)
  • When in main memory, using memory address (=pointers) is more efficient
  • Otherwise, translation table is required
    • Converts database address into current memory address
Term
What are some main concepts of pointer swizzling?
Definition
  • General term for techniques used to translate database address to virtual memory address space
  • Swizzled pointers consist of 
    • One bit to indicate whether the pointer is a database address or a memory address
    • Database or memory pointer, as appropriate
Term
Give a visual example of swizling pointers from database to virtual memory address space
Definition

[image]

[image]

[image]

Term
What are some Swizzling Strategies?
Definition
  • Automatic
    • As soon as block brought into memory, locate all pointers and addresses and enter them into translation table
    • Replace pointers in blocks with new entries
  • On demand
    • Leave all pointers unswizzled whenblock in brought into memory
    • Swizzle pointers only when dereferenced
  • No swizzling
    • Use translation table to map pointers on each dereference
Term
What are some key concepts of swizzling?
Definition
  • Reverse of the swizzling operation
    • When a block is written back to disk, rewrite swizzled pointers using the translate table
    • Need to beware of pinned blocks (that cannot yet be safely written to disk)

 

Term
Give an example of row column store
Definition

Row store:

----------------------------------------------------

|      id1     ||     cust1    ||  prod1 ||    date1   |

----------------------------------------------------

|      id2     ||     cust2    ||  prod2 ||    date2   |

----------------------------------------------------


Column store:


 ---------       --------------

|   id1   |  ||     cust1    ||  

-----------      --------------

|   id2   |  ||     cust2    |

 ---------       --------------


 ---------       --------------

|   id1   |  ||     prod1    ||  

-----------      --------------

|   id2   |  ||     prod2    |

 

 ---------       --------------


 ---------       --------------

|   id1   |  ||     date1    ||  

-----------      --------------

|   id2   |  ||     date2    |

 

 ---------       --------------

Term
What are the advantages of row and column store
Definition
  • Advantages of Column Store:
    • More efficient compact storage (Fields need not start at byte boundaries)
    • Efficient reads on data mining operations
  • Advantages of Row Store:
    • Writes (Multiple fields of one record) more efficient
    • Efficient reads for record access (OLTP)

 

Supporting users have an ad free experience!