Shared Flashcard Set

Details

COMP3017 - Lecture 7 (B) - Distributed Databases
COMP3017 - Lecture 7 (B) - Distributed Databases
25
Computer Science
05/03/2014

Term
 What are the characteristics of Localisation in Query Processing
Definition
 Fragmentation expressed as relational algebra expressions Global relations can be reconstructed using these expression (A localisation program) Naively, generate distributed query plan by substituting localisation programs for relations (Use reduction techniques to optimise queries)
Term
 What are the characteristics of Reduciton for Horizontal Fragmentation in Query Processing
Definition
 Given a relation R fragmented as FR={R1,R2,...,Rn}  Localization program is R=R1U R2 U ... U Rn Reduce by identifying fragments of localized query that give empty relations Two classes to consider: Reduction with selection Reduciton with join
Term
 What are the characteristics of Horizontal Selection Reduction in Query Processing
Definition
 Given horizontal fragmentation of R such that Rj=opj(R): σp(Rj) = ∅ if ∀x∈R, ¬(p(x) ∧ pj(x)) [image]
Term
 What are the characteristics of Horizontal Join Reduction in Query Processing
Definition
 Recalling that join distributes over unions: (R1 ∪ R2) ⨝ S ≣ (R1 ⨝ S) ∪ (R2 ⨝ S) Given fragments Ri and Rj defined with predicates pi and pj: Ri ⨝ Rj = ∅ if ∀x∈Ri, ∀y∈Rj ¬(pi(x) ∧ pj(y)) [image]
Term
 What are the characteristics of Reduction for Vertical Fragmentation in Query Processing
Definition
 Given a relation R fragmented as FR = {R1, R2, ..., Rn} Localization program is R = R1 ⨝ R2 ⨝ ... ⨝ Rn   Reduce by identifying useless intermediate relations. One case to consider is reduction with projection.
Term
 What are the characteristics of Vertical Projection Reduction in Query Processing
Definition
 Given a relation R with attributes A = {a1, a2, ..., an} vertically fragmented as Ri = πAi(R) where Ai ⊆ A   πD,K(Ri) is useless if D ⊈ Ai [image]
Term
 Where would we compute the join between relations R and S when these are stored in different sites?
Definition
 It is possible to move one relation to the other site and perform the join there CPU cost of performing join same regardless of site Communication costs depend on size of hte relation being moved CostCOM = size(R) = cardinality(R) * length(R) Move the relation with the smallest size to the other's site
Term
 What are the characteristics of Semijoins in Query Processing
Definition
 R ▷p S  ≣ πR(R ⨝p S)   ≣ πR(R ⨝p πp(S)) Where πp(S) projects out from S only the attributes used in predicate p   This reduces communication cost by only moving that part of a relation that will be used in the join   Recall: R ▷p S ≣ πR(R ⨝p S) where p is a predicate defined over R and S and πR projects out only those attributes from R Then:   size(R ▷p S) < size(R ⨝p S)   This is because: R ⨝p S ≣ (R ▷p S) ⨝p S ≣R ⨝p (R ◁p S) ≣ (R ▷p S) ⨝p (R ◁p S)
Term
 What are the steps to perform Semijoin Reduction with R in site 1 and S in site 2?
Definition
 Site 2 sends πp(S) to site 1 Site 1 calculates R ▷p S ≣ πR(R ⨝p πp(S)) Site 1 sends R ▷p S to site 2 Site 2 calculates R ⨝p S ≣ (R ▷p S) ⨝p S So Costcom= size(πp(S)) + size(R ▷p S)   This approach is better if: size(πp(S)) + size(R ▷p S) < size(R)
Term
 What are the principles of distributed transactions?
Definition
 Transaction processing may be spread across several sites in the distributed database The site from which the transaction originated is known as the coordinator The sites on which the transaction is executed are known as the participants
Term
 What are the principles on Distribution and ACID on DDBMSs?
Definition
 Non-distributed databases aim to maintain isolation Isolation: a transaction should not make updates externally visible until committed Distributed databases use two-phase locking (2PL) to preserve isolation 2PL ensures serialisability, the highest isolation level
Term
 What does two phase locking consist of?
Definition
 Two phases (Duh) Growing phase: obtain locks, access data items Shrinking phase: release locks Guarantees serializable transactions
Term
 Show an image of the growing and shrinking phase of 2pl
Definition
 [image]
Term
 How is locking controlled in a non-distributed and distributed database
Definition
 Former: Locking is controlled by a lock manager Latter: Two main approaches to implementing 2pl: Centralized 2PL (C2PL) Responsibility for lock management lies with a single site Distributed 2PL (D2PL) Each site has its own lock manager
Term
 What are the main characteristics of Centralised Two-Phase Locking (C2PL)?
Definition
 Coordinating sire runs transaction manager TM Participant sites run data processors DP Lock manager LM runs on central site TM requests locks from LM If granted, TM submits operations to processors DP When DP finish, TM sends message to LM to release locks
Term
 Show an image of how two-phase locking works
Definition
 [image]
Term
 What are the disadvantages of Centralized two-phase locking?
Definition
 LM is a single point of failure (less reliable) LM is a bottleneck (Affects transaction throughput)
Term
 How does Distributed Two-Phase Locking work?
Definition
 Coordinating site C runs TM Each participant runs both an LM and a DP TM sends operations and lock requests to each LM If lock can be granted, LM forwards operation to local DP DP sends "end of operation" to TM TM sends message to LM to release locks Variant: DPs may send "end of operation" to their own LM LM releases lock and informs TM
Term
 Show an image of how Distributed Two-phase locking works
Definition
 [image]
Term
 What are the characteristics and conditions for deadlock?
Definition
 Exists when two or more transactions are waiting for each other to release a lock on an item Three conditions must be satisfied for a deadlock to occur: Concurrency: Two transactions claim exlusive control of one resource Hold: one transaction continues to hold exclusively controlled resources until its need is satisfied Wait: transactions wait in queues for additional resources while holding resources already allocated
Term
 What is a wait-for graph?
Definition
 Representation for interactions between transactions Directed graph containing: A vertex for each transaction that is currently executing An edge from T1 to T2 if T1 is waiting to lock an item that is currenly locked by T2 Deadlock exists ifff the WFG contains a cycle. Ex: T1 -> T2 -> T3 -> T1
Term
 What are the types of Wait For graph?
Definition
 Local: One per site, only considers transactions on that site Global: union of all LWFGs Deadlock may occur on a single site (within LWFG) or between sites (within GWFG)
Term
 How is it possible to manage distributed deadlocks?
Definition
 Prevention (Pre-declaration, etc) Avoidance (Resource Ordering, Transaction Prioritization) Detection and Resolution
Term
 What are the characteristics of prevention for DDeadlocks?
Definition
 Guarantees that deadlocks cannot occur in the first place   Transaction pre-declares all data items that it will access TM checks that locking data items will not cause deadlock Proceed to lock only if all data items are available (unlocked) It is difficult to know in advance which data items will be accessed by a transaction
Term
 What are the characteristics of avoidance for DDeadlocks?
Definition
 Two main sub-approaches Resource ordering (Concurrency controlled such that deadlocks won't happen) All resources (data items) are ordered Transactions always access resources in order Transaction prioritisation (Potential deadlocks detected and avoided) Each transaction has timestamp that corresponds to the time it started: ts(T)  Transactions are prioritised using these timestamps When lock request is denied, use priorities to choose a transaction to abort (WAIT-DIE and WOUND-WAIT rules)
Supporting users have an ad free experience!