Shared Flashcard Set


COMP3017 - Lecture 7 (B) - Distributed Databases
COMP3017 - Lecture 7 (B) - Distributed Databases
Computer Science
Undergraduate 2

Additional Computer Science Flashcards




What are the characteristics of Localisation in Query Processing
  • Fragmentation expressed as relational algebra expressions
  • Global relations can be reconstructed using these expression (A localisation program)
  • Naively, generate distributed query plan by substituting localisation programs for relations (Use reduction techniques to optimise queries)
What are the characteristics of Reduciton for Horizontal Fragmentation in Query Processing
  • Given a relation R fragmented as FR={R1,R2,...,Rn
  • Localization program is R=R1U R2 U ... U Rn
  • Reduce by identifying fragments of localized query that give empty relations
  • Two classes to consider:
    • Reduction with selection
    • Reduciton with join
What are the characteristics of Horizontal Selection Reduction in Query Processing

Given horizontal fragmentation of R such that

Rj=opj(R): σp(Rj) = ∅ if ∀x∈R, ¬(p(x) ∧ pj(x))


What are the characteristics of Horizontal Join Reduction in Query Processing
  • Recalling that join distributes over unions:
    • (R1 ∪ R2) ⨝ S ≣ (R1 ⨝ S) ∪ (R2 ⨝ S)
  • Given fragments Ri and Rj defined with predicates pi and pj:
    • Ri ⨝ Rj = ∅ if ∀x∈Ri, ∀y∈Rj ¬(pi(x) ∧ pj(y))


What are the characteristics of Reduction for Vertical Fragmentation in Query Processing

Given a relation R fragmented as FR = {R1, R2, ..., Rn}

Localization program is R = R1 ⨝ R2 ⨝ ... ⨝ Rn


Reduce by identifying useless intermediate relations.

One case to consider is reduction with projection.

What are the characteristics of Vertical Projection Reduction in Query Processing

Given a relation R with attributes A = {a1, a2, ..., an} vertically fragmented as Ri = πAi(R) where Ai ⊆ A


πD,K(Ri) is useless if D ⊈ Ai


Where would we compute the join between relations R and S when these are stored in different sites?
  • It is possible to move one relation to the other site and perform the join there
    • CPU cost of performing join same regardless of site
    • Communication costs depend on size of hte relation being moved
  • CostCOM = size(R) = cardinality(R) * length(R)
  • Move the relation with the smallest size to the other's site
What are the characteristics of Semijoins in Query Processing

R ▷S  πR(R p S)
  πR(R p πp(S))

Where πp(S) projects out from S only the attributes used in predicate p


This reduces communication cost by only moving that part of a relation that will be used in the join


Recall: R ▷p S ≣ πR(R ⨝p S) where p is a predicate defined over R and S and πR projects out only those attributes from R Then:


size(R ▷p S) < size(R ⨝p S)


This is because:

R ⨝p S

≣ (R ▷p S) ⨝p S

≣R ⨝p (R ◁p S)

≣ (R ▷p S) ⨝p (R ◁p S)

What are the steps to perform Semijoin Reduction with R in site 1 and S in site 2?
  1. Site 2 sends πp(S) to site 1
  2. Site 1 calculates R ▷p S ≣ πR(R ⨝p πp(S))
  3. Site 1 sends R ▷p S to site 2
  4. Site 2 calculates R ⨝p S ≣ (R ▷p S) ⨝p S

So Costcom= size(πp(S)) + size(R ▷p S)


This approach is better if:

size(πp(S)) + size(R ▷p S) < size(R)

What are the principles of distributed transactions?

Transaction processing may be spread across several sites in the distributed database

  • The site from which the transaction originated is known as the coordinator
  • The sites on which the transaction is executed are known as the participants
What are the principles on Distribution and ACID on DDBMSs?
  • Non-distributed databases aim to maintain isolation
    • Isolation: a transaction should not make updates externally visible until committed
  • Distributed databases use two-phase locking (2PL) to preserve isolation
    • 2PL ensures serialisability, the highest isolation level
What does two phase locking consist of?
  • Two phases (Duh)
    • Growing phase: obtain locks, access data items
    • Shrinking phase: release locks
  • Guarantees serializable transactions
Show an image of the growing and shrinking phase of 2pl
How is locking controlled in a non-distributed and distributed database
  • Former: Locking is controlled by a lock manager
  • Latter: Two main approaches to implementing 2pl:
    • Centralized 2PL (C2PL)
      • Responsibility for lock management lies with a single site
    • Distributed 2PL (D2PL)
      • Each site has its own lock manager
What are the main characteristics of Centralised Two-Phase Locking (C2PL)?
  • Coordinating sire runs transaction manager TM
  • Participant sites run data processors DP
  • Lock manager LM runs on central site
    1. TM requests locks from LM
    2. If granted, TM submits operations to processors DP
    3. When DP finish, TM sends message to LM to release locks



Show an image of how two-phase locking works
What are the disadvantages of Centralized two-phase locking?

LM is a single point of failure (less reliable)

LM is a bottleneck (Affects transaction throughput)

How does Distributed Two-Phase Locking work?
  • Coordinating site C runs TM
  • Each participant runs both an LM and a DP
    1. TM sends operations and lock requests to each LM
    2. If lock can be granted, LM forwards operation to local DP
    3. DP sends "end of operation" to TM
    4. TM sends message to LM to release locks
  • Variant:
    • DPs may send "end of operation" to their own LM
    • LM releases lock and informs TM
Show an image of how Distributed Two-phase locking works
What are the characteristics and conditions for deadlock?
  • Exists when two or more transactions are waiting for each other to release a lock on an item
  • Three conditions must be satisfied for a deadlock to occur:
    • Concurrency: Two transactions claim exlusive control of one resource
    • Hold: one transaction continues to hold exclusively controlled resources until its need is satisfied
    • Wait: transactions wait in queues for additional resources while holding resources already allocated
What is a wait-for graph?
  • Representation for interactions between transactions
  • Directed graph containing:
    • A vertex for each transaction that is currently executing
    • An edge from T1 to T2 if T1 is waiting to lock an item that is currenly locked by T2
  • Deadlock exists ifff the WFG contains a cycle. Ex:

T1 -> T2 -> T3 -> T1

What are the types of Wait For graph?
  • Local: One per site, only considers transactions on that site
  • Global: union of all LWFGs

Deadlock may occur on a single site (within LWFG) or between sites (within GWFG)

How is it possible to manage distributed deadlocks?
  1. Prevention (Pre-declaration, etc)
  2. Avoidance (Resource Ordering, Transaction Prioritization)
  3. Detection and Resolution
What are the characteristics of prevention for DDeadlocks?

Guarantees that deadlocks cannot occur in the first place


  • Transaction pre-declares all data items that it will access
  • TM checks that locking data items will not cause deadlock
  • Proceed to lock only if all data items are available (unlocked)

It is difficult to know in advance which data items will be accessed by a transaction

What are the characteristics of avoidance for DDeadlocks?
  • Two main sub-approaches
    1. Resource ordering (Concurrency controlled such that deadlocks won't happen)
      • All resources (data items) are ordered
      • Transactions always access resources in order
    2. Transaction prioritisation (Potential deadlocks detected and avoided)
      • Each transaction has timestamp that corresponds to the time it started: ts(T) 
      • Transactions are prioritised using these timestamps
      • When lock request is denied, use priorities to choose a transaction to abort (WAIT-DIE and WOUND-WAIT rules)
Supporting users have an ad free experience!