Shared Flashcard Set

Details

Title

COMP3017 - Lecture 7 (B) - Distributed Databases

Description

COMP3017 - Lecture 7 (B) - Distributed Databases

Total Cards

Subject

Computer Science

Level

Undergraduate 2

Created

05/03/2014

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Computer Science Flashcards

Cards Return to Set Details

Term

What are the characteristics of Localisation in Query Processing

Definition

Fragmentation expressed as relational algebra expressions

Global relations can be reconstructed using these expression (A localisation program)

Naively, generate distributed query plan by substituting localisation programs for relations (Use reduction techniques to optimise queries)

Term

What are the characteristics of Reduciton for Horizontal Fragmentation in Query Processing

Definition

Given a relation R fragmented as F_R={R₁,R₂,...,R_n}

Localization program is R=R₁U R₂ U ... U R_n

Reduce by identifying fragments of localized query that give empty relations

Two classes to consider:

Reduction with selection

Reduciton with join

Term

What are the characteristics of Horizontal Selection Reduction in Query Processing

Definition

Given horizontal fragmentation of R such that

R_j=o_pj(R): σ_p(R_j) = ∅ if ∀x∈R, ¬(p(x) ∧ p_j(x))

[image]

Term

What are the characteristics of Horizontal Join Reduction in Query Processing

Definition

Recalling that join distributes over unions:

(R1 ∪ R2) ⨝ S ≣ (R1 ⨝ S) ∪ (R2 ⨝ S)

Given fragments Ri and Rj defined with predicates pi and pj:

Ri ⨝ Rj = ∅ if ∀x∈Ri, ∀y∈Rj ¬(pi(x) ∧ pj(y))

[image]

Term

What are the characteristics of Reduction for Vertical Fragmentation in Query Processing

Definition

Given a relation R fragmented as FR = {R1, R2, ..., Rn}

Localization program is R = R1 ⨝ R2 ⨝ ... ⨝ Rn

Reduce by identifying useless intermediate relations.

One case to consider is reduction with projection.

Term

What are the characteristics of Vertical Projection Reduction in Query Processing

Definition

Given a relation R with attributes A = {a1, a2, ..., an} vertically fragmented as Ri = π_Ai(R) where Ai ⊆ A

πD,K(Ri) is useless if D ⊈ Ai

[image]

Term

Where would we compute the join between relations R and S when these are stored in different sites?

Definition

It is possible to move one relation to the other site and perform the join there

CPU cost of performing join same regardless of site

Communication costs depend on size of hte relation being moved

Cost_COM = size(R) = cardinality(R) * length(R)

Move the relation with the smallest size to the other's site

Term

What are the characteristics of Semijoins in Query Processing

Definition

R ▷p S ≣ πR(R ⨝p S)
≣ πR(R ⨝p πp(S))

Where πp(S) projects out from S only the attributes used in predicate p

This reduces communication cost by only moving that part of a relation that will be used in the join

Recall: R ▷_p S ≣ π_R(R ⨝_p S) where p is a predicate defined over R and S and π_R projects out only those attributes from R Then:

size(R ▷_p S) < size(R ⨝_p S)

This is because:

R ⨝p S

≣ (R ▷p S) ⨝p S

≣R ⨝p (R ◁p S)

≣ (R ▷p S) ⨝p (R ◁p S)

Term

What are the steps to perform Semijoin Reduction with R in site 1 and S in site 2?

Definition

Site 2 sends π_p(S) to site 1

Site 1 calculates R ▷p S ≣ π_R(R ⨝p π_p(S))

Site 1 sends R ▷p S to site 2

Site 2 calculates R ⨝_p S ≣ (R ▷p S) ⨝_p S

So Cost_com= size(πp(S)) + size(R ▷p S)

This approach is better if:

size(π_p(S)) + size(R ▷p S) < size(R)

Term

What are the principles of distributed transactions?

Definition

Transaction processing may be spread across several sites in the distributed database

The site from which the transaction originated is known as the coordinator

The sites on which the transaction is executed are known as the participants

Term

What are the principles on Distribution and ACID on DDBMSs?

Definition

Non-distributed databases aim to maintain isolation

Isolation: a transaction should not make updates externally visible until committed

Distributed databases use two-phase locking (2PL) to preserve isolation

2PL ensures serialisability, the highest isolation level

Term

What does two phase locking consist of?

Definition

Two phases (Duh)

Growing phase: obtain locks, access data items

Shrinking phase: release locks

Guarantees serializable transactions

Term

Show an image of the growing and shrinking phase of 2pl

Definition

[image]

Term

How is locking controlled in a non-distributed and distributed database

Definition

Former: Locking is controlled by a lock manager

Latter: Two main approaches to implementing 2pl:

Centralized 2PL (C2PL)

Responsibility for lock management lies with a single site

Distributed 2PL (D2PL)

Each site has its own lock manager

Term

What are the main characteristics of Centralised Two-Phase Locking (C2PL)?

Definition

Coordinating sire runs transaction manager TM

Participant sites run data processors DP

Lock manager LM runs on central site

TM requests locks from LM

If granted, TM submits operations to processors DP

When DP finish, TM sends message to LM to release locks

Term

Show an image of how two-phase locking works

Definition

[image]

Term

What are the disadvantages of Centralized two-phase locking?

Definition

LM is a single point of failure (less reliable)

LM is a bottleneck (Affects transaction throughput)

Term

How does Distributed Two-Phase Locking work?

Definition

Coordinating site C runs TM

Each participant runs both an LM and a DP

TM sends operations and lock requests to each LM

If lock can be granted, LM forwards operation to local DP

DP sends "end of operation" to TM

TM sends message to LM to release locks

Variant:

DPs may send "end of operation" to their own LM

LM releases lock and informs TM

Term

Show an image of how Distributed Two-phase locking works

Definition

[image]

Term

What are the characteristics and conditions for deadlock?

Definition

Exists when two or more transactions are waiting for each other to release a lock on an item

Three conditions must be satisfied for a deadlock to occur:

Concurrency: Two transactions claim exlusive control of one resource

Hold: one transaction continues to hold exclusively controlled resources until its need is satisfied

Wait: transactions wait in queues for additional resources while holding resources already allocated

Term

What is a wait-for graph?

Definition

Representation for interactions between transactions

Directed graph containing:

A vertex for each transaction that is currently executing

An edge from T1 to T2 if T1 is waiting to lock an item that is currenly locked by T2

Deadlock exists ifff the WFG contains a cycle. Ex:

T1 -> T2 -> T3 -> T1

Term

What are the types of Wait For graph?

Definition

Local: One per site, only considers transactions on that site

Global: union of all LWFGs

Deadlock may occur on a single site (within LWFG) or between sites (within GWFG)

Term

How is it possible to manage distributed deadlocks?

Definition

Prevention (Pre-declaration, etc)

Avoidance (Resource Ordering, Transaction Prioritization)

Detection and Resolution

Term

What are the characteristics of prevention for DDeadlocks?

Definition

Guarantees that deadlocks cannot occur in the first place

Transaction pre-declares all data items that it will access

TM checks that locking data items will not cause deadlock

Proceed to lock only if all data items are available (unlocked)

It is difficult to know in advance which data items will be accessed by a transaction

Term

What are the characteristics of avoidance for DDeadlocks?

Definition

Two main sub-approaches

Resource ordering (Concurrency controlled such that deadlocks won't happen)

All resources (data items) are ordered

Transactions always access resources in order

Transaction prioritisation (Potential deadlocks detected and avoided)

Each transaction has timestamp that corresponds to the time it started: ts(T)

Transactions are prioritised using these timestamps

When lock request is denied, use priorities to choose a transaction to abort (WAIT-DIE and WOUND-WAIT rules)

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Computer Science Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile