Shared Flashcard Set

Details

COMP3017 - Lecture 6 (B) - Parallel Databases
COMP3017 - Lecture 6 (B) - Parallel Databases
28
Computer Science
Undergraduate 2
05/02/2014

Additional Computer Science Flashcards

 


 

Cards

Term
What is the meaning of Parallelism and parallel databases?
Definition
  • An arrangement or state that permits several operations or tasks to be performed simultaneously rather than consecutively
  • Parallel databases
    • Have the ability to split
      • Processing data
      • Access to data
    • Accross multiple processors
Term
What are some benefits of using parallel databases?
Definition
  • Hardware trends
  • Reduced elapsed time for queries
  • Increased transaction throughput
  • Incresed scalability
  • Better price/performance
  • Improved application availability
  • Access to more data
  • In short, for better performance
Term
What are the characteristics of software shared memory?
Definition
  • Tightly coupled
  • Symmetric Multiprocessor (SMP)
  • Less complex database software
  • Limited scalability
  • Single buffer
  • Single database storage
Term
What are the characteristics of shared disk architecture?
Definition
  • Loosely coupled
  • Distributed memory
  • Avoids memory bottleneck
  • Same page may be in more than one buffer at once - can lead to incoherence
  • Needs global locking mechanism
  • Single logical database storage
  • Each processor has its own database buffer
Term
What are the characteristics of shared-nothing architecture?
Definition
  • Massively parallel
  • Loosely coupled
  • High speed interconnect (between processors)
  • One page is only local buffer - no buffer incoherence
  • Needs distributed deadlock detection
  • Needs multiphase commit protocol
  • Needs to break SQL requests into multiple sub-requests
  • Each processor has its own database buffer
  • Each processor owns part of the data
Term
What are the comparisons between hardware and Software architecture?
Definition
  • It is possible to use one software strategy on a different hardware arrangement
  • Also possible to simulate one hardware configuration on another
    • Virtual shared disk (VSD) makes an IBM SP shared nothing system look like a shared disk setup (for oracle)
  • From this point on, we deal only with shared nothing software mode
Term
What are some challenges of the shared nothing architecture?
Definition
  • Partitioning the data
  • Keeping the partitioned data balanced
  • Splitting up the queries to get the work done
  • Avoiding distributed deadlock
  • Concurrency control
  • Dealing with node failure
Term
What are some characteristics of transaction parallelism?
Definition
  • Improves throughput
  • Inter-transaction
    • Run many transactions in parallel, sharing the DB
    • Often referred to as Concurrency
Term
What are the characteristics of query parallelism
Definition
  • Improves response times (lower latency)
  • Intra operator (horizontal) parallelism
    • Operators decomposed into independent operator instances, which perform the same operation on different subsets of data
  • Inter-operator (vertical) parallelism
    • Operations are overlapped
    • Pipeline data from one stage to the next without materialisation
Term
What are the benefits of function shipping?
Definition
  • Database operations are performed where the data is, as far as possible
  • Network traffic is minimized
  • For basic database operators, code developed for serial implementations can be reused
  • In practice, mixture of function shipping and IO shipping has to be employed
Term
What are some characteristics of basic operators?
Definition
  • The DBMS has a set of basic operations which it can perform
  • The result of each operation is another table
    • Scan a table - sequentially or selectively via an index - to produce another table
    • Join two table together
    • Sort a table
    • Perform aggretation functions (sum, average, count)
  • Those are combied to execute a query

 

Term
What are some characteristics of the Volcano Architecture?
Definition
  • The volcano query processing system was devised by Goetze Graefe (Oregon)
  • Introduced the exchange operator
    • Inserted between the steps of a query to:
      • Pipeline results
      • Direct streams of data to the next step(s), redistributing as necessary
    • Provides mechanism to support both vertical and horizontal prallelism
    • Informix 'Dynamic Scalable Architecture' based on volcano
Term
What are some parallel queries?
Definition
  • Enquiry (i.e. how many customer live in UK)
  • Collocated Join (Which customers placed order in july?)
  • Directed Join (Which customer placed orders in July? - Tables have differnt keys)
  • Broadcast Join ( Which customers and suppliers are in the same country?)
  • Repartitioned Join (Which customers and suppliers are in the same country?)
Term
Enquiry (i.e. how many customer live in UK)
Definition
[image]
Term
  • Collocated Join (Which customers placed order in july?)
Definition
[image]
Term
Directed Join (Which customer placed orders in July? - Tables have differnt keys)
Definition
[image]
Term
Broadcast Join (Which customers and suppliers are in the same country?)
Definition
[image]
Term
Repartitioned/Parallel-Hash Join (Which customers and suppliers are in the same country?)
Definition
[image]
Term
What are some characteristics on Query handling?
Definition
  • Critical aspect of Parallel DBMS
  • User wants to issue simple SQL, and not be concerned with parallel aspects
  • DBMS needs structured and facilitates to parallelise queries
  • Good optimizer technology is essential
  • As database grows, effects (good or bad) of optimiser become increasingly significant
  • A single transaction may update data in several different places
  • Multiple transactions may be using the same distributed tables simultaneously
  • One or several nodes could fail
  • Requires concurrency control and recovery across multiple nodes for:
    • Locking and deadlock detection
    • Two-phase commit to ensure 'all or nothing'
Term
What are some characteristics of locking and deadlocks?
Definition
  • With shared nothing architecture, each node is responsible for locking its own data.
  • No global locking mechanism
  • However
    • T1 locks item A on Node 1 and wants item B on Node 2
    • T2 locks item B on Node 2 and wants item A on Node 1
    •  = Distributed Deadlock
Term
What are some approaches to resolving deadlocks?
Definition
  • One approach: Timeouts
  • Timeout T2, after wait exceeds a certain interval
    • Interval may need random element to avoid 'chatter' (i.e. both transactions give up at the same time and then try again
  • Rollback T2 to let T1 to proceed
  • Restart T2, which can now complete
Term
What are some other approaches to resolving deadlocks?
Definition
  • More sophisticated approach (DB2)
  • Each node maintains a local 'wait-for' graph
  • Distributed deadlock detector (DDD) runs at the catalogue node for each database
  • Periodically all nodes send their graphs to the DDD
  • DDD records all locks found in wait state
  • Transaction becomes a candidate for termination if found in same lock wait state on two successive iterations
Term
What are some characteristics of reliability?
Definition
  • We wish to preserve the ACID properties for parallelised transactions
    • Isolation is taken care of by 2PL protocol
    • Isolation implies consistency
    • Durability can be taken care of node-by-node, with proper logging and recobery routines
    • Atomicity is the hard part. We need to commit all parts of a transaction or abort all parts
  • Two-phase commit protocol (2PC) is used to ensure that atomicity is preserved
Term
What are the main characteristits of Two-phase commit (2PC)?
Definition
  • Distinguish between:
    • The global transaction
    • The local transactions into which the global transaction is decomposed
  • Global transaction is managed by a single site, known as the coordinator
  • Local transactions may be executed on separate sites, known as the participants

 

Term
What are the steps of 2 Phase Commits (2PC)?
Definition
  1. Phase 1: Voting
    1. Coordinator sends "prepare T" message to all participants
    2. Participants repond with either "Vote-commit T" or "Vote-abort T"
    3. Coordinator waits for participants to resond within a timeout period
  2. Phase 2: Decision
    1. If all participants return "vote-commit T" (to commit), send "Commit T" to all participants. Wait for acknowledgements within timeout period.
    2. If any participants returns "vote-abort T", send "abort T" to all participants. Wait for acknowledgements within timeout period
    3. When all acknowledgements received, transaction is completed
    4. If a site does not acknowledge, resend global decision until it is acknowledged
Term
What are some characteristics of parallel utilities?
Definition
  • Ancillary operations can also exploit parallel hardware
    • Parallel data loading/import/export
    • Parallel index creation
    • Parallel Rebalancing
    • Parallel Backup
    • Parallel Recovery
Supporting users have an ad free experience!