Shared Flashcard Set

Details

COMP3017 - Lecture 11 - NoSQL Databases
COMP3017 - Lecture 11 - NoSQL Databases
15
Computer Science
Undergraduate 2
05/04/2014

Additional Computer Science Flashcards

 


 

Cards

Term
What does the impedance mismatch consist of?
Definition
  • To store data persistently in modern programs:
    • A single logical structure
    • Must be split up (i.e. normalized)
  • Object orientation
    • Based on software engineering principles
  • Mapping from one world to the other has problems
Term
Which are some trands?
Definition
  • Data size
    • Can be dealt by building bigger databases
      • Expensive + physically limited
    • Dealt by building clusters of smaller machines
      • Complex
  • Increased data connectivity
Term
What are the problems with relational databases?
Definition
  • RDBMS have fundamental issues in dealing with horizontal scale
    • Designed to work on single, large machines
    • Difficult to distribute effectively
  • Mode subtle: an impedance mismatch
    • Create logical structures in memory and then rip them appart to stick them in RDBMS
    • RDBMS datamodel often disjoint from its intended use (normalization not always good)
    • Unconftable to program with (joins and ORM, etc)
Term
What are the key attributes of NoSQL?
Definition
  • Non-relational (thought they can be, but aren't good at it)
  • Schema free (except the implicit schema, application side)
  • Inherently distributed (In different ways, some more than others)
  • Open source (Mostly. eg. Oracle's NoSQL)
Term
What are some key value basics?
Definition
  • "A hashtable with persistance"
  • Use a key, ask a database for a value
  • Key is usually a string
  • Value can be anything (DB often unaware of value content)
  • Examples:
    • Riak
      • Buckets/keys/values/links
      • Query with key, process with map-reduce
      • Secondary indexes(metadata)
    • Redis
      • More understanding of value types
      • In memory (very fast)
Term
What are the basics for document databases?
Definition
  • Database as storage of a mass of different documents
  • Is a complex data structure
  • Can only contain data from other docuemnts
  • Document Data Stores understand their documents
    • Queries can run against values of document fields
    • Indexes can be constructed for document fields
    • Batch style (mapreduce etc.) often supported
  • Examples
    • MongoDB
      • Master/slave design
      • .find() queries like ORM
      • geo-spatial indexing
    • CouchDB
      • Master/master
      • Only map reduce queries
      • Favours availability to consistency
Term
What are the basics of column databases?
Definition
  • Entries held in rows
    • Rows have unique keys
  • Tables define a set of "column families"
    • Rows contain 0 or more columns for each column family
  • No schema (Columns in a family per row)
  • On querying:
    • Key lookup is fast
    • Batch processing via mapreduce (OLAP lives here)

[image]

Examples:

  • Primarily for batch processing
  • HBase
    • Uses HDFS for storage, hadoop for processing
    • Built to trasure consistency over avilability
  • Cassandra
    • Supports key ranges
    • Works over a variety of processing architectures (Hadoop, storm, etc)

 

Term
What are some characteristics of aggregates?
Definition
  • These DBs represent a variety of "aggregate" databases
    • Columns, Key-values, documents
  • They store some form of self contained thing that is useful in issolation
    • A document in MongoDB
    • The column families in Hbase
    • The values in Riak
  • Many leverage this for scale
    • It completely side steps the sharding issues in RDBMS
Term
What are some basics on Graph DBs?
Definition
  • Focus on modelling the data's structure
  • Graphs are composed of Vertices and Edges
  • Queried with graph traversal API (Cypher, SPARQL)
  • Can be much faster at querying graph like data structures
    • Like friends of friends or web links
  • Examples
    • Neo4j
      • Not distributed 
      • ACID transactions
      • Cypher for query
    • 4Store (5Store, other triple stores
      • RDF and Semantic Web technologies
      • 5store supports 1000s of machines easily
      • SPARQL for query
Term
Give a definition of ACID
Definition
  • Atomic - Entire transaction succeeds or the entire transaction rolls back
  • Consistent - A transaction must leave the database "valid"
  • Isolated - Concurrent transactions behave as though they occurred serially
  • Durable - Once committed, transactions survive power loss, etc
Term
What is the CAP theorem?
Definition
  • Consistent - Writes are atomic, all subsequent requests retrive the new value
  • Available - the database will always return a value as long as the server is running
  • Partition tolerant - The system will still function even if the cluster network is partitioned (i.e. the cluster looses contact with parts of itself)
Term
What is an alternative of ACID?
Definition
  • If we want CAP P, ACID can be restrictive, it is possible to use BASE
  • Basic Availability - The application works basically all the time
  • Soft-state - does not hav eto be consistent all the time
  • Eventual consistency - But will be in some known state eventually
Term
What is statend by eventual consistency?
Definition
  • A work around of CAP
  • From Amazon's Dynamo paper: "The storage system guarantees that if no new updates are made by the object, eventually all accesses will return the last updated value"
Term
Explain Multi Version Concurrency Control in Eventual Consitency (MVCC)
Definition
  • Some document DBs use multi-version concurrency control (MVCC)
    • Like a version control system
    • Writes without locks
    • Multiple versions of documents
  • Distributed versions on different machines
  • Collisions detected during replication
  • App developer can be informed/decide on make collisions
  • Used by CouchDB
Term
Explain Vector Clocks in Eventual Consitency (MVCC)
Definition
  • An extension of Lamport timestamps
  • They help understand the order of events in a distributed system
  • Vector clocks can be used to
    • Identify the provenance of an Item of data
    • Decide order in which data was changed
    • Help resolve conflicts
    • Flag inconsitencies for app specific decisions
  • Used by amazon's Dynamo and Riak
Supporting users have an ad free experience!