Shared Flashcard Set

Details

COMP3017 - Lecture 8 (B) - Data Streams
COMP3017 - Lecture 8 (B) - Data Streams
24
Computer Science
Undergraduate 2
05/04/2014

Additional Computer Science Flashcards

 


 

Cards

Term
What are several assumptions that traditional DBMS make?
Definition
  • Persistent Data storage
  • Relatively static records
  • (Typically) no predefined notion of time
  • Complex one-off queries
Term
What are some requirements in applications that make Data Streams Useful? Give some application domains
Definition
  • Data arrives in real time
  • Data is ordered (implicitly by arrival time or explicitly by timestamp)
  • Too much data to store
  • Data never stops coming
  • Ongoing analysis of rapidly changing data

Application domains

  • Network monitoring and traffic engineering
  • Sensor networks, RFID tags
  • Telecommunication call records
  • Financial applications
  • Web logs and click streams
  • Manufacturing processes
Term
What are the key principles of Data Streams?
Definition
  • A potentially unbounded sequence of tuples
  • Transactional data streams: log interaction between entities
    • Credit card: purchases by consumers from merchants
    • Telecommunications: phone calls by callers to dialed parties
    • Web: accesses by clients of resources at servers
  • Measurement data streams: monitor evolution of entity states
    • Sensor networks: physical phenomena, road traffic
    • IP network: traffic at router interfaces
    • Earth climate: temperature, moisture, etc.
Term
Give a comparison of One-time vs Continuous Queries
Definition
  • One-time queries
    • Run once to completion over the current dataset
  • Continuous queries
    • Issued once and then continuously evaluated over a data stream
      • Notify me when the temperature drops below X
      • tell me when prices of stock Y > 300
Term
Give a comparison of DBMS vs DSMS
Definition
DBMS DBMS
Persistent relations (relatively ststic, stored) Transient streams (on-line analysis)
One-time queries Continuous Queries (CQ)
Random Access Sequential Access
Unbounded disk store Bounded main memory
Only current state matters Historical data is important
No real-time services Real TIme requirements
Relatively low update rate Data at fine granularity
assume precise data Data scale/imprecise
Access plan determined by query processor, physical db design Unpredictable/variable data arrival and characteristics
Term
What are the architectural issues of DBMS vs DSMS?
Definition
DBMS DBMS
Resource (memory, disk, per-tuple computation) rich Resource limited
Extremely sophisticated query processing Reasonably complex, near real time query processing
Useful to audit query results of data streams Useful to identify what data to populate in database
Query evaluation: Arbitrary Query evaluation: One pass
Query Plan: Fixed Query Plan: Adaptive
Term
What are the concepts Continuous Query Language is based on?
Definition
  • Queries produce/refer to relations and streams
  • Based on SQL, with the addition of:
    • Streams as new data type
    • Continuous instead of one-time semantics
    • Windows on streams (derived from SQL-99)
    • Sampling on streams (basic)
Term
What are the principles of Query Processing?
Definition
  • Construct query plan based on relational operators, as in an RDBMS
    • Selection 
    • Projection
    • Join
    • Aggregation (group by)
  • Combine plans form continuous queries (reduce redundancy)
  • Stream tuples through the resulting network of operators
Term
What are tuple-at-a-time operators?
Definition
Evaluation only requires consideration of one tuple at a time (Selection and projection)
Term
What are the concepts of full relation operators?
Definition
  • Some full relation operators can work on a tuple at a time
    • Count, sum, average, min, max (even with group by)
    • (Order by, however, can't)

[image]

 

Term
What are some things that full relation operators can't do
Definition

Intersection, difference, product, join

(Union, however, can be evaluated tuple-by-tuple)

Term
What relational operators may block when applied to streams?
Definition
  • No output until entire input seen, but streams are unbounded
  • Joins may need to join tuples that are aritrarily far apart
Term
What does relation/stream translation consist of?
Definition
  • Some relational operators can work directly on streams
    • Selection, projection, union, some aggregates
  • Some relational operators need to work on relations
    • Join, product, difference, interseciton, other aggregates
  • Stream-to-relation operators
    • Windows
  • Relation-to-stream operators
    • Istream, Dstream, Rstream
Term
What does Windowing consist of in stream operations?
Definition
  • Mechanism for extracting a finite relation (synopsis) from an infinite stream
  • Various window proposals for restricting operator scope.
    • Windows absed on ordering attribute (e.g. last 5 minutes of tuples)
    • Windows absed on tuple counts (e.g. last 1000 tuples)
    • Windows based on explicit markers e.g. punctuations)
    • Variants (e.g. partitioning tuples in a window)
  • Various window behaviours
    • Sliding, tumbling
Term
Show a diagram of a sliding window
Definition
[image]
Term
What are some characteristics of join evaluation?
Definition
  • Consider a stream-based join operation
    • a conventional join over a pair of windows on the input streams
    • outputs a stream of tuples joined from the input streams
Term
What is the scalability and completeness of DBMS and DSMS?
Definition
  • DBMS deals with finite relations
    • Query evaluation should produce all results for a given query
  • DSMS deal with unbounded data streams
    • May not be possible to return all results for a given query
    • Trade-off between resource use and comlpeteness of result set
    • Size of buffers used for windows is one example of a parameter that affects resource use and completeness
    • Can further reduce resource by randomly sampling from streams
Term
What are some Relation-to-Stream Operators?
Definition
  • Insert Stream (Istream)
    • Whenever a tuple is inserted into the relation, emit it on the stream
  • Delete Stream (Dstream)
    • Whenever a tuple is deleted from the releation, emit it on the stream
  • Relation Stream (Rstream)
  • At every time instant, emit every tuple in the relation on the stream
Term
Give an example CQL query (uses window)
Definition

SELECT Istream(*)

FROM S [rows unbounded] 

WHERE S.A > 10

 

S is converted into a relation (of unbounded size)

Resulting relation is converted back to a stream via Istream

Term
Given an example of CQL query (No window)
Definition

SELECT *

FROM S

WHERE S.A > 10

 

S is a stream - query plan involves only selection, so window is now unnecessary

Term
Give an example of a CQL query (window specified on streams)
Definition

SELECT *

FROM S1 [rows 1000], S2[range 2 minutes]

WHERE S1.A > S2.A and S1.A > 10

 

Windows specified on streams:

  • Tuple based sliding window - [rows 1000]
  • Time-based sliding window - [range 2 minutes]
Term
Give an example of a CQL query (Probing stored table)
Definition

SELECT Rstream(S.A, R.B)

FROM S1 [now], R

WHERE S1.A = R.A

  • Query probes a stored table R based on each tuple in streams S and streams the result
    • [now] - time based sliding window containing tuples in the last time step
Term
What are some key concepts on query optimization?
Definition
  • Traditionally relation cardinalities used in query optimizer
    • Minimize the size of intermediate results
  • Problematic in a streaming environment
    • All streams are unbounded = infinite size
  • Need novel optimization objectives that are relevant when input sources are streams 
    • Stream rate based (e.g. NiagaraCQ)
    • Resource-based (e.g. STREAM)
    • QoS based (e.g. Aurora)
  • Solution: Continuous adaptive optimization
Supporting users have an ad free experience!