Shared Flashcard Set

Details

CS 347 - Final Exam Notes
Miranker
113
Computer Science
Undergraduate 3
10/22/2013

Additional Computer Science Flashcards

 


 

Cards

Term
Database
Definition
a collection of data
Term
Database Management System (DBMS)
Definition
a software system responsible for acting as the go-between a user and a database, offering a set of services on that database
Term
Three tier architecture
Definition
Presentation tier
Application tier
Data tier
Term
Presentation tier
Definition
What’s presented to the user; the UI

In simple terms it is a layer which users can access directly such as a web page, or an operating systems GUI
Term
Application tier
Definition
AKA (business logic, logic tier, data access tier, or middle tier)

The query engine
● The logical tier is pulled out from the presentation tier and, as its own layer, it controls an application’s functionality by performing detailed processing.
Term
Data tier
Definition
The database

This tier consists of database servers. Here information is stored and retrieved. This tier keeps data neutral and independent from application servers or business logic. Giving data its own tier also improves scalability and performance.
Term
Relation schema
Definition
The name of a table, its attributes (column names)

○ Name(First, Last)
○ T1(a1, a2, … , an)
Term
Database schema
Definition
a set of relation schema

{R(a, b), S(c, d)}
Term
SQL and its 3 components
Definition
DDL, DQL, DML
Term
DDL
Definition
Data Definition Language
Used to create databases
CREATE TABLE
Term
DQL
Definition
Data Query Language
Used to get data from database
SELECT FROM WHERE queries
Term
DML
Definition
Data Manipulation Language
Used to alter the data in the database
● INSERT INTO
● UPDATE
Term
Super Key
Definition
a set of attributes such that for all tuples in the table, no two have the same values for those attributes.
Term
Candidate Key
Definition
the minimum amount of attributes required to identify a tuple from all other tuples in table
Term
Foreign Key
Definition
An attribute of a table that references a candidate key in another table
Term
Index Key
Definition
an attribute or set of attributes that serve as the basis of organization of an index structure.
Term
Primary index
Definition
An index that determines the locations of the records of the file
Term
Secondary index
Definition
An index that does not determine the locations of the records.
Term
Data model
Definition
A data model is a conceptual representation of the data structures that are required by a database application.

Includes logical/physical models in MagicDraw
Term
Steps of data modeling
Definition
■ planning and analysis
■ conceptual design // logic without the details
■ logical design
■ physical design
■ implementation
Term
Entity
Definition
anything in a model that can be named.
Term
Attribute
Definition
description of an entity’s characteristics. all instances of the same entity have the same attributes
Term
Identifier
Definition
an attribute of an entity that identifies it. in SQL, this would be equivalent to the primary key of the entity. But entities don’t have keys. They have identifiers.
Term
Relationship
Definition
Any connection between two or more entities. Relationships can have attributes, and can involve more than two entities.
Term
Logical Model
Definition
the data model with no physical details. Includes entities (tables), attributes (columns/fields) and relationships (keys)
Term
Physical Model
Definition
the data model with physical details. Includes Includes tables, columns, keys, data types, validation rules, database triggers, stored procedures, domains, and access constraints
Term
DDL
Definition
the database definition language file that’s generated from the physical model of the data model.
Term
Package (UML modeling)
Definition
A collection of UML objects, similar to a package in java
Term
Class (UML modeling)
Definition
A data model unit that represents a single relational schema
Term
Association (UML modeling)
Definition
A symbol used to signify foreign key/primary key relations in the UML diagram
Term
Multiplicity
Definition
one to one, one to many, many to many; details how many objects are involved in the association
Term
Aggregation
Definition
When one object contains a bunch of other ones, it is an aggregational relationship. This is a form of ‘weak’ association; the aggregatees do not die when the aggregator dies.
Term
Indentifying/Nonidentifying
Definition
Identifying attributes are those that can be used to reference a tuple in a relation. Pretty much just primary keys. Identifying attributes are unique in the schema.
Term
Inheritance (UML modeling)
Definition
Inheritance can be used to simulate complex parent child relations, such as : employee is a person. therefore, employee shall inherit from person.
Term
How are individual constructs of a logical model compiled to a physical model?
Definition
Abstract tables and types are given physical SQL names
Term
What additional details must (and how) be addressed in the physical model specification panel when transforming a logical model?
Definition
Associations must be transformed into foreign key relationships and ID values
Term
Design Patterns and Models
Definition
You should that repeatable patterns in models may connect to abstractions of problem solving at a higher level.
Term
Referential integrity constraints (how to define)
Definition
In oracle sql, referential integrity constraints are defined by using the A REFERENCES B(x) syntax.
Term
Referential integrity constraints (how they operate)
Definition
Conceptually, foreign keys allow us to create relations amongst the data that enforce consistency. By maintaining consistency, we can have more reliable data that is free of errors and we can use features like ON DELETE/UPDATE to keep data that way.
Term
Not null
Definition
“NOT NULL” means that if a tuple is created, the constraint can at no time be ‘null.’ It must have a value
Term
Primary key
Definition
A primary key is a value in a tuple that must be unique for all the tuples in a relation.
Term
Check
Definition
In SQL, a ‘check’ is a constraint used to ensure that a value is within a set of specified values. This can be used to create enums.
Term
Triggers
Definition
used as another form of referential integrity. Triggers are checks that have certain conditions that run during specified events and can perform specified actions. For the most part, triggers are just like if (this happens) then (do this) statements.
Term
View (syntax/semantics)
Definition
Views allow us to do a few things. We can mask of sensitive parts of a table from prying eyes, and we can also get a sub-table out of a table that might be frequently used without ever touching a certain set of attributes.
Term
View (use as a subroutine mechanism)
Definition
Queries that contain sub-queries can sometimes be simplified by first setting up a view that gets the info needed out of a subquery.
Term
View (use per logical restructuring of a database)
Definition
views can also be used to create a ‘view of how customers see the database,’ allowing us to interact with the data as a customer would by masking out all of the back-end specific implementation details.
Term
Supply chain
Definition
The management of resources in a production line
Term
EDI (Electronic Data Interchange)
Definition
A high speed data interchange that allows companies to quickly transfer data between one another

Used to enable commerce between partners
Term
Select operator
Definition
returns a relation containing tuples that satisfy specified condition

σcondition(relation)
Term
Project operator
Definition
Used to remove attributes from a table so that less data has to be handled

πa1,a2 (relation)
■ returns a relation that contains only attributes a1, a2 of the relation

Is similar to SQL SELECT but it removes columns instead of tuples
Term
Join operator
Definition
○ Taking the attributes of two tables, comparing them, and then creating a new relation based on how the data compares
○ Used to join the data of two tables
○ how the data is compared depends on the type of join
Term
Group by
Definition
○ returns a selection of the relation grouped by a selected attribute
○ For example, Table(ShipperName, QuantityShipped)
■ Selecting shippername.count(quantityshipped) would give you the total number of things shipped, in a single tuple
■ Selecting it with groupby(shippername) will keep each distinct shippername separate, so that you get a total number of items shipped by each individual shipper.
Term
Ordered by
Definition
returns the relation sorted by one or more columns (and if specified in ascending or descending

ORDER BY columnName ASC|DESC
Term
Aggregation functions
Definition
Aggregation functions are functions that can be used to tell you data about the table, not the actual data of the table itself

ex: MIN - returns the minimum value of an attribute
MAX - returns the max value of an attribute
COUNT - returns the number of attributes that have values in a relation
Term
Algebraic Identifiers
Definition
you can use + and - on relations that have the same attributes

ex: R+S concatenates the rows of R and S
R-S is equal to the relation R after you remove all the tuples in R(intersect)S
Term
Subqueries/Nested queries
Definition
Queries can be nested within each other, creating a tree like-structure of queries

■ For example,
● select * from t1 where t1.x = (select * from t2)
● Nested select query
■ Leads to the question of which order should the queries be executed to maximize speed with respect to the tree
Term
OLAP steps
Definition
Step I

identify multidimensional data


Step II

Analyze multidimensional data into cross-tabulation


Step III

Visualize n-dimensional cube - data cube


Step IV

After you design the cube, you will use the cube's structure to build a relational database (known as a star schema) to house the data for the cube

Step V

Once you load data into the relational database, and then into the cube, you'll be able to see how attributes, dimensions, measures, and measure groups fit together within a cube to create a powerful analytical tool.
Term
Federated/Enterprise Schema
Definition
an element of data warehouse, is basically ● A database database. Sort of a ‘meta database’ that companies can use to keep track of their databases
Term
OLAP (on-line analytic processing)
Definition
groupby/aggregation queries


Basic idea: converting data into information that decision makers need



Concept to analyze data by multiple dimension in a structure called data cube
Term
Dense index
Definition
an index where there is a key-pointer pair to every record in a data block
Dense index requires more space, but is able to tell if a record exists by a quick look up (sparse must iterate through block)
Term
Sparse index
Definition
There is one key-pointer pair for every block

A lot more cost efficient on disk, doesn’t take up as much space
Term
B+tree (primary index)
Definition
● Usually sparse
Used to cluster the primary keys of a table on the disk
Term
B+tree (secondary index)
Definition
usually dense
Term
Quantitative aspects of a B+tree
Definition
The lowest levels of a B tree are all leaves/data, everything else is just keys pointing downward in the tree
Every internal node is at least half full (except for the root)
Term
Parsing (query evaluation)
Definition
process of turning a query into a parse tree that represents the structure of the query itself
Term
Logical Plan (query evaluation)
Definition
parse tree is transformed into expression tree of relational algebra
Term
Physical Plan (query evaluation)
Definition
● logical plan turned into the physical plan
● indicates operations performed, in which order, the algorithm used to perform the step, ways data is stored and passed around
Term
table_scan
Definition
Scan all the blocks of the disk looking for the required data
Term
index_scan
Definition
If there’s an index on the data, that can be used to look for required data
Term
Nested loops
Definition
Nested Join is just a simple loop through R loop through S compare their tuples and output the result
Term
Merge Join
Definition
○ Sort the attribute that is to be joined, and then merge it on that attribute
○ quicker because there’s no searching, if the attributes match, then they’ll show up at the same time
Term
B(R)
Definition
number of blocks in R
Term
T(R)
Definition
number of tuples in R
Term
T(R)/B(R)
Definition
number of tuples per block
Term
V(R, a)
Definition
number of distinct values for attribute a on R
Term
Linear cost of table_scan
Definition
c * B(R)

c is cost of finding a block
Term
Affine cost of a table_scan
Definition
c + c’ (B(R) -1)
c is cost of finding a block, c’ is the seek time of finding adjacent block
Term
Cost of an index_scan
Definition
Only one cost - T(R)/B(R)
Term
size of a select from relation R
Definition
T(R) / (V(R, x1) * V(R, x2) … * V(R, xn)) for all x you’re selecting on
Term
size of a join on relations R and S
Definition
T(R) * T(S) / (max(V(R, x), V(S, x)) * max(V(R, y), V(S, y) …)
○ the denominator is the max of V() for all attributes that R and S have in common
Term
Greedy rules e.g. pushing selects
Definition
execute select and project before the joins
Term
Dynamic programming method of optimizing join orders
Definition
● bottom up style
● Steps: start with A, B, C, D
○ consider two way joins (AB, AC, AD, BC, BD, CD, threeway joins (ABC, ABD, ACD, BCD), etc - consider minimal cost of all possible plans
● extreme case of left-deep join trees with n! possible sequences - found in 2n-1 steps
Term
Definition and use of a query graph
Definition
● For query Q, a query graph is a labeled graph (V,E) where:
○ each vertex in V (Vi) corresponds with a relation in R (Ri)
○ an edge exists from Vi to Vj if predicate P(Ri, Rj) exists in Q
● Pros: avoids Cartesian products


● Steps:
○ remove vertex V from query graph and add to join plan
○ choose vertex in remaining query graph but connected edge-wise to plan
Term
Transactions
Definition
A transaction is a collection of actions upon a database bundled up as a single package that contain follow the ACID properties.
Term
A in ACID
Definition
■ A…TOMICITY
All of the actions in a transaction happen, or NONE of them do
Term
C in ACID
Definition
■ C...ONSISTENCY
● If the database was consistent before the transaction, it will remain consistent after the transaction
Term
I in ACID
Definition
■ I...SOLOATION
● the execution of a transaction is unaffected by other transactions
Term
D in ACID
Definition
■ D…URABILITY
● if a transaction commits, then the effects of that transaction persist in the database
Term
Architectural Component of RDBMS (i.e. logs)
Definition
● What enable transactions to occur
● A logbook of every event that happens in the DBMS
● If a transaction does not complete, then it is not added to the log
Term
JDBC
Definition
Java Database Connectivity
Term
JDBC Steps
Definition
■ establish running programs/drivers (database server & app/client driver)
■ create connectivity (“open a connection”)
■ ask something of the database using Statements
■ process returned results (after executing queries)
Term
MapReduce Master/Slave architecture
Definition
Term
MapReduce Two phase algorithm
Definition
Map: ● function written by user that takes input data, and outputs a key/value pair
● similar to GROUPBY

Reduce: ● function that takes a list of key,value pairs that were generated by the Mapper and condenses them down
● simliar to COUNT
Term
Example of a word counting program
Definition
● Mapper would take each word, and emit ● Reducer would take all of these and then count up all of them ● ● reduces to
Term
Data Warehouse
Definition
a collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time.
Term
Data Warehouse (parts)
Definition
Development of a data warehouse includes development of systems to extract data from operating systems plus installation of a warehouse database system that provides managers flexible access to the data.
Term
Stable store
Definition
Stable storage must be provided to prevent data loss

Um I guess you have paired sectors that you write data to with parity checks to make sure everything is ok
Term
RDF
Definition
Resource Description Framework Has triples: Triplets are used on a directed label graph
Term
Two Phase Sort
Definition
Merge Sort - divide and conquer
Divide (sort) and Merge
Term
Joins are commutative and associative
Definition
Term
Left-deep join
Definition
right child is always a relation, not a result of a join
Term
How to compute I/O cost for a select using a primary key (linear)
Definition
Since we're using a primary key, there could only be at most one record with that value. So it would be 1 I/O lookup
Term
How to compute I/O cost for a select using a compound primary key where all the values are given in the select query (linear)
Definition
this is essentially one unique primary key so at most 1 record, 1 I/O lookup
Term
How to compute I/O cost for a select using a compound primary key where not all the values are given in the select query (linear)

where x = a AND y = b etc
Definition
order of the compound key is important because it tells you the order to which the attributes were sorted. Whichever is the first attribute in the compound key, take the total number of blocks of the relation and divide it by the number of unique vales of that attribute. Take this number and divide by the number of unique values of the second attribute and so on until you're done. I guess you take the ceiling of the final number you end up with (may very well be a fraction as to which case the I/O is just 1 since the answer is stored on a fraction of a block). The total number of records would be the final number * (total number of tuples/the number of blocks for the relation)
Term
If there's no indexing, you have to do a table_scan when trying to figure out I/O cost
Definition
Term
Figuring I/O with an AND clause in a select
Definition
take the total number of blocks and divide by the number of unique values of the first attribute. take this number and divide by the number of unique values of the second attribute and so. this well tell you how many blocks (I/O accesses) hold that info. just times this number by tuples/blocks to get the number of records this will return
Term
Figuring I/O with an OR clause in a select
Definition
You have to do the union law:
AUB = A+B-|A intersect B|

so for example, see how many unique values there are of A, times this by the records/block. this will give you the number of blocks holding A

see how many unique values there are of B, times this by the records/block. this will give you the number of blocks holding B

figure out how many blocks hold them together (remember the divide trick

then add A+B - AandB
Term
Figuring out the size of a join query
Definition
T(R)T(S)/Max of the unique value of the attribute they have in common
Term
Figuring out the size of a join query - no common attributes
Definition
T(R)T(S)
Term
Figuring out the size of a join query, multiple shared attribute
Definition
T(R)T(S)/Max of attribute 1 between the two * Max of attribute 2 between the two
Term
How to figure out the number of blocks and I/O for a dense, sequential B tree
Definition
Take the number of records and divide it by the number of records per block.
-------------------------------------
Take the total number of records and divide it by the number of pointers and keep successfully dividing the answer by the number of pointers until you can't anymore.

Add up the total number of blocks and however many divisions you did (including the first one) + 1 is the number of IO
Term
How to figure out the number of blocks and I/O for a sparse, sequential B tree
Definition
Take the number of records and divide it by the number of records per block

Use this number and divide it by the number of keys until you can't anymore


Add up the total number of blocks and however many divisions you did (including the first one) + 1 is the number of IO
Term
How to figure out the number of blocks and I/O for a dense B tree in no particular order
Definition
same process as if the B tree was sorted
Supporting users have an ad free experience!