Shared Flashcard Set

Details

Database Design
University of Arizona - CSc 460 - Final Review
122
Computer Science
Undergraduate 4
12/12/2009

Additional Computer Science Flashcards

 


 

Cards

Term
DBMS
Definition
A software system that enable users to define, create, maintain and control access to the database.
Term
Database
Definition
An collection of related records or files consolidated into a common pool that provides data for one or more multiple uses.
Term

DBMS Components

Definition
  1. Database
  2. Database Administrator
  3. Application Programs
  4. Hardware
Term
Advantages of a DBMS
Definition
  1. Data sharing
  2. Centralized control
  3. Redundancy control (normalization)
  4. Data integrity
  5. Data security
  6. Views
  7. Data independence
Term
Logical Data Independence (LDI)
Definition
  • Conceptual schema is a description of data and relationships
  • HARD to achieve
Term
Physical Data Independence (PDI)
Definition
  • Physical storage can be changed without bothering users
  • EASIER to achieve than LDI
Term
Disadvantages of a DBMS
Definition
  1. DB design is complex
  2. Cost (hardware/software)
  3. Availability (single point of failure)
  4. Speed vs. Flexibility
Term
Data Languages
Definition
  1. Data Description Language - turns conceptual schema into physical DB description
  2. Data Manipulation Language - insert/delete data
  3. Query Language - ask questions
  4. Data Control Language - security
Term
Schema
Definition
The overall description of a database
Term
Architecture (DBMS)
Definition
A description of DBMS components and their interconnections
Term
ANSI/SPARC (External Level)
Definition
Describes end-user perspectives on the data
Term
ANSI/SPARC (Conceptual Level)
Definition
Defines field groupings, relationships between data, etc.
Term
ANSI/SPARC (Internal Level)
Definition

Defines record sizes, field representations, types of indices, etc.

Term
External-Conceptual mapping
Definition
Allows field renamings and provides LDI
Term
Conceptual-Internal mapping
Definition
Converts logical structure to physical representations, and provides PDI
Term
Sources of Read/Write Delay
Definition
  1. Seek time
  2. Rotational delay
  3. Transfer time
Term
Disk Mirroring
Definition

Advantages

  • can operate with a failed drive

Disadvantages

  • cost
Term
Disk Striping
Definition

Advantages

  • performance (parallelism)

Disadvantages

  • no data replication
  • increased prob. of disk failure
Term
RAID (Level 0)
Definition
  • Non-Redundant
  • striping only
  • good write performance
  • no data redundancy
Term
RAID (Level 1)
Definition
  • Mirrored
  • no striping
  • opportunity for parallel reads
  • 2x the number disks ($$$)
Term
RAID (Level 3)
Definition
  • Bit-Interleaved Parity
  • striping unit = 1 bit
  • can recover from disk failure
Term
RAID (Level 5)
Definition
  • Block-Interleaved Distributed Parity
  • striping unit = 1 block
  • parity blocks are distributed across all disks
  • fairly complex controller design
Term
Blocking Factor
Definition
The number of whole records that can be stored within a single block
Term
Index
Definition
A file consisting of structured references to records of another file
Term
Candidate Key
Definition
A key that can uniquely identify a record
Term
Primary Key
Definition
A chosen candidate key
Term
Secondary Key
Definition
Any other field besides the primary key field
Term
Sort Key
Definition
The field on which the records are sorted
Term
Primary Index
Definition
  • Indexed field is a candidate key
  • Indexed records are sorted on the CK
  • The DB file records are sorted on the CK
  • Can have only one per DB file
Term
Clustered Index
Definition
  • The indexed field is not a CK
  • The indexed records are sorted on the key
  • The DB records are sorted on the key
Term
Secondary Index
Definition
  • The indexed field may be any field
  • The index records are sorted
  • The DB records are not sorted
  • Can have as many secondary indices as needed
Term
Dense Index
Definition
  • Holds one record for each in DB file
  • Can do 'existence check' on just the index
Term
Sparse Index
Definition
  • Indexes only a subset of the field values
  • Smaller, therefore faster searching
Term
B-Tree of Order M
Definition
  • Each node contains at most 2M keys
  • Each node (excepting the root) has at least M keys
  • A non-leaf node has at least 2 children
  • All leaf nodes are at the same level
  • A non-leaf node of n keys has n+1 children
Term
B+-Tree
Definition
  • Each key is stored in a leaf node
  • Copies of some keys occupy internal nodes
  • Leaf nodes are linked sequentially left to right to create a sequence set
    • Linear searching
    • No extra storage required
Term
Advantages of a B+-Tree
Definition
  • Has built-in disk pointers for keys in leaves
  • Supports exact-match and range queries

 

Term
Disadvantages of a B+-Tree
Definition
  • 'Wastes' storage capacity of internal nodes (relatively minimal)
  • Insertions and deletions are a bit more complex
Term
Null
Definition
A marker that indicates that a field does not have a value
Term
Foreign Key
Definition
A field in one file whose values are drawn from the primary key field of another file
Term
DB Design Process
Definition
  1. Requirements Analysis - talk to the client
  2. Conceptual Design - create conceptual schema
  3. Logical Design - convert to DBMS' implementation data model (normalization)
  4. Physical Design - physical storage requirements
Term
Entity
Definition
An instance of a general classification
Term
Relationship
Definition
An association between a set of entities
Term
Hierarchical Data Model
Definition
  • Schema is tree-structured
  • Only 1:M relationships (1 parent & M children)
Term
Network Data Model
Definition
  • Graph-based
  • A std. theory of DB Systems
Term
Relational Data Model
Definition
  • Theoretical foundation: set theory
  • Uses no pointers
  • No physical/logical schema distinction
Term
Object Data Model
Definition
  • Could map object data into a tuple (awkward)
  • Object persistence
Term
What are the names and salaries of the people in department #5 (TRC)?
Definition

{ e.surname, e.givenname, e.salary | Employee(e) Λ e.deptid = 5 }

Term
What are the names of the parts that can be supplied by individual suppliers in quantity > 200 (TRC)?
Definition

{ p.Pname | P(p) Λ q (SP(q) Λ p.P# = q.P#

Λ q.Qty > 200) }

Term
What are the names of the active suppliers of nuts (TRC)?
Definition

{ s.sname | S(s) Λ s.status = 'Active' Λ 

(q)(SP(q) Λ s.S# = q.S# Λ

(p)(P(p) Λ q.P# = p.P# Λ

p.pname = 'Nut')) }

Term
What is the content of the Employee Relation (DRC)?
Definition
{<efghi> | <efghi> ∈ Employee }
Term
What are the names and salaries of the people in department #5 (DRC)?
Definition
{ <fei> | (h)(<efghi> Employee Λ h=5) }
Term
What are the names of the parts that can be supplied by individual suppliers in quantity > 200 (DRC)?
Definition

{ <o> | (n)(<nopqr> P Λ

(tu)(<stu> SP Λ n=t Λ

u > 200)) }

Term
What are the names of the active suppliers of nuts (DRC)?
Definition

{ <k> | (jl)(<jklm> S Λ l='Active' Λ

    (st)(<stu> SP Λ j=s Λ

       (no)(<nopqr>P Λ n=t Λ

o='Nut'))) }

Term
Find the names & salaries of the employees in department #5 (Relational Alg.)
Definition
givenname,surname,salary dept#=5(Employee))
Term
What are the names of the active suppliers of nuts (Relational Alg.)?
Definition

sname status>0 ∧ pname='Nut'

S.S#=SP.S#P.P#=SP.P#

(S × (SP × P)))))

Term
What are the names of the parts that can be supplied by individual suppliers in qty > 200 (Relational Alg.)?
Definition

pname qty>200 SP.P#=P.P# (SP × P)))


or

 

pname qty>200 (SP ⋈SP.P#=P.P# P)

Term
Theta Join
Definition
r ⋈θ s where θ is any PK-FK comparison
Term
Equijoin
Definition
A theta join where θ is an equality comparison between PK & FK
Term
Natural Join
Definition
An equijoin in which columns with matching names in both tables are compared, and only one of the matching columns will appear in the resulting table
Term
If we have one of each part in a box, how much does the content weight (SQL)?
Definition
SELECT SUM(weight) from p;
Term
What are the average quantities in which suppliers are supplying parts (SQL)?
Definition

SELECT sno, AVG(qty)

FROM spj

GROUP BY sno;

Term
Outer Joins
Definition
  • Left Outer Join: retains unmatched tuples from left relation
  • Right Outer Join: retains unmatched tuples from right relation
  • Full Outer Join: retains all unmatched tuples
Term
Create a Relation (SQL)
Definition

CREATE TABLE <name> (

     <attr. name> <data type> [NOT NULL],

     ...

     [PRIMARY KEY (<attr>)]

);

 

Term
Creating a View (SQL)
Definition

CREATE VIEW <name>

[(<attr. list>)]

AS <select stmt>;


CREATE VIEW supplierpart("Supplier Name","Part #")

AS SELECT DISTINCT sname, pno

     FROM s, spj

     WHERE s.sno = spj.sno;

 

Term
Updating Content of Tuples (SQL)
Definition

UPDATE <relation name>

SET <attr. name> = <expression> [,...]

[FROM <relation list>]

[WHERE <condition>];

 

Ex:

  UPDATE m SET name = 'Ray' where id = 1;

Term
Deleting Tuples (SQL)
Definition

DELETE FROM <relation name>

WHERE <condition>;

Term
Deleting Relations (SQL)
Definition

DROP [TABLE|INDEX|VIEW|DATABASE] <name>;

Term
Transaction
Definition
A group of one or more operations treated as a single, indivisible unit of work
Term
ACID Properties of Transactions
Definition
  • Atomicity: Xacts are all or nothing
  • Consistency: An Xact's actions retain the consistency of the DB
  • Isolation: Each Xact thinks it's the only Xact in the system
  • Durability: A completed Xact's changes are permanent - won't be lost

 

Term
Transaction States
Definition
  • Active: Xact is being executed
  • Partially Committed: Xact's actions are complete but changes may not yet be on disk
  • Committed: Xact's changes are on disk
  • Failed: The Xact cannot finish, but partial changes still linger
  • Aborted: All changes have been cleared away
Term
Triggers (SQL)
Definition
  • Follow the ECA model:
    • Event - causes the activation
    • Condition - if true, action is performed
    • Action - SQL statement(s)
Term
Disadvantages of Triggers
Definition
  1. Hard to write appropriate actions
  2. Specified separately from relations
  3. Can reduce concurrency of the DBMS
  4. Trigger interaction
  5. Rule Termination
Term
Basic Trigger Syntax (Oracle)
Definition

CREATE TRIGGER <name>

{BEFORE/AFTER} {INSERT/DELETE/UPDATE}

ON <relation>

[ [FOR EACH ROW] WHEN (<condition>) ]

<PL/SQL block>;

Term
Functional Determination
Definition
The set of attributes X functionally determines the set of attributes Y iff whenever any two tuples of the relation agree on their X values, they must also agree on their Y values.
Term
Closure (of a set of attributes)
Definition
Let A be a set of attributes. The closure of A, denoted A+, is the set of attributes that A functionally determines with respect to a set of FD's.
Term
Closure (of a set of FD's)
Definition
Let G be a set of FD's. The closure of G, denoted G+, is the set of FD's that are implied by G, including those of G itself.
Term
Armstrong's Axioms
Definition
  1. Reflexivity: If B ⊆ A, then A→B
  2. Augmentation: If C→D, then:
    1. C→CD
    2. CE→D
    3. CE→DE
  3. Transitivity: If F→G and G→H, then F→H
Term
Covers (of sets of FD's)
Definition
Let G and H be sets of FD's. G covers H if every FD in H is also in G+.
Term
Equivalence (of sets of FD's)
Definition
Let G and H be sets of FD's. G and H are equivalent if G+ = H+.
Term
Minimal Sets (of FD's)
Definition

A set of FD's is minimal if all of the following hold:

  1. All FD's have only 1 attribute on their RHS
  2. No attribute can be removed from the LHS of any FD w/o changing the closure
  3. All of the FD's in the set are needed to retain the closure
Term
Minimal Cover (of a set of FD's)
Definition
A minimal cover of a set of FD's G is a minimal set of FD's that is equivalent to G.
Term
Finding a Minimal Cover of a FD Set
Definition
  1. Put FD's in Standard Form
  2. Minimize the LHS of the FD's
  3. Remove redundant FD's
Term
Normalization
Definition
The process of decomposing relations into relations of lower degree that no longer posses certain undesirable properties.
Term
First Normal Form (1NF)
Definition
A relation is in 1NF if its attributes are not set-valued.
Term
Full Functional Dependency
Definition
An FD X→Y is a full functional dependency if removing any attribute from X destroys the dependency.
Term
Prime Attribute
Definition
An attribute that is a member of any CK of the relation
Term
Second Normal Form (2NF)
Definition
A relation R is in 2NF if every non-prime attribute of R is fully functionally dependent upon every CK of R.
Term
Trivial Functional Dependency
Definition
An FD X→A is a trivial FD when A ⊆ X
Term
Superkey
Definition
A superkey is a set of attributes that includes a candidate key.
Term
Third Normal Form (3NF)
Definition

A relation R is in 3NF if, for every non-trivial FD X→A that holds in R, either:

  1. X is a superkey of R, or
  2. A is a prime attribute of R.
Term
Boyce-Codd Normal Form (BCNF)
Definition
A relation R is in BCNF if, for every non-trivial FD X→A in R, X is a superkey of R.
Term
Multidependency
Definition

Let R be a relational schema, and let A,B & C be subsets of R's attributes. B is multidependent on A (or A multidetermines B) iff the set of B values matching a given (A,C) pair of values depends only on the A set and is independent of the C set.

Term
Fourth Normal Form (4NF)
Definition
A relational schema is in 4NF iff whenever there exist attribute subsets A and B of R such that A multidetermines B, then all attributes of R are also functionally dependent on A.
Term
CREATE USER (SQL)
Definition

CREATE USER <username> [<options>];

  • PASSWORD or IDENTIFIED BY <password>
  • quotas
  • limits on access (e.g. time)
  • capabilities (to create DB's, to add users, etc.)
Term
GRANT (SQL)
Definition

GRANT <privilege>

[ON <object>]

TO <user>

[WITH GRANT OPTION];

 

Ex:

  GRANT CREATE VIEW to smithj;

  GRANT SELECT ON supplier TO PUBLIC;

Term
REVOKE (SQL)
Definition

REVOKE <privilege>

[ON <object>]

FROM <user>;

 

Ex:

  REVOKE ALL PRIVILEGES

  ON supplier

  FROM PUBLIC;

Term
Hierarchy of Security Classes (MAC)
Definition

TS - Top Secret

S - Secret

C - Classified

U - Unclassified

 

TS > S > C > U

Term
Bell-LaPadula Model
Definition

Subjects (users, groups, etc.) have clearances; Objects (tables, tuples, etc.) have classifications

  1. Simple Security Property: S can read O only when class(S) ≥ class(O)
  2. Star Property: S can write O only when class(S) ≤ class(O)
Term
DBMS Security Issues
Definition
  1. Availability - to authorized users only
  2. Confidentiality - preserve secrecy
  3. Integrity - prevent data corruption and/or loss
Term
Confidentiality
Definition

Provide security from a more 'external' perspective, via:

  1. Quality passwords
  2. Data encryption (public-key encryption, digital signatures, etc.)
Term
Integrity
Definition

Be Able to recover DB's after accidents/disasters:

  • Maintain off-site backups
  • Log/journal DBMS actions
  • Use a non-0-level of RAID
Term
SQL Injection
Definition
A DBMS attack in which a user tries to add (inject) SQL into an incomplete query to get the DBMS to reveal additional information
Term
Preventing SQL Injection
Definition
  • Sanity-check all user input (i.e. range-checking)
  • Trim excess characters from input strings
  • Accept raw input only when necessary (i.e. use option lists)
Term
Data Warehouse
Definition
A DBMS of data from many sources used to support an organization's decision-making
Term
Data Mart
Definition
A small data warehouse supporting a portion of an organization
Term
Characteristics of a Data Warehouse
Definition
  • Usually orders of magnitude larger than an OLTP system
  • Used to generate data summaries and to identify trends
  • Often known as On-line Analytical Processing (OLAP)
  • Serves as a key input source for data mining algorithms
Term
Data Flow
Definition
  • Cleaning - attempts to detect/correct errors and omissions in the data
  • Reformatting - structures the data to match the schema of the DW
  • Metadata ("data about data") - shows the origins of the DW's content
  • Backflushing - returning cleaned data to the source DBMSes for re-integration
Term
Data Design for Data Warehouses
Definition
  • The basic OLAP schema is the Star Schema
    • Fact table is at the center
  • A variation: The Snowflake Schema
    • Becomes more generalized away from the center
Term
Querying a Data Warehouse
Definition

Slicing: An equality selection in 1 or more dimensions
Dicing: A range selection in 1 or more dimensions
Pivoting: Changing the orientation (viewpoint) by rotating the n-dimensional array to 'hide' a dimension.

Roll-up: Moves from the current level in a dimension to the next more general level (ex. Dept→College)

Drill-down: Moves from the current level to the next more specific level (ex. Vegetable→Carrot)

Term
Data Mining
Definition
The discovery of patterns in, or rules about, the content of a data warehouse
Term
Goals of Data Mining
Definition
  1. Predictions of future events (DANGEROUS)
  2. Identification of events (ex. system intrusion)
  3. Classification (ex. Which discounts appeal to which customers?)
Term
Applications of Data Mining
Definition
  • Marketing - customer behavior, targetted emailing
  • Finance - credit ratings, fraud detection
  • Manufacturing - optimizing processes & resources
  • Health care - drug side-effects, treatment effectiveness
Term
Itemset
Definition
The items contained within a grouping of selected items, ignoring quantities
Term
Support (of an itemset)
Definition
The percentage of all transactions that contain the itemset
Term
Support (of Association Rules)
Definition

The support of a rule is the fraction of transactions that satisfy the rule.

 

support(L=>R) = support(L∪R)

Term
Confidence (of Association Rules)
Definition

The confidence of a rule is the fraction of itemsets containing L that also contain R.

 

conf(L=>R) = supp(L∪R) / supp(L)

 

 

Term
Order of Operations (Query Trees)
Definition
  1. Join(s)
  2. Select(s)
  3. Project(s)
Term
Query Optimization
Definition
The process of generating and selecting different potential query operation orderings
Term
Algorithms for Project and Select
Definition

Project: Sequential Scan of relation

 

Select:

  1. Sequential Scan (full table scan, linear search)
  2. Binary Search
  3. Index Scan
Term
Join Algorithms
Definition
  1. Nested-Loops Join (NLJ)
  2. Sort-Merge Join (SMJ)
  3. Hash Join
Term
Influencing the Query Optimizer
Definition
  1. Change operators (ex. Do joins w/ nested queries)
  2. Insist on a join ordering (ex. Replace "from s,sp" with "s JOIN sp")
Term
Data Storage Pyramid
Definition

/    CPU Registers     \

/      Cache Memory      \

/    Main Memory (RAM)    \

/ Solid-State Devices (SSDs) \

/  Hard (Magnetic Disk) Drives \

/ Tapes/Optical Disks (Libraries) \

 

 

Supporting users have an ad free experience!