Shared Flashcard Set

Details

COMP3017 - Lecture 10 - Data warehousing
COMP3017 - Lecture 10 - Data warehousing
22
Computer Science
Undergraduate 2
05/04/2014

Additional Computer Science Flashcards

 


 

Cards

Term
What is OLTP?
Definition
  • On-line transaction processing
    • Traditional workloads, 'bread and butter' processing
    • Volumes of data, transactions grow, networks getting larger
Term
What is OLAP?
Definition
  • On-line analytical processing
    • Includes the use of data warehouses
    • Multidimensional databases
    • Data analysis
  • The name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exagetical, contemplative and formulaic data analysis models (exagesis=critical explanation, complentative=asking what if questions, formulaic=which parameters must be varied in order to achieve a given outcome)
Term
What are the 12 rules of OLAP?
Definition
  1. Multidimensional conceptual view
  2. Transparency
  3. Accessibility
  4. Consistent reporting
  5. Client server architecture
  6. Generic dimensionality
  7. Dynamic sparse matrix handling
  8. Multi-user support
  9. Unrestricted cross-dimensioal operations
  10. Intuitive data manipulation
  11. Flexible reporting
  12. Unlimited dimensions and aggregation levels
Term
What are the key concepts of Data Mining?
Definition
  • The process of discovering hidden patterns and relations in large databases using a variety of advanced analytical techniques
  • Attempts to use the computer to discover relationships that can be used to make predictions
  • Data mining tools often find unsuspected relationships in data that other techniques would overlook
Term
What are some data mining approaches?
Definition
  • Rule-based analysis
  • Neural networks
  • Fuzzy logic
  • K-nearest neighbour
  • Genetic algorithms
  • Advanced visualisation
  • Combination of any of the above
Term
What is the data warehouse?
Definition
  • A subject-oriented, ingegrated, time-variant, non-volatile collection of data that is used primarily in organizational decision making
    • Subject oriented - The data is organized according to subject data instead of application and contains only the information necessary for 'decision support' processing
    • Integrated - Data encoding is made uniform (e.g. sex=f or m, 1 or 2, b or g. Data naming is made consistent)
    • Time variant - Data is collected over time and can then be used for comparisons, trends and forecasting
    • Non-volatile - The data is not updated or changed once in the data warehouse, but is simply loaded and then accessed. The data warehouse is held quite separately from the operational databse, which supports OLTP
Term
Why is it that a separate warehouse is needed?
Definition
  • Performance
    • Operational databases are optimized to support known transactions and workloads
    • Special data organization, access methods and implementation methods are needed
    • Complex OLAP queries would degrade performance for operational transactions
  •  Missing data
    • Decision support requires historical data, which operational databases do not typically maintain
  • Data consolidation
    • Decision support requires consolidation (aggregation summarization) of data from many heterogeneous resources, including operational databases and external sources
  • Data quality
    • Different sources typically use inconsistent data representations, codes and formats, which have to be reconciled
Term
Show a diagram on how data is extracted
Definition
[image]
Term
How would a data warehouse may be realised?
Definition
  • Via a front end to existend databases and files
  • In a fresh relational database
  • In a multi-dimensional database (MDDB)
  • In a proprietary databse format
  • Using a mixture of the above
Term
How can data from a data warehouse may be accessed?
Definition
  • Decision support systems (DSS)
  • Executive Information Systems (EIS)
  • Data mining
  • On-line analytical processing
Term
What are the characteristics of Data Marts?
Definition
  • Focus on either:
    • Only one subject area or
    • Only one group of users
  • An organization can have:
    • One enterprise data warehouse
    • Many data marts
  • Do not contain operational data
  • Are more easily understood and navigated
Term
What are the characteristics of Multidimensional analysis?
Definition
  • Need to examine data in various ways
  • Produce views of multidimensional data for users
    • Slice
    • Dice
    • Pivot
    • Drill Down
    • Roll Up
Term
What are the characteristics of slice in multidimensional analysis?
Definition
[image]
Term
What are the characteristics of dice in multidimensional analysis?
Definition
[image]
Term
What are the characteristics of pivot in multidimensional analysis?
Definition
[image]
Term
What are the characteristics of drill down in multidimensional analysis?
Definition
[image]
Term
What are the characteristics of roll up in multidimensional analysis?
Definition
[image]
Term
What are some internal aspects of data warehouses?
Definition
  • Schemas
    • Star schema
    • Snowflake schema
    • Fact constellation schema
  • Aggregated data
  • Specialised indexes
    • Bitmap indexes 
    • Join indexes
  • Specialised join methods
Term
Give an example of star schema
Definition
[image]
Term
Give an example of a fact table
Definition
[image]
Term
Show an example of a snowflake schema
Definition
[image]
Term
What are some applications/types of Data Warehouse Databases?
Definition
  • Relational and specialised RDBMSs
    • Specialised indexing techniques, join and scan methods
  • Relational OLAP (ROLAP) servers
    • Explicitly developed to use a relational engine to support OLAP 
    • Include aggregation navigation logic, the ability to generate multi-statements SQL, and other additional services
  • Multidimensional OLAP (MOLAP) servers
    • The storage model is an n-dimensional array
    • May use a 2-level approach, with 2D dense arrays indexed by B-trees
    • Time is often one of the dimensions
Supporting users have an ad free experience!