Term
|
Definition
- On-line transaction processing
- Traditional workloads, 'bread and butter' processing
- Volumes of data, transactions grow, networks getting larger
|
|
|
Term
|
Definition
- On-line analytical processing
- Includes the use of data warehouses
- Multidimensional databases
- Data analysis
- The name given to the dynamic enterprise analysis required to create, manipulate, animate and synthesise information from exagetical, contemplative and formulaic data analysis models (exagesis=critical explanation, complentative=asking what if questions, formulaic=which parameters must be varied in order to achieve a given outcome)
|
|
|
Term
| What are the 12 rules of OLAP? |
|
Definition
- Multidimensional conceptual view
- Transparency
- Accessibility
- Consistent reporting
- Client server architecture
- Generic dimensionality
- Dynamic sparse matrix handling
- Multi-user support
- Unrestricted cross-dimensioal operations
- Intuitive data manipulation
- Flexible reporting
- Unlimited dimensions and aggregation levels
|
|
|
Term
| What are the key concepts of Data Mining? |
|
Definition
- The process of discovering hidden patterns and relations in large databases using a variety of advanced analytical techniques
- Attempts to use the computer to discover relationships that can be used to make predictions
- Data mining tools often find unsuspected relationships in data that other techniques would overlook
|
|
|
Term
| What are some data mining approaches? |
|
Definition
- Rule-based analysis
- Neural networks
- Fuzzy logic
- K-nearest neighbour
- Genetic algorithms
- Advanced visualisation
- Combination of any of the above
|
|
|
Term
| What is the data warehouse? |
|
Definition
- A subject-oriented, ingegrated, time-variant, non-volatile collection of data that is used primarily in organizational decision making
- Subject oriented - The data is organized according to subject data instead of application and contains only the information necessary for 'decision support' processing
- Integrated - Data encoding is made uniform (e.g. sex=f or m, 1 or 2, b or g. Data naming is made consistent)
- Time variant - Data is collected over time and can then be used for comparisons, trends and forecasting
- Non-volatile - The data is not updated or changed once in the data warehouse, but is simply loaded and then accessed. The data warehouse is held quite separately from the operational databse, which supports OLTP
|
|
|
Term
| Why is it that a separate warehouse is needed? |
|
Definition
- Performance
- Operational databases are optimized to support known transactions and workloads
- Special data organization, access methods and implementation methods are needed
- Complex OLAP queries would degrade performance for operational transactions
- Missing data
- Decision support requires historical data, which operational databases do not typically maintain
- Data consolidation
- Decision support requires consolidation (aggregation summarization) of data from many heterogeneous resources, including operational databases and external sources
- Data quality
- Different sources typically use inconsistent data representations, codes and formats, which have to be reconciled
|
|
|
Term
| Show a diagram on how data is extracted |
|
Definition
|
|
Term
| How would a data warehouse may be realised? |
|
Definition
- Via a front end to existend databases and files
- In a fresh relational database
- In a multi-dimensional database (MDDB)
- In a proprietary databse format
- Using a mixture of the above
|
|
|
Term
| How can data from a data warehouse may be accessed? |
|
Definition
- Decision support systems (DSS)
- Executive Information Systems (EIS)
- Data mining
- On-line analytical processing
|
|
|
Term
| What are the characteristics of Data Marts? |
|
Definition
- Focus on either:
- Only one subject area or
- Only one group of users
- An organization can have:
- One enterprise data warehouse
- Many data marts
- Do not contain operational data
- Are more easily understood and navigated
|
|
|
Term
| What are the characteristics of Multidimensional analysis? |
|
Definition
- Need to examine data in various ways
- Produce views of multidimensional data for users
- Slice
- Dice
- Pivot
- Drill Down
- Roll Up
|
|
|
Term
| What are the characteristics of slice in multidimensional analysis? |
|
Definition
|
|
Term
| What are the characteristics of dice in multidimensional analysis? |
|
Definition
|
|
Term
| What are the characteristics of pivot in multidimensional analysis? |
|
Definition
|
|
Term
| What are the characteristics of drill down in multidimensional analysis? |
|
Definition
|
|
Term
| What are the characteristics of roll up in multidimensional analysis? |
|
Definition
|
|
Term
| What are some internal aspects of data warehouses? |
|
Definition
- Schemas
- Star schema
- Snowflake schema
- Fact constellation schema
- Aggregated data
- Specialised indexes
- Bitmap indexes
- Join indexes
- Specialised join methods
|
|
|
Term
| Give an example of star schema |
|
Definition
|
|
Term
| Give an example of a fact table |
|
Definition
|
|
Term
| Show an example of a snowflake schema |
|
Definition
|
|
Term
| What are some applications/types of Data Warehouse Databases? |
|
Definition
- Relational and specialised RDBMSs
- Specialised indexing techniques, join and scan methods
- Relational OLAP (ROLAP) servers
- Explicitly developed to use a relational engine to support OLAP
- Include aggregation navigation logic, the ability to generate multi-statements SQL, and other additional services
- Multidimensional OLAP (MOLAP) servers
- The storage model is an n-dimensional array
- May use a 2-level approach, with 2D dense arrays indexed by B-trees
- Time is often one of the dimensions
|
|
|