Shared Flashcard Set

Details

ch2
notes
44
Internet/New Media
Undergraduate 2
03/05/2013

Additional Internet/New Media Flashcards

 


 

Cards

Term
Data Warehouse
Definition
• A pool of data produced to support decision making
• A repository of current and historical data of potential interest to managers throughout an organization
Term
• Subject Oriented
- Data are organized by topics, such as sales, products, customers, etc.
- Best for providing a more comprehensive view of organization
- Not only how a business is operating, but why!
• Integrated
- Data from different sources are stored in a consistent format
- Must deal with naming conflicts and discrepancies among units of measure
• Time Variant (Time Series)
- Provides data at various points in time
• Nonvolatile
- Users cannot change the data once entered into the data warehouse
Definition
Four major characteristics of Data Warehousing and what each means:
Term
• Web based
- Designed to provide an efficient computing environment for Web-based applications
• Relational/multidimensional structure
• Client/server
- Used client/server architecture to provide easy access to end-users
• Real time
- Provides real-time data access and analysis capabilities
• Metadata
- Contains metadata (data about data) about how the data are organized and how to effectively use them
Definition
Additional characteristics of Data Warehousing:
Term
• Data Mart
- A subset of a data warehouse, typically consisting of a single subject area (marketing, operations, sales, customer satisfaction, etc.)
- Dependent Data Mart
o Created directly from the data warehouse
o Ensures that the user is viewing the same data available to all other users
o EDW must be constructed first
- Independent Data Mart
o A small data warehouse designed for a department or strategic business unit (SBU)
o Its source is not an EDW
• Operational Data Stores (ODS)
- Often used as an interim staging area for a data warehouse, especially for customer information files (CIF)
- Data are updated frequently throughout the course of business operations
- “Short-term memory” – stores only very recent information
- Consolidates data from multiple source systems and provides a near-real-time, integrated view of volatile, current data
• Enterprise Data Warehouses (EDWs)
- A large-scale data warehouse that is used across the enterprise for decision support
- Integrates data in standard format from many sources
Definition
Three main types of Data Warehouses and their definitions:
Term
• Data about data
• Describes the contents of a data warehouse (structure, meaning, syntax and the manner of its use)
Definition
The definition of Metadata and be able to identify examples:
Term
• Data Sources
- Transactional data (OLTP), web logs, external data (ex: census data), “legacy” systems (reference to outdated computer systems), etc.
• ETL (Extraction, Transformation, Load) Process
- Data are extracted from external data sources using custom ETL software, maintained in a staging area where transformed, cleansed, and integrated, then loaded into the Data Warehouse and/or data marts
• Comprehensive Database
- The Enterprise Data Warehouse (EDW) used to support all decision analyses
• Metadata
- Maintained so that it can be used by IT personnel and users
- Includes software programs with rules for organizing data that can be indexed and searched
• Middleware Tools
- Enable access to the data warehouse
- Front-end applications that users use to interact with data
o Data mining, queries, OLAP, predictive analyses, reporting and visualization tools
Definition
The major components of the Data Warehouse process and describe those components:
Term
• Three-Tiered Architecture
Definition
- Client Workstation (Tier 1) – allows the end user to request application functions which, in turn, requests data content
- Application Server (Tier 2) – responsible for execution of programs
- Database Server (Tier 3) – houses the database or data warehouse
- Advantages:
o Separates the application and database functions, allowing for greater capacity and performance of the respective servers
- Disadvantages:
o Increased cost due to more hardware requirements
Term
• Two-Tiered Architecture
Definition
- Client Workstation (Tier 1) – allows end-user to request both the application functions and data content from one server
- Application and Database Server (Tier 2) - run both on the same server (hardware platform)
- Advantages:
o More economical than three-tiered architecture
- Disadvantages:
o Can have performance problems for large data warehouses using data-intensive applications
Term
• Server
Definition
- Computer hardware that provides a specific service used by other companies
Term
• Application Server
Definition
- Computer hardware responsible for the efficient execution of procedures (programs)
Term
• Database Server
Definition
- Holds the database or data warehouse
- “Back-end”
Term
• Client Software
Definition
- Allows users to request a server’s content or function
- “Front-end”
Term
• Which database management system (DBMS) should be used?
- Most data warehouses are built using relational database management systems (RDBMS).
• Will parallel processing and/or partitioning be used?
- Parallel processing enables multiple CPUs to process data warehouse query requests simultaneously and provides scalability. Data warehouse designers need to decide whether the database tables will be partitioned (i.e., split into smaller tables) for access efficiency and what the criteria will be.
• Will data migration tools be used to load the data warehouse?
- Depending on the diversity and location of the data assets, migration may be relatively simple procedure or a months-long project. The results of a thorough assessment of the existing data assets should be used to determine whether to use migration tools, and if so, what capabilities to seek in those commercial tools.
• What tools will be used to support data retrieval and analysis?
- A decision has to be made on (i) developing the migration tools in-house, (ii) purchasing them from a third-party provider, or (iii) using the ones provided with the data warehouse system. Overly complex, real-time migrations warrant specialized third-party ETL tools.
Definition
The issues considered when deciding on the architecture to use:
Term
• Independent Data Marts
- Operate independently of each other to serve for the needs of individual organizational units
- Simplest and least costly
• Data Mart Bus
- Individual marts are linked to each other via some kind of middleware
- Consistency
- Kimball Approach
• Hub-and-Spoke (Corporate Information Factory)
- Includes a centralized data warehouse and several dependent data marts
- Inmon Approach
• Centralized Data Warehouse
- A gigantic enterprise data warehouse that serves for the needs of all organizational units
• Federated
- Involves integrating disparate systems
Definition
Alternative data warehouse architectures and their basic descriptions
Term
• Information interdependence between organizational units
• Upper management’s information needs
• Urgency of need for a data warehouse
• Nature of end-user tasks
• Constraints on resources
• Strategic view of the data warehouse prior to implementation
• Compatibility with existing systems
• Perceived ability of the in-house IT staff
• Technical issues
• Social/political factors
Definition
Ten factors that potentially affect the architecture selection decision
Term
architectures obtain similar acceptance ratings, while independent data marts were believed to be a poor solution, followed by the federate architecture
Definition
According to Ariyachandra and Watson (2006), Data Mart Bus, Hub-and-Spoke, and Centralized (EDW)
Term
central data warehouse architecture
Definition
TeraData Corporation supports the
Term
• Data Access
- The ability to access and extract data from any data source
• Data Federation
- The integration of business views across multiple data stores
- Exists when one application/one user interface is able to treat multiple data stores/sources as one entity
• Change Capture
- The process of capturing changes made at the data source and applying them throughout the entire enterprise
- Ensures data synchronicity (one version of the truth)
Definition
Three major data integration processes
Term
• A way of ensuring and maintaining data quality
• Extraction
- Reading data from one or more databases
• Transformation
- Converting data from their original form to whatever form the data warehouse needs
- Includes cleansing of the data to remove as many errors as possible, splitting up fields, recording, creating new variables with formulae, eliminating duplicates
- Transformation rules/procedures can be saved as metadata
• Load
- Putting the transformed data into the data warehouse
Definition
Extraction, Transformation, Load (ETL) Process
Term
• Define the plan (business objectives and strategies)
- Where the company wants to go and why, how to get there, and what it will do once it gets there
• Gather support from managers and end-users
• Set reasonable time frames and budgets
• Manage expectations
Definition
What must a company do to ensure a successful data warehouse implementation process?
Term
• Financial strength
• How well it links to EPR (Enterprise Resource Planning) systems
- EPR: a computer system used to manage internal and external resources, including tangible assets, financials, materials and human resources
• Qualified consultants
• Market share
• Industry experience
• Established partnerships
• Product functionality
• Customer references
Definition
Criteria for selecting a data warehouse vendor
Term
enterprise data warehouse
Definition
The Inmon model, the Kimball model, and that both ultimately result in an
Term
• The Inmon Model: The EDW Approach
Definition
• The Inmon Model: The EDW Approach
- Created by Bill Inmon (father of DW)
- “top-down approach”
o The enterprise as one data warehouse and data marts are created from the data warehouse
Term
• The Kimball Model: The Data Mart Approach
Definition
- “bottom-up approach”
o The data warehouse is an aggregate of all data marts across the enterprise
o The data warehouse is built one data mart at a time
- A data mart is subject-oriented and focuses on the departmental level
- “Plan big, build small”
Term
• Companies can completely outsource all DW efforts and hire a company that provides a hosted data warehouse. That company develops and maintains the data warehouse.
• There are however security and privacy concerns
Definition
Additional data warehouse development considerations
Term
The Star Schema
Definition
• Dimensional modeling design
- Dimensional modeling is a retrieval-based system that allows for querying data tables, usually in terms of location, time period, etc.
• Consists of a central FACT table, surrounded by several DIMENSION tables
- FACT table
o Data to be analyzed
o Contains foreign keys (FK) to be linked to the dimension tables
- DIMENSION tables
o Contain classification information used to define how your facts/data are to be aggregated or summarized (by location, time period, store, type of product, etc.)
o One-to-many relationships with rows in the FACT table
Term
• Slice
- A subset of a multidimensional array corresponding to a single value on ONE of the dimensions
- Usually a two-dimensional representation of a 3-dimensional cube
• Dice
- A slice on more than two dimensions (slicing the slice of a slice)
• Drill Down/Up
- Technique where the user navigates among levels of data ranging from the most summarized (up) to the most detailed (down)
• Roll Up
- Involves computing totals across all levels of a dimension (or multiple dimensions)
• Pivot
- Changing the dimensional orientation of a cube (ex: rotating a cube)
Definition
Basic OLAP operations
Term
Granularity
Definition
• The highest level of detail
• Example:
- Sales by region, vs. sales by state, vs. sales by store, vs. sales by department, vs. sales by sales rep
Term
• Starting with the wrong sponsorship chain
• Setting expectations that you cannot meet
• Engaging in politically naïve behavior
• Loading the data warehouse with information just because it is available
• Believing that data warehousing database design is the same as transactional database design
• Choosing a data warehouse manager who is technology oriented rather than user oriented
• Focusing on traditional internal record-oriented data and ignoring the value of external data and of text, images, and, perhaps, sound and video
• Delivering data with overlapping and confusing definitions
• Believing promises of performance, capacity, and scalability
• Believing that your problems are over when the data warehouse is up and running
• Focusing on ad hoc data mining and periodic reporting instead of alerts
Definition
Data warehouse implementation issues
Term
Scalability
Definition
• The ability to accommodate increases in activity/volume without major changes to the process efficiency
Term
• Real-time (Active) Data Warehousing
- The process of loading and providing data via the data warehouse as they become available
- Data used for decision making data are updated on an ongoing basis as business transactions occur
• Evolutionary Process of Increased DW Requirements
- The data warehouse reports what happened in terms of reports
o Ex: daily sales report
- DW allows for analyzing WHY something happened
o Ex: sales dropped because of defective billiards tables
- As business needs increase, decisions may be made to collect more data that allow for predicting WHAT WILL HAPPEN
- Technology and systems are put in place to allow for answering “what IS happening NOW”
Definition
Description of real-time (active) data warehousing and the evolutionary process of the increased DW requirements
Term
• Maintains the data warehouse
• Skills:
- Familiarity with high-performance hardware, software and networking technologies, since a data warehouse is based on those
- Solid business insight, to understand purpose of the DW and its business justification
- Familiarity with business decision-making processes to understand how the DW will be used
- Excellent communication skills, to communicate with the rest of the organization
Definition
Definition and skills of the data warehouse administrator (DWA)
Term
• Establishing effective corporate and security policies and procedures
• Implementing logical security procedures and techniques to restrict access, including user authentication, access controls, and encryption
• Limiting physical access to the data center environment
• Establishing an effective internal control review process for security and privacy
Definition
Four main areas to consider when ensuring effective security of a data warehouse
Term
• Problem: as their business grew, DirecTV faced the challenge of accommodating such a large data volume and rapidly changing market conditions. Management wanted reports that could be used for measuring and maintaining customer service, attracting new customers, and preventing customer churn. Also, they wanted to reduce the resource load that its current data management system imposed on its CPUs. They implemented a data warehouse from which information had to be pulled from every night, this process was very long and was straining the system.
• Solution: The companies’ main two goals were to send fresh data to the call center in a very small period of time (less than 15mins), and simplify changed data capture to reduce the amount of maintenance. They decided to implement the GoldenGate integration system.
• Results: It allowed measuring the churn rate in real time, so they used that data to target specific customers in order to reduce the churn rate. They started contacting customers who had just discontinued their service and get them back, also they started sales campaigns that could target specific customers for retention and prioritize them for special offers. They also received information on customer service calls, in order to manage the service and make it more efficient. Real-time call-center reports were used by management to compare daily call volumes.
Definition
DirecTV, Opening Vignette:
Term
• Problem: old system which did not keep up with user demand, consisted of access and excel, manual process of analyses and trending, time was long, error rate was high and low accuracy
• Solution: Microsoft BI –an end-to-end solution, focused on data and analysis of data, user now has instant response and they can pull data down out of the cubes and data warehouse, everybody in the company was accustomed to Excel and this is a common user interface, gives them one version of the truth, delivering the right information to the right people
• Benefits: flexible and self-service, information is more accurate, saves time, saves money (“Big Picture” of analyzed data), helped their bottom line
Definition
Premier Bank Card (hamburger)
Term
• Problem: many calls to complete rentals, needed better customer service, a lot of info from insurance
• Solution: made it faster, single version of the truth, all information together
Definition
Enterprise Rental Video (DW is built by Teradata Corp)
Term
• Problem: challenge to grow into an integrated, solution-oriented business structure with a global focus
• Solution: Teradata EDW system
• Results: EDW drove huge organizational and process changes, reduced cost, improved process efficiency and used to drive growth-focused goals for the global organization
Definition
Enterprise Data Warehouse Delivers Cost Savings and Process Efficiencies for NCR
Term
• Problem: $60 million lost, traditional banking system
• Solution: Transformed from a traditional banking system to one that was centered on CRM, by using the VISION data warehouse, which stores information about customer behavior, products they used, buying preferences.
• Results: VISION provided the information about profitable and non profitable customers, retention strategies, lower-cost distribution channels, strategies to expand customer relationships, redesigned information flows. First American achieved a revolutionary change, moving itself into the “sweet 16” of financial corporation services.
Definition
First American Corporation:
Term
• Problem: improve consistency, transparency and accessibility of management information and BI
• Solution: came up with Enterprise Data Warehouse, very complex
• Results: consistent, easier, faster, more flexible reporting, improved ability to respond intelligently to new business opportunities
Definition
BP Lubricants Achieves BIGS Success (Case 2.3), page 46.
Term
• Problem: Competitive pressures and consumer demand, coca cola needed to make its vending machines more profitable
• Solution: HCCBD data warehouse and analytical software implemented by Teradata, collects near-real time data from vending machines (viewed as a store). Put on a wireless network instead of using a modem, to gather near0real time point of sales (POS) data. → Forecasted demand and identified problems quickly
• Results: Total sales increased by 10%, costs reduces by 46%, each salesperson was able to service up to 42% more vending machines.
Definition
Hokuriku Coca-Cola Bottling Company:
Term
• Problem: There was a need for analytical data in the company that led to the creation of many data marts. Data silos were very expensive to design and maintain.
• Solution: Hewlett-Packard planned to consolidate its 762 data marts around the world into one single EDW. In order to gain a superior sense of its own business and to determine how best to serve its customers.
• Results: in 2006 consolidated the data marts into the new data warehouse, all the disparate data marts will ultimately be eliminated.
Definition
Hewlett-Packard Company (case 2.5)
Term
• Problem: insurance company grew and did not have an integrated data management system
• Solution: a developed standardized into assets in EDW / AXIS using best breed approach
• Results: Hob-and-spoke architecture (picture)
Definition
A Large Insurance Company Integrates Its Enterprise Data with Axis (Case 2.6)
Term
• Problem: system needed to be updated to address latency issues
• Solution: CDW refreshed in a near real time
• Results: sales and marketing campaigns developed in minutes, faster decision making about specific customers and customer classes
Definition
Egg Pic Fries the Competition in Near Real Time (Case 2.7), page 66.
Term
• Problem: In 1995 continental was in deep financial trouble. It had filed bankruptcy twice and was going to file the third one. This reflected on its performance, low percentage of on-time departures, frequent baggage arrival problem, and customers turned away due to overbooking (in 2006 was the 5th largest airline in the United States
• Solution: 1994 Gordon Bethune became CEO initiated the Go Forward Plan 4 interrelated parts to be implemented simultaneously; in order to improve customers valued performance and better understanding their needs and perceptions. 1999 decided to integrate everything into one single data warehouse (IT, revenue, and operational data sources) Continental moves real-time data (to-minute, hourly) about customers, reservations, and operations includes: revenue, customer relationship, crew operations and payroll, security and fraud, flight operations)
• Results: Continental became a leader in the real-time BI; eliminated$7million in fraud, reduced costs by $41 million, with a $30 million investment in hardware over 6 years, continental had reached over $500 million in increased revenues and reduced costs.
Definition
Continental Airlines:
Supporting users have an ad free experience!