Shared Flashcard Set

Details

Database Management
Database Management
69
Other
Graduate
05/03/2010

Additional Other Flashcards

 


 

Cards

Term

 

 

 

Define what a transaction is

Definition

 A transaction is a logical unit of work.  A sequence of actions carried out by a single user or process that constitute some well defined business activity.

Term

Define each of the ACID properties of transactions

 

Atomicity

Consistency

Independence/Isolation

Durability

Definition

Atomicity - A transaction MAY NOT BE subdivided.  ALL steps that are part of a transaction are completed or none are. "ALL OR NOTHING"

Consistency - A transaction must transform the database form one valid state to another (or abort).  "VALID STATE TO ANOTHER"

Independence/Isolation - A transaction's behavior is not affected by other transacttions running concurrently.  (Partial results of transactions in progress are not visible to other transactions.) "BY ITSELF. NOT SEEN BY OTHERS UNTIL DONE"

Durability - once a transaction is commmitted, its results should survive, even if there is subsequent system failure. "IT LASTS AFTER FAILURE IF COMMITTED"

Term

Describe mechanisms available in the DBMS for preserving each of the ACID properties.

Definition
 Atomicity - Commit and Rollback.  Either a transaction is committed or rolled back . No middle ground.
Consistency - Grouping multiple SQL statements together into a single atomic transaction. Rely on referential integrity with cascading update and denial of permission to delete.
Isolation - Locking.  ie: remove shared resources for the duration of the transaction.  There are many types of locks.  Implicit and explicit are set by the DBMS or user command, respectively. Exclusive and shared locks.
Durability - Database Recovery - A Recovery Manager DBMS subsystem utilizes Transaction Logs and Checkpoint Facilities to restore what was lost  due to system failure.   
 
Term

Analyze the core issues with the lost-update problem, the uncommitted dependency problem, and the inconsistent analysis problem.  Provide a solution to each.

 

FYI - all three of these are known as concurrency problems, which happen when more than one transaction is tried at the same time.

 

answers - pg 575 and 588

Definition
Lost- Update problem -Core Issue:  An apparently successfully completed update operation by one user can be overridden by another user.
Solution:Exclusive locking (2PL - two phase locking)
 
Uncommitted dependency problem -Core Issue: (also known as a "dirty read")  occurs when one transaction is allowed to see the intermediate results of another transaction before it has committed.
Solution: Exclusive locking (2PL - two phase locking)

Inconsistent Analysis Problem -Core Issue: when a transaction reads several values from the database, but a second transaction updates some of them during the execution of the first. For example, a transaction that is summarizing data in a database will obtain inaccurate results if, while it is executing, other transactions are updating the database.
Solution: Exclusive locking (2PL - two phase locking)
Term

 

 

 

 

Describe the role played by the concept ‘serializability’ in managing concurrent execution of transactions.

 

"I" refers to the I in ACID

Definition
Independency/Isolation can be guaranteed if one transaction commits before the next one begins.  (They execute IN SERIES).

Serializability guarantees that concurrently executing transactions will leave the database in the same state as though they had executed IN SERIES.

Term

 

 

 

 

Compare optimistic and pessimistic concurrency control schemes.

Definition
Pessimistic Schemes seek to prevent transactions from conflicting  with each other  ie: more locking "Lock It Up!!!"


Optimistic Schemes permit transactions to execute, checking for conflict only at commit time.  (results in some aborts or reruns).
Term

 

 

 

 

 

Contrast shared and exclusive locks

Definition
Shared locks allow you to read the item but not update it.

Exclusive locks allow you to read and  update the item.
Term

Distinguish between different types of failure as described in the lectures or text.  Identify:

  • what is lost, and
  • what is needed to recover from each.

FYI - this seems like a really good test question.

Questionable parts in red.

Definition
Local Failure - A single transaction fails. The transaction is lost. To recover - redo the transaction??
System failure - Main memory Lost - We lose in-progress transactions & their data.  RAM buffers which may contain data form committed transactions. The challenge is to recover committed transactions (durability) and rollback uncommitted transactions. You need a Recovery Manager, Transaction logs and Checkpoint Faciilities to survive a system failure.  
Media Failure - Secondary Storage lost (drive failure, etc) - Same as for system failure plus Full backups (copy of entire database), Incremental backups ( copy of parts of db that have changed) and partial backups (copies of individual pieces).
Incorrect Data Entry -the transaction will fail. To recover - redo the transaction??
Term

 

 

 

 

It is the role of the recovery manager to guarantee two of the four ACID properties of transactions:


which two?

Definition

If a failure occurs between writing to the buffers and flushing the buffers to sec-ondary storage, the recovery manager must determine the status of the transaction that performed the write at the time of failure. If the transaction had issued its commit, then to ensure durability the recovery manager would have to redo that transaction’s updates to the database ( also known as rollforward). On the other hand, if the transaction had not committed at the time of failure, then the recovery manager would have to undo ( rollback) any effects of that trans-action on the database to guarantee transaction atomicity.

 

Term

 

 

 

Checkpoint

Definition

The information in the log file is used to recover from a database failure. One difficulty with this scheme is that when a failure occurs we may not know how far back in the log to search and we may end up redoing transactions that have been safely written to the database. To limit the amount of searching and subsequent processing that we need to carry out on the log file, we can use a technique called checkpointing.

 

Checkpoint The point of synchronization between the database and the transaction log file. All buffers are force- written to secondary storage.

 

Checkpoints are scheduled at predetermined intervals and involve the following operations: • writing all log records in main memory to secondary storage; • writing the modified blocks in the database buffers to secondary storage; • writing a checkpoint record to the log file. This record contains the identifiers of all transactions that are active at the time of the checkpoint.


Term

 

 

 

Backup

Definition

The DBMS should provide a mechanism to allow backup copies of the database  to be made at regular intervals without necessarily having to stop the system first.

 

  • The backup copy of the database can be used in the event that the database has been damaged or destroyed.
  • A backup can be a complete copy of the entire database or an incremental backup, consisting only of modifications made since the last complete or incremental backup.
Term

 

 

 

Volatile Storage

Definition

 

 

Volatile storage: Storage that can be destroyed. Main memory is volatile storage that usually does not survive system crashes.

Term

 

 

 

 

 

Non Volatile Storage

Definition

 

Non Volatile Storage: Storage that is kept separate from the main system (disks or tape), and therefore cannot be destroyed during a system crack. Also referred to as secondary storage.

Term

 

 

 

 


Transaction Log

Definition

Transaction Log: special file called a log ( or journal) that contains information about all updates to the database. The log may contain the following data:

• Transaction records, containing:

transaction identifier;

type of log record ( transaction start, insert, update, delete, abort, commit);

identifier of data item affected by the database action ( insert, delete, and update operations);

before- image of the data item, that is, its value before change ( update and delete operations only);

after- image of the data item, that is, its value after change ( insert and update operations only);

log management information, such as a pointer to previous and next log records for that transaction ( all operations).

The log is also used for purposes other than recovery ( for example, for performance monitoring and auditing).

Term

 

 

 

 

 

Recovery Manager

Definition

 Allows the system to restore the database to a consistent state following a failure.

 

It is the role of the recovery manager to guarantee two of the four ACID properties of transactions, namely atomicity and durability, in the presence of failures. The recovery manager has to ensure that, on recovery from failure, either all the effects of a given transaction are permanently recorded in the database or none of them are.

Term

 

 

 

 

 

Roll forward (redo)
Definition

If a failure occurs between writing to the buffers and flushing the buffers to secondary storage, the recovery manager must determine the status of the transaction that performed the write at the time of failure.


If the transaction had issued its commit, then to ensure durability the recovery manager would have to redo that transaction’s updates to the database ( also known as rollforward).


  • It all depends on whether or not that database has issued the commit.
Term

 

 

 

 

 

Rollback (undo)

Definition
If the transaction had not committed at the time of failure, then the recovery manager would have to undo (rollback) any effects of that transaction on the database to guarantee transaction atomicity.
Term

 

 

 

Describe what must be done to recover a given transaction, given information about when it started and what its state was at the time of failure.

Definition
  1. Find the most recent backup.
  2. Find the most recent checkpoint up to the time of failure.
  3. Roll back any transactions in process at the time of the failure to preserve the atomicity.
  4. Roll forward any committed transactions at the time of failure to preserve durability.
Term

 

 

 


Significance of ODBC

 

Definition

Significance of ODBC: There are problems with connecting to remote databases.

  • How do you identify and establish connection?
  • How do client and server ensure that their messages are correctly interpreted?
  • What if you need to change the DBMS? Will the client and the server still connect?

ODBC provides a standard API for accessing data in both relational and non relational DBMS. It also shields the user from the complexities of underlying database and communications software needed to access the data.


It provides a common platform that shields the user from complexity.

Term

 

 

 

 

 

Architecture of ODBC 1

Definition
[image]
Term

 

 

 

 

 

Architecture of ODBC 2

Definition
[image]
Term

Compare and contrast data warehouses with on-line transaction processing (OLTP) systems. (I)


FYI: This looks like an excellent test question.

 

Definition

OLTP captures data and stores it in a database. It happens everywhere data originates. (POS systems, ticket reservations, etc). Highly repetitive actions (shoppers in a grocery line). Compact transactions. System must be fast accurate available reliable. OLTP are process oriented. OLTP are nicely normalized, with many tables. However:

  • SQL formulation issues
  • difficulty with time based queries
  • aggregation performance problems

Data warehouses are subject oriented. Data warehouses bring together all the information on a single subject, in order for decision makers to recognize patterns and support decision making.

Term

Compare and contrast data warehouses with on-line transaction processing (OLTP) systems. (II)

 

Excellent test question, part II.

Definition
[image]
Term

 

 

 


Identify shortcomings of OLTP systems in light of the requirements of decision-support systems.

Definition
  • Supporting decision support  on OLTP systems takes away resources from OLTP. (System is busy keeping up with the day's data.)
  • OLTP systems do not present an integrated/consistent view of the business. (Usually only show one piece of the pie.)
  • OLTP systems do not store data in a way that reflects the way decision-makers think about data. (A day's work in time order).
Term

 

 

Define ‘data warehouse’.   Be able to explain each of the adjectives in Bill Inmon’s classic definition of a data warehouse.

Definition

"The queriable source of data in the enterprise".

"A subject-oriented, integrated, time-variant, and non-updatable collection of data in support of management's decision making processes and business intelligence."

  • Subject-oriented - Organized around subjects (customer, product) not processes.
  • Integrated -Data originates inseveral source systems.
  • Time-variant- Time is captured explicitly in a time dimension.
  • Non-updatable collection of data - Users do not update values.
Term

 

 

 

 

Identify some of the risks/potential problems with data warehouses and data warehouse projects.

Definition

 

Enterprise systems touch a lot of different kinds of data. In addition, turf wars can start with the managers of different departments - whose data is it, anyway? It can be near impossible to build a single, monolithic, all encompassing data warehouse for an organization. Also, there is homogenization of the data and end user demand to deal with. The data staging phase alone accounts for 60-80% of the effort involved in creating a data warehouse.

Term

 

 

 

Explain what a conformed dimension is and the role it plays in data warehouse design.

Definition

Conformed dimension: “A dimension that means the same thing with every possible fact table to which it can be joined.”

 

To use a conformed dimension is to define a data dimension so that it can be used with any other data mart that may use the subject area. If the dimension is a customer, then we want one definition of customer.

Term

 

 

 

 

 

Define what a star schema is.

Definition

 

 

Star schema: A dimensional data model that has a fact table in the center, surrounded by denormalized dimension tables.

 

The star schema exploits the characteristics of factual data such that facts are generated by events that occurred in the past, and are unlikely to change, regardless of how they are analyzed.

Term

 

 

 

 

 

Fact tables

Definition

Fact tables: A fact is “an observation in the marketplace”

  • Each occurrence of a fact table represents an indicator of the business
  • Non-key attributes are functionally dependent on the entire (composite) key
  • Fact tables represent a many-to-many relationship between dimension tables.
  • A single ER diagram, when converted to a dimensional model, may have several fact tables
  • Within a data warehouse, a query typically involves only one fact table at a time.
Term

 

 

 

 

 

Dimension tables

Definition

Dimension tables contain highly correlated attributes, which describe properties of “business things”

  • A principal source of constraints in a query
  • Dimension tables do not need to be normalized. (denormalized)
  • Use an artificial key for dimensions, and never relying on the keys from the legacy systems.
  • Dimension tables usually contain one or more hierarchies (e.g. Brand--sub-category--category)
Term

 

 

 

 

 

The time dimension

Definition

The time dimension: A distinctive feature of data warehouses

  • Each occurrence is a single day
  • The attributes make aggregating by any portion of a calendar very easy
  • Almost all data warehouses have a time dimension
  • Applications need not keep track of the calendar.
Term

 

 

 

 

 

Design a data warehouse using the steps discussed in class or in your text.

Definition

predesign - gathering informal requirements as a set of questions. this set of questions does not freeze at this point.

1. Understand the initial set of questions the DW should be able answer (ID marts and dimensions, conform dimensions)
2. choose the mart - just start with one
3. declare the (whole) grain (fact table) - For the data mart, we will develop a selection of star schemas. At the heart of the star schema lies the fact table. Each fact table will be of a particular grain.
4. choose the dimensions (dimension tables)
5. choose the facts (fact tables)
Term

 

 

 

Recognize violations of the principle that all facts in a fact table must be at the same level of granularity.

Definition
In a  line item fact table, each of the facts should describe a part of the transaction. What if you were to include order total? You'd get a lot of redundancy, and if you were to aggregate, you would get a meaningless answer.

Facts must have the same grain as the fact table. It's too easy to make calculations on the wrong column. Item total and order total would never be in the same table. Match the fact table.
Term

 

 

 


Compare and contrast dimensional modeling with entity-relationship modeling.

Definition

An ER diagram is used to build a typical OLTP database. ER models describe relationships among entities, and remove redundancy in data. Since OLTP processes need to be fast available and accurate, this is very beneficial. 


Dimensional models are typically used to build data marts, or data warehouses. It supports intuitive and high-performance retrieval of data, where OLTP supports intuitive organization and storage of data. Dimensional models support ad hoc queries from management, supporting decision making processes. OLTP efficiently stores the information on which the decision is made. The key to understanding the relationship between dimensional models and ER models is that a single ER model normally decomposes into multiple DMs.

Term

 

 

 

 

 

Describe what ETL tools do

Definition
  • Extract: Extract data from transaction systems to staging area
  • Transform: Remove duplicates, conform codes, fill in missing data, prune unnecessary data, reconcile data types & formats, calculate new value, etc. (this is the most difficult stage)
  • Load: Data moved from staging area to data warehouse


ETL Issues - heterogeneity, complexity, time, scale

Term

 

 

 

 

 

Compare and contrast ‘data warehouse’ with ‘data mart’.

 

Definition

Data warehousing: A subject- oriented, integrated, time- variant, and nonvolatile collection of data in support of management’s decision making process.

 

Data mart: A database that contains a subset of corporate data to support the analytical requirements of a particular business unit ( such as the Sales department) or to support users who share the same requirement to analyze a particular business process ( such as property sales).

Term

 

 

 

 

 

Be able to define what we mean by the grain of a fact table.

Definition

The grain of the fact table determines which facts can be used in the dimensional model.

 

All the facts must be expressed at the level implied by the grain. In other words, if the grain of the fact table is an individual property sale, then all the numerical facts must refer to this particular sale.

 

Also, the facts should be numeric and additive.

Term

 

 

 

 

Example of a badly done fact table, with errors called out. Why are these errors bad?

Definition
[image]
Term

 

 

 

 

Explain what a multi-dimensional view of data is and how this applies to the subject of data warehousing and OLAP

Definition

OLAP is technology that uses a multidimensional view of aggregate data to provide quick access to information for the purposes of analysis.


Multidimensional data are facts (numeric measurements such as sales data) and the association of this data with dimensions (such as location, time, branch, property type, etc)


Multi-dimensional structures are often visualized as cubes of data, and cubes within cubes. Every cell represents the indicator for particular combinations of dimensions. Matrix and tables can be used too.

Term

 

 

 

 

Distinguish between the different types of business intelligence (e.g. OLAP, dashboards, etc.)

 

OLAP, analytics, alerts and events, dashboard

Definition

OLAP: multidimensional data analysis technique, advanced database support, easy to use interfaces


Analytics: Techniques to discover patterns and trends in large databases (Statistical analysis, data minng, Modeling and forecasting), descriptive and predictive analytics


Alerts and Events: define event conditions of interest, respond as appropriate (email alerts, initiate process, etc)


Dashboard: provide “at-a-glance” overview of the key performance indicators or business, using sophisticated data visualization techniques. Interactive. Designed for top-level decision makers.

Term

 

 

 

Compare and contrast the various ways in which a multi-dimensional view may be represented in a database

 

Relational table, Matrix, 3D data cube, series, lattice

Definition

Relational table – can be used for representation of two or three -dimensional  data, rather complicated


  Matrix – is a more natural way or representation of 2 dimensional data


Three dimensional data cube- more natural way of representation of 3-dimensional data. The facts are represented by the cell of data cube and each cell is ID'd by the intersection of the values held by each dimension.


Series of three –dimensional cubes for representation of n-dimensional data


A lattice of cuboids – with each cuboid representing the subset of given dimensions ( p.1205) Does not show dimensional hierarchies.

Term

 

 

 

 

 

MOLAP

 

what's the M stand for?

Definition

 

 

MOLAP -  

·         Data loaded into multidimensional structure

·         Data is typically pre aggregated and stored  to enhance performance.

·         Limited amount of data can  be stored and used efficiently.

Term

 

 

 

 

 

ROLAP

 

what's the R stand for?

Definition

 

ROLAP

·         Uses SQL variants to query traditional database

·         Creates a multidimensional view through metadata layer

·         stores much more data than MOLAP

·         Some use enhanced SQL servers, and others use highly denormalized database design to increase performance.

Term

 

 

 

 

 

Hybrid OLAP

 

ROLAP + MOLAP

Definition

Hybrid OLAP

·         Data loaded into relational database (like ROLAP)

·         Aggregations are stored in multidimensional data structure

(like MOLAP)

·         Promotes speed of access and access to large volumes of data

·         Only limited amount of data can be efficiently maintained in MOLAP structrure.

Term

 

 

 

 

 

Desktop OLAP

 

a desktop is something a client would have

Definition



Desktop OLAP

·         store OLAP data in client-base files and support multidimensional processing using a client multidimensional engine.

Term

 

 

 

 

 

Roll Up

 

4D to 3D aggregation

Definition

Roll-up:

 

performs the aggregation of data either by moving up in dimensional hierarchy or by dimensional reduction

 

(view four-dimensional sales data as three-dimensional)

(location, time type, office – possible dim)

Term

 

 

 

 

 

Drill Down

 

 

Definition

Drill down

 

involves revealing the detailed data in the form of aggregated data either by moving down in  the dimensional hierarchy  or  by dimensional introduction

 

(viewing 3 dim data as 4-dim)

 

Going deeper into the data - finding out more by adding dimensions

Term

 

 

 

 

 

Slice and Dice

Definition

 

 

Slice – a selection of one dimension of the data ( all of  the sales revenue for the type “Flat” or property)

 

Dice  operations performs a selection on two or more dimensions ( all the sales for type = “Flat” and time = “Q1”

Term

 

 

 

 

Describe some extensions to the SQL standard to accommodate OLAP requirements.

Definition

Extended grouping capabilities

·         ROLLUP

-          Supports aggregations (sum, count, max, min, avg)

-          Includes multiple levels of subtotals and a grand total

-          Most useful for aggregating up a hierarchy (time, geography)

·         CUBE

- Takes a group of columns and computes all of the subtotals for all possible combinations


Extended OLAP operations:

·         Ranking functions

-          Rank() over (order by columnList)

-          Dense_Rank() OVER (ORDER BY columnList)

·         Windowing Calculations

-          Moving Averages

-          LAG/LEAD – (compare this year’s results against last years’s)

Term

 

 

 

 

Define the problems that XML is designed to solve and explain how XML solves those problems.

 

Likely test question.

Definition
Problems:
Semantic Web
Data Interchange
Semi-structured Data
Solutions:
XML adds structure to the content. All operating systems can read XML.  The structure and the meaning of the data is embedded into the data. (He repeats the separation  of content, presentation, structure and behavior as a data interchange need.)
Term

 

 

 

 

Define semi-structured data and discuss the role it plays in today’s world.

Definition

Semi-structured data: Data that may be irregular or incomplete and have a structure that may change rapidly or unpredictably. It has some structure, but the structure may not be rigid, regular or complete, and generally data does not confirm to a fixed schema. The information that is normally associated with schema, is contained in data  itself.

Semistructured data has gained importance recently for the following reasons:

1.      It may be desirable to treat a WWW like a database, but we cannot constrain  these sources within a schema;

2.      It may be desirable to have a flexible format for data exchange between disparate databases;

The emergence of XML as the standard for data representation and exchange on the Web, and the similarity between XMl documents and semistructured data.
Term

 

 

 

 

Be able to identify violations of the rules that govern the definition of XML tags.

Definition

 

 

Tags must be paired.  AND nested tags must not cross each other.  ie: Only COMPLETE sets of tags may be within each other.  You cannot have two start tags then two completion tags.  [image]

 

If unsure, draw brackets and make sure the brackets don't cross.

 

 

Term

 

 

 

 

Be able to identify violations of the rules that govern the definition of XML tags.

Definition

 

 

Tags must be paired.  AND nested tags must not cross each other.  ie: Only COMPLETE sets of tags may be within each other.  You cannot have two start tags then two completion tags.  [image]

 

If unsure, draw brackets and make sure the brackets don't cross.

 

 

Term

 

 

 

 

Distinguish between the terms well-formed and valid as they apply to XML documents.

 

Do we need some example code for this?

Definition
“A document is well-formed” means it conforms to general XML conventions
“A document is valid” means its structure conforms to the specific structure definition. (the document map)

Document Type Declarations (DTD) are one way to define a valid structure, XML Schema is a second way.

FOR EXAM, know the difference between well formed documents and Valid documents.  One can be well formed but not be valid!!
Term




Explain the purpose of DTD, XSL, XSD (XML Schema) documents in the world of XML.
Definition

 DTD is a set of markup declarations that define a document type. DTD's use a syntax that declares precisely which elements and references may appear where in the document, and also declare entities.

 
XSL - a language to define how an XML document's data is rendered and how to transform one XML document into another.
 
XSD - XML Schema Definition - An XML schema provides a more rich structure of other XML documents. 
 

These are all road maps for the XML document that references the schema and VALIDATES THE XML.

Term

 

 

 

 

 

Compare and contrast DTD and XML Schema.

Definition

 DTD is the old way of doing things. 

 

It is a separate TEXT document (NOT an XML document).  XML Schema is in fact an XML document.

 

The Schema offers a variety of data types, you can define integers and dates.  In the DTD, the only datatypes were pcdata, etc.  Not specific to char vs number, etc. 

 

XML Schema also offers the ability to put more constraints on the data.

Term

 

 

 

 

 

Compare and contrast ways of storing XML in a database.

Definition

There are two ways to store XML in a database: storing actual XML documents (tricky!) or storing data to be transmitted to another system (more common). 


Storing actual documents is tricky because XML has to be stored as XML, which means that it cannot be translated. There are three ways to accomplish this:

  • Store XML document in an attribute (CLOB),
  • storing XML inside a relational DBMS in shredded form (not terribly flexible),
  • and storing XML inside a hierarchichal database (IMS, for example).
Term

 

 

 

 

Compare and contrast the system perspective on data quality and the data consumer perspective.

Definition

System perspective – data conforms to real world. Intrinsic qualities: complete, unambiguous, correct, meaningful. (CUCM)

 

Consumer perspective – Data is fit for the use of data consumers. For Data Quality categories: Intrinsic, contextual, Representational, accessibility. (ICRA)

Data needs to be readily available and put together in a form that is easy to comprehend and   process.

 

Both perspectives are important.

Term

 

 

 

 

Define data lifecycle management and identify some of the key issues/questions with which it is associated.

Definition

The process of managing business data throughout its lifecycle from conception until disposal across different storage media, within the constraints of the business process.


Data lifecycle management issues:

Data availability:

-          Costs to acquire & operate storage devices, Performance

Data preservation:

-          Backup & recovery, Long term archiving

Regulatory compliance:

-          Privacy

Retention:

Term

 

 

 

 

Explain the relationship between data governance and such issues as data quality, data integration, regulatory demands on data management.

Definition

 

These are data management issues that are resolved with data governance (being data governance areas of focus, they launch the need for data governance).

 

The better control of data can help to achieve desired data quality, make data consistent to be injected into data warehouse, or to enforce specific  usage and retention of data required by certain regulatory acts.

Term

 

 

 

 

 

Define data governance

Definition

Data governance is a organizational structure that oversees the broad use and the usability of data as a enterprise asset.


A system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take which actions with what information, and when, under what circumstances, using what methods.


Manifested at executive level organizational structure that creates and enforces policies and procedures for the business and technical use of data across the entire organization. Involves IT, but is not driven by IT.

Term

 

 

 

 

Articulate the benefits and challenges of Master Data Management

Definition

Benefits:

-  offers a single version of truth

overcomes barriers between units/processes in the organization;

-  simplifies integration;

-  ensures data consistency, accuracy, accountability.


Challenges ( mistakes) :

-      Treated as a purely technical initiative;

-     Assume dirty data is an IT problem;

-     Manage multiple data domains without the right techniques;

      -     Underestimate and over-prioritize costs; Underestimate  level of executive sponsorship needed.

Term

 

 

 

 

Explain the impact of regulations like Sarbanes-Oxley on data management

Definition

Tightened requirements on how companies form their board of directors, interact with  auditors, and report their financial statements.

 

Data management is affected since companies now have to certify the accuracy of their financial statements. SOX places increased requirements on security and auditing of financial data, and this has implications for how data is collected, processed, secured, and reported, both internally and externally to the organization.


Impact on IT- managing metadata (any changes made to the data, who touched the data last time).

Term

 

 

 

 

Define metadata

Definition

Metadata captures properties of any IT asset (not just data):

-          Documenting review of specifications, design: dates, individuals, results;

-          Ownership

-          Status

-          Name

-          Type

-          Identity

-          Configuration

-          Quality

-          Dependencies with other IT assets

Term

 

 

 

 

 

Explain the role that metadata plays in data management

Definition

In order to comply with regulations, metadata needs to be captured and managed. Metadata provides the following information:

 

What the data is, What data means

 

Where it comes from, Where it is, Where it goes

 

Who owns it, Who is responsible for it, Who has access to it

 

 

Term

 

 

 

Describe the components of a data warehouse system and trace the flow of data through these components from the time it is generated to the time it is used by the end-user.

Definition
[image]
Term

 

 

 


Identify how and why a Resource Description Framework (RDF) may be used.

Definition

An RDF adds meaning.  Next step up from XML... The most generic way to describe the model

A resource is anything that has an IDENTITY.
An XML-based language to describe resources

Resource (e.g. file, image, person)
Anything that has an identity
Description: Metadata about the resource
    itself (e.g. author, type, creation date, )
RDF Model (“triple”)
Subject (e.g. customer)
Predicate (e.g. makes reservation)
Object (e.g. reservation)
Term

 

 

 

 

Define the semantic web and describe the technologies and standards that make it a reality.

Definition

Semantic Web is an evolving extension of WWW in which the semantics of information an service of the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content.

 

Layer 1 XML – adds structure to Web content

Layer 2: XML-based protocols

Layer 3: XML vocabularies – public schemas defining a shared vocabulary

Layer 4: Ontology frameworks

Term

 

 

 

 

Identify some of the major similarities and differences between SQL and XQuery

Definition

Similarities:

-          Support joins, sorts and other common operations

-          Query rows and columns in the table

Differences:

-          SQL can’t address part of XML document

-          SQL can’t create new XML

-          SQL doesn’t work with sequences of both nodes and values

-          SQL doesn’t support XML schema

            -          XQuery doesn’t support insert/update/delete.

Supporting users have an ad free experience!