Shared Flashcard Set

Details

Data Integrity
IACA Certification - Chapter Six - Data Integrity
36
Criminology
Professional
10/26/2009

Additional Criminology Flashcards

 


 

Cards

Term

Data Integrity

Definition
Data Integrity is universally critical, because every part of the analytical process that follows depends on the data on which it is based.
Term

Crime Mapping

 

Definition
Crime Mapping is the use of a GIS to perform spatial analysis of crime and police activity
Term

Geocoding Score

Definition

Geocoding Score is often called the "hit rate," often guages the level of success. A score of 80-90% is often thought of as desirable for law enforcement. That is, 80-90% of the data are represented on the map, and only these data will be analyzed.

Term

Data Cleaning

(or Data Scrubbing)

Definition
Data Cleaning is the process of correcting data integrity errors. Data Cleaning is the process of taking tabular data and correcting mistakes before being used for analysis.
Term

Origination Errors

Definition
Origination Errors occur when the data is collected or transcribed
Term

Management Errors

Definition
Management Errors occur when the data is stored
Term

Retrieval Errors

Definition
Retrieval Errors occur when we query or download data to analyze it.
Term

"Empties"

Definition
"Empties" are an Origination Error that occurs when an entire data field is null, or empty.
Term

Typographical Errors

(or Typos)

Definition
Typographical Errors are Origination Errors that occur when keystrokes aren't what the operator intends.
Term

Punctuation

Definition
Punctuation is An Origination Error that occurs when punctuation marks are used when they should not be, or excluded when they are expected.
Term

Abbreviations

Definition
Abbreviations is an Origination Error that occurs when efforts are made to shorten words, thereby decreasing the time and effort necessary to enter them into a data table.
Term

Omissions

(or Blanks)

Definition
Omissions is an Origination Error that occurs when necessary data elements are erased or, more likely, not entered in the first place.
Term

Alias Errors

Definition
Alias Errors are Origination Errors that occur from the use of names and phrases that make perfect sense to the human being, but are useless to a computer.
Term

Malapropisms

(or Malaprops or Mals)

Definition
Malapropisms are Origination Errors that occur when the data is not the expected format or information. For example, instead of "123 E Main St," the writer might enter "behind the fence" or "100 yards S/B."
Term

Generalizations

Definition
Generalizations are Origination Errors that occur when a common place name or broad location is entered rather than a specific location. A common example is the use of "hundred blocks."
Term

Invalid Entries

Definition
Invalid Entries are Origination Errors that look like correct data, but are not valid. The most common type of invalid entry is the non-address.
Term

Extraneous

Definition
Extraneous is an Origination Error that occurs when extra data is entered into a field that should contain only limited data. This often happens in address fields and especially in CAD records.
Term

Management Errors

Definition
Management errors arise from how we store our data - usually in the form of computer data files, but not necessarily.
Term

Record Truncations

Definition
Record Truncations are Management Errors that occur when a database or file system can't hold all the data that's been put into it, resulting in some records being deleted or not accepted.
Term

Field Truncations

Definition
Field Truncations are Management Errors that occur when specific fields aren't long enough or detailed enough to hold the information that's placed in them. The most common offenders are simple text fields.
Term

Field Conversion

Definition
Field Conversions are Management Errors that happen when data is changed from one type into another type. This usually occurs when we transfer data from one system into another electronically using some kind of automation. For example, a date field might inadvertently be converted to a numeric field, causing unpredictable changes in the resulting values.
Term

Physical

Definition
Physical is a Management Error that is the result of data corruption. This occurs when the physical record containing the data is damaged or misplaced.
Term

Retrieval Errors

Definition
Retrieval Errors may occur when the analyst retrieves information; even though the source data is accurate and reliable, the resulting search, query or reporting functions can often lead to problems. In general, retrieval errors are identical to management errors.
Term

Data "Chain of Custody"

Definition

A typical data "chain of custody" looks something like this:

  1. Victim
  2. Officer
  3. Records Clerk
  4. Database Administrator
  5. Crime Analyst
Term

Compensate

Definition
The final way to overcome data integrity problems is to compensate for them. This method is by far the weakest and least desirable way to cope with data errors; it is commonly used by crime analysts.
Term

Alias Tables

Definition
Probably the most common example of compensating for dirty data is the use of Alias Tables in GIS software. An alias table converts a known erroneous address into a valid, matchable address. The data is still dirty and the damage is not repaired. Alias tables serve only as a band-aid.
Term

Fault & Fix

Definition
The two elements of data cleaning operation are the "fault," the error we're searching for, and the "fix," what we replace it with.
Term

Manual Data Cleaning

Definition
Manual Data Cleaning consists of a human operator searching through records, spotting instances of errors and replacing the faults with valid fixes.
Term

Semi-Automatic Data Cleaning

Definition
Semi-Automatic Data Cleaning consists of the human operator using an automated function to quickly perform an individual operation.
Term

Fully Automatic Data Cleaning

Definition
Fully Automatic Data Cleaning is the strongest approach for most users. This method consists of preparing a list of fault/fix operations ahead of time, which are then followed in order by a completely automated set of search and replace actions.
Term

Unintended Consequences

Definition
Cleaning operations can often have unintended consequences. These consequences occur when a given cleaning operation affect data other than that intended by the user (over-inclusion), or fail to clean all the targets intended by the user (under-inclusion).
Term

Rules of the Road

Definition
  1. Never clean source data - always use a copy.
  2. Never replace numbers with other numbers
  3. Don't fix a problem with another problem
Term

Off-the-Shelf Automation

Data Cleaning

Definition
Off-the-Shelf automation data cleaning products offer professionalism and customer support; however, it may require an initial outlay of money.
Term

Homegrown Data Cleaning

Definition
Homegrown data cleaning applications are programs written by someone at the local level. The support is local and the product will be tailor-made, however the support is often weak and it may be time-consuming to build.
Term

Ad Hoc Data Cleaning

Definition
Ad Hoc data cleaning applications are macro-type automations written at the local level, often by the crime analyst. Ad-hoc solutions usually employ macro automation technology such as VBA.
Term

Cleaning Applications

Definition

A well-written cleaning application, whether created internally or purchased from a vendor, should not have it cleaning parameters "hard-coded" in the application. It should be adaptable. Macros are inflexible and contain hard-coded operations. Applications are typically far more flexible.

Supporting users have an ad free experience!