Shared Flashcard Set

Details

Title

Statistics

Description

Exam 1 Short answer

Total Cards

Subject

Mathematics

Level

Undergraduate 2

Created

02/12/2013

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Mathematics Flashcards

Cards Return to Set Details

Term

What are the differences between descriptive vs. inferential statistics?

Definition

Descriptive statistics are tables, graphs, or numbers that organize or summarize a set of data. Inferential statistics are mathematical techniques that allow us to make decisions, estimates, or predictions about a larger group of indivdiuals on the basis of data collected from a much smaller group.

Descriptive Statistics: certainty, results could describe sample or population, results reflect same 'level' as the data I have

Inferential: probability, results describe population, results reflect different level than the data I have

Term

What role do inferential statistics play in the scientific process?

Definition

To determine if we can be confident that the results for our sample will hold true for the entire population from which it was drawn.

Term

How/why are descriptive statistics used when we conduct inferential statistic tests?

Definition

Descriptive stats are used when we conduct inferential stats tests because scientists typically cannot study entire large populations, so we describe samples and then conduct inferential statistical tests to determine if we can be confident that the results for our sample will hold true for the entire population from which it was drawn.

Term

What do we mean when we say the results of an inferential test are "statistially significant?"

Definition

If the results of a study are statistically significant, we are confident that the results we see for our sample will hold true for 1. most other samples drawn from the same populations, and thus 2. for the entire populations from which the sample was drawn.

Term

What's the difference between a sample statistic and a population parameter? How are they similar?

Definition

A descriptive statistic for a sample is called a sample statistic. A descriptive statistic for an entire population is called a population parameter. They are similar because both are descriptive statistics.

Term

What is a representative sample?

Definition

A representative sample is a subset of the population that exhibits the important characteristics, the diversity of the population

Term

What 3 questions should you ask yourself to determine whether a particular sample described in a study is representative or unrepresentative?

Definition

1. What is the population of interest?
2. What diversity would we expect to find in that population?
3. Does the sample described in the problem include the diversity expected in the population?

Term

Why are representative samples critically important in science?

Definition

Representative samples are critically important in science because if a sample is not representative of a particular population, then conclusions we reach about the population based on that sample are not valid.

Term

Which 2 sampling methods are likely to yield representative samples? How are these methods similar to and different from each other?

Definition

Two sampling methods that are likely to yield representative samples are simple random sample (one in which every individual in the population has an equal chance of being chosen in the sample) and stratified random sample(to ensure that particular groups within a population are adequately represented in a sample, we can randomly selected individuals from each group until the proportion of individuals from each group in our sample equals the proportion of that group in the larger population

Term

Why do convenience and voluntary response sampling methods often fail to yield representative samples?

Definition

Convenience sampling and voluntary response sampling are likely to yield unrepresentative (biased) samples. Convenience sampling occurs whenever the sample is selected based on the ease of collecting data, rather than using a random method. Voluntary response sampling is a type of convenience sampling in which only those who volunteer are in the sample.

Term

What is replication? Why do scientists conduct replication?

Definition

Replication is repeating the study, with essentially the same methodology, on a new sample. Scientists conduct replications to check to see if findings based on a sample are really true for an entire population.

Term

Most consumers of scientific information want to understand and predict characteristics of an individual. What caution must we consider when using scientific principles to predict what may be true for individuals?

Definition

Remember that scientific principles are meant to describe what is true in most cases in a population. We know that there will always be individuals that are different because there will always be variability among individuals in any population

Term

What is the key difference between categorical (qualitative) and numerical (quantitative) data?

Definition

Categorical / qualitative data is not naturally numerical. Numerical / quantitative data is naturally numerical.

Term

What characteristics define nominal variables?

Definition

a categorical variable that cannot be ranked

Term

What characteristics define ordinal variables?

Definition

a categorical variable that can be ranked; if denoted in numbers, the distances between values are not necessarily equal in interval

Term

What characteristics define interval variables?

Definition

a numeric variable that falls on a scale with equal intervals, but does not have an absolute zero point

Term

What characteristics define ratio variables?

Definition

a numeric variable that falls on a scale with equal intervals and DOES have an absolute 0 point

Term

How does precision of measurement vary along the NOIR scale?

Definition

At each successive level along the NOIR scale (Nominal --> Ordinal --> Internval --> Ratio) we have a MORE precise measurement. Scientists generally choose to use the most precise scale of measurement possible.

Term

What is the difference between discrete and continuous variables?

Definition

Discrete variables produce numerical responses, typically from a counting process, and therefore tend to take on only a finite number of real values.

Continuous variables produce numerical responses, typically from a measuring process, and therefore can assume an infinite number of values.

Term

Science proceeds from words to numbers to words. When we are measuring categorical variables, the only numbers we can generate are frequencies -- the frequency or relative frequency of individuals who fall in each category. What's the difference between class frequency and class relative frequency?

Definition

Class frequency is the number of observations in the data set falling in a particular class. Class relative frequency is the proportion or percentage of observations falling in a class.

Term

Can bar charts present CF, CRF, or either one?

Definition

Bar charts can be used to represent either CF or CRF.

Term

Why are there typically spaces between bars on bar charts?

Definition

The separation between bars communicates to the viewer that each bar represents a distinct category.

Term

Can pie charts present CF, CRF, or either one?

Definition

Pie charts can present either CF or CRF.

Term

How does the placement of dots on a dot plot differ depending on whether our variable is discrete or continuous?

Definition

If your quantitative variable is discre3te then you put the dots directly above the values on the x-axis. If the variable is continuous, then you have to estimate where each dot should fall relative to the values on the x-axis.

Term

When creating a pie chart, in what order do we place the slices if our variable is nominal? In what order do we place the slices if our variable is ordinal?

Definition

If data is nominal, pie slices are typically placed in order from the category with the highest frequency (largest slice) to lowest frequency, starting at 0-degrees. If data is ordinal, the order of slices is typically determined by the order in which the categories naturally fall.

Term

Dot plots and histograms are used to display quantitative data -- whcih one requires that we create measurement classes?

Definition

Histrograms require measurement classes.

Term

Why are there never spaces between bars on a histogram?

Definition

Bars on a histogram are placed side-to-side because a histogram shows the frequency of individual scores that fall into measurement classes that are continuous along a quantitative scale.

Term

Do we typically present continuous numerical variables on bar charts or histograms?

Under what circumstances are we likely to present discrete numerical variables on a histogram?

Definition

Typically, continuous numerical variables are displayed on a histogram.

We create measurement classes for discrete numerical variables when the number of discrete values in our data set exceeds 15.

Term

What do measures of central tendency represent?

Definition

Measures of central tendency reflect something about "middleness" -- the middle of a data set. Measures of central tendency are single numerical measured used to represent an entire set of scores.

Term

How do the concepts of mean, median, and mode differ?

Definition

The mean is the average of a set of scores. The mean is calculated by adding up all of the scores and dividing by the total number of scores.

The median is the middle number in a set of measurements. To calculate a median, arrange the N measurements from smallest to largest. If N is odd, the median is the middle number. If N is even, it is the mean of the middle two numbers.

Mode is the most frequently occurring score in a set of scores. In a large data set, we estimate the mode as the midpoint of the modal class.

Term

If the median of a set of test scores is 84 -- what does that mean about how the other scores in the set are distributed?

Definition

It means that half of the scores are above 84 and half are below it.

Term

If, for a given set of scores, the mode = median = mean, what shape is the distribution?

Definition

normal distribution

Term

Which is the more reliable measure of central tendency: mode or median? Why?

Definition

You have to know how the scores in your data set are distributed before you can know which measure of central tendency best represents your data

Term

Why is the median a more reliable measure of central tendency than the mean when a distribution has outliers?

Definition

The mean is most strongly affected in a skewed distribution. Extreme scores (outliers) in the tail of the distribution "pull" the mean in that direction and the more extreme the outliers are the more the mean will be affected. The median is somewhat "pulled" in the direction of the tail, but not as much as the mean because the median only responds to the number of scores above or below it, not how far above or below the outliers fall.

Term

If you have high outliers in a set of data, what will the distribution be?

If you have low outliers in a datset, what will the distribution be?

Definition

High outliers = skewed right

Low outliers = skewed left

Term

What do measures of variability represent?

Definition

A measure of variability represents how alike or different are the scores in a dataset -- whether they cluster together around the mean or whether they are widely dispersed.

Term

When and why can range be an unreliable measure of variability?

Definition

Range can be an unreliable measure of variability if you have either a high or low outlier, the range can be misleading because range is based on the high and low scores in a data set. Just two numbers, no matter how large the set.

Term

What quality do variance and standard deviation have that make them preferred over range as a measured of variability?

Definition

Range is the least reliable measure of variability, as it only includes 2 scores in the set. Variance and standard deviation make calculations with all of the scores in a set.

Term

When using the definitional formula to calculate variance, why can't we just sum up the distances of each score from the mean -- and use that as a measure of variability?

Definition

When we sum the distances of each score from the mean, the sum always equals zero because the mean is the balancing point of any distribution.

Term

Why is standard deviation the preferred measure for reporting variability in a set of data, as compared to variance?

Definition

We prefer to report standard deviation instead of variance because standard deviation is in the original units of measurement -- which makes standard deviation easier to interpret than variance (which is expressed as [original units of measurement]^2).

Term

How does the Empirical Rule help us describe dats that is distributed normally?

Definition

If we know that a set of data is distributed normally, and we know the mean and standard deviation of the distribution, we can use the empirical rule to estimate how many measurements will occur within one or two or three standard deviations of the mean.

Term

What does a measure of relative standing represent?

Definition

A measure of relative standing indicates how a particular score compares to the other scores in a data set.

Term

What does a percentile rank represent?

Definition

The percentile rank for a score is the percentage of scores that are less than that score.

Term

How do we calculate percentile rank differently depending on whether the score for which we want to know the rank occurs only once or whether it occurs more than once?

Definition

If a score only appears once in a data set, then its percentile rank is simply the percentage of scores that are less than it. If a score appears more than once in a dataset, then its percentile rank is the percentage of scores less than it plus half of the percentage of scores that are equal to that value.

Term

What do z-scores represent?

Definition

A z-score represents the number of standard deviations away from the mean a score falls.

Term

When we want to compare scores for individuals who belong to two different distributions why are z-scores useful?

What do "standard normal distributions" have to do with this sort of comparison?

Definition

When we calculate z-scores for raw scores drawn from two different normal distributions, we are standardizing both distributions so that the mean = 0 and standard deviation = 1 for both distributions. We can then compare how far each scores falls from the mean of the distribution from which it was drawn.

Term

Why is using a z-table preferred as the method for estimating areas beneath the normal curve instead of using the empirical rule?

Definition

A z-table is preferred as the method for estimating areas beneath the normal curve instead of using the empirical rule because the empirical rule can only address questions about relative standing when the score falls EXACTLY on a standard deviation mark.

Term

In lesson 2, we used the empirical rule to estimate areas beneath the normal curve to answer questions about proportions and percentages. In this lesson we began to estimate areas beneath the normal curve to answer questions about probability (p) -- why will this be important to keep in mind as we conduct inferential statistical tests?

Definition

Inferential statistics are based on estimates or probability. We draw nromal curves and shade in areas beneath the curve because we are reasoning about probability.

Term

Do normal distributions all have the same height and width?

Definition

No, normal distributions do not all have the same height and width because they differ in mean and standard deviation.

Term

What must we do to normally distributed data so that we can estimate probability using the normal probability curve?

Definition

Any normal distribution can be transformed to match the qualities of the normal probability curve.

Term

What do you do to estimate an area in one tail of the distribution?

Definition

Shade the area you are estimating. Look up area between 0 and z in the z-table.Subtract the area you got from the table from 0.5 to get an area in one tail of the distribution.

Term

What do you do to estimate an area that is more than 1/2 of the distribution?

Definition

Shade the area you are estimating. Look up area between 0 and z in the table. Add 0.5 to the area you got from the table to find an area that is more than 1/2 of the distribution?

Term

What do you do to estimate an area that is between a negative z and a positive z?

Definition

Shade the area you are estimating. Look up the area between 0 and the negative z. Look up the area between 0 and the positive z. Add the two areas together.

Term

What do you do to estimate an area between 2 negative or 2 positive z-scores?

Definition

Take the absolute values of the two positive or negative z-scores. Look up the area between 0 and the larger z. Look up the area between 0 and the smaller z. Subtract the smaller area from the larger.

Term

Explain the problem-solving steps we use to estimate probabilites of events in a normal distribution.

Definition

1. Convert the questions in words to a probability statement using the raw score (x).
2. Label the normal distribution with appropriate scales: (1) z-scores, (2) raw scores (x), and (3) standard deviations.
3. Calculate z-score(s) for the value(s) of x you're given and label the z-scores on the distribution.
4. Shade the area on the distribution corresponding to the probability you want to find.
5. Look up the area between 0 and z in the z-table and label it on the distribution -- always!
6. Separately label any other sections of the distribution that are included in the area you shaded.

Term

What do you need to remembet about what a percentile rank represents to set up and solve problems that ask you to use a z-table to estimate the percentile rank of a score?

Definition

You must recall that a percentile rank is a measure of relative standing that tells us what percentage of individuals in the distribution score below a particular individual.

Term

Explain: To estimate probabilities of events in a normal distribution, we start with an X, solve for a Z, and then obtain a P.

Definition

Term

What is the key difference in how mystery-z problems are worded, as compared to problems in which you are asked to estimate the probability of a particular event (x)?

Definition

In mystery-z problems we will be given a percentage or proportion or probability (p) or -- ans asked to determine the score (x) that relates to the probability given.

Mystery z: start with P. We then solve for z and then x.
Others: Start with x. We then solve for z and then P.

Term

Explain: In mystery-z problems, we start with a P, solve for a z, and then obtain an x.

Definition

Term

Explain the problem-solving steps we use to conduct mystery-z problems.

Definition

1. Shade and label the area given in the problem on a normal distribution.
2. Place a "tag" for the mystery z on the x-axis at the boundary of the area you have shaded and labeled -- to remind yourself that it is the mystery z for which you are solving.
3. Calculate the area between 0 and the mystery z and label it on your distribution.
4. Look up the "area between 0 and mystery z" - you will read the z-table "inside-out" because areas between 0 and z are shown in the body of the z-table. You know the area between 0 and z and must find it IN the body of the table and "raed" the z-score from the OUTer left column and top row.
5. Plug the mystery z into the formula and solve for x.

Term

What is a Distribution of Sample Means (DSM)?

Definition

A DSM shows the means of many samples drawn from one particular population

Term

How do we create a DSM from scratch?

Definition

To create a DSM from scratch, we collect many samples form one particular population; calculate the mean of each sample; and then display the sample means (x-bar) on a distribution

Term

Do we expect that samples drawn from the same population will have identical means? Why or why not?

Definition

No, it is unusual to draw samples with means (x-bar) that are identical to each other or to the population mean (mu)

Term

Do we expect the mean of a sample drawn at random from a population to equal the mean of the population? Why or why not?

Definition

Samples means (x bar) tend to resemble the population mean (mu) more than individual scores (x) from the population. THe means of samples (x bars) drawn from a population will cluster more tightly around the population mean (mu) than will individual scores (x) drawn from the population

Term

When we calculate the standard deviation of all sample means for a DSM, is "n" equal to the total number of samples we collect in order to create the DSM or is "n" equal to the number of individuals in each sample collected?

Definition

n is equal to the total number of samples we collect in order to create the DSM

Term

Why do we create a DSM for a population?

Definition

1. So we can visualize / describe the types of sample means we would expect to draw from a particular population.
2. So we can compare one sample (x bar) to other sample (x-bar's) drawn from the same population -- to say how rare or common that type of sample is in the population.
3. So we can compare a "test sample to the DSM -- to say whether it is unlikely that the "test sample" came from that population

Term

Which two population parameters do we need to know to create a DSM without having to actually collect 100s of samples?

How do we use those population parameters to get the mean and standard deviation of the DSM?

Definition

We set the mean of all sample means (mu sub x-bar) equal to the mean of the population (mu)

We calculate the standard deviation of all sample means, also called standard error (sigma sub x-bar) using

sigma sub x-bar = sigma/square root of n

Term

The standard deviation of sample means is also called what?

Definition

standard error

Term

What can we say will always be true about the relationship between variability among individual scores (sigma) drawn from a population as compared to variability among the means of samples (sigma sub x bar) drawn from the same population? Explain.

Definition

Sigma sub x bar = sigma / square root of n

So, standard error -- which represents variability among sample means (x bar) in a DSM -- will always be LESS THAN the variability among individual scores in a population (sigma)

Why?
1. Each sample mean calculated is a summary statistic that represents the center of that sample --> high and low individual scores in each sample are 'washed out'
2. Because the mean of each sample is an approximation of the population, the x-bar's cluster more tightly around the population mean (mu) than do individual scores (x's)

Term

If we used both methods to create DSMs for a particular population -- one DSM from scratch and one using population parameters -- which would be most accurate?

Definition

Estimation from population parameters would be more accurate because the number of samples in a sampling distribution is assumed to be infinite. Even though creating a DSM from scratch includes hundreds of sample means, we would need many more to get the exact same results as we get when we estimate from population parameters.

Term

THe amount of variability among sample means drawn from a population depends on what two values?

Definition

-the size of samples (n) used to create the DSM
-the amount of variability in individual scores in the population (sigma)

Term

As sample size increases, what happens to standard error? As variability of scores in a population increases, what happens to standard error?

Definition

-Standard error (sigma sub x-bar) will be reduced as sample size (n) is increased.
- Standard error (sigma sub x-bar) will increase as variability of scores in a population (population standard deviation, sigma) is increased.

Term

The accuracy of sample means (x-bar) as estimates of the population mean (mu) increases as sample size does what? Explain why this is so by explaing how sample size affects standard error and how standard of the DSM indicates whether sample means cluster tightly around the mean of the DSM / mean of the population.

Definition

THe accuracy of sample means (x-bar) as estimates of the population mean (mu) increases as the size of samples (n) used to create the DSM increases. The larger sample size (n), the lower standard error in the DSM (sigma sub x-bar). The lower standard error, the more sample means will cluster tightly around the mean of the DSM, and thus, the more likely it is that a sample mean drawn at random from the population will accurately estimate the population mean.

Term

State the exact definition of the Central LImit THeorem.

Definition

When a distribution of sample means is created from large samples (N > or equal to 30), the DSM will resemble a normal distribution regardless of whether the samples were drawn from a population that was distributed normally or non-normally

Term

What sample sizes are needed to ensure a normally distributed DSM if individual scores are normally distributed in the population of interest?

What sample sizes are needed to ensure a normally distributed DSM if individual scores are not normally distributed in a population of interest?

Definition

-If the individual scores in a population are distributed normally, then a DSM created for that population will be normal for any/all sample sizes.
-If the individual scores in a population are not distributed normally, a DSM created for that population will be at least approximately normal if we create the DSM using a sample size of N is greater than or equal to 30.

Term

We want to conduct inferential statistial tests on data taken from many poplations that are not distributed normally. Why is the Central Limit Theorem crucially important to ensure the accuracy of our estimates of probability for these tests?

Definition

Individual scores for many populations in which we are interest may NOT be distributed normally. Still, we can conduct our inferential statistical tests using the standard normal distribution because we know that any DSM, created with sample size of 30 or greater, will be at least approximately normal -- even when the data in the population from which the samples were drawn is not normally distributed.

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Mathematics Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile