Shared Flashcard Set

Details

Stats
210
125
Mathematics
Not Applicable
12/11/2004

Additional Mathematics Flashcards

 


 

Cards

Term
Design
Definition
The __________ of a sample refers to the method used to choose the sample from the population.
Term
Lurking Variable
Definition
Observational studies of the effect of one variable on another often fail because the explanatory variable is confounded with __________. Well-designed experiments take steps to defeat confounding.
Term
stratum
Definition
A stratified sampling design can produce more exact information than a simple random sample of the same size by taking advantage of the fact that individuals in the same __________ are similar to one another.
Term
response bias
Definition
The wording of a question is the most important influence on the answers given to a sample survey. Confusing or loaded questions can introduce strong __________. Never trust the results of a sample survey until you have read the actual questions posed!
Term
causation
Definition
A statistically significant association in data from a well-designed experiment does imply __________.
Term
Experiment
Definition
An observational study is a poor way to gauge the effect of an intervention. To see the response to a change, we must actually impose the change. When our goal is to understand cause and effect, a/an __________ is the only source of fully convincing data.
Term
voluntary response sample
Definition
A/an __________ is biased because people with strong opinions, especially negative opinions, are most likely to respond
Term
Chance
Definition
In a voluntary response sample, people choose whether to respond. In a convenience sample, the interviewer makes the choice. In both cases, personal choice produces bias. The statistician’s remedy is to choose the sample by __________. This ensures that neither favoritism by the sampler nor self-selection by respondents takes place in selecting the sample.
Term
Sample
Definition
A simple random sample gives each individual an equal chance to be chosen. It also gives every possible __________ an equal chance to be chosen
Term
Response Bias
Definition
That some respondents lie, especially when asked about illegal or unpopular behavior, is an example of __________. The sample then underestimates the occurrence of such behavior in the population.
Term
laws of probability
Definition
Properly designed samples avoid systematic bias, but their results are rarely exactly correct and they vary from sample to sample. However, the results of random sampling don’t change haphazardly from sample to sample. Because we deliberately use chance to select the sample, the results obey the __________ that govern chance behavior. We can say how large an error we are likely to make in drawing conclusions about the population from a sample.
Term
lurking variables
Definition
When conducting an experiment, we have a treatment group and a control group. The group of patients who received a sham treatment is called a control group, because it enables us to control the effects of __________ on the outcome.
Term
Sample
Definition
Larger random samples give more accurate results than smaller samples. In other words, the __________ determines how close to the population truth the sample result is likely to fall.
Term
Factors
Definition
Because the purpose of an experiment is to reveal the response of one variable to changes in other variables, the distinction between explanatory and response variables is essential. The explanatory variables in an experiment are often called __________.
Term
Bias
Definition
Voluntary response samples and convenience samples are sampling methods which display __________, or systematic error. That is, these sampling methods systematically favor some parts of the population over others.
Term
Randomization
Definition
How can we assign experimental units to treatments in a way that is fair to all the treatments? The answer is the same as in sampling: let impersonal chance make the assignment. The use of chance to divide experimental units into groups is called __________.
Term
statistically significant differences
Definition
If we observe __________ among the groups in a comparative randomized experiment, then we have good evidence for a cause-and-effect relationship between the explanatory and response variables.
Term
chance variation
Definition
The principle of replication means that we should use enough experimental units to reduce __________.
Term
Double Blind
Definition
The __________ avoids unconscious bias by, for example, a physician who doesn’t think that “just a placebo” can benefit the patient.
Term
Order
Definition
Sometimes in a matched-pairs design, each subject serves as his or her own control. The __________ of the treatments can influence the subject’s response. So we toss a coin to decide which treatment the subject gets first.
Term
exploratory data analysis
Definition
Statistical tools and ideas help us examine data in order to describe their main features. This examination is called __________. We use graphs and numerical summaries to describe the variables in the data set and the relations among them.
Term
Count
Definition
The distribution of a categorical variable lists the categories and gives the __________ of individuals who fall in each category
Term
Overall Pattern
Definition
In any graph of data, we look for the __________ and for unusual features.
Term
vertical scale
Definition
When comparing two histograms, we use on the __________ not the actual counts but the percents. The reason is that the two histograms may not have the same total number of counts. A histogram of percents rather than counts is also convenient when the counts are very large.
Term
horizontal scale
Definition
If we are interested in the change of a child’s height over time, we make a time plot. We plot each observation against the time at which it was measured. In this plot, we put time on the __________.
Term
Relationships
Definition
In conducting exploratory data analysis, we begin by examining each variable in the data set by itself. Then we move on to study the __________ among the variables.
Term
Categories
Definition
When making a pie chart, you must include all the __________ that make up a whole. Bar graphs are more flexible.
Term
Spread
Definition
We can describe the __________ of a distribution by giving the smallest and largest values.
Term
Trend
Definition
A time plot of a variable plots each observation against the time at which it was measure. When we examine a time plot, we look for a/an __________. An example is a long-term upward or downward movement over time.
Term
Distributions
Definition
The __________ of a variable describes what values the variable takes and how often it takes these values.
Term
numerical summaries
Definition
In conducting exploratory data analysis, we look at a graph or graphs. Then we make __________ of specific aspects of the data for more complete description
Term
Range
Definition
The bars of a histogram should cover the entire __________ of values of a variable. Our eyes respond to the area of the bars in a histogram
Term
Outlier
Definition
In any graph of data, we look for an overall pattern and for striking deviations from that pattern. A/an __________ is an individual value that falls outside the overall pattern.
Term
Trend
Definition
When observations on a variable are taken over time, we make a time plot that graphs time horizontally and the values of the variable vertically. A time plot can reveal a/an __________ or other changes over time.
Term
Distribution
Definition
A/an __________ is symmetric if the right and left sides of the histogram are approximately mirror images of each other.
Term
Overall Pattern
Definition
Shape, center and spread provide a good description of the __________ of any distribution for a quantitative variable.
Term
mean
Definition
An important fact about the __________ as a measure of center is that it is sensitive to the influence of a few extreme observations
Term
Median
Definition
In a skewed distribution, the mean is farther out in the long tail than is the __________.
Term
first quartile
Definition
The __________ is larger than 25% of the observations.
Term
variance
Definition
The __________ of a set of observations is the average of the squares of the deviations of the observations from their mean.
Term
Mean
Definition
The __________ is a measure of center that uses the actual value of each observation and is thus sensitive to extreme values.
Term
Median
Definition
The __________ is a measure of center that is resistant to outliers. It is also the second quartile.
Term
Five Number Summary
Definition
The __________ of a data set consists of the smallest observation, the first quartile, the median, the third quartile and the largest observation, written in order from smallest to largest
Term
Symmetric distribution
Definition
A boxplot gives an indication of the symmetric or skewness of a distribution. In a/an __________, the first and third quartiles are equally distant from the median.
Term
standard deviation
Definition
The __________ is the positive square root of the variance.
Term
Median
Definition
If the distribution is exactly symmetric, the mean and the __________ are exactly the same.
Term
Standard Deviation
Definition
The __________ is a measure of spread that looks at how far the observations are from their mean.
Term
Degrees of Freedom
Definition
In calculating the variance or the standard deviation, we use n - 1 in the formula. The number n - 1 is called the __________ of the variance or the standard deviation.
Term
The Same Value
Definition
When the standard deviation is zero, all the observations have __________.
Term
Mean
Definition
A skewed distribution that has no outliers will, in all likelihood, pull the __________ toward its long tail.
Term
resistant measure
Definition
Because the mean is sensitive to the influence of extreme observations, we say that it is not a/an __________ of center.
Term
median
Definition
The __________ is the midpoint of the distribution, the number such that half the observations are smaller and the other half are larger.
Term
third quartile
Definition
The __________ is larger than 75% of the observations.
Term
five-number summary
Definition
The __________ of a distribution is used to construct a boxplot
Term
standard deviation
Definition
The __________ measures spread about the mean and should be used only when the mean is chosen as the measure of center
Term
linear relationships
Definition
Correlation and regression must be interpreted with caution. Because they describe only __________, we must not forget to first plot the data to see the form of the relationship and also to detect outliers and influential observations.
Term
lurking variables
Definition
A strong association between X and Y does not necessarily mean that X causes Y. Indeed, the strong association may be explained by __________, and, in this scenario, the conclusion that X causes Y is either wrong or not proved.
Term
regression lines
Definition
__________ are straight lines that describe how a response variable Y changes as an explanatory variable X changes.
Term
conditional distribution
Definition
When working on a two-way table, to find the __________ of the row variable for one specific value of the column variable, look only at that one column in the table. Find each entry in the column as a percent of the column total.
Term
Simpson’s paradox
Definition
A comparison between two variables that holds for each individual value of a third variable can be changed or even reversed when the data for all values of the third variable are combined. This is called __________. It is an example of the effect of lurking variables on an observed association.
Term
lurking variables
Definition
The best way to get good evidence that X causes Y is to do an experiment in which we change X and keep __________ under control.
Term
roundoff error
Definition
In a two-way table, sometimes the row and column totals do not match. The explanation is __________.
Term
categorical variables
Definition
In Simpson’s paradox, the lurking variables are __________. The paradox is an extreme form of the fact that observed associations can be misleading when there are lurking variables.
Term
two-way tables
Definition
__________ of counts organize data about two categorical variables. Values of the row variable label the rows that run across the table, and values of the column variable label the columns that run down the table.
Term
marginal distributions
Definition
The row totals and the column totals of a two-way table give the __________ of the two individual variables. It is clearer to present these distributions as percents of the table total.
Term
extrapolation
Definition
__________ is the use of a regression line for prediction far outside the range of values of the explanatory variable X that you used to obtain the line. Such predictions are not accurate.
Term
lurking variable
Definition
In studying a relationship between X and Y, we sometimes find that the relationship is influenced by other variables we did not measure or even think about. A/an __________ can thus falsely suggest a strong relationship between X and Y or it can hide a relationship that is really there.
Term
cause-and-effect relationship
Definition
Be careful not to conclude that there is a/an __________ between two variables just because they are strongly associated.
Term
two-way table
Definition
To study the relationship between two categorical variables (say, education and age group), we construct a/an __________. In this case, education becomes a row variable and age group becomes a column variable.
Term
influential observation
Definition
A/an __________ is an individual point that substantially changes the regression line. It is often an outlier in the X direction, but it need not have a large residual.
Term
lurking variables
Definition
__________ that you did not measure may explain the relations between the variables that you did measure. Correlation and regression can be misleading if you ignore these variables that you did not measure.
Term
right margin
Definition
In studying the relationship between two categorical variables, say education and age group, we construct a two-way table. Education is the row variable and age group is the column variable. The distribution of education alone is called a marginal distribution because it appears at the __________ of the two-way table.
Term
appropriate percents
Definition
In a two-way table, relationships between the categorical variables are described by calculating __________ from the counts given. Counts are often hard to compare.
Term
Simpson’s paradox
Definition
An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called __________.
Term
residuals
Definition
You can examine the fit of a regression line by studying the __________, which are the differences between the observed and predicted values of Y.
Term
total area 1
Definition
We can sometimes describe the overall pattern of a distribution by a density curve. A density curve has __________ underneath it.
Term
approximation
Definition
In drawing a density curve, minor irregularities and outliers are ignored. Of course, no set of real data is exactly described by a density curve. The curve is a/an __________ that is easy to use and accurate enough for practical use.
Term
balance point
Definition
The mean of a density curve can be located by eye. The mean µ is the __________ of the curve.
Term
z-score
Definition
When we subtract the mean of the distribution from an observation x and then divide the difference by the standard deviation, we get what is called a __________.
Term
standard deviation
Definition
The normal distribution is completely determined when we know its mean and __________.
Term
mathematical model
Definition
The density curve is a/an __________ for the distribution of a quantitative variable. It is an idealized description. It gives a compact picture of the overall pattern of the data but ignores minor irregularities as well as any outliers.
Term
proportion
Definition
An area under a density curve gives the __________ of observations that fall in a range of values.
Term
Area
Definition
The median of a density curve can be located by eye. The median divides the __________ under the curve in half.
Term
change-of-curvature points
Definition
On normal density curves, the standard deviation  is the distance from the mean  to the __________ on either side.
Term
standard deviation
Definition
The __________ of the standard normal distribution is 1.
Term
density curve
Definition
A/an __________ is an idealized description of the overall pattern of a distribution that smooths out the irregularities in the actual data.
Term
standard deviations
Definition
The z-score of an observation x says how many __________ x lies from the mean of the distribution.
Term
Mean
Definition
The __________ of the standard normal distribution is zero.
Term
Area
Definition
You can roughly locate the median and the quartiles of any density curve by dividing the __________ under the curve into four equal parts
Term
curvature
Definition
On a normal curve, the point at which the __________ changes is located at a distance  on either side of the mean .
Term
histogram bars
Definition
When we have a large number of observations and graph the distribution of the quantitative variable, we sometimes get an overall pattern which is so regular that we can describe it by a smooth curve. We then draw a smooth curve through the tops of the __________.
Term
Area
Definition
A density curve is a curve that is always on or above the horizontal axis. It has __________ exactly 1 underneath it.
Term
median
Definition
The __________ of a density curve can be located by eye. It is the point with half the observations on either side.
Term
Standard Normal
Definition
If X has the N(, ) distribution, then Z = (X – )/ has the __________ distribution.
Term
The Long tail
Definition
The mean and median are equal for symmetric density curves. The mean of a skewed curve is located farther toward the __________ than is the median
Term
laws of probability
Definition
Statistical inference is most secure when we produce data by random sampling. The reason is that when we use chance to choose respondents, the __________ answer the question: “What would happen if we did this many times?”
Term
random phenomenon
Definition
A/an __________ has outcomes that we cannot predict but that nonetheless have a regular distribution in very many repetitions.
Term
Original population
Definition
When we choose many simple random samples from the same population, the sampling distribution of the sample means is centered at the mean of the __________.
Term
Spread
Definition
In estimating the population parameter , the sample mean is correct on the average in many samples. How close the sample mean falls to the parameter  in most samples is determined by the __________ of the sampling distribution.
Term
Averages
Definition
__________ are less variable than individual observations. In general, the results of large samples are less variable than the results of small samples.
Term
sampling distribution
Definition
The __________ of a statistic is the distribution of values taken by the statistic in all possible samples of the same size from the same population.
Term
law of large numbers
Definition
Draw observations at random from any population with finite mean population mean . The __________ guarantees that as the number of observations drawn increases, the mean of the observed values gets closer and closer to the mean  of the population
Term
chance behavior
Definition
The value of a statistic varies in repeated random sampling. But sampling variability is not fatal. __________ is unpredictable in the short run but has a regular and predictable pattern in the long run.
Term
Central Limit Theorem
Definition
The __________ allows us to use normal probability calculations to answer questions about sample means from many observations even when the population distribution is not normal.
Term
systematic tendency
Definition
In repeated sampling, the sample mean will sometimes fall above the true value of  and sometimes below, but there is no __________ to overestimate or underestimate the parameter.
Term
probability
Definition
The __________ of an event is the proportion of times the event occurs in many repeated trials of a random phenomenon.
Term
individual observations
Definition
When we choose many simple random samples from the same population, the sampling distribution of the sample means is less spread out than the distributions of the __________.
Term
Central Limit Theorem
Definition
Draw a simple random sample of size n with mean  and standard deviation . According to the __________, when n is large, the sampling distribution of the sample mean is approximately normal.
Term
unbiased estimate
Definition
Because the sample means are centered at , we say that the sample mean is a/an __________ of the parameter .
Term
sampling distribution
Definition
The __________ of the sample mean describes how the sample mean varies in all possible samples of the same size from the same population.
Term
confidence interval
Definition
A/an __________ is one of the two types of statistical inference. We use it when our goal is to estimate a population parameter.
Term
good evidence
Definition
The basic idea of significance tests is simple: an outcome that would rarely happen if a claim were true is __________ that the claim is not true.
Term
alternative hypothesis
Definition
The claim about the population that we are trying to find evidence for is the __________.
Term
P-Value
Definition
The __________ of a test is the probability, computed assuming that the null hypothesis is true, that the observed outcome would take a value as extreme as or more extreme than that actually observed.
Term
null hypothesis
Definition
Large p-values fail to give evidence against the __________.
Term
test of significance
Definition
A/an __________ is one of the two types of statistical inference. We use it when our goal is to assess the evidence provided by data about some claim concerning a population.
Term
null hypothesis
Definition
The __________ is the statement being tested in a statistical test. The test is designed to assess the strength of the evidence against it.
Term
Evidence
Definition
The smaller the p-value is, the stronger is the __________ against the null hypothesis provided by the data.
Term
one-sided alternative hypothesis
Definition
We have a/an __________ when we are interested only in deviations from the null hypothesis in one direction. An example is Ha:  > 0.
Term
P-Value
Definition
If the __________ is as small or smaller than , we say that the data are statistically significant at level .
Term
test of significance
Definition
A/an __________ assesses the evidence against the null hypothesis by giving a probability, the p-value.
Term
Chance
Definition
Small p-values are evidence against the null hypothesis, because they say that the observed result is unlikely to occur just by __________.
Term
sampling distribution
Definition
Calculating p-values requires knowledge of the __________ of the test statistic when the null hypothesis is true.
Term
null hypothesis
Definition
The __________ is a claim that we will try to find evidence against.
Term
p-value
Definition
A statistical test is based on a test statistic. The __________ is the probability, computed supposing that the null hypothesis is true, that the test statistic will take a value at least as extreme as that actually observed.
Supporting users have an ad free experience!