SOC 113 Sociology Hypothesis Testing Questions 1. Hypothesis testing: how to form hypotheses (null and alternative); what is the meaning of reject the null or fail to reject the null; how to compare the p-value to the significant level (suchlike alpha = 0.05), and what a smaller p-value means.

2. How to interpret the one-sample t-test results: what are Ho and Ha; the standard for determining statistical significance, i.e., t statistic and p-value; what are the steps for the one-sample t test; what a normal distribution looks like.

3. How to interpret the one-way ANOVA results: what are Ho and Ha; the standard for determining statistical significance, i.e., F statistic and p-value; what an F distribution looks like.

4. How to interpret the simple linear regression results: what are Ho and Ha; the standard for determining statistical significance, i.e., t statistic and p-value of the slope; what is the slope and what it means; what is the R-square (not R, it is R-square!) and what it means; what are independent variables and dependent variable, and what their relationships are; how would you plot the relationship between a dependent variable and an independent variable; from a given independent variable, how would you predict the value of a dependent variable.

5. How to interpret the multiple regression results: how to interpret the slope of an independent variable (i.e., the impact of this independent variable, holding other independent variables constance). 1

Soc 113 Review Guide for November 6 Midterm (2019)

Form: 20 questions in total. 10 multiple choice or filling the blanks; 10 short responses, related to

the statistical tables provided (suchlike those tables in HW assignments). Key points are

summarized below:

1. Level of measurement: understand what are continuous and discrete variables, and examples

of different types (discrete, continuous, and the 4 types below)

2. Hypothesis testing: how to form hypotheses (null and alternative); what is the meaning of

reject the null or fail to reject the null; how to compare the p-value to the significant level

(suchlike alpha = 0.05), and what a smaller p-value means.

3. How to interpret the one-sample t-test results: what are Ho and Ha; the standard for

determining statistical significance, i.e., t statistic and p-value; what are the steps for the

one-sample t test; what a normal distribution looks like.

4. How to interpret the one-way ANOVA results: what are Ho and Ha; the standard for

determining statistical significance, i.e., F statistic and p-value; what an F distribution looks

like.

5. How to interpret the simple linear regression results: what are Ho and Ha; the standard for

determining statistical significance, i.e., t statistic and p-value of the slope; what is the slope

and what it means; what is the R-square (not R, it is R-square!) and what it means; what are

independent variables and dependent variable, and what their relationships are; how would

you plot the relationship between a dependent variable and an independent variable; from a

given independent variable, how would you predict the value of a dependent variable.

6. How to interpret the multiple regression results: how to interpret the slope of an independent

variable (i.e., the impact of this independent variable, holding other independent variables

constance).

2

● Understand how to use SPSS or Stata to produce all of the tables that you have had to handle so

far.

○ Homework 1:

■ Tables used:

○ Homework 2:

■ Tables used:

○ Homework 3:

■ Tables used:

● Be familiar with the variables housed in the GSS dataset.

○ Limited because it doesn’t have a lot of the best kind of variables, but the variables still

work.

■ Limitations: level of measurement / going to be a lot of times you have to overlook

the problems

■ HAPMAR (happiness in marriage), RINCOME (income), PAPRES10 (father’s prestige

score)

● `How are they coded?

○ HAPMAR → 1 = very happy, 2 = pretty happy, 3 = not too happy, 8 = don’t

know, 9 = no answer, 0 = Not applicable

○ RINCOME → 1 = Lt $1000, 2 = $1000 – $2999, […], 12 = $25000 or more, 13 =

Refused, 98 = Don’t know, 99 = No answer, 0 = applicable

○ PAPRES → F“or the 3 different ‘papres’ variables on GSS, there are no

labels associated with the codes

● Levels of measurement?

○ HAPMAR – nominal

○ RINCOME – ordinal

○ PAPRES – interval

● Be able to distinguish among various levels of measurement for variables.

○ Nominal

○ Data cannot be ordered nor can it be used in calculations

■ Republican, democrat, green, libertarian

● Not useful in calculations – Data is qualitative, can’t be used in a

meaningful way such as means and standard deviations

● Ordinal

○ Data that can be ordered, differences cannot be measured

■ Small – 8oz, medium – 12oz, large – 32oz

■ Cities ranked 1-10, but differences between the cities don’t make sense/

can’t know how much better life is in city 1 vs city 2

● Also shouldn’t be used in calculations

● Interval

3

○ Data with a definite ordering but not starting point; the differences can be

measured, but there is no such thing as a ratio

○ Not only classifies and orders the measurements, but it also specifies that the

distances between each interval on the scale are equivalent along the scale from

low interval to high interval

○ Can be ordered and differences between the data make sense

○ Data at this level does not have a starting point

■ 0 degrees doesn’t mean absence of temperature

■ think temperature: 10℃+10℃=20℃ but 20℃ is not twice as hot as 10℃. We can

see this when we convert to Farenheit; 10℃= 50℉, but 20℃= 68℉.

● Ratio Data

○ Data with a starting point that can be ordered; the differences have meaning and

ratios can be calculated

○ All features of interval data plus absolute zero

■ Phrases such as “four times as likely” are actually meaningful

○ Is defined as a quantitative data, having the same properties as interval data, with

an equal and definitive ratio between each data and absolute “zero” being treated

as a point of origin

○ Tell us about the order, the exact value in between units

■ Height, weight, duration

■ Both descriptive and inferential statistics can be applied

■ Your highest level, your most sophisticated

■ Axis of whatever you are measuring

■ There can be no negative numeric value in ratio data

■ Amount of money in your pocket right now

● Understand the difference between continuous and discrete variables.

4

○ Discrete data

■ Very discrete spaces in between values / not going to have values in between whole

numbers

● Certain number of values; positive, whole numbers (like number of people)

○ Continuous data

■ Fractional size spaces in between

■ Capturing every moment of the process / any value between a given range

● Height, weight, etc.

■ Not restricted to separate values

■ Occupies any value over a continuous data value

● Age

● Why is it important to know #4 and #5 in performing statistical procedures.

○ Not all variable types can have statistical procedures performed on them

○ Affects what type of analytical techniques can be used on the data and what conclusions

can be drawn

○ Important to understand that they are just 2 different types of data which will explain the

relationship of the data & create a better understanding for analysis

○ Important because you always want to know the level of measurement before you start

analysis – you want to choose the right way of doing analysis

● What do we mean by inference?

○ Inference: causal

■ Something caused/influenced another thing

■ A caused by B

○ Concerned primarily with understanding the quality of parameter estimates

■ How sure are we that estimated xbar is near true population mean µ

○ Reliability of statistical relationships, typically on the basis of random sampling

● Would you need to perform any work regarding inference with population data?

○ No, inferential statistics allows you to make inferences about the population based on

sample data. No inferences would need to be made if you had population data.

● What is the purpose of hypothesis testing, and on what kind of data?

○ Hypothesis testing is the primary mechanism for making decisions based on observed

sample statistics

○ We want to know if there’s any relationship – causal or correlated

■ Related to the conclusion we can get/ pre-score and post-score see if there’s a

difference

■ Must be done with continuous sample data

○ The alpha level tells you that you’re operating at the possibility of being wrong

■ Working cautiously and understanding limitations

● What are the important components of hypothesis testing? What are the essential elements?

5

Read all the elements to understand what it’s about

Know sampling statistic – derive from own data

Critical value – get off curve

Compare critical value to the point you derive from your data

Based on the level of significance, you draw a conclusion

■ There’s a lot of components – you have to have a dataset, have to construct your

own hypothesis, find mean & variance to construct analysis

● Null & alternative hypotheses

● Test statistic

● Sampling statistic

● Critical value

● Probability values and statistical significance

● Conclusions of hypothesis testing

● What are the steps in performing a hypothesis test?

○ 1. Specify the null hypothesis and alternative hypothesis

○ 2. assumptions / givens

■ Random sampling, known parameters, levels of measurement, known statistics

○ 3. Set the significance level (alpha value)

○ 4. Calculate the test statistic and corresponding p-value

○ 5. Drawing a conclusion

● Be able to draw a “curve” and label that curve appropriately for a hypothesis test.

○ Plot number line below curve and be able to do the math

○ Make sure math matches curve

○ If it’s a two tailed test make sure you break it up into two sides

○ F is always one tail

○ Question about greater than or equal to – it’s a one-sided test

○

○

○

○

○

● What alternative is there to a “curve”?

○ a. You can walk through the equation without drawing a curve

■ Ex: calculate p-value and compare that to the critical value

○ You perform the test and afterwards and tell people how to determine if that’s significant

or not

● How do tests of proportion differ from tests of means?

6

○ A test of proportions seeks to find a statistically significant difference between the

proportions of two groups. A test of means seeks to find a statistically significant

difference between the means of two groups.

● What is a sampling distribution and how is it derived?

○ A sampling distribution is a probability distribution of a statistic obtained through a large

number of samples drawn from a specific population

■ It tells us which outcomes we should expect for some sample statistics (mean,

standard deviation, correlation, etc

○ Represents the distribution of the point estimates based on samples of a fixed size from a

certain population. It is useful to think of a particular point estimate as being drawn from

such distribution. Understanding the concept of a sampling distribution is central to

understanding statistical inference.

■ Example below: unimodal and approximately symmetric. Centered exactly at true

population mean µ=3.90. Sample means should tend to fall around population

mean.

■

● What are sampling distributions used for?

○ Knowledge of sampling distribution & making inferences about the overall population

● What is a significance level? How is it interpreted? (significance level = a)

○ Probability of error / doing our best to get as close as we can. Restricting to 5%, 1%, etc.

○ The significance level, also denoted as alpha or a is the probability of rejecting the null

hypothesis when it is true. For example, a significance level of .05 indicates a 5% risk of

concluding that a difference exists when there is no actual difference (95% confidence

interval to evaluate hypothesis test).

7

●

●

●

●

■ With this example, we will make an error whenever the point estimate is at least

1.96 standard errors away from population parameter (about 5% of the time, 2.5%

on each tail)

Can you set your level of significance anywhere?

○ Yes you can – you’re essentially making an assumption at the beginning of your statistical

experiment so you can adjust it to whatever you want

○ Lower the alpha(significance level), more confident

■ Coming in with an alpha of .01 – one would most likely assume that findings would

be somewhat significant

What do we mean by a “significant” finding?

○ Differences that are being studied are real and not due to chance

What are the basic things you need to perform a hypothesis test?

○ 1. Parameter & Statistic

■ parameter: summary description of a fixed characteristic or measure of the target

population. Denotes the true value that would be obtained if a census rather than a

sample were undertaken

● Mean (µ), Variance (oˆ2), standard deviation (o), proportion (p)

■ Statistic: summary description of a characteristic or measure of the sample. The

sample statistic is used as an estimate of the population parameter

● Sample mean (xbar), sample variance (S^2), sample standard deviation (S),

sample proportion (pbar)

○ 2. Sampling Distribution: probability distribution of a statistic obtained through a large

number of samples drawn from a specific population

○ 3. Standard Error: similar to standard deviation – both are measures of spread. The higher

the number, the more spread out your data is. Standard error uses statistics (sample data)

and standard deviation uses parameters (population data)

■ Tells you how far your sample statistic (such as sample mean) deviates from the

actual population mean. Larger your sample size, the smaller the SE/closer your

sample mean is to the actual population mean.

○ 4. Null hypothesis: a statement in which no difference or effect is expected

○ 5. Alternate hypothesis: a statement that some difference or effect is expected

○ Descriptive statistics

■ Brief descriptive coefficients that summarize a given data set, which can be either a

representation of the entire or a sample of a population/ summarizes or describes

characteristics of a data set

■ Broken down into measures of central tendency (mean, median, mode) and

measures of variability (spread – standard deviation, variance, minimum and

maximum variables, skewness)

What do you run on the computer at the very start of a hypothesis test? (Varies with type of test)

8

○ Run a frequency distribution to make sure your levels of measurement match the

procedures you want to do

● What is a test statistic and how many test statistics have we worked with so far?

○ Test statistic measures how close the sample has come to the null hypothesis. Its

observed value changes randomly from one random sample to a different sample. A test

statistic contains information about the data that is relevant for deciding whether to reject

the null hypothesis or not

○

Hypothesis test

Test Statistic

Z-Test

Z-Statistic

t-test

t-statistic

ANOVA

F-statistic

Chi-square tests

Chi-square statistic

● What is a frequency distribution and a cross tabulation and how do you interpret them?

○ Frequency distribution: shows you how common values are within the variable

■ We can get an idea about whether something is a continuous or categorical

variable/ snapshot view of the characteristics of a data set – allows you to see how

scores are distributed across the whole set of scores (spread evenly, skew, etc.)

● SPSS steps: click on analyze —> descriptive statistics —> frequencies

○ Move the variable of interest into the right-hand column

○ Click on the chart button, select histograms, and press continue and

OK to generate distribution table

○ Cross tabulations: shows where the variables have something in common, seen at the

intersec tion of the row and the column

■ summarize the association between two categorical variables

■ joint frequency distribution of cases based on two or more categorical variables

● SPSS steps: analyze —> descriptive statistics —> select cross tabulation

○ Here you will see Rows and Columns. You can select one or more

than one variable in each of these boxes, depending on what you have

to compare, then click on OK.

■ For percentages – analyze —> descriptive statistics —>

crosstabs —> cells —> under percentage, select all 3 options

● Can you determine the level of measurement from a frequency distribution?

○ Yes, the independent variable of a frequency distribution should indicate its level of

measurement – which is typically categorical

● What is the purpose of an analysis of variance? Is it relevant for data that comes in proportions?

○ ANOVA uses a single hypothesis test to check whether the means across many groups are

equal: H0: The mean outcome is the same across all groups. In statistical notation, µ1 = µ2

9

●

●

●

●

●

=…… = µk where µi represents the mean of the outcome for observations in category i. HA:

At least one mean is different. Generally we must check three conditions on the data

before performing ANOVA:

■ the observations are independent within and across groups,

■ the data within each group are nearly normal, and

■

■ the variability across the groups is about equal

How do you calculate Eta2 from ANOVA and how do you interpret it? (from the reading)

○ A measure in ANOVA that tells you how much variance is in between each variable

○ Is a measure in ANOVA (h^2) – proportion of the total variance that is attributed to an

effect. It is calculated as the ratio of the effect variance (SSeffect) to the total variance

(SStotal)

○ We will be given value and just need to interpret it on test

■ Example: Total SS: 62.29, Anxiety SS: 4.08 —> 4.08/62.29 = 6.6%

● 6.6% of variance is associated with anxiety

What kind of data is needed for an analysis of variance?

○ Dependent variable must be a continuous (interval or ratio) level of measurement

○ Independent variable must be a categorical (nominal or ordinal variable)

■ Two way ANOVA has 2 independent variables

● Females may have higher IQ scores compared to males, but this difference

could be greater or less in European countries compared to North American

countries

○ ANOVA assumes: data is normally distributed, homogeneity of variance (variance among

groups should be approx. equal), observations independent of each other

How does ANOVA work with both means and variances?

○ Inferences about means are made by analyzing variance

What is the equation for ANOVA?

○ F = MST/MSE

■ where F = Anova coefficient, MST = mean sum of squares due to treatment, MSE =

mean sum of squares due to error

■ MST = SST/p-1

■ SST = ∑n(x-xbar)^2

● where SST = sum of squares due to treatment, p = total number of

populations, n = total number of samples in a population

■ MSE = SSE/N-p

■ SSE = ∑(n-1)S^2

● Where SSE = sum of squares due to error, S = standard deviation of samples,

and N = total number of observations

○ F=MSbetween/MSwithin

What kind of conclusion are we looking to draw from an ANOVA procedure? What is ALL that we

can report?

10

○ We are looking to see if the means between groups are statistically equal to one another,

which is all we can report.

○ P-value and Eta^2

● What are we able to conclude from linear regression that we have not been able to conclude with

other procedures? Based on what?

○ The growth of dependent variable due to changing (can be positive or negative) of 1 unit of

independent variable.

○ Which group is significantly different from the others (coding each group as one binary

independent variable).

● What level of variable measurement is ideal for regression? Why?

○ Continuous variable

○ Any time you’re working with means, you want to be working with ratios because you want

to be able to have continuous data with an absolute zero

● Why are certain levels of measurement problematic?

○ TA doesn’t think they are problematic, but – for some variables getting the mean doesn’t

make sense

■ If not continuous, maybe it’s not normally distributed

OTHER NOTES / READING NOTES

● Descriptive statistics: uses the data to provide descriptions of the population, either through

numerical calculations or graphs or tables

● Inferential statistics: makes inferences and predictions about a population based on a sample of

data taken from the population in question

● ANOVA

○ Analysis of variance using a test statistic F/ uses single hypothesis test to check whether

the means across many groups are equal

■ Null: mean outcome is the same across all groups; Alternate: at least one mean is

different

○ Interval or ratio level data

○ 3 conditions before performing ANOVA:

■ the observations are independent within and across groups

■ The data within each group are nearly normal

■ The variability across the groups is about equal

○ Example: consider a stats department that runs three lectures of an introductory stats

course. We might like to determine whether there are statistically significant differences in

first exam scores in these three classes (A,B, and C). Describe appropriate hyp…

Purchase answer to see full

attachment