Information
AI Chat

Math1041 assignment pdf

Course

Statistics for Life and Social Science (MATH1041)

183 Documents

Students shared 183 documents in this course

University

University of New South Wales

Academic year: 2022/2023

Uploaded by:

Anonymous Student

This document has been uploaded by a student, just like you, who decided to remain anonymous.

University of New South Wales

Recommended for you

5
MATH Assignment Final
Statistics for Life and Social Science
Assignments
100% (6)
11
MATH1041 - Computing assignment
Statistics for Life and Social Science
Assignments
100% (4)
3
Statistics for Life and Social Sciences Assignment
Statistics for Life and Social Science
Assignments
100% (2)
36
MATH1041 past exams - n.a
Statistics for Life and Social Science
Assignments
100% (2)
7
MATH1041 Assignment
Statistics for Life and Social Science
Assignments
100% (2)

Comments

Please sign in or register to post comments.

Preview text

Gabriela Prasetiyo z

Q

1. The following RStudio codes are used to determine the ATAR's Standard Deviation. ATAR1<- as(unlist(data$ATAR)) ATAR2<- ATAR1[!is(ATAR1)] sd(ATAR2) 𝗰 = 14. 𝗰 = 14 (4 𝗆.𝗅.)

1. The following RStudio formula and code are used to determine Daniel9s Z-score.

𝗆 =

𝗆 2 𝗰

𝗰

z_score<- (92-mean(ATAR2))/sd(ATAR2) z_score 𝗆 2 𝗆𝗅𝗅𝗅𝗅 = 0. 𝗆 2 𝗆𝗅𝗅𝗅𝗅 = 0 (4 𝗆.𝗅.)

The number of standard deviations (𝗰) x is from the mean is represented by its z-score if it is an observation from a normal distribution with a mean (𝗰) and standard deviation (𝗰). As a result, Daniel's ATAR's Z-score is positive, indicating that x is 0 standard deviations to the right of the mean (𝗰).

1. Before computing any numerical summary, Daniel should have first completed the data filled in the NA values through a method called imputation. The imputation that can be used in RStudio is central imputation. In central imputation the centre value, which is the mode, median, or mean of the specified dataset to replace the missing data.

1. Isabella is right because the different distributions of the data can cause error in the bar chart and the graph shows that the data is normally distributed without any outliers by utilising measurements of central tendency per se which is incorrect and there are also incomplete data with N/A.

1. The differences are that most female students chose <Labor= and male students chose mostly chose <Liberal= and more male students have a political preference between <Liberal= and <Labor= rather than female students. Also, more number of females voted rather than males.

Q

2. The explanatory variable is the type of high schools attended and it is categorical.

2. The response variable is the WAM, and it is quantitative.

2. It is an observational study because the responses are observed, the variables are measured, and no treatments are imposed as it is in an experiment.

2. The spread order is Australian public school, Australian private school, Australian selective school, and non-Australian high school. The shape of Australian public school is skewed to the left, Australian private school is symmetrical, and both Australian selective school and non-Australian high school is skewed to the right. Both Australian public school and Australian selective school have outliers at one end, while both Australian private school and non-Australian high school have no outlier. The outlier means that there was one unusual score compared to the other scores. The medians of Australian public school and Australian selective school are almost identical, while non- Australian high school have a lower median. Non-Australian high school have more spread than the rest of the data.

2. Confounding variables are variables that influence the independent and dependent variable in the data,

leading to an erroneous relationship between the two variables, which in this case is the curriculum of the school because it affects the WAM by the type of high school of the students, as we can see that students from Australian high schools that teaches the curriculum from New South Wales, or Victoria, or Queensland, etc. seem to be doing better in terms of WAM than students who did their high school in a non-Australian high school with a foreign curriculum.

3. The residual is a measure of the difference between the observed and the predicted value which in this context is the difference between Daniel9s current WAM and predicted WAM which is negative meaning that the predicted value is greater.

Gabriela Prasetiyo z

Q

4. The formula for confidence interval is

𝗃𝗃 = 𝗆± 𝗆 7

𝗆

:𝗅

We know that the mean(data$WAM) 𝗆 = 72.

And the standard deviation is sd(data$WAM) 𝗆 = 7.

𝗆 7 is the value from t(n-1) which is the degrees of freedom (df)

n = 108

therefore, we use 𝗆 7 = 𝗆(𝗅 2 1) = t(108-1) 𝗆 7 = 𝗆(107)

Since it is a 95% confidence interval the formula would be

Quantile = 0 + (

)

= 0.

And in Rstudio it would be qt(0,df) qt(0, df= 107) = 1.

Then, we put it in the confidence interval formula for the lower bound and the upper bound

𝗃𝗃 = [72 2 1 ×

7. : 108

,72 + 1 ×

7. : 108

]

Therefore, the confidence interval for the dataset is = (71, 73)

4. From the confidence interval in 4. we can find that the margin of error is

𝗆 7

𝗆

:𝗅

1 ×

7. : 108

= 1.

= 1 5 (4 𝗆.𝗅.)

Due to the removal of incomplete surveys, the non-response in this case won't have an impact on the margin of error.

4. We may determine the true mean of a sample by utilising the sample mean and standard deviation rather than the confidence interval, which only provides the lower and upper bounds. The statement is untrue as a result.

The confidence interval is not a measure of probability, as it is stated in the question. Because its mean is either within the limits or it is beyond them, unlike sample means, whose position within the limitations cannot be expressed as a probability, the actual mean is a fixed parameter that does not vary from sample to sample.

Gabriela Prasetiyo z

P-value: 𝗄𐀀𐀀𐀀𐀀(𝗄 g 𝗆)

P(𝗄 g 1)= 1 2 𝗄(𝗄 <1) The null distribution is 𝗆(𝗅 2 1)

𝗆(108 2 1) = t(107) On Rstudio the P-value would be 1-pt(1, df=107) = 0. = 0 (4 𝗆.𝗅.)

Therefore, we can conclude that there is little to no evidence that there was an increase in the students9 stress levels at Randwick university during Covid compared to the pre-pandemic situation.

The observations are independent for the distribution.
The distribution of each random variable is normal with the same mean 𝗰 and standard deviation 𝗰.

The two aforementioned conditions were satisfied since the dataset was generated at random, and because each variable had an equal chance of being chosen, the difference between the stress levels at Randwick university during Covid compared to the pre-pandemic are independent, as we can also in the quantile plot.

Was this document helpful?

Math1041 assignment pdf

Course: Statistics for Life and Social Science (MATH1041)

183 Documents

Students shared 183 documents in this course

University: University of New South Wales

Was this document helpful?

Gabriela Prasetiyo z5391941

1.a. The following RStudio codes are used to determine the ATAR's Standard Deviation.

ATAR1<- as.numeric(unlist(data$ATAR))

ATAR2<- ATAR1[!is.na(ATAR1)]

sd(ATAR2)

�㗰 = 14.73493

�㗰 = 14.73)(4)�㗆. �㗅. )

1.b. The following RStudio formula and code are used to determine Daniel9s Z-score.

�㗆 = �㗆 2 �㗰

�㗰

z_score<- (92.7-mean(ATAR2))/sd(ATAR2)

z_score

�㗆 2 �㗆�㗅�㗅�㗅�㗅 = 0.5847759

�㗆 2 �㗆�㗅�㗅�㗅�㗅 = 0.5848)(4)�㗆. �㗅. )

The number of standard deviations (

�㗰

) x is from the mean is represented by its z-score if it is an observation

from a normal distribution with a mean (

�㗰

) and standard deviation (

�㗰

). As a result, Daniel's ATAR's Z-score is

positive, indicating that x is 0.5847759 standard deviations to the right of the mean (

�㗰

1.c. Before computing any numerical summary, Daniel should have first completed the data filled in the NA

values through a method called imputation. The imputation that can be used in RStudio is central imputation. In

central imputation the centre value, which is the mode, median, or mean of the specified dataset to replace the

missing data.

1.d. Isabella is right because the different distributions of the data can cause error in the bar chart and the graph

shows that the data is normally distributed without any outliers by utilising measurements of central tendency

per se which is incorrect and there are also incomplete data with N/A.

1.e.

1.f. The differences are that most female students chose <Labor= and male students chose mostly chose

<Liberal= and more male students have a political preference between <Liberal= and <Labor= rather than female

students.

Also, more number of females voted rather than males.

Math1041 assignment pdf

Statistics for Life and Social Science (MATH1041)

University of New South Wales

Recommended for you

Comments

Students also viewed

Related documents

Preview text

Q

𝗆 =

𝗆 2 𝗰

𝗰

Q

2. Confounding variables are variables that influence the independent and dependent variable in the data,

Q

𝗆

:𝗅

)

= 0.

𝗃𝗃 = [72 2 1 ×

7.

: 108

,72 + 1 ×

7.

: 108

]

𝗆 7

𝗆

:𝗅

1 ×

7.

: 108

= 1.

= 1 5 (4 𝗆.𝗅.)

Math1041 assignment pdf

Course: Statistics for Life and Social Science (MATH1041)

University: University of New South Wales

Recommended for you

Students also viewed

Related documents