Information
AI Chat

How To Lie With Statistics Summary - How to Read a Paper

How to Lie With Statistics 48205

Course

StuDocu Summary Library EN

999+ Documents

Students shared 1436 documents in this course

University

Studocu University - USA

Academic year: 2017/2018

Listed bookHow to Lie With Statistics

AuthorDarrell Huff

Uploaded by:

Anonymous Student

This document has been uploaded by a student, just like you, who decided to remain anonymous.

Università degli Studi di Torino

Recommended for you

5
The Lexus and the Olive Tree - The Descent of Man
StuDocu Summary Library EN
Summaries
100% (41)
7
Gender and the politics of history summary
StuDocu Summary Library EN
Summaries
100% (41)
7
Playing Lev Manovich - Summary The Language of New Media
StuDocu Summary Library EN
Summaries
100% (33)
17
Macbeth Notes including-summary-1kbn64n
StuDocu Summary Library EN
Summaries
100% (27)
8
R.W. Connell – Masculinities - summary (Chapters 1-5) - Doing Gender
StuDocu Summary Library EN
Summaries
97% (133)

Comments

Please sign in or register to post comments.

Preview text

How To Lie With Statistics Summary

The Sample with the Built-in Bias

Response Bias: Tendency for people to over- or under-state the truth Non-response: People who complete surveys are systematically different from those who fail to respond. Accessibility/Pride. Representative Sample: One where all sources of bias have been removed. (Literary Digest) Questionnaire wording/Interviewer effects Recall Bias: Tendency for one group to remember prior exposure in retrospective studies The sample with a built-in bias : the origin of the statistics problems - the sample. Any statistic is based on some sample (because the whole population can't be tested) and every sample has some sort of bias, even if the person wanting the statistic tries hard to not create any. The built-in bias comes from the respondents not replying honestly, the market researcher picking a sample that gives better numbers, personal biases based on the respondent's perception of the market researcher, data not being available at a certain past time are a few of the biases that creep in when building a statistic. One of the example (from the 1950s) that the author mentions is a readership survey of two magazines. Respondents were asked which magazine they read the most - Harpers or True love story. Most respondents came back that they read the True Love Story, but that publisher's figures came back that the True Love Story had a much higher circulation than Harpers - refuting the results from the sampling. The reason for this discrepancy - people were not willing to respond due to their own bias. As Dr says - Everybody Lies! Summary of the chapter - given any statistic, question the sample that was taken. Assume that there is always a bias in the sample

The Well-Chosen Average

Arithmetic Mean: Evenly distributes the total among individuals. Can be unrepresentative when measurements are highly skewed right. (e. per capita income) Median: Value dividing distribution into two equal parts. 50th percentile. (e. median household income) Mode: Most frequently observed outcome (rarely reported with numeric data) The well-chosen average: how not qualifying an average can change the meaning of the data. Before I delve into this, quickly, when I say, average - what comes to your mind? Sum(x1...) / N - right? The arithmetic mean. But I said average, not arithmetic average did I? Not many people know that there are 3 averages Arithmetic average / mean - sum of quantities / number of quantities Median - the middle point of the data which separates the data, the midpoint when data is sorted Mode - the data point that occurs the most in a given set of data And when someone says average, leaving it unqualified, there is a lot of room for juggling. The author mentions a very simple example. If an organization publishes a statistic that the average pay of the employees is $1000, what does this mean? This makes most of us think that almost everyone makes around $2000 - the reader thinks it is the median. But, the corporation can be talking about an arithmetic mean, where the boss might be earning say $10,500 and the rest of the 19 employees earn $500 each - the arithmetic average. Just by not qualifying the average the published fact can be completely twisted out of form from the real facts way out - always ask what is the kind of the average that someone is talking about.

The Little Figures That Are Not There

Small samples: Estimators with large standard errors, can provide seemingly very strong effects Low incidence rates: Need very large samples for meaningful estimates of low frequency events Significance levels/margins of error: Measures of the strength and precision of inference Ranges: Report ranges or standard deviations along with means (e. &quot;normal&quot; ranges) Inferring among individuals versus populations Clearly label chart axes The little figures that are not there: This chapter is about how the sample data is picked up in a way to prove the results - something we are all too aware in marketing campaigns. And picking the sample data right can mean picking a sample size that gives the kind of results we are looking for or a smaller number of trials. The author demonstrates this with a very important issue for parents - is my normal or not. The author talks about the 'Gesell Norms', where Dr Gesell stated that most kids sit erect by the age of two. This immediately translates to a parent trying to think about his/her kid and deciding whether the kid is normal or not. What is missing in this case is, that, from the source of the information (the research) to the Sunday paper where a parent read this, the average has been changed from a range to an exact figure. If the writer of the Sunday magazine article mentioned to the reader that there is a range of age in which a child sits erect, the reader is assuaged and that is where the little figures disappear. The way out - ask if the information presented is a discrete quantity or if there is a range involved.

Much Ado about Practically Nothing

Probable Error: Estimation error with probability 0. If estimator is approximately normal, PE is approximately 0 standard errors. (Old school) Margin of Error: Estimation error with probability 0. If estimator is approximately normal, PE is approximately 2 standard errors Clinical (practical) significance: In very large samples an effect may be significant statistically, but not in a practical sense. Report confidence intervals as well as P-values. Much ado about practically nothing: This little chapter is about errors in measurement. There are two measures for measuring error - Probable Error and Standard Error. The probable error measures the error in the measurement based on how much off is your measurement device. For example, if you were using a measuring scale that is 3 inches off a foot, then your measurement across trials is +/- 3. This kind of difference becomes important when there are business decisions taken based on a positive or negative result.

The Gee- Whiz Graph

The Gee-Whiz graph: This one is something that we see quite often. How to manipulate a graph so that it shows an inflated / deflated picture (based on what you are plotting on the graph). Some tricks include - miss out the measure of the axis, don't label the axis leaving only numbers and hence letting the reader make his/her own assumptions.

The One-Dimensional Picture

The one-dimensional picture: This one is an interesting trick. The trick here is use some sort of symbol

a money bag, a factory symbol things like that on the graph. So, when measuring the growth of, say the factory, increase the size of the factory image - and increase it across all the dimensions. An example - you try to display the difference in pay-scale. If it were a bar-chart, you'd have one bar with a measure of (say) 10 and another one of (say) 30. So the 1:3 ratio is clear when you see the bar chart. Now picture a money bag of similar proportions - one with a money bag of size 1 and the other

Was this document helpful?

How To Lie With Statistics Summary - How to Read a Paper

Course: StuDocu Summary Library EN

999+ Documents

Students shared 1436 documents in this course

University: Studocu University - USA

Was this document helpful?

How To Lie With Statistics Summary

1. The Sample with the Built-in Bias

Response Bias: Tendency for people to over- or under-state the truth

Non-response: People who complete surveys are systematically different from those who fail to

respond. Accessibility/Pride.

Representative Sample: One where all sources of bias have been removed. (Literary Digest)

Questionnaire wording/Interviewer effects

Recall Bias: Tendency for one group to remember prior exposure in retrospective studies

The sample with a built-in bias : the origin of the statistics problems - the sample. Any statistic is

based on some sample (because the whole population can't be tested) and every sample has some

sort of bias, even if the person wanting the statistic tries hard to not create any. The built-in bias

comes from the respondents not replying honestly, the market researcher picking a sample that gives

better numbers, personal biases based on the respondent's perception of the market researcher, data

not being available at a certain past time are a few of the biases that creep in when building a

statistic. One of the example (from the 1950s) that the author mentions is a readership survey of two

magazines. Respondents were asked which magazine they read the most - Harpers or True love story.

Most respondents came back that they read the True Love Story, but that publisher's figures came

back that the True Love Story had a much higher circulation than Harpers - refuting the results from

the sampling. The reason for this discrepancy - people were not willing to respond due to their own

bias. As Dr.House says - Everybody Lies ! Summary of the chapter - given any statistic, question the

sample that was taken. Assume that there is always a bias in the sample

2. The Well-Chosen Average

Arithmetic Mean: Evenly distributes the total among individuals. Can be unrepresentative when

measurements are highly skewed right. (e.g. per capita income)

Median: Value dividing distribution into two equal parts. 50th percentile. (e.g. median household

income)

Mode: Most frequently observed outcome (rarely reported with numeric data)

The well-chosen average: how not qualifying an average can change the meaning of the data. Before I

delve into this, quickly, when I say, average - what comes to your mind? Sum(x1....xn) / N - right? The

arithmetic mean. But I said average, not arithmetic average did I? Not many people know that there

are 3 averages

Arithmetic average / mean - sum of quantities / number of quantities

Median - the middle point of the data which separates the data, the midpoint when data is sorted

Mode - the data point that occurs the most in a given set of data

And when someone says average, leaving it unqualified, there is a lot of room for juggling. The author

mentions a very simple example. If an organization publishes a statistic that the average pay of the

employees is $1000, what does this mean? This makes most of us think that almost everyone makes

around $2000 - the reader thinks it is the median. But, the corporation can be talking about an

arithmetic mean, where the boss might be earning say $10,500 and the rest of the 19 employees

earn $500 each - the arithmetic average. Just by not qualifying the average the published fact can be

completely twisted out of form from the real facts.The way out - always ask what is the kind of the

average that someone is talking about.

How To Lie With Statistics Summary - How to Read a Paper

StuDocu Summary Library EN

Studocu University - USA

Recommended for you

Comments

Students also viewed

Related documents

Preview text

How To Lie With Statistics Summary

How To Lie With Statistics Summary - How to Read a Paper

Course: StuDocu Summary Library EN

University: Studocu University - USA

Recommended for you

Students also viewed

Related documents