Skip to document

Full notes introduction to statistics

whole course notes
Course

Introduction to statistics (cumt105)

48 Documents
Students shared 48 documents in this course
Academic year: 2017/2018
Uploaded by:
1Uploads
467upvotes

Comments

Please sign in or register to post comments.
  • Student
    Easy to follow
  • Student
    Thanks
  • ai
    its a nice document.
  • Student
    Thank you so much these notes are very helpful.
  • Student
    good work

Preview text

Contents 1 Introduction 1. Overview of Statistics . . . . . . . . . . . . . . . . . 1. Definition of terms . . . . . . . . . . . . . . . . . . 1. Sampling Techniques . . . . . . . . . . . . . . . . . 1. Probability Sampling methods . . . . . . . . . . . . 1.4. Simple Random Sampling . . . . . . . . . . 1.4. Systematic Random Sampling . . . . . . . . 1.4. Stratified Sampling . . . . . . . . . . . . . . 1.4. Cluster Sampling . . . . . . . . . . . . . . . 1. Non-probability sampling methods . . . . . . . . . 1.5. Convinience or Availability . . . . . . . . . 1.5. Quota / Proportionate . . . . . . . . . . . . . 1.5. Expert or Judgemental . . . . . . . . . . . . 1.5. Chain referral / Snowballing / Networking 1. Errors in sampling . . . . . . . . . . . . . . . . . . 1. Data Collection Methods . . . . . . . . . . . . . . . 1.7. Observation . . . . . . . . . . . . . . . . . . 1.7. Interview . . . . . . . . . . . . . . . . . . . . 1.7. Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 4 5 5 6 6 7 8 8 8 8 9 9 10 10 10 11 2 Data and Data Presentation 2. Introduction . . . . . . . . . . . . . . . 2. Data Types . . . . . . . . . . . . . . . . 2.2. Qualitative random variables . 2.2. Quantitative random variables 2. Data sources . . . . . . . . . . . . . . . 2. Data presentation . . . . . . . . . . . . 2.4. Pie Charts . . . . . . . . . . . . 2.4. Bar Chart . . . . . . . . . . . . 2.4. Histograms . . . . . . . . . . . . 2.4. Stem and leaf diagram . . . . . 2.4. Frequency Polygons . . . . . . . 2. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 1 1 2 6 8 8 9 9 10 11 11 3 Measures of Central Tendency 3. Introduction . . . . . . . . . . . . 3. Measures of Central Tendency . . 3. Arithmetic Mean . . . . . . . . . 3.3. Mean for ungrouped data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 13 13 13 14 . . . . 1 . . . . . . . . 2 CONTENTS 3. 3. 3. 3. 3. 3. 3.3. Mean for grouped data . . . . . . The Mode . . . . . . . . . . . . . . . . . . 3.4. Mode for ungrouped data . . . . . 3.4. Mode for grouped data . . . . . . The Median . . . . . . . . . . . . . . . . 3.5. Median for ungrouped data . . . 3.5. Median for grouped data . . . . . Quartiles . . . . . . . . . . . . . . . . . . 3.6. Quartiles for ungrouped data . . 3.6. Quartiles for grouped data . . . . 3.6. The second quartile, Q2 (Median) 3.6. The upper quartile, Q3 . . . . . . 3.6. Percentiles . . . . . . . . . . . . . Skewness . . . . . . . . . . . . . . . . . . Kurtosis . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . 4 Measures of Dispersion 4. Introduction . . . . . . 4. Range . . . . . . . . . . 4. Variance . . . . . . . . 4. Standard deviation . . 4. Coefficient of variation 4. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 15 15 16 16 17 17 18 19 19 20 20 21 21 22 22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 23 23 24 27 27 28 5 Basic Probability 5. Introduction . . . . . . . . . . . 5. Definition . . . . . . . . . . . . . 5. Approches to probability theory 5. Properties of probability . . . . 5. Basic probability concepts . . . 5. Types of events . . . . . . . . . 5. Laws of probability . . . . . . . 5. Types of probabilities . . . . . . 5. Contigency Tables . . . . . . . . 5.10 diagram . . . . . . . . . . . 5.11 rules . . . . . . . . . . 5.11. Multiplication Rule . . . 5.11. Permutations . . . . . . 5.11. Combinations . . . . . . 5.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 31 31 33 34 34 35 37 39 40 40 40 41 41 42 6 Probability Distributions 6. Introduction . . . . . . . . . . . . . . . . . . . . . 6. Definition . . . . . . . . . . . . . . . . . . . . . . . 6. Random variables . . . . . . . . . . . . . . . . . . 6. Discrete probability distribution . . . . . . . . . 6. Properties of discrete probability mass function 6. Probability terminology and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 45 45 45 46 47 48 . . . . . . . . . . . . . . . . . . . . . . . . 4 10 Index numbers 10.1 . . . . . . . . . . . . . . . . . . . . . . . 10.2 . . . . . . . . . . . . . . . . . . . . . . 10.3 is an Index Number? . . . . . . . . . . . . . . 10.3. Characteristics of an Index Numbers . . . . 10.3. Uses of Index Numbers . . . . . . . . . . . . 10.4 of Index Numbers . . . . . . . . . . . . . . . 10.5 of constructing index numbers . . . . . . 10.5. Aggregate Method . . . . . . . . . . . . . . . 10.5. Merits and demerits of this method . . . . . 10.5. Weighted Aggregates Index . . . . . . . . . 10.5. Laspeyres Method . . . . . . . . . . . . . . . 10.5. Merits and demerits of Laspeyres method? 10.5. Paasches Method . . . . . . . . . . . . . . . 10.5. Merits and Demerits of Paasches Index . . 10.6 Index . . . . . . . . . . . . . . . . . . . . . . CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 95 95 95 95 96 97 98 98 99 100 100 101 102 102 104 Chapter 1 Introduction 1. Overview of Statistics Statistics is when individual data values are collected, summarized,analysed and presented and used for decision making. It is an important tool in transforming raw data into meaning and usable information. Also statistics can be regarded as a decision support tool. A table below shows a transformation process of data to information. Input Process Output Data Statistical Analysis Information Raw observation Transformation process Useful, Usable and Meaningful An understanding of statistics allows managers to: i) Perform simple statistical analysis. ii) Intelligently prepare and interpret reports expressed in numerical terms. iii) Communicate effectively with statistical analysts. iv) Good decision making. 1. Definition of terms The following terms shall be used in this module more often. Statistics Definition 1 Statistics refers to the methodology [collection techniques] for collection, presentation and analysis of data and the use of such data [Neter J. et al (1988)]. Definition 2 In common usage, it refers to numerical data. This means an y collection of data or information constitutes what is referred to as Statistics. Some examples under this definition are: Introduction 3 Population A population is a collection of elements about which we wish to make an inference. The population must be clearly defined before the sample is taken. Target population The population whose properties are estimated via a sample or usually the ’total’ population. Sample A sample is a collection of sampling units drawn from a population. Data are obtained from the sample and are used to describe characteristics of the population. A sample can also be defined as a subset / part of or a fraction of a population. statistic(s) These are numeric measure(s) derived from a sample e. sample mean (¯ x), sample 2 variances (s ), and sample standard deviation (s). Sampling Frame A sampling frame is a list of sampling units. A set of information used to identify a sample population for statistical treatment. It includes a numerical identifier for each individual, plus other identifying information about characteristics of the individuals, to aid in analysis and allow for division into further frames for more in-depth analysis. Sampling. A process used in statistical analysis in which a predetermined number of observations will be taken from a larger population. The methodology used to sample from a larger population depend on the type of analysis being performed, but includes simple random sampling, systematic sampling and observational sampling. These will be discussed later. Sampling Units Sampling units are non-overlapping collections of elements from the population that cover the entire population. It is a member of both the sampling frame and of the sample. The sampling units partition the population of interest for example households or individual persons for census. 4 1. Introduction Sampling Techniques We do explore the sampling techniques in order to be able to decide which one is the most appropriate for each given situation. Sampling techniques are methods of how data can be collected from the given population. Types of Sampling Probability Sampling Has a distinguishing characteristic that each unit in the population has a known, nonzero probability of being included in the sample thus, it is clear that every subject or unit has an equal chance of being selected from the population. These probabilities are usually equal. It eliminates the danger of being biased in the selection process due to one’s own opinions or desires. Non-probability Sampling Is a process where probabilities cannot be assigned to the units objectively, and hence it becomes difficult to determine the reliability of the sample results in terms of probability. A sample is selected according to one’s convenience, or generality in nature. It is a good technique for pilot or feasibility studies. Examples include purposive sampling, convenience sampling, and quota sampling. In non-probability sampling, the units that make up the sample are collected with no specific probability structure in mind e. units making up the sample through volunteering. Remark: We shall focus on probability sampling because if an appropriate technique is chosen, then it assures sample representativeness and hence the errors for the sampling can be estimated. Reasons to use Sampling Sampling is done mostly for reasons of Cost, Time, Accessibility, Utility and Speed. Expansion on the reasons is left for the lecture. Some points to clearly define when sampling. Sampling method to be employed. Sample size Reliability degree of the conclusions that we can obtain i. an estimation of the error that we are going to have. An inappropriate selection of the elements of the sample 6 Introduction Illustration An example of simple random sampling may include writing each member of the population on a piece of paper and putting in a hat. Selecting the sample from the hat is random and each member of the population has an equal chance of being selected. However, this approach is not feasible for large populations, but can be completed easily if the population is very small. 1.4. Systematic Random Sampling Selection of sampling units is done in sequences separated on lists by the interval selection. In this method, every nth element from the list is selected as the sample, starting with a sample element n randomly selected from the first k elements. For example, if the population has 1000 elements and a sample size of 100 is needed, then k would be 1000 100 = 10. Now, if the number 7 is randomly selected from the first ten elements on the list, then the sample would continue down the list selecting the 7th element from subsequent groups of ten elements. Care must be taken when using systematic sampling to ensure that the original population list has not been ordered in a way that introduces any non-random factors into the sampling. Illustration An example of systematic sampling would be if an official from the Academic Registry of a hypothetical university is to register students for a tour of regional universities. The official may select at random the 15th student out of the first 20 students in a list of all students in the university. This official would then keep adding twenty and selecting the 35th student, 55th student, 75th student and so on to register for the tour of regional universities until the end of the list is reached. Remark: In cases where the population is large and the population list is available, systematic sampling is usually preferred over simple random sampling since it is more convenient to the experimenter. 1.4. Stratified Sampling It is used when representatives from each homogeneous subgroup within the population need to be represented in the sample. The first step in stratified sampling is to divide the population into subgroups (strata) based on mutually exclusive criteria. Random or systematic samples are then taken from each subgroup. The sampling fraction for each subgroup may be taken in the same proportion as the subgroup has in the population. Introduction 7 Illustration As an example, if an owner of a local supermarket conducting a customer satisfaction survey may wish to select random customers from each customer type in proportion to the number of customers of that type in the population. Suppose 40 sample units are to be selected, and 10% of the customers are managers, 60% are users, 25% are operators and 5% are students from CUT, then 4 managers, 24 users, 10 operators and 2 students from CUT would be randomly selected. Remark: Stratified sampling can also sample an equal number of items from each subgroup. 1.4. Cluster Sampling In cluster sampling, the population that is being sampled is divided into naturally occurring groups called clusters. A cluster is as heterogeneous as possible to matching the population clusters which says that a cluster is representative of the population. A random sample is then taken from within one or more selected. Illustration An organization with 300 small branches providing a service country wide has an employee at the HQ who is interested in auditing for compliance to some coding standard. The employee might use cluster sampling to randomly select 40 branches as representatives for the audit and then randomly sample coding systems for auditing from just the 40. Remark: Cluster sampling can tell us a lot about that particular cluster, but unless the clusters are selected randomly and a lot of clusters are sampled, generalizations cannot always be made about the entire population. Difference between a Cluster & a Stratum A cluster is a heterogeneous subgroups but a stratum is a homogeneous subgroup. A summary of probability sampling methods is discussed below. Simple random Introduction 9 that some population members have better or more information than others. Or some members are more representative than others. 1.5. Chain referral / Snowballing / Networking The researcher starts with a person who displays qualities of interest then refers to the next and so on. 1. Errors in sampling During sampling, errors can be committed by the statistician. These are either sampling or non-sampling errors. Errors can be corrected by sampling without bias. Some common sources of bias are i) incorrect sampling operation, non-interviews. Some errors that arises in sampling are discussed below. Selection error Selection error occurs when some elements of the population have a higher probability of being selected than others. Consider a scenario where a manager of a local supermarket wishes to measure how satisfied his customers. He proceeds to interview some of them from 08:00 to 12:00. Clearly, the customers who do their shopping in the afternoon are left out and will not be represented making the sample unrepresentative of all the customers. Such kind of errors can be avoided by choosing the sample so that all the customers have the same probability of being selected. This is a sampling error. Non-Response Error It is possible that some of the elements of the population do not want or cannot answer certain questions. It may also happen when we have a questionnaire including personal questions, that some of the members of the population do not answer honestly or would rather avoid answering. These errors are generally very complicated to avoid, but in case that we want to check honesty in answers, we can include some questions called filter questions to detect if the answers are honest. This is a non-sampling error. Interviewer influence The interviewer may fail to be impartial i. s/he can promote some answers more than others. Remark: A sample that is not representative of the population is called a biased sample. 10 Introduction Questions relating to selecting out of naturally arise. These are: When concluding about the population, how many of the population elements is represented by each one of the sample elements? What proportion of the population are we selecting? The responses lie in the following factors. 1. Data Collection Methods The three data collection methods are: Observation,Interview and Experimentation. Depending on the type of research and data to be collected, different methods can be used to collect that data set. 1.7. Observation This method has the direct and desk research methods. Direct observation involves collecting data by observing the item in action. Examples for this method are: pedestrian flow, vehicle traffic, purchase behavior of a commodity in a shop, quality control inspection e.t. An advantage of this method is that the respondent behaves in a natural way since he is not aware that he is being observed. A disadvantage is that it is a passive form of data collection. Also there is no opportunity to investigate the behavior further. Desk research involves consulting and extracting secondary data from source documents and collect data from them. 1.7. Interview This method collects primary data through direct questioning. A questionnaire is the instrument used to structure the data collection process. Three approaches in data collection using interviews are: personal, postal and telephone interviews. Personal Interviews A questionnaire is completed through face-to-face contact with the respondent. Advantages for this method are: High response rate, it allows probing for reasons, data collection is immediate, data accuracy is assured, useful for technical data is required, non-verbal responses can be observed and noted, more questions can be asked, responses are spontaneous, and use of aided-recall questions is possible. Disadvantage of this method are that it is time consuming, it requires trained interviewers, fewer interviews are conducted because of cost and time constraints, biased data can be collected if interviewer is inexperienced. 12 Introduction Chapter 2 Data and Data Presentation 2. Introduction A Statistician collects data (in an appropriate manner) analyses it using statistical techniques, interprets the results and makes conclusions and recommendations on the basis of data analysis. The word data keeps turning in our discussion. Data is the ”blood of statistics”. The world of statistics resolves around data, there is no statistics without data. What is data? How is it collected? Why do we collect it? These are the questions to be answered in this chapter. 2. Data Types An understanding of nature of data is necessary for 2 reasons. It enables a user: assess data quality and to select the appropriate statistical method to use to analyse the data. Quality of data is influenced by three factors that are: type, source and method used to collect data. The type of data gathered determines the type of analysis which can be performed on the data. Certain statistical methods are valid for certain data types only. An incorrect application of a statistical method to a particular data type can render the findings invalid. Data type is determined by the nature of the random variables which the data represents. Random variables are essentially of two kinds that are Qualitative and Quantitative. 2.2. Qualitative random variables These are variables which yield categorical (non-numeric) responses. The data generated by qualitative random variables are classified into one of a number of categories. 3 Data and Data Presentation Examples of nominal-scaled data Table below shows examples of nominal scaled data. Qualitative Random Variables Gender Car type owned City leaved in Marital Status Engineering Profession Response Categories Male / Female Mazda/Golf/Toyota/Honda Harare/Byo/Mutare/Gweru Married/Single/Divorced/Widow Civil/Electrical/Mechanical Data Code 1/2 1/2/3/4 1/2/3/4 1/2/3/4 1/2/3 Each observation of the random variables is assigned to only one of the categories provided. Arithmetic calculations cannot be meaningfully performed on the coded values assigned to each category. They are only numeric codes which are arbitrarily assigned and can be counted. Nominal-scaled data is the weakest form of data, since only a limited range of statistical analysis can be formed on such data. Ordinal-scaled data Objects or events are distinguished on the basis of the relative amounts of some characteristics they posses. The magnitude between measurements is not reflected in the rank. Such data is associated mainly with qualitative random variables. Like nominalscaled data, ordinal-scaled data is also assigned to only one of a number of coded categories, but there is now a ranking implied between the categories in terms of being better, bigger, longer, older, taller, or stronger, etc. While there is an implied difference between the categories, this difference cannot be measured exactly. That is, the distance between categories cannot be quantified nor assumed to be equal. Ordinalscaled data is generated from ranked responses in market research studies. Examples of Ordinal-scaled data Qualitative Random Variables T-Shirt size Company turnover Management levels Work experience Magazine type Sizes of bulbs Response Categories Small / Medium / Large Small / Medium / Large Lower / Middle / Senior Little / Moderate / Extensive Rank the top three magazine you often read Smallest / Small / Large / Largest Data Codes 1/2/3 1/2/3 1/2/3 1/2/3 1/2/3 1/2/3/4 There is a wider range of valid statistical methods (i. the area of non-parametric statistics) available for the analysis of ordinal-scaled data than there is for nominalscaled data. Ordinal-scaled data is also generated from a ”counting process”. 4 Introduction Interval-scaled data Interval-scaled data is associated with quantitative random variables. Differences can be measured between values of a quantitative random variable. Thus intervalscaled data possesses both order and distance properties. Interval-scaled data, however, does not possess an absolute origin. Therefore the ratio of values cannot be meaningfully compared for interval-scaled data. The absolute difference makes sense when interval-scaled data has been collected. Examples of Interval-scaled data Suppose four places A, B, C and D have temperatures 20o C, 25o C, 35o C and 40o C respectively. Using interval scale we see that the difference between A and B is equal to that of C and D. However ratios are not used. A value of 0o C does not mean absence of temperature, also it is not correct to say temperature of D is twice as much as that of A. Interval-scaled data is most often generated in marketing studies through rating responses on a continuum scale. A wide range of statistical techniques can be applied to interval scaled data as it posses numeric (measurement) properties. Ratio-scaled data This data is associated mainly with quantitative random variables. If the full range of arithmetic operations can be meaningfully performed on the observations of a random variable, the data associated with that random variable is termed ratio-scaled. It is a numeric data with a zero origin. The zero origin indicates the absence of the attribute being measured. Example 1 of Ratio-scaled data Quantitative Random Variable Age Income Distance Time Mass Price Example of data values 42 years $2,500 35 km 32 minutes 240g $7 Such data are the strongest form of statistical data which can be gathered and lends itself to the widest range of statistical methods. Ratio-scaled data can be manipulated meaningfully through normal arithmetic operations. Ratio-scaled data is

Was this document helpful?

Full notes introduction to statistics

Course: Introduction to statistics (cumt105)

48 Documents
Students shared 48 documents in this course
Was this document helpful?
Contents
1 Introduction 1
1.1. Overview of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. Definition of terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3. Sampling Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4. Probability Sampling methods . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1. Simple Random Sampling . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.2. Systematic Random Sampling . . . . . . . . . . . . . . . . . . . . . 6
1.4.3. Stratified Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.4. Cluster Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5. Non-probability sampling methods . . . . . . . . . . . . . . . . . . . . . . 8
1.5.1. Convinience or Availability . . . . . . . . . . . . . . . . . . . . . . 8
1.5.2. Quota / Proportionate . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5.3. Expert or Judgemental . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5.4. Chain referral / Snowballing / Networking . . . . . . . . . . . . . 9
1.6. Errors in sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7. Data Collection Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7.1. Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7.2. Interview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7.3. Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Data and Data Presentation 1
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2. Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2.1. Qualitative random variables . . . . . . . . . . . . . . . . . . . . . 1
2.2.2. Quantitative random variables . . . . . . . . . . . . . . . . . . . . 2
2.3. Data sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4. Data presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1. Pie Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.2. Bar Chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.3. Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.4. Stem and leaf diagram . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.5. Frequency Polygons . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5. Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Measures of Central Tendency 13
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2. Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3. Arithmetic Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3.1. Mean for ungrouped data . . . . . . . . . . . . . . . . . . . . . . . 14
1