Basic Statistical Description of Data
Measure of Central Tendency and Dispersion of Data:
- We have learnt about data and different attributes of data in previous section.
- As a next step, we have to understand the different values of same attribute with he help of statistical description of data. That means, we have to carefully observe the values of each attributes of different records in data set.
- This will help us to find the variance between records, removing the noise and identifying the outliers. Removing of noise and identify the outliers are used during cleaning of data. This process is used in pre-processing of data.
- In order to do the cleaning of data, there are certain statistical operations. These operations can be applied on different types of attribute.
- I will provide the briefs of each operations in this section. I will explain in details of each statistical operations in subsequent sections.
- In broader level, there are two major statistical Operations used widely to identify properties of data:
- Measures of central tendency
- Dispersion of data
Measures of central tendency:
- Central tendency measures values of attribute and tells where do most of its values fall. It measures the location of the middle or centre of a data distribution. In layman terms, it provides the average of values.
- To find the central tendency, there are different techniques for different attribute types.
- Important techniques to measure central tendency:
Dispersion of data:
- Dispersion of data evaluates how are the data spread out? How far or near is the values of an attribute from its average value.
- This measure is useful to identify the outliers data.
- Important techniques to measure dispersion of data:
- Standard deviation
- Variance
- Box plot (range, quartiles and interquartile range)
Comments
Post a Comment