Median Value and Median range in Data Science
Median value in Data Science:
Median is another popular statistical technique to calculate central tendency in data science. We have discussed Mean value in earlier section.
This is mostly used for Ordinal attribute type, however, this can be used for Numerical attribute as well. You can learn about attribute types here.
Median is the value that separate the higher half of the data set from lower half.
To calculate Median : Sort the data and then take the middle value.
Suppose N = Number of data.
If N is odd number, then
Median = Middle value.
If N is even number, then
Median = Average of two middle most value.
Example:
Given data : 30, 36,47,50,52,56,60,63,70,70,110
Data is already sorted.
N = Number of data = 12 (Even number)
Median = Average of two middle most value.
Median = (52 + 56)/2 = 54.
Since, Median involves sorting of data, so it is very expensive. Especially, if you have millions of records, then Median is very expensive.
You can approximate the Median value, by grouping the values in interval or range, then calculate Median of interval.
Median for interval data or range data:
Suppose you have to calculate Median of salary of employee of very large organisation. You can create range of salary like $1000 0- $20000, $20001-$30000, and so on.
You can approximate the Median by using the below formula:
M = Median range
L = Lower boundary of the median class
m = cumulative frequency of median class
C = Length of class interval of median class
N/2 = Median Item
Example: Calculate approximate median age from below data in the range of age group 20 to 50.
Calculate cumulative frequency:
L = 20, F = 1500, m = 950, C = 50-20 = 30, N/2 = 3194/2
M = 20 + {((3194/2) - 950)/1500} * 30
M = 32.94
So, median range for age range 20-50 = 32.94.
Comments
Post a Comment