Understanding of Attribute types
Know your Data
What is data
It is set of records, having one or more attributes or features. It is also called collection of data objects.
Example of Data
Name | Income | Profession | Mother Tongue | Native Place |
Ram | 70000 | Doctor | Bengali | Village |
Shyam | 50000 | Carpenter | Hindi | Small Town |
Mohan | 60000 | Engineer | Hindi | Suburban |
Kabir | 90000 | Doctor | Bengali | Metropolitan |
Fig1.
As shown in Fig1, There are four rows and five columns.
Record: Each row is called record. There are four records in the example dataset.
Features or Attribute: Each column is called feature or attribute of a record in the dataset. There are five attributes for each record in example dataset.
Understanding of Data
- Knowing of data is very important in data science, whether you are doing Data Mining or applying Machine Learning Techniques on data.
- To understand the data you have to understand each attribute of a record.
- The following properties of numbers are typically used to describe attributes.
- Distinctness: = and !=
- Order: > and <
- Addition: + and -
- Multiplication: * and /
- Given these properties, we can define four types of attributes:
- Nominal: = and !=
- Ordinal: > and <
- Interval: + and -
- Raion: * and /
Nominal and Ordinal attributes are Categorical in nature, where you can perform qualitative analysis or operations. E.g. Comparison between values.
Interval and Ratio attributes are Numerical in nature, where you can perform quantitative analysis or mathematical operations. E.g. Addition, Multiplication.
Table in Fig3 is the summary of attribute types. It has description, Various statistical operations and example of each attribute types:
Fig3
In Fig1 (Example Data set):
‘Name’, ‘Profession’ and ‘Mother tongue’ attribute are Nominal attribute.
‘Income’ is Ratio attribute and ‘Native place’ is Ordinal attribute.
Nominal Attributes:
- Example of nominal attribute are Marital status or Eye colour of a persons. Possible values of Marital status are Married, Unmarried and Widows. Possible values of Eye colour are black, brown, blue.
- Representation of Nominal Attributes
- It is possible to represent Nominal attribute values with numerical values such 0 for married, 1 for unmarried and 2 for window. Another example is zip code, with all numeric values.
- In such cases numbers are not intended to be used quantitatively. That is mathematical operations are meaningless on nominal attributes.
- Binary Attributes:
- It is special type of nominal attribute, with only two categories or states: 0 or 1.
Example: Gender : Male or Female
- Marital status is NOT Binary attribute, as there would be more than two possible values, Married, Unmarried, Widow.
- Binary attributes are referred to as Boolean if the two states correspond to True or False.
- Symmetric Binary attributes: Both status are equally important. Example : Gender having the status Male and Female.
- Asymmetric Binary attributes: Both status are not equally important. Example, Positive result of medical test are more important than negative result.
Ordinal Attributes:
- Example of Ordinal attributes are grade of students or ratings of survey. Possible values of grade are A+, A, A-, B+, B, B-. Possible values of rating are 0: very dissatisfied, 1: neutral, 2: somewhat satisfied, 3: Satisfied, 4: very satisfied.
- Ordinal attributes describe a feature of an object without giving an actual size or quantity. You can have order or ranking among ordinal attributes. However, you can not have quantitative difference between them.
- Ordinal attributes may also be obtained from the discretisation of numeric quantities by dividing in several groups. Example: age range between 1 to 12 : Kids, 13 to 19: Teen, 20 and above: Adult.
Interval Attributes:
- Interval attributes are measured on a scale of equal-size units. Such attributes allow us to compare and quantify the difference between values.
- Example of interval attribute are calendar dates or Temprature. You can compare quantitatively for both calendar dates or temperature, but you can not have values in terms of ratio.
Ratio Attributes:
- Ratio attributes can be compared and value can be multiple (ratio) of another value.
- Example of ratio attributes are count (years of experience, number of words), height, mass, length.
Till now we have organised attributes into Categorical and Numerical types. There are many ways to organize attribute types.
In Machine Learning, attributes are seen as discrete or continuous . Each type may be processed differently.
Discrete Attribute: It has finite or countably infinite set of values, which may or may not be represented as integer.
Example : Eye colour, medical test, drink size, grade… each have finite number of values so they are discrete. Discrete attribute may have numerical values also, 0 and 1 for binary attribute or 1 to 110 for age. Similarly employee ID, zip number.
Continuous Attributes: If an attribute is not discrete is called continuous attribute. These attributes are typically represented as floating point variable.
Example : Temperature, weight, length…
Now, we know attribute types of data, what is next?
We have to preprocess the data and find out the relation between them. Also, we have to remove noise and identify outlier data from data set. There are certain statistical operations which will be performed on the data to preprocess it.
I will cover these operations in Next Section.
Comments
Post a Comment