Discuss Dummy encoding in the context of feature engineering

Dummy encoding: In order to represent a categorical variable in a machine learning model, it usually must be somehow coded numerically before it is used in the training of a model. The easiest way to do so is to create a series of dummy variables (k-1 dummies for a variable with k distinct categories) that take on a value of 1 when the original field is equal to the kth category and 0 otherwise. As one level can be represented when all of the dummy variables are set to 0, there should be one less dummy variable compared to the total number of unique categories of the variable to avoid redundancy. An illustration of dummy variable transformation is shown below, assuming the four categories for country are the only unique values present in the dataset. 

Author

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment

Partner Ad
Find out all the ways that you can
Contribute