Computer Vision (1)
Generative AI (2)
Machine Learning Basics (18)
Deep Learning (52)
- DL Basics (16)
- DL Architectures (17)
  - Feedforward Network / MLP (2)
  - Sequence models (6)
  - Transformers (9)
- DL Training and Optimization (17)
Natural Language Processing (27)
- NLP Data Preparation (18)
Supervised Learning (115)
- Regression (41)
  - Linear Regression (26)
  - Generalized Linear Models (9)
  - Regularization (6)
- Classification (70)
  - Logistic Regression (10)
  - Support Vector Machine (9)
  - Ensemble Learning (24)
  - Other Classification Models (9)
  - Classification Evaluations (9)
Unsupervised Learning (55)
- Clustering (37)
  - Distance Measures (9)
  - K-Means Clustering (9)
  - Hierarchical Clustering (3)
  - Gaussian Mixture Models (5)
  - Clustering Evaluations (6)
- Dimensionality Reduction (9)
Statistics (34)
Data Preparation (35)
- Feature Engineering (30)
- Sampling Techniques (5)

How can categorical predictors be incorporated in linear regression?

Updated: March 12, 2024

To incorporate categorical features, also known as qualitative predictors, into a linear regression model, they must first be converted into a numerical format.

There are two types of categorical features:

Nominal features: This corresponds to features where there is no inherent ordering between different values. For example: gender, color etc. These features can be represented using one hot encoding, and dummy encoding techniques. The primary disadvantage of using nominal features is the explosion of the feature set, especially if the number of unique possible values is large.
Ordinal features: These are categorical features that have an inherent ordering to them. For example: the size of a t-shirt (Small/Medium/Large). These features can be represented using Ordinal encoding. The disadvantage of using Ordinal features is that the difference between two representative integers might not be a true representation of the ordinal difference between the raw features.

There are three common approaches to deal with categorical or qualitative predictors: (a) Dummy encoding, (b) One-hot encoding, and (c) Ordinal encoding

Dummy Encoding

The classical approach to dealing with nominal categorical or qualitative predictors is to use dummy encoding to numerically represent the different levels of that feature using binary 1’s and 0’s. If a predictor has k categories, k-1 dummy variables are needed to uniquely represent that attribute in the model. As one level can be represented when all of the dummy variables are set to 0, there should be one less dummy variable compared to the total number of unique categories of the variable to avoid redundancy.

The table below depicts Dummy encoding using an Auto Loan example, where there are three types of loan: New Auto, Used Auto, and Signature. Dummy encoding would produce the following transformation of data, creating two new Dummy variables and the third category (‘Signature’ in this example) is inferred when the value of the other two categories is 0. ‘Signature’ category is sometimes also called the reference level.

LoanID	Loan Type (Original)	New Auto (Dummy variable)	Used Auto (Dummy variable)	Explanation
L1	New Auto	1	0	The value of dummy variable New Auto is 1 and that of Used Auto is 0
L2	Used Auto	0	1	The value of dummy variable New Auto is 0 and that of Used Auto is 1
L3	Used Auto	0	1	Same as L2
L4	Signature	0	0	The value of both dummy variables New Auto and Used Auto is 0
L5	New Auto	1	0	Same as L1

An example for Dummy Encoding, Source: AIML.com

One-hot Encoding

A similar but slightly different technique from dummy encoding is one-hot encoding used for nominal categorical features. In this approach, assuming the same scenario of a predictor with k categories, k different binary columns are created, where each takes on the value of 1 if the original value of that observation belongs to that particular category and 0 otherwise. For the same dataset, one-hot encoding would transform the loan type variable as follows:

LoanID	Loan Type (Original)	New Auto (One-hot variable)	Used Auto (One-hot variable)	Signature (One-hot variable)	Explanation
L1	New Auto	1	0	0	The value of one-hot variable New Auto is 1 and others are 0
L2	Used Auto	0	1	0	The value of one-hot variable Used Auto is 1 and others are 0
L3	Used Auto	0	1	0	Same as L2
L4	Signature	0	0	1	The value of one-hot variable Signature is 1 and others are 0. In Dummy encoding, 'Signature' does not appear as a separate variable
L5	New Auto	1	0	0	Same as L1

An example for One-hot Encoding, Source: AIML.com

In one-hot encoding, each category has a separate regression coefficient in the model. This is in contrast to dummy encoding, where only k-1 levels have coefficients representing the effect of that level. In either dummy or one-hot encoding, if an observation contains missing values for the original variable, it would be represented with a value of 0 for all columns created as a result of using either transformation technique, since a value of 1 only appears to indicate the presence of the original value.

Ordinal Encoding

When dealing with ordinal categorical features, where there is a natural and consistently-spaced ordering to the levels of a categorical variable, such as temperature being recorded on a scale of low, medium, or high, it might make sense to map its values to the integers values 1, 2, and 3, respectively. The transformation of such a variable would look like the following:

ObservationID	Temperature (Original)	Temperature_coded (Ordinal variable)
O1	Medium	2
O2	High	3
O3	High	3
O4	Low	1
O5	Medium	2

An example for Ordinal Encoding, Source: AIML.com

It should be clearly noted that if there is no intrinsic order to the variable’s categories, this would not be a viable approach. It also might be questionable if the original categories are spaced at different intervals apart. For example, if Low represented 0 degrees, Medium 10 degrees, and High 50 degrees, it might be difficult to preserve the practical meaning of that spacing after an ordinal transformation, thus possibly leading to information loss. If it is decided not to use ordinal encoding, dummy encoding would probably be more suitable than one-hot encoding, as the Low category might naturally lend itself to the reference level, and the order would still be preserved among the three levels.

Author

AIML.com

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment (Cancel Reply)

You must be logged in to post a comment.

Partner Ad

Join us on:

Find out all the ways that you can

Contribute