What is Tweedie Regression?

The tweedie distribution has a density that follows an exponential curve but has a large concentration of data points around 0. Analogous to the discrete case of Zero-Inflated Poisson regression, the Tweedie can be used in continuous data that has a lot of 0 data points.

A common use case of the Tweedie distribution is in modeling the pure premium of insurance claims, or total claim amount per exposure, which consists of both the frequency of claims (count data with many 0’s) and amount per claim (continuous, right-skewed data). One approach would be to separately model the frequency of claims using a Poisson-like approach and the amount portion using a Gamma-like approach and then multiplying the predictions together to model the pure premium.

However, the Tweedie distribution can also be used for such cases and removes the need for separately modeling the individual components using a different distribution. When performing a Tweedie regression, the user must specify a power parameter that represents the underlying target distribution, which can be tuned using cross validation. 

Author

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment

Partner Ad
Find out all the ways that you can
Contribute