What is a p-value, and what is its significance?

The p-value is an indicator of which terms in a regression model are statistically significant. The p-value is an important concept in classical statistics and basically translates to the probability of the observed result simply occurring by chance. For example, in detecting significance of regression coefficients, the setup is usually to test against the null hypothesis that a coefficient is 0, meaning there is no relationship between the predictor and response. A low p-value translates to there being a low likelihood of observing the result obtained if the true value for the coefficient is indeed 0, thus implying it is likely significant. On the other hand, if the p-value is large, the interpretation is that it is not rare for one to observe this result if the true value for the coefficient is actually 0. Thus, that variable is probably not significantly related to the target. Exactly what defines large and small is a somewhat ambiguous manner, as ?=0.05 is commonly used in most settings, but that doesn’t preclude other thresholds from being used instead. As ? translates to the probability of making a type I error, It is important to note that using a larger cutoff (i.e. 0.10) increases the chance of that occurring (falsely detecting significance) and using a smaller threshold (i.e. 0.01) makes it more likely to commit a type II error (reduced power), so that tradeoff must be considered when arriving at a threshold. 

Author

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment

Partner Ad
Find out all the ways that you can
Contribute