How does hinge loss differ from logistic loss?

Hinge loss adds an increased penalty to misclassifications that are off by a large amount, since the cost function increases linearly as the decision function output moves further away from the actual label. This property is one of the reasons SVM performs very well on many data sets, as it enables hyperplanes to find margins that result in the highest accuracy possible. As can be seen in the graphs above, hinge loss is non-differentiable, which means that the optimization problem is no longer convex. Logistic, or cross-entropy loss, does not suffer from such a problem and also allows for the computation of predicted probabilities rather than just class labels, which is why it is suitable for logistic regression. In practice, SVM is usually preferred to logistic regression if the decision boundary is non-linear or many variable transformations would be required, but if it is a simpler problem and direct probability estimates are desired, logistic regression might be the preferred choice. 

For information regarding less common loss functions used in classification, a reference is provided: (https://en.wikipedia.org/wiki/Loss_functions_for_classification)

Author

Help us improve this post by suggesting in comments below:

– modifications to the text, and infographics
– video resources that offer clear explanations for this question
– code snippets and case studies relevant to this concept
– online blogs, and research publications that are a “must read” on this topic

Leave the first comment

Partner Ad
Find out all the ways that you can
Contribute