Classical statistics tends to focus more on the process of inference, or learning from data, while in Machine Learning, prediction accuracy is usually more of interest. However, there is much overlap between statistics and machine learning. Obviously, both require data in order to conduct either inference or prediction, so in that sense, it is accurate to say learning happens in both “fields”.
The similarities and differences between the two can be illuminated by considering different use cases of a linear regression model. In a more classical statistical setting, a subject matter researcher might collect data and fit a regression to a data set in order to learn the relationships between the input features and target. If the data was collected as part of a study conducted in a specific scope, and there is no clear intent to collect new data in the future, the researcher might not be interested in how the model performs for prediction on holdout data. Thus, it might not be necessary to go through the cross validation process. However, quantifying significant relationships in an interpretable manner, and maybe even finding more complex insights such as interactions between predictors, is likely an important part of the conclusions drawn in this setting.
In the machine learning context, prediction accuracy is usually more of the aim than drawing interpretable conclusions. In the regression case, an elastic net model might be trained over a grid of hyper-parameters for the regularization coefficients and weights on the LASSO and Ridge components. Depending on the need for preserving a degree of interpretation, it is possible that the modeler will not know or even care about what coefficients were assigned to each predictor, providing the model is performing at an acceptable level. Instead, the focus might be on tuning the model to find the optimal setting of hyperparameters that maximize the performance metric most applicable to the business setting. However, underlying both settings is a compatible statistical framework that allows a model to be built to serve each purpose.
Deep learning is a subtopic within machine learning that seeks to automate the feature selection process in addition to the learning, thus attempting to eliminate human intervention at an earlier stage of the process than “non-deep” machine learning. As classical machine learning relies on humans to provide the features used to train a model, it is usually more suited to structured problems where feature engineering is domain specific and benefits from subject matter knowledge of the data. On the other hand, deep learning is better suited to unstructured data that would be difficult for humans to use their knowledge to extract features, such as text or image data.