This paper is a great find. Not the least because the argument (statistics versus data science) was already in full swing 50 years ago.
I have no problem with predictive modelling, but it is a different task. And it does seem that the emphasis on ML has obscured the value (in a pendulum swing from the days of Tukey) on importance of understanding the generative model. From Donoho …
Predictive modeling is effectively silent about the underlying mechanism generating the data, and allows for many different predictive algorithms, preferring to discuss only accuracy of prediction made by different algorithm on various datasets. The relatively recent discipline of machine learning, often sitting within computer science departments, is identified by Breiman as the epicenter of the predictive modeling culture.
I like to think that our lab is pulling hard at the pendulum. That we care massively about the underlying mechanism. That for me is the ‘science’ in ‘data science’. Science because when right it tells us something about how the world works, not just how it will be. The difference between a super accurate weather forecast, and understanding the principles of the atmosphere and the climate. None of that devalues predictive modelling, but these are separate activities.