The paper I wrote in Data Mining Seminar at TUM. It is a quick survey on the use and role of statistics in data mining. The abstract:
Data mining is a cross-disciplinary field at the intersection of computer science and statistics aiming to make valuable inferences and predictions from data. Statistical methods provide data mining with invaluable tools to understand and interpret the data better. This paper aims to present statistics in data mining with an overview of some of the use cases and concludes with a short comparison of these two fields. The emphasis is on classical tools of statistics such as estimation, sampling and hypothesis testing.