This repository contains the code and analysis for a writing assignment in CS328. This course is taken by Prof. Anirban Dasgupta at IITGN. It explores key trends in machine learning research over the past decade using publication data from ICML, NeurIPS, and ICLR.
➤ Paper: writing_assignment_cs328.pdf
hypothesis/– Contains notebooks evaluating each of the four hypothesesds-summary.ipynb– Exploratory data analysis used to support the insightsREADME.md– Project overview.gitignore– Standard ignore file- 'writing_assignment_cs328.pdf' - main submission file.
-
Growth of Industry Research
Hypothesis: Over the years 2017 through 2024, the number of ICML publications authored by industry-affiliated researchers has steadily increased, and by 2024, it has reached approximately one-fourth the number of publications authored by academia-affiliated researchers.
➤ Notebook:hypothesis/h1.ipynb -
Topic Evolution in NeurIPS Abstracts
Hypothesis: The most common words in the abstracts of NeurIPS papers from 2017–2019 show a noticeable evolution compared to those from 2014–2016, reflecting the rapidly changing landscape of machine learning research in just a few years. While earlier abstracts (2014–2016) commonly referenced statistical/probabilistic learning, clustering, and kernel methods, the later years (2017–2019) exhibit a stronger emphasis on deep learning architectures, reinforcement learning, and emerging trends such as attention mechanisms and adversarial robustness.
➤ Notebook:hypothesis/h2.ipynb -
Interdisciplinary Influence in ML
Hypothesis: The frequency of interdisciplinary terms from domains such as biology, physics, and neuroscience in titles has increased from 2018 to 2024 in ICLR, NeurIPS, and ICML, indicating growing cross-domain influence in ML research.
➤ Notebook:hypothesis/h4.ipynb -
Regional and Institutional Research Focus
Hypothesis: Between 2018 and 2024, paper titles from US, China, and Corporate affiliations in ICLR and ICML disproportionately emphasize different machine learning subfields, revealing regional and institutional specialization in research focus. Specifically, titles from US institutions are more likely to highlight areas such as Fairness, Causal Inference, and Graph Learning; Chinese institutions tend to focus on Federated Learning, Semi-supervised Learning, and Adversarial Attacks; while Corporate-affiliated papers (e.g., from Google, Meta, Microsoft) emphasize topics like Large Language Models, Self-supervised Learning, and Optimization.
➤ Notebook:hypothesis/h5.ipynb
This repository is for academic purposes only.