Who’s Ahead in AI Research in 2020? Insights from the International Conference on Machine Learning (ICML 2020)
Please subscribe to our Twitter if you’d like to follow our Deep Tech research. 🤖
Introduction
The International Conference on Machine Learning (ICML) is one of the most prestigious AI research conferences (the other one is the Neural Information Processing Systems, or NeurIPS). In 2020, the acceptance rate at ICML was 21.8% (vs. 2019’s 22.6%) — a total of 1,088 papers out of 4,990 submissions got in. Using conference proceedings (ICML 2020), we went into each of the accepted papers and compiled a list of authors and their affiliated organizations, and then calculated the Publication Index for each organization (see “Methodology” section below). The most intuitive way to think of the Publication Index is from the point of view of full paper equivalents: Google’s Publication Index of 92.2 can be interpreted as if Google published 92.2 full papers at ICML 2020.
We will start this analysis with details on methodology, continue on to AI research rankings at ICML 2020, then show further interesting descriptive statistics, discuss the changes between ICML 2019 and ICML 2020, and finally conclude with a link to the dataset.
Methodology
The methodology of our Publication Index is inspired by the Nature Index:
To glean a country’s, a region’s or an institution’s contribution to an article, and to ensure they are not counted more than once, the Nature Index uses fractional count (FC), which takes into account the share of authorship on each article. The total FC available per article is 1, which is shared among all authors under the assumption that each contributed equally. For instance, an article with 10 authors means that each author receives an FC of 0.1. For authors who are affiliated with more than one institution, the author’s FC is then split equally between each institution. The total FC for an institution is calculated by summing the FC for individual affiliated authors. The process is similar for countries/regions, although complicated by the fact that some institutions have overseas labs that will be counted towards host country/region totals.
The only difference is that our Publication Index counts overseas labs towards the headquarters country/region (instead of the host country/region). This is a contentious point, but we believe that this approach better reflects the assignment of intellectual property and respective accrual of benefit to the headquarters, rather than the local lab.
Here is an example of the Publication Index calculation. If a paper has five authors — three from MIT, one from the University of Oxford, and one from Google — each author will get 1/5th of one point, or 0.2. As a result, from this paper alone, MIT will increase its Publication Index by 3*0.2=0.6 points, the University of Oxford will increase its index by 0.2, and Google will add 0.2. Since MIT is based in the United States, MIT affiliation will increase the Publication Index of the United States by 0.6. Similarly, since the University of Oxford is based in the UK, the EEA + Switzerland category will increase by 0.2. Finally, Google is a multinational corporation headquartered in the United States, therefore the United States will increase its Publication Index by an additional 0.2, for the total increase of 0.8. If an author has multiple affiliations, we split his/her fraction across each of those affiliated institutions. For instance, in the case above, if the last author listed two affiliations, Google and Stanford University (instead of just Google), both Google and Stanford University would get additional 0.2/2=0.1 points.
Who’s Ahead in AI Research at ICML 2020?
Top 50 Global Organizations (Industry & Academia) Leading in AI Research at ICML 2020 (with Publication Indices):
1. Google (USA) — 92.2
2. Stanford University (USA) — 39.2
3. MIT (USA) — 38.5
4. UC Berkeley (USA) — 34.2
5. Carnegie Mellon University (USA) — 24.0
6. Microsoft (USA) — 22.6
7. Facebook (USA) — 17.1
8. Princeton University (USA) — 17.0
9. University of Oxford (UK) — 16.3
10. UT Austin (USA) — 14.3
11. UCLA (USA) — 14.3
12. Duke University (USA) — 14.1
13. EPFL (Switzerland) — 13.9
14. Harvard University (USA) — 13.7
15. Cornell University (USA) — 12.6
16. ETH (Switzerland) — 12.4
17. Tsinghua University (China) — 12.3
18. National University of Singapore (Singapore) — 12.2
19. University of Pennsylvania (USA) — 12.1
20. Technion (Israel) — 12.1
21. IBM (USA) — 10.7
22. University of Washington (USA) — 9.7
23. UC San Diego (USA) — 9.5
24. University of Maryland (USA) — 9.0
25. Peking University (China) — 8.9
26. Georgia Institute of Technology (USA) — 8.8
27. University of Illinois at Urbana-Champaign (USA) — 8.7
28. University of Wisconsin-Madison (USA) — 8.7
29. University of Toronto (Canada) — 8.3
30. MILA (Canada) — 8.0
31. KAIST (South Korea) — 8.0
32. Texas A&M University (USA) — 7.9
33. RIKEN (Japan) — 7.8
34. University of Cambridge (UK) — 7.8
35. Columbia University (USA) — 7.8
36. UMass Amherst (USA) — 7.5
37. INRIA (France) — 7.5
38. New York University (USA) — 7.1
39. University College London (UK) — 6.8
40. University of Southern California (USA) — 6.8
41. Yale University (USA) — 6.6
42. Yandex (Russia) — 6.0
43. Shanghai Jiao Tong University (China) — 5.7
44. University of Minnesota (USA) — 5.6
45. University of Chicago (USA) — 5.6
46. McGill University (Canada) — 5.5
47. Seoul National University (South Korea) — 5.5
48. University of Tuebingen (Germany) — 5.5
49. University of Alberta (Canada) — 5.4
50. Rice University (USA) — 5.3
Top 20 American Universities Leading in AI Research at ICML 2020 (with Publication Indices):
1. Stanford University — 39.2
2. MIT — 38.5
3. UC Berkeley — 34.2
4. Carnegie Mellon University — 24.0
5. Princeton University — 17.0
6. UT Austin — 14.3
7. UCLA — 14.3
8. Duke University — 14.1
9. Harvard University — 13.7
10. Cornell University — 12.6
11. University of Pennsylvania — 12.1
12. University of Washington — 9.7
13. UC San Diego — 9.5
14. University of Maryland — 9.0
15. Georgia Institute of Technology — 8.8
16. University of Illinois at Urbana-Champaign — 8.7
17. University of Wisconsin-Madison — 8.7
18. Texas A&M University — 7.9
19. Columbia University — 7.8
20. UMass Amherst — 7.5
Top 20 Global Universities Leading in AI Research at ICML 2020 (with Publication Indices):
1. Stanford University (USA) — 39.2
2. MIT (USA) — 38.5
3. UC Berkeley (USA) — 34.2
4. Carnegie Mellon University (USA) — 24.0
5. Princeton University (USA) — 17.0
6. University of Oxford (UK) — 16.3
7. UT Austin (USA) — 14.3
8. UCLA (USA) — 14.3
9. Duke University (USA) — 14.1
10. EPFL (Switzerland) — 13.9
11. Harvard University (USA) — 13.7
12. Cornell University (USA) — 12.6
13. ETH (Switzerland) — 12.4
14. Tsinghua University (China) — 12.3
15. National University of Singapore (Singapore) — 12.2
16. University of Pennsylvania (USA) — 12.1
17. Technion (Israel) — 12.1
18. University of Washington (USA) — 9.7
19. UC San Diego (USA) — 9.5
20. University of Maryland (USA) — 9.0
Top 20 Companies Leading in AI Research at ICML 2020 (with Publication Indices):
1. Google (USA) — 92.2
2. Microsoft (USA) — 22.6
3. Facebook (USA) — 17.1
4. IBM (USA) — 10.7
5. Yandex (Russia) — 6.0
6. Amazon (USA) — 5.2
7. OpenAI (USA) — 4.4
8. Criteo (France) — 4.4
9. Uber (USA) — 4.3
10. Samsung (South Korea) — 4.2
11. Baidu (China) — 3.9
12. Apple (USA) — 3.7
13. Alibaba (China) — 2.8
14. Huawei (China) — 2.6
15. Intel (USA) — 2.1
16. NVIDIA (USA) — 2.0
17. Qualcomm (USA) — 2.0
18. NEC (Japan) — 1.8
19. Salesforce (USA) — 1.7
20. Bosch (Germany) — 1.6
Further Analysis
Top 50 Global Organizations (Industry & Academia) at ICML 2020 vs. ICML 2019 (with Publication Indices):
Changes in Publication Indices at Top 50 Global Organizations at ICML 2020 vs. ICML 2019 (positive change means more publications at ICML 2020 than at ICML 2019):
1. Google: +19.4
2. Stanford University: +14.7
3. MIT: +15.4
4. UC Berkeley: +10.0
5. Carnegie Mellon University: +4.8
6. Microsoft: +5.9
7. Facebook: +7.6
8. Princeton University: +6.3
9. University of Oxford: +2.7
10. UT Austin: +3.4
11. UCLA: +5.9
12. Duke University: +6.9
13. EPFL: +4.3
14. Harvard University: +6.7
15. Cornell University: +2.0
16. ETH: +0.3
17. Tsinghua University: +2.8
18. National University of Singapore: +9.7
19. University of Pennsylvania: +8.1
20. Technion: +3.9
21. IBM: +0.3
22. University of Washington: +0.6
23. UC San Diego: +7.2
24. University of Maryland: +4.7
25. Peking University: +3.1
26. Georgia Institute of Technology: -5.7
27. University of Illinois at Urbana-Champaign: -0.8
28. University of Wisconsin-Madison: +4.7
29. University of Toronto: +1.8
30. MILA: +4.3
31. KAIST: -3.1
32. Texas A&M University: +5.1
33. RIKEN: +2.2
34. University of Cambridge: +1.7
35. Columbia University: +1.9
36. UMass Amherst: +3.3
37. INRIA: +1.0
38. New York University: +3.4
39. University College London: +2.6
40. University of Southern California: +2.4
41. Yale University: +3.0
42. Yandex: +4.3
43. Shanghai Jiao Tong University: +3.5
44. University of Minnesota: +2.2
45. University of Chicago: +3.0
46. McGill University: +1.7
47. Seoul National University: -1.8
48. University of Tuebingen: +4.5
49. University of Alberta: +3.2
50. Rice University: +3.9
Word Cloud of Paper Titles at ICML 2020:
Discussion
Let’s see what changed between ICML 2019 and ICML 2020 (for 2019 data please see our AI Research Rankings 2019, where we combined insights from NeurIPS 2019 and ICML 2019). As you can see, the Top 5 global organizations leading in AI research are still Google, Stanford University, MIT, UC Berkeley, and Carnegie Mellon University. Each one of them significantly increased their Publication Index at ICML 2020: Google published an equivalent of 19.4 more papers at ICML 2020, Stanford University is up by 14.7, MIT is up by 15.4, UC Berkeley is up by 10.0, and Carnegie Mellon University is up by 4.8. Just like in Lewis Carroll’s Red Queen’s race, top researchers need to publish more papers each year just to maintain the lead.
“My dear, here we must run as fast as we can, just to stay in place. And if you wish to go anywhere you must run twice as fast as that.” (Lewis Carroll)
Dataset
Please note that even data science conferences still don’t release publication data in any sort of Python-friendly form 🤷♂️, so our analysis ended up being quite manual (i.e. first parse HTML, then fix typos in organization names, standardize them, split lines with multiple organizations, summarize with a pivot table, etc.). If you find any bugs, please email us, and we’ll be happy to fix them.
About the author: My name is Gleb Chuvpilo, and I’m the Managing Partner at Thundermark Capital, a Venture Capital firm that invests in Deep Tech startups. I have a Master’s degree from the MIT Computer Science and Artificial Intelligence Lab and an MBA in Finance and Strategic Management from The Wharton School at the University of Pennsylvania.