AI Research Rankings 2019: Insights from NeurIPS and ICML, Leading AI Conferences
Please subscribe to our Twitter if you’d like to follow our Deep Tech research. 🤖
Introduction
Welcome to the long-awaited refresh of our annual AI Research Rankings, 2019 edition (here is the first pilot of the rankings we published last year). This time we analyzed publications at the two most prestigious AI research conferences, Neural Information Processing Systems (NeurIPS, or NIPS) and International Conference on Machine Learning (ICML). Using conference proceedings (NeurIPS 2019 and ICML 2019), we went into each of the 2,200 accepted papers and compiled a list of authors and their affiliated organizations, and then calculated the Publication Index for each organization (see “Methodology” section below). The most intuitive way to think of the Publication Index is from the point of view of full paper equivalents: Google’s Publication Index of 167.3 can be interpreted as if Google published 167.3 full papers at the two leading AI conferences in 2019.
We will start this analysis with details on methodology, continue to AI research rankings for 2019, then show further interesting descriptive statistics, and conclude with a discussion on who is ahead in AI.
Methodology
The methodology of our Publication Index is inspired by the Nature Index:
To glean a country’s, a region’s or an institution’s contribution to an article, and to ensure they are not counted more than once, the Nature Index uses fractional count (FC), which takes into account the share of authorship on each article. The total FC available per article is 1, which is shared among all authors under the assumption that each contributed equally. For instance, an article with 10 authors means that each author receives an FC of 0.1. For authors who are affiliated with more than one institution, the author’s FC is then split equally between each institution. The total FC for an institution is calculated by summing the FC for individual affiliated authors. The process is similar for countries/regions, although complicated by the fact that some institutions have overseas labs that will be counted towards host country/region totals.
The only difference is that our Publication Index counts overseas labs towards the headquarters country/region (instead of the host country/region). This is a contentious point, but we believe that this approach better reflects the assignment of intellectual property and respective accrual of benefit to the headquarters, rather than the local lab.
Here is an example of the Publication Index calculation. If a paper has five authors — three from MIT, one from the University of Oxford, and one from Google — each author will get 1/5th of one point, or 0.2. As a result, from this paper alone, MIT will increase its Publication Index by 3*0.2=0.6 points, the University of Oxford will increase its index by 0.2, and Google will add 0.2. Since MIT is based in the United States, MIT affiliation will increase the Publication Index of the United States by 0.6. Similarly, since the University of Oxford is based in the UK, the EEA + Switzerland category will increase by 0.2. Finally, Google is a multinational corporation headquartered in the United States, therefore the United States will increase its Publication Index by an additional 0.2, for the total increase of 0.8. If an author has multiple affiliations, we split his/her fraction across each of those affiliated institutions. For instance, in the case above, if the last author listed two affiliations, Google and Stanford University (instead of just Google), both Google and Stanford University would get additional 0.2/2=0.1 points.
Finally, the reason why we thought it was fair to combine NeurIPS and ICML publications into the same dataset is that they have similar perceived prestige among top AI researchers, similar institutional participation, and similar paper acceptance rates (21.2% for NeurIPS and 22.6% for ICML).
AI Research Rankings 2019
Top 40 Global Organizations (Industry & Academia) Leading in AI Research in 2019 (with Publication Indices):
1. Google (USA) — 167.3
2. Stanford University (USA) — 82.3
3. MIT (USA) — 69.8
4. Carnegie Mellon University (USA) — 67.7
5. UC Berkeley (USA) — 54.0
6. Microsoft (USA) — 51.9
7. University of Oxford (UK) — 37.7
8. Facebook (USA) — 33.1
9. Princeton University (USA) — 31.5
10. Cornell University (USA) — 30.9
11. Georgia Tech (USA) — 30.1
12. UT Austin (USA) — 29.9
13. University of Illinois (USA) — 29.4
14. Columbia University (USA) — 29.2
15. Tsinghua University (China) — 28.4
16. UCLA (USA) — 27.2
17. ETH (Switzerland) — 27.0
18. IBM (USA) — 25.8
19. University of Washington (USA) — 24.0
20. INRIA (France) — 23.2
21. EPFL (Switzerland) — 22.3
22. Peking University (China) — 21.6
23. University of Toronto (Canada) — 21.4
24. Harvard University (USA) — 19.2
25. Duke University (USA) — 18.7
26. New York University (USA) — 17.7
27. University of Cambridge (UK) — 15.1
28. KAIST (South Korea) — 14.8
29. Technion (Israel) — 14.6
30. UC San Diego (USA) — 14.6
31. University of Wisconsin Madison (USA) — 14.4
32. Amazon (USA) — 14.3
33. UMass Amherst (USA) — 13.8
34. University College London (UK) — 13.7
35. MILA (Canada) — 13.5
36. University of Southern California (USA) — 13.5
37. University of Pennsylvania (USA) — 13.3
38. Seoul National University (South Korea) — 12.7
39. Johns Hopkins University (USA) — 12.6
40. RIKEN (Japan) — 12.3
Top 20 Regions Leading in AI Research in 2019 (with Publication Indices):
1. United States — 1260.2
2. EEA* + Switzerland — 431.5
3. China — 184.5
4. Canada — 80.3
5. Japan — 49.4
6. South Korea — 46.8
7. Israel — 43.3
8. Australia — 27.0
9. India — 17.1
10. Singapore — 13.2
11. Russia — 10.6
12. Taiwan — 5.3
13. Saudi Arabia — 5.0
14. United Arab Emirates — 2.3
15. Iran — 2.2
16. South Africa — 1.0
17. Chile — 1.0
18. Malaysia — 0.7
19. Turkey — 0.6
20. New Zealand — 0.5
*Countries that belong to the EEA include Austria, Belgium, Bulgaria, Croatia, Republic of Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, UK, Iceland, Liechtenstein, and Norway (source).
Top 20 Countries Leading in AI Research in 2019 (with Publication Indices):
1. United States — 1260.2
2. China — 184.5
3. United Kingdom — 126.1
4. France — 94.3
5. Canada — 80.3
6. Germany — 64.5
7. Switzerland — 59.3
8. Japan — 49.4
9. South Korea — 46.8
10. Israel — 43.3
11. Australia — 27.0
12. India — 17.1
13. Netherlands — 15.3
14. Singapore — 13.2
15. Denmark — 12.2
16. Italy — 11.5
17. Sweden — 11.3
18. Russia — 10.6
19. Finland — 9.6
20. Austria — 7.4
Top 20 American Universities Leading in AI Research in 2019 (with Publication Indices):
1. Stanford University — 82.3
2. MIT — 69.8
3. Carnegie Mellon University — 67.7
4. UC Berkeley — 54.0
5. Princeton University — 31.5
6. Cornell University — 30.9
7. Georgia Tech — 30.1
8. UT Austin — 29.9
9. University of Illinois — 29.4
10. Columbia University — 29.2
11. UCLA — 27.2
12. University of Washington — 24
13. Harvard University — 19.2
14. Duke University — 18.7
15. New York University — 17.7
16. UC San Diego — 14.6
17. University of Wisconsin Madison — 14.4
18. UMass Amherst — 13.8
19. University of Southern California — 13.5
20. University of Pennsylvania — 13.3
Top 20 Global Universities Leading in AI Research in 2019 (with Publication Indices):
1. Stanford University (USA) — 82.3
2. MIT (USA) — 69.8
3. Carnegie Mellon University (USA) — 67.7
4. UC Berkeley (USA) — 54.0
5. University of Oxford (UK) — 37.7
6. Princeton University (USA) — 31.5
7. Cornell University (USA) — 30.9
8. Georgia Tech (USA) — 30.1
9. UT Austin (USA) — 29.9
10. University of Illinois (USA) — 29.4
11. Columbia University (USA) — 29.2
12. Tsinghua University (China) — 28.4
13. UCLA (USA) — 27.2
14. ETH (Switzerland) — 27.0
15. University of Washington (USA) — 24.0
16. INRIA (France) — 23.2
17. EPFL (Switzerland) — 22.3
18. Peking University (China) — 21.6
19. University of Toronto (Canada) — 21.4
20. Harvard University (USA) — 19.2
Top 20 Companies Leading in AI Research in 2019 (with Publication Indices):
1. Google (USA) — 167.3
2. Microsoft (USA) — 51.9
3. Facebook (USA) — 33.1
4. IBM (USA) — 25.8
5. Amazon (USA) — 14.3
6. Tencent (China) — 8.8
7. Alibaba (China) — 7.5
8. Bosch (Germany) — 7.2
9. Uber (USA) — 7.1
10. Intel (USA) — 6.9
11. Toyota (Japan) — 6.0
12. Yandex (Russia) — 5.8
13. Baidu (China) — 5.5
14. Nvidia (USA) — 5.2
15. Apple (USA) — 4.6
16. Salesforce (USA) — 4.2
17. PROWLER.io (UK) — 4.2
18. Criteo (France) — 3.9
19. Huawei (China) — 3.7
20. NEC (Japan) — 3.5
Further Analysis
Academia vs. Industry — Share of Total Publication Index:
Academia — 77.8%
Industry — 22.2%
Top 150 Words in 2200 Paper Titles at NeurIPS 2019 and ICML 2019 (“word cloud”):
Top 30 Countries by Per Capita Publication Index (Publication Index divided by country population in millions):
1. Switzerland — 6.97
2. Israel — 4.88
3. United States — 3.85
4. Singapore — 2.34
5. Canada — 2.17
6. Denmark — 2.11
7. United Kingdom — 1.90
8. Finland — 1.75
9. France — 1.41
10. Sweden — 1.11
11. Australia — 1.08
12. South Korea — 0.91
13. Netherlands — 0.89
14. Austria — 0.84
15. Germany — 0.78
16. Latvia — 0.67
17. Belgium — 0.44
18. Estonia — 0.44
19. Japan — 0.39
20. Norway — 0.32
21. Cyprus — 0.28
22. United Arab Emirates — 0.26
23. Taiwan — 0.22
24. Ireland — 0.21
25. Italy — 0.19
26. Saudi Arabia — 0.15
27. Greece — 0.14
28. China — 0.13
29. Czech Republic — 0.11
30. New Zealand — 0.11
Treemap of Top 40 Global Organizations Leading in AI Research (Area is Proportional to the Publication Index):
Collectively, the top 40 organizations contributed 55% of the total Publication Index, with a combined value of 1,212.3 out of 2,200 total.
Measuring Competition in AI Research (the Herfindahl Index):
The Herfindahl index (also known as Herfindahl–Hirschman Index) is a measure of the size of participants in relation to the industry and an indicator of the amount of competition among them.
Calculation:
Interpretation:
- An H below 100 indicates a highly competitive industry.
- An H below 1,500 indicates an unconcentrated industry.
- An H between 1,500 to 2,500 indicates moderate concentration.
- An H above 2,500 indicates high concentration.
For our dataset (using each organization’s share of the Total Publication Index): H=146.47, which indicates an unconcentrated industry. In other words, there are no signs of monopolization of AI research in 2019.
Discussion: Who’s Ahead in AI?
A heated debate is going on today on the state of the strategic race between the United States and China to dominate in AI. We tend to side with a more balanced perspective, but before we begin our analysis, a bit of history is in order:
- Two major events happened in AI in 2016: in March, Google’s AlphaGo became the first computer program to beat a 9-dan Go professional, Lee Sedol, without handicaps; in October, President Obama’s administration released a strategy on future directions and considerations for AI called Preparing for the Future of Artificial Intelligence.
- In China, these two events created a “Sputnik moment” which helped convince the Chinese government to prioritize and dramatically increase funding for artificial intelligence (see Kai-Fu Lee’s AI Superpowers).
- In response, in July 2017 the Communist Party of China set 2030 as the deadline for an ambitious AI goal: it called for China to reach the top tier of AI economies by 2020, achieve major new breakthroughs by 2025, and become the global leader in AI by 2030. The strategy became known as the New Generation Artificial Intelligence Development Plan, and it has spurred many policies and billions of dollars of investment in research and development from ministries, provincial governments, and private companies.
- Certain think tanks, such as CNAS, have argued that China’s AI strategy reflected the key principles from the Obama administration report — now it is China adopting them, instead of the United States.
- This copying strategy isn’t new: to quote Peter Thiel’s Zero to One, “The Chinese have been straightforwardly copying everything that has worked in the developed world: 19th-century railroads, 20th-century air conditioning, and even entire cities. They might skip a few steps along the way — going straight to wireless without installing landlines, for instance — but they’re copying all the same.”
- 2017 is precisely the year when we started tracking the state of AI research, so we established China’s baseline summarized in the following chart showing that the United States had an 11x lead in the total Publication Index over China:
- In 2019 the United States has a 7x lead (USA — 1260.2, China — 184.5), so the gap is clearly narrowing. Furthermore, the analysis by the Allen Institute for Artificial Intelligence found that China has steadily increased its share of authorship of the top 10% most-cited papers: China’s share was at 26.5% in 2018, not far behind the United States at 29%.
One might say that it is not looking good for American competitiveness in AI in the next decade. However, we believe that the outcome will depend on the interplay of the advancement of three key ingredients of modern AI: algorithms, hardware, and training data, and it takes getting all three right in order to dominate the field.
We believe that the United States will have a strong lead in AI algorithms for years to come, grounded in several decades of track record in the advancement of computer science at world-class universities, such as MIT, Stanford, CMU and UC Berkeley. In addition, the openness of the likes of Google and Facebook to publishing internal research at the conferences created a thriving ecosystem (and a rotating door of sorts) for top AI researchers to move seamlessly between academia and industry (think Yann LeCun or Andrew Ng).
In addition, the United States is the home of Silicon Valley (in its original silicon-focused definition), which has been at the forefront of hardware innovation ever since the traitorous eight left Shockley Semiconductor Laboratory to found Fairchild Semiconductor in 1957. Deep Learning algorithms are extremely compute-hungry, right up there with mining for bitcoin that now consumes more power than Switzerland. We think that it will be extremely difficult for China to catch up to the United States in hardware over the next decade.
However, where the American advantage is questionable is training data, and it is by design. It is, in fact, part of a broader privacy vs. public good debate, where the United States tends to choose the former, and China—the latter. In China today AI scans faces from hundreds of millions of street cameras, reads billions of WeChat messages, and analyzes millions of health records—all following the data-as-a-public-good argument. This training data availability, combined with China’s 1.4B population, creates an enormous strategic advantage for China.
Hard-pressed to draw a conclusion, we still think that the first two factors (algorithms and hardware) will outweigh the last one (availability of data), and the United States will maintain its lead in AI for years to come.
Dataset
Please note that conferences don’t release publication data in a standard form, so our analysis ended up being quite manual (HTML parsing, Python transformations, a lot of manual name standardization, and a few unknown affiliations). If you find any bugs, please email us, and we’ll be happy to fix them.
About the author: My name is Gleb Chuvpilo, and I’m the Managing Partner at Thundermark Capital, a Venture Capital firm that invests in Deep Tech startups. I have a Master’s degree from the MIT Computer Science and Artificial Intelligence Lab and an MBA in Finance and Strategic Management from The Wharton School at the University of Pennsylvania.