First of all, two months ago I got the official announcement that I got tenure. This is the end of a journey that started in 2003 with a PhD in computer vision for humanoid robotics at the University of Genova (Italy). This was an incredibly stressful journey for a number of reasons: Switching research topic after my PhD, living in Milan with the salary of a post-doc, moving to the US with a PhD from an Italian university, not having a famous American professor as mentor/letter writer, etc.
In “Mostly Harmless” by Douglas Adams, the protagonist Arthur Dent visits a woman seer to receive advice. The woman, who swats at flies in front of a cave and smells horrible, hands photocopies of her life to him suggesting he should live his life the opposite way she did so he will not end up living in a rancid cave. I should do the same for my academic life…
However, I cannot complain too much: I did get tenure, I contributed 2 or 3 things that I am proud of, and I do not live in a rancid cave! So, now I am going to use my new tenure powers to explain why I think the expert reviewer is a myth.
2. The Fairytale of the Expert Reviewer
There is a common myth in the theory ML community that the main problem with reviews at conferences is that reviewers are not really experts.
First, I think it is hardly controversial that nowadays most of the reviewers do not have the experience and knowledge to judge scientific papers: reviewers at big ML conferences are for the majority young and inexperienced PhD students. Also, for the Dunning-Kruger effect, they vastly overestimate what they actually know. The final effect is that the average review tend to be a mix of arrogant meaningless semi-random critiques. Let me be clear: PhD students from 10 years ago were equally terrible as reviewers. For example, I cringe thinking at the reviews I used to write as a PhD student. However, my terrible inexperienced reviews were also for third-tier conferences. Nowadays, instead, the exponential growth of ML submissions means that we have to enroll all the possible reviewers at first-tier venues.
However, I do not want to focus on the young PhD students: Mainly, it is not their fault. I understand that the growth of some community made the reviewing process basically random and nobody has a real solution to it, me neither. Instead, here I want to talk about the Expert Reviewer.
You should know that the theory Machine Learning community is convinced to have a better reviewing system. The reason, they say, is because they have the Expert Reviewers. This mythological figure knows all the results in his/her area, read all the Russian papers from 50 years ago, and thanks to his/her infinite wisdom only accepts the best and finest products of the human intellect. He/she can also check the correctness of 40 pages of appendix purely laying on hands and can kill any living beings just pronouncing the magic words “this is not surprising”.
Now, the fact that not even in Advanced Dungeons & Dragons you can have a Mage/Paladin with Wisdom and Intelligence to 19, it should make you suspect this is not exactly the reality…
3. The Reality
Now, let me tell you what actually happens, most of the time. The Expert Reviewer typically is someone obsessed with a particular area whose volume tends to 0, but very often with close to 0 knowledge of anything outside of it. You can visualize him/her as a Delta function.
If you work on a less fashionable area, the probability to randomly meet an Expert Reviewer for your specific sub-topic is close to 0. Moreover, He/She will often refuse to think that anything outside of his/her area is actually of any interest. Finally, for a weird bug in the symmetric property, the Expert Reviewer is always sure to know more than the authors of the paper he/she reviews.
The net effect is that
- most of the time the authors are actually the only real experts;
- papers are seldom accepted or rejected based on a deep understanding of what is going on;
- most of the time the most important decision factor is how much the reviewer likes your topic and/or you.
Sometimes it does happen that the review process actually work as intended, but I would argue that the above is what happens at least 60% of the times an Expert Reviewer is involved.
Now, I could argue forever on these 3 points above. Instead, thanks to my tenure, I decided to do something that is taboo in academia: I’ll describe the review process of a real paper!
The paper is here and it was accepted at COLT 2017 only thanks to me.
Let me start from the beginning.
4. The Story of the Unlucky Paper
First of all, for a long time COLT had a reviewer/subreviewer system: Each PC member was allocated a number of papers to review and he/she could decide if review all the paper by him/herself or send it to subreviewers. The subreviewers were not taken from a fixed list of reviewers, but they had to be contacted personally: As you can imagine, it was not an easy task. How many times you accept review requests from random people? Exactly! On the other hand, this gave the possibility to PC members to really select the best reviewer for each paper, even going outside of the usual clique of people.
So, I was a PC member and I assigned that paper to an expert of regression in Reproducing Kernel Hilbert Spaces, because the paper was clearly an extension of the seminal work on 1/T rate of SGD without strong convexity, that in turn was based on the line of work developed by Rosasco, Caponnetto, De Vito, Smale, Zhou, etc. Now, for a number of reasons, this specific line of work is unknown to 99% of the theory ML people. I happen to know it because I published in this subfield after Lorenzo Rosasco showed to me its beauty. This is to say that I was pretty confident about my understanding of the paper and my choice of the subreviewer.
The other two PC members assigned to the paper were online learning people. This requires some explanation: for long time at COLT online learning people were the only people with some background in convex optimization. So, any optimization paper was going to them, by default. If you know classic online learning theory and convex optimization, you should see why sometimes this can be a terrible idea. (This situation improved a bit in the latest years, but not by much.) Anyway, the reviews are in and one PC personally destroys the paper, clearly not understanding it. The other one sent it to an OR person, that also did not understand the paper at all. My subreviewer instead firmly accepted the paper.
Now, it is necessary to open a parenthesis. There is something that young reviewers might not understand: Most of the time, it is adamant when a reviewer did not understand a paper. It is painfully clear for the authors, but it is also very clear for an experienced Area Chair/PC/Chair in charge of the paper. So, luckily the Chair decided to intervene and asked for the opinion of an external (famous but oblivious to the area of the paper) reviewer. Let me say that this is also not common: It heavily depends on the Chairs and the load they have. Quite understandably, they often do not have the time to intervene on single papers. As expected, the fourth review also did not understand the paper and rejected it… At this point, I started a long discussion in which I successfully refuted each single point of the fourth external reviewer, till he/she concedes that the paper was borderline because he/she was still not excited about it.
Let me pause here to explain you another thing: Expert Reviewers are humans and humans are rarely rational. So, one of the main ways they have to judge a paper is “Do I like this topic?” that often translates to “Do I work on this topic? No, I don’t, because this is clearly a bad topic!”. So, the external reviewer decided that the paper was not interesting because he/she thought the entire area was not interesting. End of the story.
Let me stress here that the problem is not how well a paper is written. In fact, all the involved reviewers understood the paper and its claims. However, deciding if these results were warrant or not acceptance it is just a matter of taste of different mostly orthogonal sub-communities, like the pineapple-on-pizza community and the never-pineapple-on-pizza one.
By this stage, the paper was doomed: two reviewers against me and my subreviewer, and one reviewer completely silent. At this stage, any further discussion was also counter-productive because the Expert Reviewer sees any attack to his/her argument as an attack to his/her auctoritas.
So, I had an idea: I stopped sending messages to the other reviewers and wrote directly to the COLT Chairs. I plainly explained that none of these people but my subreviewer and I actually worked in this area. The Chair was not convinced, so I had to propose another PC member who i) actually understood the area, and ii) was famous enough to have a weight on the decision. (The Chair did not mention the second point, I read between the lines…) I proposed a fifth reviewer and the Chair contacted him/her.
After a few days, the fifth reviewer came back and the first sentence of the message was:
I’m a bit surprised: the improvement over existing work is pretty clear!!
The Chair was convinced, the paper accepted.
It took only 2 weeks of discussions, 7 Expert Reviewers, and 1 Chair.
Was it worth it? In my personal opinion, definitely yes.
What did I get in return? Absolutely nothing, zero, nada, zip, zilch.
So, trust me when I tell you that this is not what normally happens. And it does not matter how many Experts Reviewers you have: The problem is that the probability to get someone that really understands your subtopic is very low, even when you submit to a prestigious theory conference. Even assuming you got reviewers that actually understand your paper, you have to be really lucky to avoid your Nemesis (that rejects all your papers just because he/she does not like you and you are not even sure why), the Egomaniac (that rejects anything vaguely similar to what he/she does, because nothing compares to what he/she does), and the Purist (that rejects anything that might actually work in practice). All the above are things that actually happened, but not even tenure makes me so brave to describe these episodes. But just to give you some fun facts, recently a Chair of a prestigious conference told me that I indeed might have “enemies”. He/she also plainly told me that I should declare the people I suspect are my enemies as conflicts (never mind that almost none of the conferences have a system in place for “negative” conflicts…).
In reality, in my years of experience (yes, I am old) I very rarely saw the reviewing system working as it should. Most of the time, in order to get a meaningful decision on a paper, you have to work hard, so hard that people might end up deciding that it is not worth it. I myself have less and less strength and patience to fight many of these battles. I did not gain anything in any of them, probably only more enemies.
An entire system of checks and balances is badly needed in the conference reviews, much more than just amassing expert reviewers. Indeed, you also need somebody that allocates them properly, that checks that they do their job, that prevents them from acting like jerks, that keep them open to discussions, that makes them plainly admit that after all they might not have understood the paper, that helps them admit that they might not know the area, that (God forbid!) prevents them from rejecting papers just because they don’t like them, etc.
However, the main problem is why should anyone waste so much time on reviewing/discussing/analyzing papers of other people? More importantly, how exactly the community gives feedback for these poor reviews? How are we teaching people to simply say “Actually, this is not exactly my topic”? Indeed, not only no feedback whatsoever was given to any of the people of my story, somehow they also got a prize: in later years I fought similar battles to have papers of the above mentioned reviewers accepted in other conferences!
Overall, I do not know what is a better system to review papers, but I am pretty sure Expert Reviewers are not the answer.
So, there is no happy ending here: The Expert Reviewers are still convinced to be always right and, from time to time, still rejecting your papers they did not actually understand.
P.S. I am also an Expert Reviewer.
Thank you for sharing your experience. This gave me the opportunity to think one more time about the reviewing process. Though, I have to admit that I am a little bit skeptical about your criticisms on the “Expert Reviewer”, or the fact that this is the prism for your analysis. In my opinion you do not describe Experts but rather researchers:
– Either that took no more than 2 or 3 hours to complete the review.
– Either that were not qualified (at all) to review and do not admit it.
Said differently, you describe researchers that did not fulfil their review role (even if it is super important).
I admit that my opinion is kind of biased as I am fairly confident in human beings: if you put them in a good position, they will do a good work. In other words, if you take some time and have some respect in the review task, then you will surely be able to deliver a good review (or just admit that you do not know enough on the topic to judge). There are no such thing as being Experts or non-Experts but only researchers that take time and consideration when reviewing and researchers that despise the review role (and it is not necessarily their fault –as they were put in this position, see below).
Hence, my catch is that if some people (and not all !!!) tend to consider only poorly the reviewing role, it is because our community considers reviewers only poorly. Here are few examples:
-How come that in other communities you have 1 paper to review in 6 months and we have 7 papers in 1 month ? Reviewing should not be assembly-line production work.
-Furthermore I guess that every body has at least one experience of a paper full of typos, imprecisions, non-rigorous or even non-mathematical claims. And this occurs in top rank conferences. Do you think this can occur when you review for Annals of Probability ? This also raises the question of partitioning the conferences in Tiers. Tier 1 would be for top discoveries, Tier 2 for nice results, etc… I guess that we are one of the only community that has such a horizontal structure, isn’t it time to be more humble ?
-There are only rare times (and I was sooo glad when this occurred) that the papers are actually discussed further (with scientific arguments) after all reviewers submit their reviews. Even rarer is the fact that the Area Chair gives comments. Each time, I LEARNED a lot, and was really happy with my reviewing experience. In other communities, reviewing is actually learning deeply something and having nice discussions!
All of this seems discouraging when you are reviewing. And if the ML community wants reviewers to respect the high standards established by mathematics history (or science in general), then the ML community itself should respect these.
LikeLiked by 1 person
Hi Loucas, thanks a lot for sharing your comments!
Let me say that my criticism started from a series of recent and past tweets about how incredibly better are the reviews in the theory community thanks to the Expert Reviewers (sic!). Indeed, this is a very common thought in the COLT community, and it is actually difficult to find CS theoreticians that fully disagree with this view. Even if I am more of a theoretician, I have a lot of respect for applied research and I cannot stand any kind of (unsupported?) snobbery.
Anyway, my post has a lot of irony, but I would still defend the main points.
In particular, I would like to share your optimism, but I have seen too much unfairness to still believe that “if you take some time and have some respect in the review task, then you will surely be able to deliver a good review (or just admit that you do not know enough on the topic to judge)”. It just does not happen as often as one would expect. As an empirical measure, for example, count how many papers on RKHS were accepted at COLT over the years and how many at NeurIPS (after a rejection at COLT). I don’t know why these kind of things happen: Maybe there is a will to shape the community accepting only papers in some subfields or maybe it is the real belief that some directions are not appropriate for COLT or maybe the reviewers just don’t realize that they didn’t understand the paper. I really don’t know the reason, but it happens on a regular basis. For example, I know people that still think that papers on regression in RKHS are useless and will actively reject *all* of them. Think about it: Why should not it happen? The community never gave any feedback to these reviewers.
That said, I completely agree with your other considerations: too many reviews, low quality of the submissions to top ML conferences, no real discussion. All very true. I even strongly support the idea of having different tiers.
But, my main point was that the theory community is not in a better shape than other communities, possibly just for different reasons.