The proliferation of international university rankings has created a need for some sort of critical assessment and comparison. The Research Evaluation Working Group of the International Network of Research Management Societies (INORMS) is working to create a rating tool, Rating the Rankers, to guide researchers, students, administrators and other stakeholders in using international rankings. This is emphatically not a ranking of rankers with a top and a bottom but rather an instrument that will identify strengths and weaknesses and encourage improvement.
It is hoped that eventually all global rankings will be covered but the pilot stage of the project involved only six: the Times Higher Education (THE) World University Rankings, QS World University Rankings, the Shanghai Rankings (ARWU), the US News Best Global Universities, Leiden Ranking and U- Multirank. The project is led by Elizabeth Gadd of Loughborough University and Justin Shearer of the University of Melbourne.
The rating is based on questions relating to four core values each of which were assessed by an international team of expert reviewers: Good Governance, Measuring What Matters, Transparency, and Rigour. Rankings were classified for each question as Fully Compliant, Partially Compliant, Not Compliant, or Not Applicable. The ranking organisations were asked to complete a self-assessment but only CWTS Leiden did so.
After the reviews were completed, they were calibrated by Richard Holmes writer of the blog University Ranking Watch. During the rating process it was noted that there were some problems with scoring and interpretation that will be addressed as the work progresses.
The reviewers found that the rankers made efforts towards good governance although there were weaknesses especially with regard to conflicts of interest. They were generally transparent about their aims and methods but there were areas of weakness including the ability to do reverse engineering. Only the Shanghai rankings were fully compliant with the latter expectation.
The rankers did not perform well with regard to measuring what matters although U-Multirank and Leiden Ranking did fairly well for measuring against missions.
Rigour was another area where the well-known rankers did not do well. The best performance here was from Leiden Ranking which avoided surveys, and used error bars to indicate uncertainty.