Citations and Metrics - Shifting Goalposts of Academic Recognition?

I've been thinking a lot recently about how the experience of being an academic has changed since I started my PhD back in 2004. I wrote about one aspect of this a couple of weeks ago - the shift that many funders have made from the traditional bullet point CV to narrative résumés. Related to this, is the attempt to move away from what you might call 'traditional' metrics such as journal impact factors and author citation counts. The journal impact factor refers to the average number of citations received by articles published in a journal over a specific period, typically two years. Author citations track how often a researcher’s work is referenced by others, often aggregated into metrics like the h-index (an h-index of 10 would mean a researcher has 10 papers each cited at least 10 times). These numbers have traditionally been used to assess the "impact" of research, with higher citations meaning greater impact. I am told they influence decisions around hiring, promotion, and funding, though in my experience they have barely been looked at in hiring or funding decisions - certainly the panels I have been involved in they barely get a mention, though I think this may vary a lot between disciplines. Promotions is possibly a different matter - when I have applied for promotion in the past, I have been asked to present my h-index and explain it alongside a narrative around the papers themselves and their impact.
One of the key issues with these metrics is that they vary dramatically between disciplines. In large fields like biomedicine, where thousands of researchers publish prolifically and cite each other frequently, citation counts can be very high. In contrast, disciplines like archaeology where the research community is much smaller and publication cycles are slower, citations are overall lower. This makes cross-disciplinary comparisons misleading and unfair. A paper in a high-impact biomedical journal might receive hundreds of citations in a year, while a landmark archaeological study might accumulate the same number of citations slowly over a decade. I would emphasise that there is no point making direct comparisons between disciplines in this way.

Even within a single discipline, citation patterns can vary widely depending on the topic. In archaeology, for example, studies involving ancient DNA often attract significantly more citations than work focused on micromorphology or material culture. These differences reflect not just the perceived novelty or interdisciplinarity of certain approaches, but also the size and activity of subfields. An experienced researcher understands these nuances and can account for them when evaluating impact. My view is that metrics have their place, alongside a range of other factors to be considered when looking at a researcher's impact in  their field. Looking at impact factors and citation counts in context has always seemed like the most sensible approach to me. It’s obvious that an early career researcher isn’t going to have dozens of papers or hundreds of citations straight out of their PhD. But if they have one paper from their doctoral research that starts being cited by other scholars soon after graduation, that’s a strong signal. It likely means the work is original, well-executed, and resonating with others in the field. How else can we assess this type of impact? Of course there are lots of issues here around who and what gets cited and why, but it is hard to argue that a paper that isn't getting cited can be having an impact. One of my favourite papers that I thought was pretty good, on formation processes of Margat Castle in Syria, published in 2014, has only been cited 4 times! Despite the fact I think it's a solid bit of work, it clearly hasn't had much impact.

One metric I’ve found particularly useful in navigating these complexities is the field-weighted citation impact (FWCI). This metric compares the actual number of citations a paper receives to the average number of citations received by similar publications in the same field, year, and document type. A score of 1.0 means the paper is cited as expected; a score above 1.0 indicates above-average impact. What makes FWCI helpful is that it accounts for disciplinary differences and publication norms, offering a more nuanced view of how a piece of research is performing relative to its peers. It’s not perfect, but it’s a step toward recognising quality and influence in context.

Despite the critiques of citation metrics, I still remember the quiet pride I felt when my PhD papers began to be cited by other archaeologists. It wasn’t about the numbers, it was about recognition. When my 2011 paper on coprolites from Çatalhöyük reached 100 citations, it felt like a milestone. That work, one of the key outputs from my PhD and the basis of much of my subsequent research, was being used, built upon, and valued. For scholars from marginalised backgrounds, these moments of recognition can be especially meaningful. Academia often feels like a landscape of shifting goalposts; what counts as impact seems to change depending on who you are and where you’re positioned. If citations aren’t the answer, what is? How do we judge the quality and impact of research in ways that are fair, inclusive, and meaningful?

Sometimes the debate around citations and h-index feels a bit like the debate around using academic titles. Some people argue that using academic titles is pretentious. But researchers from marginalised backgrounds argue that titles are markers of achievement and respect (see On Academic Titles, Perception, and Respect, which critiques the idea that using titles is pretentious, highlighting how this view often ignores the gendered and racialised dynamics of authority and respect). I am really keen to here different perspectives on this!


Comments