PWC207 - H-Index

TL;DR

On with TASK #2 from The Weekly Challenge #207. Enjoy!

The challenge

You are given an array of integers containing citations a researcher has received for each paper.

Write a script to compute the researcher’s H-Index. For more information please checkout the wikipedia page.

The H-Index is the largest number h such that h articles have at least h citations each. For example, if an author has five publications, with 9, 7, 6, 2, and 1 citations (ordered from greatest to least), then the author’s h-index is 3, because the author has three publications with 3 or more citations. However, the author does not have four publications with 4 or more citations.

Example 1

Input: @citations = (10,8,5,4,3)
Output: 4

Because the 4th publication has 4 citations and the 5th has only 3.

Example 2

Input: @citations = (25,8,5,3,3)
Output: 3

The H-Index is 3 because the fourth paper has only 3 citations.

The questions

No questions asked, as the domain makes it pretty clear that we’re talking “human” numbers, both in terms of size of the input array, as well as each individual count of citations.

The solution

The text/definition steals us the joy of coming up with a solution, because it hints about having the array sorteded in descending order.

In fact, the key is that we have to compare a count of elements in a subset with the values in the subsets. If we start with an array sorted like above, as we move on we progressively include articles with less citations, while at the same time increasing the size of the subset. At this point, it’s just a matter of meeting in the middle.

… and, of course, a viable solution will exist to include all top-cited articles…

Let’s go Perl first:

#!/usr/bin/env perl
use v5.24;
use warnings;
use experimental 'signatures';

say h_index(@ARGV);

sub h_index (@citations) {
   @citations = reverse sort { $a <=> $b } grep { $_ } @citations;
   $_ < $citations[$_] || return $_ for 0 .. $#citations;
   return scalar(@citations);
}

We’re removing articles with no citations at all, because they’re pretty much useless to calculate the H-Index. Then we sort and reverse, to get our descending list of counts.

Then we iterate through the whole array. If we find a crossing point, then we can return; otherwise, every article is part of the H-Index, so we just return the size of the subset of articles with at least one citation.

Raku goes pretty much the same way, making the first preparatory part a bit more readable for many westerners (at least those who are used to reading left-to-right):

#!/usr/bin/env raku
use v6;
sub MAIN (*@args) { put h-index(@args) }

sub h-index (@citations) {
   @citations = @citations».Int.grep({.so}).sort.reverse;
   $_ < @citations[$_] || return $_ for ^@citations;
   return @citations.elems;
}

Stay safe folks!


Comments? Octodon, , GitHub, Reddit, or drop me a line!