Google:
Compute the h-index of a list of papers, given their citation count. Can you do it in linear time? How about a distributed algorithm for the task?
Facebook:
Given: for every paper authored, there is a citation count vector. The h-index is a measure of researcher importance. h-index: The largest number i such that there are i papers each with at least i citations.
1. Suppose that the citation-vector is sorted, how to efficiently compute the h-index?
2. Suppose that the citation-vector is not sorted, how to efficiently compute the h-index? time complexity? an algorithm with time complexity n?
Princeton algorithm:
Given an array of N positive integers, its h-index is the largest integer h such that there are at least h entries in the array greater than or equal to h. Design an algorithm to compute the h-index of an array.
Hint: median or quicksort-like partitioning and divide-and-conquer.
Solution:
- Create an int[] Histogram as big as the maximum number of publications of any particular scientist).
- If all publication reference counts are stored in another int[] references, then go over this array and, on each publication, if it's reference count is R, then do Histogram[R]++. While doing this, keep the maximum reference count in Max.
- After building the histogram, do a decreasing loop on int[] Histogram from i=Max, adding Histogram[i] values to int hIndex. When hIndex >= i, return i as the hIndex.
... As to the distributed part, let several machines build the Histogram of disjoint sets of somebody's publications, and then have one machine add up those histograms and return hIndex as described above.
1. binary-search (O(log(n)). If citations[i] >= i then h >= i (if array's in descending order).
2. Here's a O(n) time & space solution in ruby. The trick is you can ignore citation-counts larger than n.
If there are 'n' papers in total, this problem can be solved in O(n) with space complexity of O(n). Note that, h-index can be between 0 to n. Say if the h-index is 10, this means, there has to be 10 papers with citation count >= 10. So if we can find out the number of papers with citations >=X for every X (and store it in an array C) where X ranges between 0 to n, then by scanning the count array C from the right to left, we can find the h-index at index i where i == C[i].
Pseudocode:
input array A of length n.
- init array C[0] to C[n] with 0
- foreach p in A, if p >= n, c[n]++; else c[p] +=1
- for i=n-1 to 0, c[i]=c[i]+c[i+1]
- for i=n to 0, if c[i] == i return i
// assume sorted in descending order, O(lgN) public static int getHIndexFromSorted(int[] citation) { int low = 0; int high = citation.length - 1; while(low <= high) { int idx = (low+high)/2; if(citation[idx] >= idx + 1) { low = idx + 1; } else { high = idx - 1; } } return low; } // sort the array, O(NlgN) public static int computeHIndexBySorting(int[] A) { Arrays.sort(A); int h = 0; for (int i = A.length-1; i >= 0; i--) { if(A[i] > h) { h++; } else { return h; } } return -1; } // no need to sort array, O(N) public static int computeHIndex(int[] A) { int n = A.length; int[] s = new int[n+1]; for(int num : A) { num = Math.min(n ,num); s[num]++; } int sum = 0; for (int i = s.length-1; i >= 0; i--) { sum += s[i]; if(sum >= i) { return i; } } return -1; }
Reference:
http://en.wikipedia.org/wiki/H-index
http://www.careercup.com/question?id=14585874
http://algs4.cs.princeton.edu/25applications/
相关推荐
collections-of-the-basis-of-compute-system-2nd.bin
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
elide.zip,elide coreelide是一个java库,它允许您以最小的努力建立一个graphql/json-api web服务。
gradle-git-scm-plugin.zip,gradle-scm插件的git实现gradle-scm插件的it实现
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
openstack-云计算-os-compute-adminguide-trunk.pdf
怎么建? sudo docker build --rm -t="krystism/openstack-nova-compute" . 如何使用 ? 在启动 nova-compute 实例之前,您需要运行以下... docker run -d -e RABBITMQ_NODENAME=rabbitmq -h rabbitmq --name rabbit
官方离线安装包,亲测可用
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,测试可用。使用rpm -ivh [rpm完整包名] 进行安装
官方离线安装包,亲测可用
NVIDIA Nsight Compute 调试工具 - a CUDA kernel profiler supporting Volta and new GPUs。 nsight-compute-mac-2020.2.0.18-28964561.dmg版本。
% entropy - Compute entropy z=H(x) of a discrete variable x. % jointEntropy - Compute joint entropy z=H(x,y) of two discrete variables x and y. % mutInfo - Compute mutual information I(x,y) of two ...
base-compute.sh
python库。 资源全名:azure-mgmt-compute-21.0.0.zip
design and implementation of high-level compute on android systems
英特尔GPU的计算架构,一个系统的描述,有助于了解英特尔的GPU架构
nova-compute源码分析
前端开源库-compute-size.zip