flmflm’s goal is to ensure challenge submissions are fairly ranked. We publicly describe how we calculate leaderboards so you know what to expect and hold us to account. The following is a technical description of our leaderboard algorithm.
Summary
Partially order a set of video submissions by the relative votes each receives vis a vis the other. By relative vote we mean the votes that submission A receives given the voter had previously seen submission B. Assuming votes are normally distributed, if two submissions’s votes are not significantly different, then they are equally ranked.
Assumptions
- the minimum sample size of relative votes is 10
- the confidence interval for determining if the mean votes for two submissions are unequal is 90%
Algorithm
Given a challenge and its submissions
- Collect all votes for a challenge and calculate the attenuated score for each
- Index all votes by the pair of its id and each of the other submissions’s ids the voter saw previously. If any set of these votes is less than the minimum sample size, then stop—the leaderboard cannot yet be calculated.
- Sort the set of all submissions cards using merge-sort and the comparator function below
- calculate the average score of attenuated votes for submission A given B seen prior and submission B given A seen prior. If the former is higher, then A is less than B, otherwise B is less than A.
- Take the top 5 submissions from the sorted set. For every two adjacently sorted submissions (1-2, 2-3, 3-4, 4-5) determine if the two are statistically equal using the following procedure
- Given the set of votes for A given B seen prior, and B given A seen prior, calculate the sample mean, and sample standard deviation of each set.
- Calculate the degrees of freedom, v, of each set using the Welch-Satterthwaite equation
- Calculate t using Welch’s t-test
- interpolate the t value, q, from t-test distribution using v and the confidence interval. If t < q then the two submissions are equally ranked.