Trifle
GitHub
Trifle::Stats / How to Guides / Percentiles
Learn how to calculate average with percentiles.

Percentiles

Average or mean can easily be biased by skewed distribution. Therefore knowing just whats the average may not always be enough. If you want to avoid displaying distribution, you may want to use 95th and/or 99th percnetiles instead.

To be able to calculate percentiles you will need to calculate standard deviation and normal distribution.

Usually standard deviation is caluclated on top of your data, but in this case we're not preserving all instances of events, just the summary of them. To get around it, we need to preserve three values:

  • count - total number of events.
  • sum - aggregated sum of the event value.
  • square - aggregates square value of the event.

With these three values, we are able to calculate standard deviation. Lets take it to practical example of duration.

def duration(seconds)
  {
    count: 1,
    sum: seconds,
    square: seconds**2
  }
end

When you run this couple times, you will see what kind of payload it will be generating. The Trifle::Stats will then preserve the sum of it.

irb(main):001:1* def duration(seconds)
irb(main):002:2*   {
irb(main):003:2*     count: 1,
irb(main):004:2*     sum: seconds,
irb(main):005:2*     square: seconds**2
irb(main):006:1*   }
irb(main):007:0> end
=> :duration
irb(main):008:0> duration(10)
=> {:count=>1, :sum=>10, :square=>100}
irb(main):009:0> duration(12)
=> {:count=>1, :sum=>12, :square=>144}
irb(main):010:0> duration(8)
=> {:count=>1, :sum=>8, :square=>64}
irb(main):011:0> duration(16)
=> {:count=>1, :sum=>16, :square=>256}
irb(main):012:0> duration(12)
=> {:count=>1, :sum=>12, :square=>144}
irb(main):013:0> duration(10)
=> {:count=>1, :sum=>10, :square=>100}

With these values, the sum will be { count: 5, sum: 68, square: 808 }. This is still bit far away from percentiles.

Standard Deviation

Just like with average, Trifle::Stats wont give you that right away. To get the average, you need to calculate sum / count. Same with standard deviation, you need to use an equation for that. Luckily we can laverage Rapid calculation method for Standard Deviation.

average = (sum / count)
sd = Math.sqrt((count * square - sum * sum) / (count * (count - 1)))

Thats somewhat simple, but not so straight forward. There are also other methods how to calculate standard deviation that you may want to explore.

95th and 99th Percentile

Once you have standard deviation sd, it is easy to calculate percentiles.

p95 = average + sd * 1.98
p99 = average + sd * 2.58

And voila. Now you have average, 95th percentile as well as 99th percentile. Which are the holy grail of performance monitoring.

Transponders

Alternatively you can calculate percentiles by using Standard Deviation Transponder and then calculating appropriate percentile in Timeline Formatter and avoid doing all the manual calculation yourself!