Learn how to start using Trifle::Stats.
Getting Started
Start by adding Trifle::Stats
into your Gemfile
as gem 'trifle-stats'
and then run bundle install
.
Once you're done with that, create a global configuration. If you are doing this as a part of a Rails App, add an initializer config/initializers/trifle.rb
; otherwise place the configuration _somewhere in your ruby code that gets called once your app gets launched.
require 'redis'
Trifle::Stats.configure do |config|
config.driver = Trifle::Stats::Driver::Redis.new(Redis.new)
config.track_ranges = [:hour, :day]
config.time_zone = 'Europe/Bratislava'
config.beginning_of_week = :monday
config.designator = Trifle::Stats::Designator::Linear.new(min: 0, max: 100, step: 10)
end
This configuration will create a Redis driver that will be used to persist your metrics. It will create metrics for two ranges: per-hour and per-day. These ranges will be localized against Europe/Bratislava
timezone.
The beginning_of_week
setting is used if you track weekly metrics (which in above case we don't). The designator
is used for tracking histogram distribution.
Storing some metrics
Lets say for the sake of sample that we're gonna track execution of a background job that handles uploading your customers something into 3rd party service. We run tons of them and need to know what/how they perform.
The TL;DR version
If you just want to throw some metrics into Trifle::Stats
, go ahead and run couple times following snippet.
100.times do
Trifle::Stats.track(key: 'event::uploads', at: Time.zone.now - rand(1.day), values: { count: 1, duration: rand(4..16), products: rand(20..1000) })
end
What this will do is create you bunch of metrics within last 24h that will track count
as how many times specific event happened, duration
as how long it took it to happen and products
as additional data related to that event.
In depth version
It may sounds contraproductive, but one of the hardest parts of dealing with analytics is modeling your data. So before we start storing anything, lets talk a bit what we're trying to track and what will be its structure.
The job from this example may have look something like this.
class UploadJob
include Sidekiq::Job
def perform(customer_id)
@customer_id = customer_id
upload
end
def upload
# the magic of uploading lives here
end
def products
@products ||= Product.where(customer:)
end
def customer
@customer ||= Customer.find(@customer_id)
end
end
This looks pretty straight forward. Job gets triggered, and beyond setting up some customer
and products
logic, it only calls upload
method that does the heavy lifting.
We would like to keep track of performance of these uploads. We definitely want to keep track of these per-customer and it seems that the upload is gonna be somewhat related to number of uploaded products.
We're gonna track these:
count
- how many times we executed the job/uploadduration
- how long it took to perform the uploadproducts
- how many products we uploaded
To be able to get duration, we need to store timestamp before we started the execution/upload. So lets add two new methods: start
and track
.
def start
@start ||= Time.zone.now
end
def track
Trifle::Stats.track(
key: 'event::uploads', at: Time.zone.now,
values: {
count: 1,
duration: Time.zone.now - start,
products: products.count
}
)
end
The key
is the identifier of your metrics. You will use this later to retrieve all your metrics for a specified timeframe. The above example is still missing tracking per customer.
Here is a decision that you need to make. There are two ways to move forward from here and it really depends on "how many" items you will have in a subcategory.
- If you have only fixed and small number of customers, it is better to insert customer into main payload and retrieve all these values at once. This would allow you to display it all in a single dashboard with a single query without users need to filter anything out.
- If you have unknown number of customers that will (hopefully) grow, you may want to avoid putting everything into single key as these values would grow into huge payloads. In this case it is better to specify another tracking where customer is part of the key. Ie
"event::upload::#{customer.id}"
. And then use this customer-only key to retrieve the data.
It really depends on the usecase you have. For now, let's pretend that we have small and fixed number of customers.
We're gonna expand the track
method bit further.
def track
Trifle::Stats.track(
key: 'event::uploads', at: Time.zone.now,
values: {
count: 1,
duration: Time.zone.now - start,
products: products.count,
customers: {
customer.id => {
count: 1,
duration: Time.zone.now - start,
products: products.count
}
}
}
)
end
Now within the values
you will be storing count
, duration
and products
per specific customer as well. We may want to dry this up a bit as calling Time.zone.now
may result in bit different values each time you call it. So lets do something with it.
def track
values = {
count: 1,
duration: Time.zone.now - start,
products: products.count
}
Trifle::Stats.track(
key: 'event::uploads', at: Time.zone.now,
values: values.merge(customers: { customer.id => values })
)
end
Alright, now we store values
and inside of it we merge customers
values under each customers ID. Sounds reasonable. Here is the full job sample.
class UploadJob
include Sidekiq::Job
def perform(customer_id)
@customer_id = customer_id
start
upload
track
end
def upload
# the magic of uploading lives here
end
def products
@products ||= Product.where(customer:)
end
def customer
@customer ||= Customer.find(@customer_id)
end
def start
@start ||= Time.zone.now
end
def track
values = {
count: 1,
duration: Time.zone.now - start,
products: products.count
}
Trifle::Stats.track(
key: 'event::uploads', at: Time.zone.now,
values: values.merge(customers: { customer.id => values })
)
end
end
You may feel like by now we've already invested too much time into this. Trust me when I say that modeling your data early on will save you headaches down the road. I can already tell you that we've made a mistake by storing duration
as a value. You can read more about that in How to Guides
. Now lets move on to retrieving the data.
Retrieving stats
By now you've either run the quick snippet or you let the background job be executed couple times. That means you should have some numbers stored in.
Trifle::Stats
allows you to retrieve values for a tracked range
and specified period between from
and to
. You can do this by using either .values
or .series
methods.
irb(main):001:0> stats = Trifle::Stats.values(key: 'event::uploads', from: 1.day.ago, to: Time.zone.now, range: :hour)
In above case we're retrieving hourly values for last 24 hours for the event::uploads
key. The returned data is a hash with two arrays. In first array under key at
you receive list of timestamps and in second array under key values
you receive list of values.
=>
{:at=>
[2023-03-04 16:00:00 +0100,
2023-03-04 17:00:00 +0100,
2023-03-04 18:00:00 +0100,
2023-03-04 19:00:00 +0100,
2023-03-04 20:00:00 +0100,
2023-03-04 21:00:00 +0100,
2023-03-04 22:00:00 +0100,
2023-03-04 23:00:00 +0100,
2023-03-05 00:00:00 +0100,
2023-03-05 01:00:00 +0100,
2023-03-05 02:00:00 +0100,
2023-03-05 03:00:00 +0100,
2023-03-05 04:00:00 +0100,
2023-03-05 05:00:00 +0100,
2023-03-05 06:00:00 +0100,
2023-03-05 07:00:00 +0100,
2023-03-05 08:00:00 +0100,
2023-03-05 09:00:00 +0100,
2023-03-05 10:00:00 +0100,
2023-03-05 11:00:00 +0100,
2023-03-05 12:00:00 +0100,
2023-03-05 13:00:00 +0100,
2023-03-05 14:00:00 +0100,
2023-03-05 15:00:00 +0100],
:values=>
[{"count"=>5, "duration"=>58, "products"=>2551},
{"count"=>5, "duration"=>57, "products"=>2844},
{"count"=>2, "duration"=>24, "products"=>1400},
{"count"=>3, "duration"=>32, "products"=>2019},
{"count"=>7, "duration"=>74, "products"=>3248},
{"count"=>5, "duration"=>42, "products"=>1520},
{"count"=>4, "duration"=>33, "products"=>1668},
{"count"=>10, "duration"=>89, "products"=>6172},
{"count"=>2, "duration"=>23, "products"=>994},
{"count"=>3, "duration"=>23, "products"=>1483},
{"count"=>6, "duration"=>60, "products"=>4400},
{"count"=>2, "duration"=>27, "products"=>796},
{"count"=>1, "duration"=>10, "products"=>456},
{"count"=>3, "duration"=>29, "products"=>1602},
{"count"=>5, "duration"=>69, "products"=>2238},
{"count"=>3, "duration"=>32, "products"=>1619},
{"count"=>8, "duration"=>67, "products"=>4594},
{"count"=>1, "duration"=>4, "products"=>322},
{"count"=>1, "duration"=>9, "products"=>963},
{"count"=>11, "duration"=>107, "products"=>6146},
{"count"=>2, "duration"=>11, "products"=>1762},
{"count"=>2, "duration"=>12, "products"=>1218},
{"count"=>3, "duration"=>20, "products"=>2028},
{"count"=>6, "duration"=>45, "products"=>3572}]}
You may recall that in configure
we specified that we want to track hour
and day
. All you need to do is to provide a desired range
.
irb(main):001:0> stats = Trifle::Stats.values(key: 'event::uploads', from: 1.day.ago, to: Time.zone.now, range: :day)
=> {:at=>[2023-03-04 00:00:00 +0100, 2023-03-05 00:00:00 +0100], :values=>[{"count"=>41, "duration"=>409, "products"=>21422}, {"count"=>59, "duration"=>548, "products"=>34193}]}
Working with values is great as that gives you full controll over what you want to do with them. Sometimes you don't wanna get that dirty and staying on higher level is completely fine. In that case you can use series
to get same values and work with the data using transponders, aggregators and formatters.
irb(main):001:0> series = Trifle::Stats.values(key: 'event::uploads', from: 1.day.ago, to: Time.zone.now, range: :day)
{=> #<Trifle::Stats::Series:0x0000ffffa14256e8 @series={:at=>[2023-03-04 00:00:00 +0100, 2023-03-05 00:00:00 +0100], :values=>[{"count"=>41, "duration"=>409, "products"=>21422}, {"count"=>59, "duration"=>548, "products"=>34193}]}>
irb(main):002:1> series.aggregate.sum(path: 'count')
=> 0.1e3
And thats it. Now you've successfully stored bunch of metrics in Redis and retrieved the hourly stats for last day. I know it's simple. It's crazy simple. And you can do quite a lot with it. Check out Case Studies for some real world examples.