Learn how to start using Trifle.Stats.
Getting Started
Start by adding Trifle.Stats
into your Mix file as {:trifle_stats, "~> 0.1.0"}
and then run mix deps.get
.
Once you're done with that, create a global configuration. If you are doing this as a part of a Rails App, add an initializer config/initializers/trifle.rb
; otherwise place the configuration _somewhere in your ruby code that gets called once your app gets launched.
Once you're done with that, create a global configuration. If you are doing this as a part of Phoenix App, add it into config/config.exs
or appropriate environment file. Otherwise you can create a local configuration and use that.
{:ok, connection} = Mongo.start_link(url: "mongodb://mongo:27017/trifle")
driver = Trifle.Stats.Driver.Mongo.new(connection)
config = Trifle.Stats.Configuration.configure(driver, "Europe/Bratislava", Tzdata.TimeZoneDatabase, :monday, [:hour, :day], "::")
This configuration will create a Mongo driver that will be used to persist your metrics. It will create metrics for two ranges: per-hour and per-day. These ranges will be localized against Europe/Bratislava
timezone.
The beginning_of_week
setting is used if you track weekly metrics (which in above case we don't).
Storing some metrics
Lets say for the sake of sample that we're gonna track execution of a background job that handles uploading your customers something into 3rd party service. We run tons of them and need to know what/how they perform.
The TL;DR version
If you just want to throw some metrics into Trifle.Stats
, go ahead and run couple times following snippet.
now = DateTime.now("Europe/Bratislava")
for i <- 0..10, i > 0, do: Trifle.Stats.track('event::uploads', DateTime.add(now, Enum.random(1..24) * -1, :hour, Tzdata.TimeZoneDatabase), %{ count: 1, duration: Enum.random(4..16) }, config)
What this will do is create you bunch of metrics within last 24h that will track count
as how many times specific event happened, duration
as how long it took it to happen and products
as additional data related to that event.
In depth version
It may sounds contraproductive, but one of the hardest parts of dealing with analytics is modeling your data. So before we start storing anything, lets talk a bit what we're trying to track and what will be its structure.
It doesn't matter if you use Oban, Verc or anything else. There is usually some kind of perform
method and thats all that matters now. The job from this example may have look something like this.
defmodule UploadJob
def perform(customer_id) do
customer = MyApp.Business.get_customer(customer_id)
products = MyApp.Business.list_customers_products(customer)
upload(products)
end
def upload(products) do
# the magic of uploading lives here
end
end
This looks pretty straight forward. Job gets triggered, and beyond setting up some customer
and products
logic, it only calls upload
method that does the heavy lifting.
We would like to keep track of performance of these uploads. We definitely want to keep track of these per-customer and it seems that the upload is gonna be somewhat related to number of uploaded products.
We're gonna track these:
count
- how many times we executed the job/uploadduration
- how long it took to perform the uploadproducts
- how many products we uploaded
To be able to get duration, we need to store timestamp before we started the execution/upload. So lets add two new methods: start
and track
.
def track(start, products) do
Trifle.Stats.track(
'event::uploads', DateTime.utc_now(),
%{
count: 1,
duration: DateTime.diff(DateTime.utc_now(), start),
products: List.count(products)
}
)
end
The key
is the identifier of your metrics. You will use this later to retrieve all your metrics for a specified timeframe. The above example is still missing tracking per customer.
Here is a decision that you need to make. There are two ways to move forward from here and it really depends on "how many" items you will have in a subcategory.
- If you have only fixed and small number of customers, it is better to insert customer into main payload and retrieve all these values at once. This would allow you to display it all in a single dashboard with a single query without users need to filter anything out.
- If you have unknown number of customers that will (hopefully) grow, you may want to avoid putting everything into single key as these values would grow into huge payloads. In this case it is better to specify another tracking where customer is part of the key. Ie
"event::upload::#{customer.id}"
. And then use this customer-only key to retrieve the data.
It really depends on the usecase you have. For now, let's pretend that we have small and fixed number of customers.
We're gonna expand the track
method bit further.
def track(start, customer, products) do
Trifle.Stats.track(
'event::uploads', DateTime.utc_now(),
%{
count: 1,
duration: DateTime.diff(DateTime.utc_now(), start),
products: List.count(products),
customers: %{
customer.id => %{
count: 1,
duration: DateTime.diff(DateTime.utc_now(), start),
products: List.count(products)
}
}
}
)
end
Now within the values
you will be storing count
, duration
and products
per specific customer as well. We may want to dry this up a bit as calling DateTime.utc_now()
may result in bit different values each time you call it. So lets do something with it.
def track(start, customer, products) do
values = %{
count: 1,
duration: DateTime.diff(DateTime.utc_now(), start),
products: List.count(products)
}
Trifle.Stats.track(
'event::uploads', DateTime.utc_now(),
Map.merge(values, %{customers: %{ customer.id => values }})
)
end
Alright, now we store values
and inside of it we merge customers
values under each customers ID. Sounds reasonable. Here is the full job sample.
defmodule UploadJob
def perform(customer_id) do
start = DateTime.utc_now()
customer = MyApp.Business.get_customer(customer_id)
products = MyApp.Business.list_customers_products(customer)
upload(products)
track(start, customer, product)
end
def upload(products) do
# the magic of uploading lives here
end
def track(start, customer, products) do
values = %{
count: 1,
duration: DateTime.diff(DateTime.utc_now(), start),
products: List.count(products)
}
Trifle.Stats.track(
'event::uploads', DateTime.utc_now(),
Map.merge(values, %{customers: %{ customer.id => values }}),
config()
)
end
def config do
Trifle.Stats.Configuration.configure(...)
end
end
You may feel like by now we've already invested too much time into this. Trust me when I say that modeling your data early on will save you headaches down the road. I can already tell you that we've made a mistake by storing duration
as a value. You can read more about that in How to Guides
. Now lets move on to retrieving the data.
Retrieving stats
By now you've either run the quick snippet or you let the background job be executed couple times. That means you should have some numbers stored in.
Trifle.Stats
allows you to retrieve values for a tracked range
and specified period between from
and to
.
now = DateTime.now("Europe/Bratislava")
before = DateTime.add(now, -24, :hour, Tzdata.TimeZoneDatabase)
stats = Trifle.Stats.values('event::uploads', before, now, :hour, UploadJob.config())
In above case we're retrieving hourly values for last 24 hours for the event::uploads
key. The returned data is a hash with two arrays. In first array under key at
you receive list of timestamps and in second array under key values
you receive list of values.
%{
at: [#DateTime<2023-06-23 04:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 05:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 06:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 07:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 08:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 09:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 10:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 11:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 12:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 13:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 14:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 15:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 16:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 17:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 18:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 19:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 20:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 21:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 22:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-23 23:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-24 00:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-24 01:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-24 02:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-24 03:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-24 04:00:00+02:00 CEST Europe/Bratislava>],
values: [
%{"count" => 6, "duration" => 70},
%{"count" => 2, "duration" => 25},
%{"count" => 3, "duration" => 31},
%{"count" => 2, "duration" => 20},
%{"count" => 2, "duration" => 9},
%{"count" => 2, "duration" => 16},
%{"count" => 2, "duration" => 15},
%{"count" => 3, "duration" => 16},
%{"count" => 3, "duration" => 19},
%{"count" => 3, "duration" => 29},
%{"count" => 3, "duration" => 24},
%{"count" => 2, "duration" => 20},
%{"count" => 2, "duration" => 24},
%{"count" => 4, "duration" => 27},
%{"count" => 4, "duration" => 34},
%{"count" => 4, "duration" => 45},
%{"count" => 5, "duration" => 52},
%{"count" => 2, "duration" => 20},
%{"count" => 4, "duration" => 42},
%{"count" => 3, "duration" => 35},
%{"count" => 3, "duration" => 26},
%{"count" => 2, "duration" => 24},
%{"count" => 2, "duration" => 21},
%{"count" => 2, "duration" => 21},
%{"count" => 3, "duration" => 25}
]
}
You may recall that in configure
we specified that we want to track hour
and day
. All you need to do is to provide a desired range
.
iex(37)> stats = Trifle.Stats.values('event::uploads', before, now, :day, config)
%{
at: [#DateTime<2023-06-23 00:00:00+02:00 CEST Europe/Bratislava>,
#DateTime<2023-06-24 00:00:00+02:00 CEST Europe/Bratislava>],
values: [
%{"count" => 61, "duration" => 573},
%{"count" => 9, "duration" => 92}
]
}
And thats it. Now you've successfully stored bunch of metrics in Mongo and retrieved the hourly stats for last day. I know it's simple. It's crazy simple. And you can do quite a lot with it. Check out Case Studies for some real world examples.