Copyright (c) 2012-2016 Chef Software, Inc.
Authors: Seth Falcon ([email protected]).
The stats_hero application can help you instrument your Erlang
application with metrics that will be reported to estatsd and
browsable in Graphite.
Include the stats_hero.hrl file. Wrap calls to upstream services
using the ?SH_TIME(ReqId, Mod, Fun, Args) macro. Here's an example
from the chef_db module:
fetch_user(#context{reqid = ReqId, otto_connection = _Server} = _Context, UserName) ->
case ?SH_TIME(ReqId, chef_sql, fetch_user, (UserName)) of
{ok, not_found} ->
not_found;
{ok, #chef_user{}=User} ->
User;
{error, Error} ->
{error, Error}
end.
The ReqId argument is a request id generated by our front-end load
balancers (via nginx module). It is used to lookup a helper process
that aggregates metrics before sending them to estatsd. The macro
calls stats_hero:ctime/3 and the stats_hero side of the computation is a
no-op if no worker process is found that matches the specified request
id. Note, however, that determining if there is a match requires the
stats_hero application to be running. So instrumented code will run
fine with a missing or crashed worker process, but will not function
if the stats_hero application is not running.
In order to generate consistent labels in graphite, stats_hero
maintains a mapping of module names to upstream names. The Mod
argument serves a double purpose: it is used as the module name for
the call to evaluate (and time) and it gets mapped to a generic
upstream name (e.g. chef_sql maps to rdbms). A future enhancement
would put this mapping in app config for stats_hero. If Mod is not
recognized, an error is raised.
At the begining of your request, start a stats_hero worker process
using stats_hero_worker_sup:new_worker/1 like this:
Config = [{estatsd_host, EstatsdServer},
{estatsd_port, EstatsdPort},
{request_id, ReqId},
{my_app, "ChefAPI"},
{request_label, RequestLabel},
{request_action, atom_to_list(wrq:method(Req))},
{upstream_prefixes, [<<"mysql">>, <<"couch">>, <<"authz">>, <<"solr">>]}],
stats_hero_worker_sup:new_worker(Config)
This will create the stats_hero worker and set the calling process as parent. In the event the calling process terminates prior to cleaning up the worker, then the stats_hero process will log and terminate gracefully. It is possible to change the monitored parent process using the reparent api.
stats_hero:reparent(ReqId),
You can also reparent using the two arity api where you supply a target process.
stats_hero:reparent(ReqId, Pid)
Then proceed to call code instrumented as described above. When your
request is complete, tell your worker process and it will send a total
time for the request to estatsd along with details on cummulative time
spent in the specified upstreams. Use stats_hero:report_metrics/2
for that. For example,
stats_hero:report_metrics(ReqId, Code),
Where ReqId is your request id as a binary and Code is an HTTP
response code (or similar integer).
Finally, you need to tell the stats_hero worker that you're done
with it. Use stats_her:stop_worker/1 for that.
You can retrieve call timing and count data from your stats_hero worker using stats_hero:snapshot/2. This will return a proplist suitable for logging with the fast_log module tuple logger.
Here's the usage from chef_rest_wm:
log_request(Req, #base_state{reqid = ReqId, log_msg = Msg}) ->
Status = wrq:response_code(Req),
Tuples = [{req_id, ReqId},
{status, Status},
{method, wrq:method(Req)},
{path, wrq:raw_path(Req)},
{user, wrq:get_req_header("x-ops-userid", Req)},
{msg, {raw, Msg}}],
PerfTuples = stats_hero:snapshot(ReqId, agg),
Level = log_level(Status),
fast_log:Level(erchef, Tuples ++ PerfTuples).
log_level(Code) when Code >= 500 ->
err;
log_level(_) ->
info.
The stats_hero workers are started using a simple_one_for_one
dynamic supervisor. Each worker updates a shared ETS table that
maintains a two-way mapping between worker pids and request ids. The
workers also register with the stats_hero_monitor which monitors
each worker to ensure that the pid/reqid mapping is cleaned from the
ETS table when the worker exits. The top-level supervisor initializes
the ETS table.
Currently, the metrics sent to estatsd (UDP) are as follows and are hard-coded into the stats_hero worker. At the start of a request, the worker sends the following data:
1|167
test_hero.application.allRequests:1|m
test_hero.test-host.allRequests:1|m
test_hero.application.byRequestType.nodes.PUT:1|m
When stats_hero:report_metrics/2 is called the worker sends:
1|640
test_hero.application.byStatusCode.200:1|m
test_hero.test-host.byStatusCode.200:1|m
test_hero.application.allRequests:108|h
test_hero.test-host.allRequests:108|h
test_hero.application.byRequestType.nodes.PUT:108|h
test_hero.upstreamRequests.rdbms:1200|h
test_hero.upstreamRequests.authz:100|h
test_hero.upstreamRequests.rdbms.nodes.put:200|h
test_hero.upstreamRequests.rdbms.nodes.fetch:1000|h
test_hero.upstreamRequests.authz.nodes.read:100|h
test_hero.application.byRequestType.nodes.PUT.upstreamRequests.rdbms:1200|h
test_hero.application.byRequestType.nodes.PUT.upstreamRequests.authz:100|h
The current handling of upstream prefixes and module mappings is not
very flexible. You can specify a list of upstream prefixes when the
worker is initialized and any metrics (ctimer) with a label that
matches one of the prefixes will be appropriately aggregated in
reporting. However, the helper macro used to instrument code currently
relies on stats_hero:label/2 which hard-codes a mapping of
module to upstream label. We'd like to make this configurable.
Copyright 2011-2016 Chef Software, Inc. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you
may not use this file except in compliance with the License. You may
obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied. See the License for the specific language governing
permissions and limitations under the License.
| stats_hero |
| stats_hero_app |
| stats_hero_monitor |
| stats_hero_sender |
| stats_hero_sender_sup |
| stats_hero_sup |
| stats_hero_worker_sup |