Introduction
The current state of reporting is clearly not what people would like. This document attempts to describe where Puppet is in terms of reporting and where we would like it to be.
Definitions
client - A Puppet agent or a machine running the Puppet agent.
reporting - The entire process of getting feedback from where data is produced to people who can use it. This includes collection, transport, storage, and display of the data. Unqualified, that is, reporting by itself, means this entire process.
transaction report - An instance of the Puppet::Transaction::Report class. Each transaction run by a client creates a transaction report, and if reporting is enabled in puppetd the report is sent to the server.
server - The central puppet master(s).
Current State
Puppet has a simple reporting process in place already. Each client-side transaction generates a transaction report containing metrics and log messages. This report is sent to the central server when the transaction is finished, and the server processes the report through each enabled report type (e.g., tagmail or rrdgraph).
Current transaction reports contain relatively few metrics, but they contain every log message generated during the transaction. The metrics mostly revolve around how many resources there are and what their various states are, plus a couple of timing-related metrics.
However, it's currently not possible to get a complete picture of the state of a client. You cannot get a clear report of which resources are having troubles or what the troubles are.
Desired State
The most important missing features are all related to resource details. Specifically, transaction reports need to add information about every resource they check; at the least, the report should say whether the resource was out of sync (or if it was skipped and if so why), and if the resource was out of sync then the report should say whether it was successfully fixed and what attributes were fixed. Optionally, the report could say what the previous and new values are.
With this significant new trove of information contained in the reports, the server needs the ability to process it and users need the ability to view it. The best method for both of these is to create a report processor that stores the reports in a database via Rails, and then extend PuppetShow to enable viewing those reports.
The Plan
There are three aspects of any reporting process: Collection, storage, and display. We'll discuss each piece separately, ignoring the problem of transport since it is already solved for now and won't be changing as a part of this process.
Data Collection
Fortunately, adding this new data will be straightforward: The transaction steps through the resource lists, checking and applying each resource in turn, so we just need to record a little information in the transaction report as we iterate.
- Extend the Transaction::Report class The class itself needs to be extended to understand this new type of data. This probably means creating a simple 'ResourceStatus' struct or something similar, and storing a list of these structs in the report object.
- Add information in each step of the iteration The transaction class needs to be modified to create an instance of the ResourceStatus struct for each resource under management and then add all information possible to this struct. Note that the amount of information for each resource will vary dramatically, because they can be in so many different states (these are not formally defined states that resources can be in, they are just Luke's memory):
- skipped (either because they are not scheduled or because their tags do not match)
- in sync and thus will do nothing
- out of sync but in noop mode
- out of sync but failed to sync (we will probably want the associated failure or log message)
- out of sync but fixed (we will probably want the associated events or success messages, and we might want to know the previous and new values for each changed parameter)
Storage
We should continue supporting just storing the yaml files to disk (note that or ResourceStatus needs to support dumping to YAML, both for transport and plain-text storage), but the main goal is to get this data into a database so PuppetShow can use it.
There's already a resource table in the database, so we just need to add some fields to that table. The fields should roughly correspond to the data collected in the ResourceStatus struct, but the lifecycle of a report is different from that of a resource in the database, so there will be some differences. At the least, there needs to be another field for storing a timestamp to record when the information was last refreshed.
Also, the report will often return information for resources that were not specified in the original configuration. For instance, a recursive file copy will just specify a single file resource in the configuration, but will return a file resource for every file under the tree on the client. The stored resource information should reflect this difference, probably with a boolean indicating whether the resource is specified or generated.
The database schema should probably also be enhanced to allow storage of transaction logs and possible metrics, although the log table would grow very quickly and thus would need to be trimmed often. It'd be best if the transactions could be looked up by id, returning all logs, metrics, and resources related to that transaction, although this would require having per-transaction-per-resource storage in the database which is probably excessive. This can always be enhanced later, so I'd tend towards less information at least at first.
Once the database is updated, models need to be created as necessary for storing information like logs and metrics, and possibly transactions.
Display
PuppetShow will be the display engine for these reports. It will need to be enhanced to support interacting with the new resource information. Based on the data we're storing, it should be relatively obvious what information users will want to see.
Note that users will want the ability to create their own reports, so PuppetShow should support its own drop-in reports just like the current reporting system.