Cfengine vs. Puppet
Cfengine is currently the most widely deployed configuration management tool. In many ways, Puppet can be thought of as a next-generation version of cfengine, in that many of puppet's design goals are specifically derived from experience with cfengine and are meant to overcome many of cfengine's weaknesses. Puppet has significant differences from cfengine and is not just a direct descendent, but this document focuses on their direct correlations.
This document summarizes the primary advances that Puppet makes over cfengine's current state.
An Open Development Community
While Puppet's technical innovations are clearly important to its success, one of the biggest differences is non-technical: The development community around Puppet is very open and is predicated on the belief that everyone should be able to contribute ideas and code to the project. Commit access is given pretty easily, and we do everything we can to encourage contributions of various kinds. We know that we don't have all of truth, and we're more interested in a great tool than in being right.
Cross-Platform Support
Puppet was developed with a strong focus on cross-platform support, because most organizations have to deal with this problem but also because otherwise solutions cannot easily be shared among organizations. Puppet's configuration language and functional back-end are the centers of its multiple platform support.
A Resource Abstraction Layer
Cfengine is a great way to scale common administrative practices -- you can move from using SSH and a for loop to using Cfengine pretty smoothly. However, there is just as much complexity present in either form. You still have to handle file contents, and you still have to manage operating system differences yourself -- you have to know whether it's useradd or adduser, whether it's init or Sun's SMF, and what the format of the filesystem tab is.
One of Puppet's primary innovations is a resource abstraction layer, so that you do not have to know those details. You can speak in terms of resources like users, services, or filesystems, and Puppet will translate them to the appropriate commands on each system. Puppet administrators are free to focus on the complexity of their configurations, rather than being forced to also handle that complexity plus the complexity of the differences between the operating systems.
Puppet's development was heavily influenced by the many external modules that Luke Kanies wrote for cfengine, each module managing a separate resource like users, packages, or cron jobs, and one of Puppet's primary goals was to be able to make it easy to expand the number of resource types it can manage.
Configuration Language
Cfengine makes it surprisingly difficult to provide support for multiple architectures because of how low-level it is. All path variance (e.g., /usr/sbin/sshd vs. /usr/local/sbin/sshd) must be handled manually and often in multiple places (e.g., both starting a service and restarting it). Puppet provides a notion of a title (e.g., "sshd") and a name (which might be "sshd" or "openssh") for any resource, allowing administrators to pick a single title for all resources yet use the name as necessary for functionality. Dependency resolution can use either, making resource relationships simple. For instance, here is how one might support ssh on multiple platforms in Puppet:
class ssh {
file { sshdconfig:
path => $operatingsystem ? {
solaris => "/etc/sshd_config",
default => "/etc/ssh/sshd_config"
},
source => "puppet://..."
}
service { $operatingsystem ? {
solaris => "openssh",
default => "sshd" }:
subscribe => File[sshdconfig],
ensure => running
}
}
The 'subscribe' attribute in the service causes the service to restart if the configuration file changes.
Here is how a similar configuration would look in Cfengine:
# ssh.cf
control:
AddInstallable = ( restart_sshd )
solaris::
sshdconfig = ( "/etc/sshd_config" )
sshd = ( openssh )
sshdinit = ( "/etc/init.d/openssh" )
!solaris::
sshdconfig = ( "/etc/ssh/sshd_config" )
sshd = ( sshd )
sshdinit = ( "/etc/init.d/sshd" )
copy:
"/my/path/to/source" dest=$sshdconfig
define=restart_sshd
processes:
$sshd restart "${sshdinit} start"
shellcommands:
restart_sshd::
"${sshdinit} restart"
Notice the amount of effort it takes to abstract just a couple of differences in paths, and to what extent that extra code conceals what your real goal is -- to make sure sshd is running, and that it restarts if the configuration file changes.
Handling Configuration Complexity
The true test of a tool's usefulness is how well its usability scales up as the configurations become more complex. A significant motivation for the development of Puppet was frustration at how difficult it is to maintain complex configurations within Cfengine.
Ordering
Cfengine makes it very difficult to correctly order operations. While Mark Burgess's basic point is that procedural tools are not a good fit for configuration management, it is nonetheless clearly obvious that some operations must happen in a specific order, such as installing a package before attempting to start the associated service. Cfengine provides some hacks for doing this, but they're very difficult to maintain as configurations grow, and they significantly increase the amount and complexity of code.
Puppet resolves ordering problems with relationships. Just like the newer init.d replacements use dependency information to determine service start-up order, Puppet uses dependency information to determine operational order. For instance, if you specify that a service depends on its configuration file, then Puppet will guarantee that the file will always be checked (and fixed, if necessary) before the service. The following is a very common construct in Puppet:
class ntp {
# Make sure the package is installed
package { ntp: ensure => installed }
# And the configuration file, but the configuration file
# gets installed after the package, so it overwrites
file { "/etc/ntpd.conf":
source => "/.../ntpd.conf",
require => Package[ntp]
}
# And start the service, after both other resources
# are done and in such a way that it will restart if
# either changes
service { ntpd:
ensure => running,
subscribe => [Package[ntp], File["/etc/ntpd.conf"]]
}
}
This will always apply these resources in the fixed order of package, file, then service.
Puppet builds a graph of all of the resources and their relationships, then does a topological sort to determine the order.
Code Reuse
Cfengine has no way to reuse code without code generation, which means that any complex configuration relies on either copy/paste development, which obviously results in code duplication, or using templating systems like m4. Puppet has a built-in mechanism for repeatedly using a set of resources:
define vhost(ip = "*:80", docroot = false, htmlsource, order = 500, ensure = "enabled") {
# Set the docroot, if necessary
$realdocroot = $docroot ? {
false => "/export/docroots/$name/htdocs",
default => $docroot
}
# pull down the data to serve
file { $docroot: source => $htmlsource, recurse => true }
# Create the vhost config file
file { "/etc/apache2/sites-available/$name":
content => " ",
notify => service[apache] # restart apache if this changes
}
case $ensure {
enabled: {
# Create the link
file { "/etc/apache2/sites-enabled/$order-$name":
ensure => "/etc/apache2/sites-available/$name",
notify => service[apache]
}
}
default: {
# Make sure the link is missing
file { "/etc/apache2/sites-enabled/$order-$name":
ensure => absent,
notify => service[apache]
}
}
}
}
Note that you could pull that virtual host configuration into an external template if you wanted:
# Create the vhost config file
file { "/etc/apache/sites-available/$name":
content => template("vhost.erb")
notify => service[apache] # restart apache if this changes
}
Now that you have the definition, you can reuse it as many times as you want:
vhost { "reductivelabs.com":
htmlsource => "/nfs/html/reductivelabs.com"
}
vhost { "madstop.com":
htmlsource => "/nfs/html/madstop.com"
}
Or you can make a bunch at once:
vhost {
"reductivelabs.com": htmlsource => "/nfs/html/reductivelabs.com";
"madstop.com": htmlsource => "/nfs/html/madstop.com";
"kanies.com":
htmlsource => "/nfs/html/kanies.com",
ip => "192.168.0.3:80";
"nosite.com":
ensure => disabled,
htmlsource => "/nfs/html/nosite.com"
}
Dedication
Puppet is supported by an organization dedicated to creating the best system automation software, and we expect to have a staff of at least a few people dedicated to development, support, consulting, and custom development. Contrast this with cfengine, which is supported by a professor whose primary use for the software is in research into anomalies.
Cfengine's author is only now starting to invest on community involvement in its development; while its author has always accepted patches from the community, he has been hesitant to provide standard project features like a version repository and a bug database, and as a result cfengine's large user base has not resulted in a large development community.
Because Reductive Labs is a commercial enterprise dependent on customer satisfaction for its survival, our customers will have a large say in how best to develop Puppet, and we'll be doing everything we can to develop a strong community just as dedicated to Puppet and server automation as we are. Our goal is also to have multiple developers dedicated full time to Puppet development, which should significantly accelerate feature development compared to cfengine.
Decoupling
Puppet's parser knows how to interact with the list of available resource types, but it never knows anything about specific types. Thus, adding a new type only requires creating the type, you will never have to modify the parser or anything else in the stack. In contract, new types in cfengine require modifications all the way through the stack, including the lexer. Puppet will even automatically load any new types you create -- just drop them into your search path and you can start using them (or, even better, drop them into the plugins directory and the client will automatically retrieve and load them).
Puppet also uses the industry-standard XMLRPC protocol for communication between Puppet clients and servers, so the protocol is easy to study and either end could be replaced by another service if desired. You can write your own tools to interact with Puppet clients, if you want.
Development Methodology
Reductive Labs is a big believer in enhancing developer productivity. Puppet is being written in Ruby because it is a high-level language that is easy to use yet provides significant productivity enhancements over low-level languages like C. Reductive Labs also strongly believes that unreadable code is bad code; if you can't easily follow a code path in Puppet, then you've found a bug.
Lastly, we assiduously unit test our code. We're always looking for more ways to test our code, and every bug we quash gets turned into a unit test so we know we'll never release that bug again. Puppet's code base has consistently been more than 30% test code, which makes a big difference in how supportable and stable Puppet is.