Fabric is a deployment automation tool popular among python developers. Since 1.0 release it allows you to do pretty much anything you would normally do manually with ssh.
Fabric is simple and concentrates on providing an API, a number of utility functions and an easy way to run tasks defined inside fabfiles. But although fabric allows you to execute functions on multiple hosts at once, it lacks any sufficiently advanced instruments to store and apply server-specific options. Implementing such instrument is the main goal of this post.
I’ve spent some time researching possible solutions to the following problem, however I should note that I could’ve missed some obvious and easier solution or, since this was before the release of fabric 1.0, such solution may have been introduced to fabric itself by the time of this writing.
The Problem
Suppose we have three projects: ProjectA, ProjectB and ProjectC deployed on three corresponding servers: ServerA, ServerB and ServerC. We want:
- to deploy each project to its own server with one shell command
- to deploy all projects at once with one shell command
- to run maintenance tasks on a subgroup of servers with one shell command
Fabric solves first problem really good, but it proposes no apparent solution for the rest.
Let’s say our deployment workflow is similar for all projects and consists of two basic steps: updating project source code from repository and restarting the server process. We could implement such workflow with three fabric tasks:
def update():
# ...
def restart():
# ...
def deploy():
update()
restart()
Writing these tasks for a single project is simple, however since we need them for each project it would be nice to follow the DRY principle and implement them only once for use on any project with similar deployment workflow.
And that’s where things get tricky. We can’t really implement the update()
function without either
hardcoding the source code repository location and shell command to pull the update or some sort of
project config with repository location, which update()
can address.
The same applies to restart()
function. It’s easy to write a function that restarts apache
,
but what if one of our projects is served by nginx
or lighttpd
? How can we write a restart()
function that takes care of all the boilerplate of process restart(switching to correct user,
performing appropriate checks and so on) and restarts the correct process depending on the project
config.
When working with fabric, this is usually solved by defining environment-initializing functions for each server:
def server_a():
env.hosts = 'a.example.org'
# ...
def server_b()
# ...
But this makes it impossible to run one task on multiple servers at once. I.e. running
$ fab server_a server_b deploy
won’t do what you expect it to (if you expect it to deploy to two servers, that is).
Following is a fabfile template that will allow you to use server-specific configuration options inside your task, an instruction on how to use it and a detailed explanation of implementation.
Solution in a nutshell
- Download base fabfile.py template. It requires fabric version 0.9 or later,
includes a bunch of internal functions and defines one task:
s
. You can add your own tasks to this fabfile or, if you’re using fabric version 1.0 or later, you can create fabfile folder, move it there and define your own tasks in other files inside that folder. - Add
@_setup
decorator to tasks that need additional server options. -
Address all required server options as
env
attributes from your fabric tasks:@_setup def restart(): if env.webserver == "apache": # restart apache else: # ...
Attributes aren’t namespaced and get automatically added to
env
by_setup
decorator so you have to be careful not to overwrite any of the builtin env arguments. - In the root folder of your project create a file
server_config.yaml
orserver_config.json
if you prefer json syntax. To use YAML config file you need to have PyYaml installed (pip install pyyaml
will generally suffice), while JSON requires eithersimplejson
or python 2.6 with built-injson
module. - Open created file and define server parameters for your servers.
host
is the only required parameter and it must contain server hostname:-
YAML:
server_a: host: a.example.com webserver: apache repository: git@example.com:project.git server_b: host: b.example.com webserver: nginx web_folder: /var/www/project ...
-
JSON:
{ "server_a":{ "host": "a.example.com", "webserver": "apache", "repository": "git@example.com:project.git" }, "server_b": { ... } }
-
- Define server groups in the config file:
-
YAML:
frontend-servers: [server_a, server_c, ...]
-
JSON:
{ ... "frontend-servers": ["server_a", "server_c", ...] }
-
After everything is set up, you can call
$ fab s
to list all available server configs, or
$ fab s:server_a,server_b,frontend-servers task
to execute task
once for server A, server B and members of frontend-servers
group.
Since fabric looks for fabfile in parent directories you can move your file to the common parent directory of your projects path and leave server config inside project directory. This way you can use the same fabfile with different projects and keep server config together with other project-related files.
If the above worked as expected and you aren’t interested in any further explanations you can stop reading now.
Having described the basic usage and configuration, let’s move on to listing the initial requirements and internal details of implementation.
Detailed explanation
Requirements
- Proposed solution must be portable in a sense that it should be possible to transfer the whole thing to other developers in one step and it should work on any machine with fabric already installed. This basically means patching fabric isn’t an option.
- Conforming to the DRY principle means the user has to describe the task and list specific server options only once.
- Allow any number of parameters defined by user for each server.
- Allow parameters of arbitrary type: string, integer, lists and python dictionaries when needed.
- Introduce as little overhead as possible to both writing tasks and running them. Fabric is mostly about making the common thing easy and it should remain that way.
- Make it easy to add servers and config parameters to existing configuration files.
- It should scale to any number of servers. Defining and running env initialization tasks may work for 2 or 3 servers, but it is basically unusable for 10 or more.
- Allow defining server groups the same way
env.roledefs
allows to arrange hosts in groups. - Allow to run one task on multiple servers using only one shell command.
- Must be compatible with fabric’s built-in methods to declare hosts and tasks that don’t need this functionality.
Fabric handling of env
Fabric keeps config options (including list of hosts and user info) inside global environment
variable env
.
env
itself isn’t very interesting: it’s a simple dictionary subclass
(fabric/state.py) that allows to
access its keys as attributes. Inside it are stored all the settings that influence underlying ssh
and internal settings. Full list of environment variables can be found in
fabric documentation.
There are, however, two important details about env in
fabric main loop:
- Once the task started running you can’t in any way change its destination hosts.
Any changes you make to
env.hosts
will influence the following tasks, not the current one. This is due to the fact that fabric iterates over internalhosts
variable that is set once for each task before the task starts running. - Fabric deletes duplicates from host list by transforming it into set and back, so initial hosts order is lost during execution.
The second fact is interesting outside the context of our problem because it means that you can’t possibly know whether certain task will be first executed on server A or server B and so you shouldn’t rely on any specific order in your deployment routines.
Config file syntax
Multiple-server config file is best represented by a dictionary: each server is represented by
server name (key) and parameters (value). Since we want to support arbitrary number of parameters,
we will represent each server config also as a dictionary, where parameter names will be keys, and
parameter values will be, well, values. We’re using symbolic server names as keys, so each config
must provide server’s hostname in the standard required parameter host
.
We’ve mentioned the support for server groups before. Server group is basically a list of server names stored under group name key. The reason for this is quite simple: if we have a number of database servers and want to backup them all for example, instead of writing
$ fab dbserver_1 dbserver_2 ... backup
each time, we would rather store the list of database servers in our config file under the key “dbservers” and start the actual task with:
$ fab dbservers backup
Server groups may be nested, for example:
dbservers: [dbserver_1, dbserver_2]
...
allservers: [dbservers, webservers]
To write the actual config files we could use either YAML or JSON syntax: YAML is somewhat easier to
read and write by hand, but JSON usually won’t require any additional packages since most developers
already have either simplejson
or python 2.6.
We’ll support both and choose the appropriate loader depending on the installed libraries and available config file extension. Examples of server configs are shown in “Solution in a nutshell” section.
Parsing config files
We’ve come to the actual implementation details. First thing to do is to find the config file and convert it from markup language to python data structure. We’ll start with figuring out what packages are available:
YAML_AVAILABLE = True
try:
import yaml
except ImportError:
YAML_AVAILABLE = False
JSON_AVAILABLE = True
try:
import simplejson as json
except ImportError:
try:
import json
except ImportError:
JSON_AVAILABLE = False
We’re looking for PyYAML
for YAML and either simplejson
or built-in json
module for JSON.
The actual function that transforms file to python dictionary:
def _load_config(**kwargs):
"""Find and parse server config file.
If `config` keyword argument wasn't set look for default
'server_config.yaml' or 'server_config.json' file.
"""
config, ext = os.path.splitext(kwargs.get('config',
'server_config.yaml' if os.path.exists('server_config.yaml') \
else 'server_config.json'))
if not os.path.exists(config + ext):
print colors.red('Error. "%s" file not found.' % (config + ext))
return {}
if YAML_AVAILABLE and ext == '.yaml':
loader = yaml
elif JSON_AVAILABLE and ext =='.json':
loader = json
else:
print colors.red('Parser package not available')
return {}
# Open file and deserialize settings.
with open(config + ext) as config_file:
return loader.load(config_file)
It starts by looking for one of the following:
- Existing file with
config
keyword argument name. - Existing
server_config.yaml
file. - Existing
server_config.json
file.
If none are found, function will print error message and return empty dictionary. If one of the files exists and the corresponding package is available, we parse the file and return python dictionary.
Server selection task
Now we need to actually tell fabric to load and parse config files and a way to choose which servers
we’ll be using for task execution. For this very reason we have created a single task s
. You call
it before your own tasks and pass it server names the way you would pass arguments to any fabric
task:
$ fab s:server_a,server_b,server_group mytask mytask2
Additionally, you can also specify the location of config file:
$ fab s:server_a,server_b,config=servers.yaml mytask mytask2
Here’s what it does:
def s(*args, **kwargs):
"""Set destination servers or server groups by comma delimited list of names"""
# Load config
servers = _load_config(**kwargs)
# If no arguments were recieved, print a message with a list of
# available configs.
if not args:
print 'No server name given. Available configs:'
for key in servers:
print colors.green('\t%s' % key)
# Create `group` - a dictionary, containing copies of configs for selected
# servers. Server hosts are used as dictionary keys, which allows us to
# connect current command destination host with the correct config. This
# is important, because somewhere along the way fabric messes up the hosts
# order, so simple list index incrementation won't suffice.
env.group = {}
# For each given server name
for name in args:
# Recursive function call to retrieve all server records. If `name`
# is a group(e.g. `all`) - get its members, iterate through them and
# create `group` record. Else, get fields from `name` server record.
# If requested server is not in the settings dictionary output error
# message and list all available servers.
_build_group(name, servers)
# Copy server hosts from `env.group` keys - this gives us a complete list of
# unique hosts to operate on. No host is added twice, so we can safely add
# overlaping groups. Each added host is guaranteed to have a config record
# in `env.group`.
env.hosts = env.group.keys()
s
takes as arguments a list of server names and, optionally, a keyword argument config
,
containing the path to server config file. It starts with calling _load_config
to parse config
file. If no server names were specified, s
will print all server names found by _load_config
.
We create a group
dictionary in env
and for each server name user has specified we call
_build_group
function, which as we’ll see soon, modifies env.group
to store server configs. The
last thing we do in s
is rewriting env.hosts
with env.group.keys()
, which will influence the
next task. At this point env.hosts
will contain a list of server hosts that:
- Belong to the servers that user has specified.
- Belong to the servers that appear inside server groups that user has specified.
Let’s look at _build_group
function next to see how this list gets computed:
def _build_group(name, servers):
"""Recursively walk through servers dictionary and search for all server records.
"""
# We're going to reference server a lot, so we'd better store it.
server = servers.get(name, None)
# If `name` exists in servers dictionary we
if server:
# check whether it's a group by looking for `members`
if isinstance(server, list):
if fabric.state.output['debug']:
puts("%s is a group, getting members" % name)
for item in server:
# and call this function for each of them.
_build_group(item, servers)
# When, finally, we dig through to the standalone server records, we
# retrieve configs and store them in `env.group`
else:
if fabric.state.output['debug']:
puts("%s is a server, filling up env.group" % name)
env.group[server['host']] = server
else:
print colors.red('Error. "%s" config not found. \
Run `fab s` to list all available configs' % name)
Function starts by checking servers
dictionary (which is just a parsed version of config file) for
existing name
key, which indicates whether user specified a valid server name or not. If the key
was found, we check whether its value is a list, in which case we call _build_group
for each item
from that list, or a dictionary, in which case we add server config to env.group
dictionary using
server["host"]
as key.
Thus, for each server name that is either directly specified by user or is a member of a server
group, specified by user, we store config dictionary inside env.group[server['host']]
. This is
exactly the reason why host
is required in server configs - since we need a way to create a host
list for fabric and a way to connect server on which we are executing tasks at the moment with
its config dictionary.
We could stop at this point, since we have everything we need for a working solution to the specified problem:
$ fab s:server_a,server_b mytask
will allow us to get current server options from mytask
, e.g. to get repository
option value for
current server we could do:
repository = env.group[env.host_string]["repository"]
but this looks somewhat ugly, and doesn’t really qualify as “making common thing easy”. There’s a number of things we could do to simplify this, one of which is to define a helper function:
def _get(key):
return env.group[env.host_string].get(key, None)
This way, _get("repository")
will return either the option value or None
if option isn’t defined
for the current server. This is much nicer to use, but one problem still remaining with this
solution is that it isn’t compatible with the usual way to store server settings directly as attributes
of env
(which is common inside special initialization tasks). This means that one task won’t work out of the box
with both types of server initialization.
Task setup decorator
To resolve the issue we’ll need to rewrite all keys of current server configuration from
env.group[env.host_string][key]
to env.key
before each task run. Let’s define a _setup
decorator, that will do just that:
def _setup(task):
"""Copies server config settings from `env.group` dictionary to env variable.
This way, tasks have easier access to server-specific variables:
`env.owner` instead of `env.group[env.host]['owner']`
"""
def task_with_setup(*args, **kwargs):
# If `s:server` was run before the current command - then we should copy
# values to `env`. Otherwise, hosts were passed through command line
# with `fab -H host1,host2 command` and we skip.
if env.group:
for key,val in env.group[env.host].items():
setattr(env, key, val)
if fabric.state.output['debug']:
puts("[env] %s : %s" % (key, val))
task(*args, **kwargs)
return task_with_setup
Now all you have to do to make env.key
access available inside your task is to decorate it with
@_setup
. Generally this works just fine, however you should be wary of two things:
- If you define a config option with the same name as one of the built-in
env
attributes, likehost_string
orhosts
, things might break in unexpected ways. - Attributes aren’t deleted after the task has finished. This means that if current server config
doesn’t define
key1
option,env.key1
might still exist if one of previous server configs defined it.
The whole thing
Complete source code contains everything described here, except the
_get()
function example. Note that link above points to the most recent published version of the
script, while code snippets in this post may be outdated.