Nov 12, 2010
All Drupal developers have experienced this nightmare. You have been creating a great project, now it’s time to deploy it. No one wants to deal with it. No one wants to be the responsible of doing it.
If you are building a simple site, and there’s only a one time deployment, it’s not a big deal. You just have to follow the largely discussed and know simple rules.
The rules
- Use SVN (or CVS) to keep your code updated among the developers
- Deploy to production by backing up/restoring the database
- Create scripts to make the deployment process automatic
That’s all, the unique problem you could have is maybe that you will have to delete the test content/users and change some settings before going live, but in summary, it’s pretty easy.
When it’s about deploying code, it’s still easy to do if you followed the simple rules. You will have only to update your SVN branch and you are done.
When does the nightmare begin?
The problem is deploying changes made in the CMS, but without wiping all the existing data. In other words, merging the database changes, the settings and content stored in the database.
Scenario 1: Big (or huge) Drupal projects
You have to deploy a project in multiple phases. So while the first phase is deployed and working in production, you have a bunch of developers working on the next phase (this means, implementing new code, creating new nodes, creating new views, editing existing ones, changing settings, users, permissions, etc.)
Let me illustrate the problem:
The first release of the site is deployed, the database is identical. The site is looking good and working great. Everyone is happy.
A couple of days later, on the production site:
- A content editor creates a new node; Drupal assigns it a nid (let’s say 100)
- The content editor edits an existing node, change a node reference field, and guess what, it’s changing the node reference to the newly created node! (nid=100)
- The site administrator changes some parameters in the configuration of the site (which ones? I don’t know)
- The site administrator changes even more configuration in some views (let’s say changed items displayed per page, and activated some caching options)
- New users enter to the site and register themselves. They’re assigned some roles and they start creating some content as well.
While on the development site:
- A developer is creating a new node for a feature required for phase 2. This new node should persist in production in the next release. Drupal assigns it a nid (Yes, you are right, it’s nid 100 again!)
- Another developer is creating new test content in order to test a new module he is developing. It’s working great; but wait a minute, all this new data should not be migrated to prod in the next release, right?
- Another developer is creating new views, and changing some settings in the entire site. Which of these features should be migrated in the next release? I’m not sure, maybe most of them. Which ones? Hmm…. Again, I don’t know.
A few weeks later: It’s time to deploy, in the development department, some questions arise:
- How do we filter what to migrate and what not migrate? Which content should we migrate?
- Should we use a module like deploy? But it does not have a stable release! Will it work for us?
- What about the settings? should we go one by one reproducing them in the new site?
- Which settings should we reproduce? How will we know that the settings entered by the site administrator in live will be preserved?
- Should we copy all the permissions? Or roles? Or even maybe some users?
- Why Drupal was designed it this way? Why we can’t separate content and settings?
- What will be the downtime for implementing this new release?
- What happens with the nodes that has the same nid? What if they’re linked in some part that any existing module will migrate? (For example, they’re linked from a panel page)
- No one knows the right answer. It’s just too complicated.
Before going to my proposed solution, let’s analyze another scenario.
Scenario 2: A preview environment is required
The client wants a preview environment before doing any changes, and they want to deploy all the changes they make in preview, by just clicking a button.
No, I won’t make a illustration of this scenario. This is not the worst scenario and if the site is not as complex, there could be some easy solutions.
I included this scenario in this post to make a contrast on when you can use and when you can’t use the existing Drupal deployment modules.
The proposed solutions
I will describe two possible solutions:
- Starting from the end, for the scenario when a preview environment is required: use Deployment or Node Export module
- For Big/Huge Drupal projects or when the first solution is not enough: Use DB synchronization software
A first solution: Use Deployment or Node Export modules
If the site is not complex, and you just want to deploy content, you could try one of these modules:
- Deployment module
- Node export module
Maybe there are other modules, but I think these ones are the most mature and that fits the best for this job.
Deployment module could suit perfectly for this scenario; you can see a screencast of how it works. The drawbacks of deploy module are described in their project page:
- “Deploy is under active development. API and/or UI changes can and almost certainly will happen. Deploy for Drupal 7 will likely be a very extensive rewrite”
- “The Drupal 5 version of Deploy is currently non-functioning and I do not plan to revive it”
- There’s no a stable release yet
Because of those drawbacks, we could consider using Node Export Module. From their project page: “The export module allows users to export an existing node and then import it into another Drupal installation”
Using this module, you can allow the content editor to enter to the site’s content page, and export the nodes and then import them in the production site.
If you want to give your users even a better way to find the nodes to export, you could create a view, expose some filters and use the Views Bulk Operations module to add an Export button in the view.
When these solutions are not enough?
I’d say that you will run into problems when your site is complex. The examples that I can think on are ones that I’ve had in real projects, but I’m pretty sure that a lot of people have had more problems, beyond the ones I’m describing below.
1. You have rules that create related content.
The general rule could be that you want rules to be executed in both environments; since we’re using Drupal functions to create the content, the rules will be executed in both sides. This is great. (And this is maybe the reason of why we want to use these modules).But there could be some cases when you don’t want that behavior. Most people will ask me... when? Well, these cases only appear in real life projects, and we never read about them in the theory or see them in the demo screen casts.
You have the A and B content types. When the A node is created, you have to create also a B node, that has a node reference to the newly created A node. Great, this is easy to implement, you just have to create a rule.
So, the editor creates an A node in PREVIEW environment, the node B is created automatically by a rule. After creating the A node, the editor edits some of its fields, and edits the B node as well. Now they want it to be deployed in production.
They use deploy or node export and the nodes are in production now. The problem is that now in production you have one A node and two B nodes.
This specific problem could be solved by deactivating rules or modifying them (We could add PHP code to verify if the node exists before creating it).
Anyway I wanted to point out this problem as an example of what could happen.
2. You want also to export settings, not only nodes
What if you want an editor to be able to change a lot of settings in preview, and not on production directly? These modules won’t help us to deploy those settings. Deploy module tries to overcome this limitation, but we know that only some basic settings of Drupal can be exported. If you use a lot of modules, all those settings will not be migrated.
3. You want to export views, panels, and others
None of them are supported yet but the modules mentioned above. You have to export them and import them manually (Or create them again) in the other site.
The problem is even bigger if you are adding new nodes contents to a panel page.
A second solution: DB Synchronization
I’d recommend this solution for the scenario of big Drupal projects, (and when the first proposed solution is not enough)
If you have been googling about how to deploy a Drupal site, I’m sure you found a lot of posts that can only give you advices. They will tell you to use SVN, to use deploy or node export (or a similar module), to test the site a lot after a migration. But no one will tell you the solution. No one has created a tested solution that can be always used.
My intent in this post isn’t describing a complete solution, but it’s to describe an approach that worked for some Enterprise Drupal applications.
In simple words, the solution is: use DB comparison software instead of Drupal modules to perform the migration.
Why?
- DB comparison tools are tested, and work fine. I don’t have to deal with bugs, (bugs in my application or bugs in deployment modules)
- The DB always contain the latest data, I don’t have to trust that everyone is recording a log to be reproduced in the other environment
- The migration can be done running a script, and not following a lot of steps. This will reduce the downtime when deploying a new version.
Certainly there are some drawbacks. You could ask me:
- The nids can collide; how do I migrate content? Other primary keys can collide as well.
- How to track what to migrate and what not to migrate?
Ok, I’m going to describe the process and these questions will be answered. I’m going to focus only on the DB migration and not the code because that should be migrated using a SVN merge.
DB migration, how the process works
1. Make an initial release, with all the data
You make an initial release of all the data: content, users, roles and settings in production site are the same that in development site. Database structure is identical.
2. Keep a snapshot of the identical database
It’s a good idea to keep a snapshot of the initial database that you are deploying. Why? You will need it later. Keep reading.
3. Just after you made the initial migration, you run a script in PRODUCTION, so the nids (and all the auto-generated numbers) will start in a higher number.
We don’t want collisions of ids, right? The idea is to separate a quantity of ids that can be used in development. While in production the new ids start in a higher number. The tables, we would look like this:
To change the ids that are going to be generated, the MySQL sentence to be used is:
ALTER TABLE node AUTO_INCREMENT = 1000;This sentence will reserve the nids up to a thousand.
Obviously “node” is not the unique table that you want to deal with. But you don’t want to go to the database and search which fields are auto-numeric, do you?
You can find which tables have an auto-increment field, by using SHOW TABLES and SHOW COLUMNS FROM <table> WHERE extra like '%auto_increment%' scripts.
Also, there are tables that for sure you don’t want to include. You don’t want to deploy the watchdog log, or the caches tables, right?
All this process could get complicated if you are using a lot of modules in your site, so I decided to put it all together in a PHP script. Note that I’m not running the alter scripts, but just printing them. I’d like to review the script before running it in the DB, right?
<?php
// my DB prefix
$prefix = 'cms_';
$not_update_tables = array(
'accesslog',
'batch',
'boost_cache',
'boost_cache_relationships',
'boost_crawler',
'ctools_css_cache',
'ctools_object_cache',
'devel_queries',
'devel_times',
'flood',
'location_search_work',
'node_comment_statistics',
'node_counter',
'search_dataset',
'search_index',
'search_node_links',
'search_total',
'sessions',
'views_object_cache',
'watchdog',
// ANY OTHER TABLE YOU DON'T WANT TO INCLUDE
// BECAUSE YOU WILL NEVER MIGRATE IT FROM DEV TO PROD
);
$query = "SHOW TABLES";
$table_result = db_query($query);
$fields_info = array();
while ($table = db_result($table_result)){
$query = "SHOW COLUMNS FROM %s where extra like '%auto_increment%';";
$result = db_query($query, $table);
$process = TRUE;
// check if we should not process this table
foreach ($not_update_tables as $not_update_table){
$not_update_table = $preffix . $not_update_table;
if ($table == $not_update_table){
$process = FALSE;
break;
}
}
if (!$process){
continue;
}
while ($field = db_fetch_array($result)){
$query = "SELECT max(" . $field['Field'] . ") FROM $table";
$max = db_result(db_query($query));
$field_info = array(
'table' => $table,
'field' => $field['Field'],
'max' => $max,
);
$already_inserted = FALSE;
foreach($fields_info as $values){
if ($values['table'] == $table && $values['field'] == $field['Field']){
$already_inserted = TRUE;
}
}
if (!$already_inserted){
$fields_info[] = $field_info;
}
}
}
print "<h1>Tables/fields to be updated</h1>";
print "<table><tr><td>Table</td><td>Field</td><td>Current Max value</td></tr>";
foreach($fields_info as $field){
print "<tr><td>" . $field['table'] . "</td><td>" . $field['field'] . "</td><td>" . $field['max'] . "</td></tr>";
}
print "</table>";
print "<h1>Queries</h1>";
foreach($fields_info as $field){
$table = $field['table'];
$max = $field['max'];
if ($max < 500){
$max = 1000;
}elseif($max < 5000){
$max = 10000;
}elseif($max < 20000){
$max = 50000;
}elseif($max < 50000){
$max = 100000;
}elseif($max < 100000){
$max = 150000;
}
print "ALTER TABLE $table AUTO_INCREMENT = $max;<br/>";
}
?>
IMPORTANT: (a.k.a. DANGER) this script could not be complete for your site. You have to review which tables you must exclude based on the project you are working on. You have also to review the code for generating the new “auto_increment” values.
How did I determine which table should be excluded? I had to navigate through all the tables in the project and exclude the ones that I’m completely sure that we don’t want to migrate between the environments.
4. Phase 2 is ready, we want to deploy it
You have to use a DB comparison and synchronization tool in order to create a script to migrate the data from the DEV server to the PRODUCTION server.
The first recommended step is deleting all the test data in DEV, that way we won’t be passing it to PRODUCTION.
Do you remember we saved a snapshot of the database we deployed in the first release? It will be useful at this point. Restore it somewhere. Currently we have 3 databases:
4.1. Compare structure
Probably the structure of the database changed because you installed/uninstalled modules. The first thing to do is a structure comparison. Based on the comparison you can either decide: 1. to install the modules in PROD manually, so the structure keeps similar, or 2. include the structure synchronization into the script that you will run later over PRODUCTION database. The script generated could be something like:
// you installed the new memberships moduleCREATE table memeberships (mid int NOT NULL PK, description varchar(50), …);
4.2. Determine which changes were made in DEV and need to be migrated to PRODUCTION
We make a comparison between the DEVELOPMENT database and the INITIAL SNAPSHOT database. We exclude all the tables that we don’t want to migrate. (The same ones that we talked before… watchdog, cache, logs, etc.) The script generated could be something like:
INSERT INTO node (nid, type, title,…) VALUES (4, ‘Page’, ‘A node created in DEV after the first release’, …);INSERT INTO node (nid, type, title,…) VALUES (5, ‘Story’, ‘Yet another node created in DEV’, …);
UPDATE TABLE variable SET value = ‘s:7:”garland”’ WHERE name = ‘theme_default’;
UPDATE TABLE variable SET value = ‘s:13:”development-mail@mysite.com”’ WHERE name = ‘site_mail’;
4.3. Determine which changes were made in PRODUCTION and need to be kept
We need to make a comparison between the PRODUCTION database and the INITIAL SNAPSHOT database. We might want to compare only the settings tables. So we should exclude all the content_* tables and logs/caching tables. The script generated could be something like:
UPDATE TABLE variable SET value = ‘s:13:”production-mail@mysite.com”’ WHERE name = ‘site_mail’;
We may want to remove some INSERTS and DELETES from this script. Why? Because this script will be run in the same production database, just to make sure that the settings are kept, and not changed by a similar UPDATE sentence generated by the other script.
You could have a bunch of options that were changed, not only in the variable table. You will have to determine which ones need to be kept.
4.4. Put all the scripts together and test them
You have to put all the scripts together. First the structure changes, second the changes to be moved from DEV to PROD, and finally, the UPDATES statements of the changes to be kept in PROD. In our sample, our script will look like this:
/* STRUCTURE CHANGES*/CREATE table memeberships (mid int NOT NULL PK, description varchar(50), …);
/* DATA DEPLOYED FROM DEV */
INSERT INTO node (nid, type, title,…) VALUES (4, ‘Page’, ‘A node created in DEV after the first release’, …);
INSERT INTO node (nid, type, title,…) VALUES (5, ‘Story’, ‘Yet another node created in DEV’, …);
UPDATE TABLE variable SET value = ‘s:7:”garland”’ WHERE name = ‘theme_default’;
UPDATE TABLE variable SET value = ‘s:13:”development-mail@mysite.com”’ WHERE name = ‘site_mail’;
/* VALUES TO BE KEPT IN PRODUCTION */
UPDATE TABLE variable SET value = ‘s:13:”production-mail@mysite.com”’ WHERE name = ‘site_mail’;
Make a backup of your latest PRODUCTION database and test the script on that database.
4.5. Time to deploy
You have to have the code updated using SVN. Now it’s time to merge the database. I would recommend following these steps in PRODUCTION before running the script:
- stop running cron
- put the site in maintenance mode
- make a backup of database
Then we run the database script. If no error, great, let’s turn the site on again. If any error occurs, you can always restore the backup you made a minute ago. (It has the latest data, no one made any modification in the database, the site is in maintenance mode, remember?)
Downtime? Just couple of minutes, and you are sure that all the changes were migrated. Isn’t this great?
What’s next?
There’s a lot more to talk about, a blog post is not enough. There are still some questions that need to be answered. I’d like to describe a complete scenario on how to implement a complete scenario and how to automate all the process, but that will be another post.
Related Insights
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.