Attempting some permalink magic
My good friend and colleague over there in the Valley of the Sun, Tim Heuer, recently blogged to ‘respect the permalink’. This was in response to a blog post Tim came across that caught his attention entitled ‘.aspx considered harmful’. The point in contention is how file types, as part of a blog posting’s url, is a bad thing on the account if you ever switch blogging engines, you run the risk of having a different permalink than your past (and previously indexed) blog entries. Tim’s observation is that it doesn’t matter what the url looks like just as long as when you do switch blogging engines that you handle the past permalinks accordingly.
Tim uses this very (this one, the one you’re reading right now) blog as an example of … get this … the right way to handle it. Tim points out my little experiment back in the fall when I switched my blogging engine of choice from Community Server to Wordpress. This was just a little experiment on my end to get a taste of using/customizing/enhancing a PHP application. Nothing more. Nothing less. However, I had this very problem pop up – how do I handle all of my old indexed content? The URLs stuffed into the depths of Google et al would be turning up 404’s left and right. Surely not the experience I wanted my readers to face.
Tim posed an inquiry of wondering how I accomplished such a task. Well, Tim… it wasn’t easy!
I have been blogging since 2003. My blog started out running on .Text and then to various editions of Community Server. A logical progression as the .Text code was ingested by the Community Server project. The team over at Telligent did a stand up job of handling the old permalinks of .Text inside of Community Server. However, moving from Community Server to Wordpress, I had to come up with a way to handle the old Community Server permalinks inside of Wordpress. As an example, the old .Text blog rendered it’s permalinks with an associated PostID, http://davebost.com/blog/archive/2003/11/05/145.aspx. Later versions of Community Server offered up a more customary permalink as in http://davebost.com/blog/archive/2007/06/21/walking-through-a-movie-shoot.aspx. Wordpress, however, worked with a slightly different permalink pattern – http://davebost.com/blog/index.php/2007/06/21/walking-through-a-movie-shoot/, notice the index.php stuffed inside the URL. What I wanted was a clean URL of any file types – period. No ‘.aspx’. No ‘.php’. Just a standard (is it a standard?), clean permalink url – http://davebost.com/blog/2007/06/21/walking-through-a-movie-shoot.
Thankfully, Wordpress comes with a plethora of plug-ins to enhance the Wordpress experience. To remove the index.php from every single blog post and to prevent “locking” myself into Wordpress, I downloaded and installed the ‘Remove index.php from Permalinks in IIS’ plug-in. Not one for a catchy name, but it does the trick. Now all my new posts in Wordpress would follow the url pattern of http://davebost.com/blog/{yyyy}/{mm}/{dd}/{post title}.
The next challenge was to handle all of the old URLs. There are many articles on the web on using ISAPI filters in IIS or mod_rewrite in Apache to handle such tasks, however I’m running my blog on an IIS server hosted at WebHost4Life. I don’t have the ability to just dump an ISAPI filter willy-nilly on the box, therefore I needed another way.
In ASP.NET you have the ability to handle tasks before the application processes the page request through the Global.asax file. I was looking for a .PHP equivalent to the Global.asax but couldn’t find one. If something like this does exist in PHP, please contact me and enlighten me. Because my blog is running on an IIS box, I kind of cheated with my solution. There’s nothing stopping from ASP.NET and PHP handling requests at http://davebost.com/blog, so I thought to myself… “can I pre-process a page request for a Wordpress blog with a Global.asax file”? It turns out, you can!
Essentially here are the steps I use in the Global.asax file to accomplish my permalink conversion magic…
- At Application_BeginRequest and peek at the URL request. If ‘/archive/’ is present in the URL, convert the URL to our new permalink structure (http://davebost.com/blog/{yyyy}/{mm}/{dd}/{post title}).
- Maintain a collection of PostID mappings to post titles (a reference dictionary). I created a query to my Community Server database that rendered the .NET code to create a dictionary of references between PostID and PostTitle. This is loaded during the Application_Start event in the Global.asax file and loaded into cache. This will handle the old .Text URLs, ie. http://davebost.com/blog/archive/2003/11/05/145.aspx.
- If the URL is determined to contain a number as the file name (ie. 145 in 145.aspx), use the reference dictionary to look up the post title and build url per our template.
- Community Server generated some unicode characters in URLs that had special characters (“:”, “?”, “-”, etc.). Wordpress either ignores these special characters or use’s a hyphen (-) as a placeholder. In this case, I created a dictionary to maintain these rules. Once again, this dictionary is loaded at Application_Start and is cached. The post title is processed against this rules dictionary to handle the special characters accordingly.
- Once the URL is converted over to the new URL pattern, a 301 Redirect is sent to notify the interested parties that this URL has “permanently moved”.
If you’re interested in this code I’d be more then happy to share it. I’m sure there are other ways to do this. This solution isn’t the most elegant, but it does the job. Or it seems to be doing the job. So much so that it seems the search engines are catching up. I tried searching for some old URLs but found that the search engines had the new URL pattern for my old blog posts. There may be some edge cases. If you encounter a 404 from a previous link, please let me know.
Thanks Tim, for calling me out. I’ve been meaning to post this for quite a while. You forced my hand and now maybe someone else can learn from my adventures.

I’m about to embark on the same journey, and I have to admit, I was pretty surprised to find someone that had been through this process. How did you handle the actual migration of data from the CS database to the WP database? Any gotchas folks should be aware of or specific guidance you might have?
Also, I’d like to take you up on that offer to have a look at your URL rewriting code :)
-k
-k
Belay that comment. I’ve put together a WordPress importer to handle data migration from Community Server. You can download it here: http://www.bettersoftwarenow.com/2008/08/02/migrating-community-server-to-wordpress/
Thanks again for the excellent tips on handling the URL rewriting in WordPress.
-k
||| Kristopher Cargile
||| http://www.bettersoftwarenow.com
Dave,
I’m struggling with this exact same problem at the moment. Would you be able to share your code/binaries with me?
Thanks in advance, Niall