Today, i’ve been working on a WordPress site that needed to be moved to another server and also another domain. This site had quite a few (absolute) internal links in its content — that had URLs like http://oldsite.com — and i needed to replace these URLs with http://newsite.com instead. Exact same url structure. Just the domain name to change. It could have been pretty tedious and boring to go over all these pages and check their source to make sure it did not contain any reference to oldsite.com. But in my extreme proactive lazyness, i found a solution to make this task a lot more fun and reusable. I used Anemone and Nokogiri Ruby gems. Here is the code:

require 'rubygems'
require 'anemone'
require 'nokogiri'
 
Anemone.crawl("http://www.newsite.com/") do |anemone|
  anemone.on_every_page do |page|
    if page.doc
      page.doc.css('a').each do |link|
        if link.attributes['href'].to_s =~ /oldsite\.com/
          puts "#{link.attributes['href']} in #{page.url}"
        end
      end
    end
  end
end

What this script does is this: Anemone is crawling all the pages in newsite.com. For each page, i use Nokogiri to look for its links. And i loop into each of these links to get the ones that contains occurrences of “oldsite.com” in them.

By running this script for a couple of minutes, i got this output:

http://www.oldsite.com/the-link in http://www.newsite.com
http://www.oldsite.com/wp-content/uploads/2010/09/dude1.jpg in http://www.newsite.com/the-guy

Then, all i had to do is go over the WordPress admin, www.newsite.com/wp-admin and replace all the old domain in the URLs.

Done!