Created: 12 Jul 2025, last update: 12 Jul 2025
From WordPress to Sitecore – A New Migration Approach with SitecoreCommander
Why a new tool?
Some time ago, I built a simple tool to migrate content from WordPress to Sitecore. That old tool worked fine for basic blog posts, but it had limitations when dealing with more complex block types like galleries. Also, it wasn’t designed to extract components.
The new migrator is based on the default XML export from WordPress, which includes almost everything: users, media, content, tags, and even data from various plugins. In this demo version of the migrator, support has been added for Gutenberg blocks and the Yoast SEO plugin. Works with SXA Headless and Sitecore XM Cloud, but can also be adapted for Sitecore XP.
Migrating just raw content is relatively easy, some (headless) CMS platforms offer that out of the box. But with this new migrator, we want to go a step further and also migrate the layout. This includes the structure of components, columns, and positioning, things often hidden inside WordPress shortcodes, Gutenberg blocks, or content created with page builders like Elementor or WPBakery.
The challenge of layout migration
Unlike Sitecore, WordPress doesn’t really support the concept of “shared data,” except for media and tags. That limits content reuse. In this demo, all migrated components are placed directly under the page item. But you're free to add logic that moves components to a shared location if needed.
It’s important to know: this is a tailor-made migration, not a plug-and-play “download and run” tool. Every migration is different and requires customization. This tool gives you a strong starting point, but you’ll need to adapt it to match your content model and component setup.
WordPress stores content in a mix of raw HTML, shortcodes, and for example Gutenberg block JSON inside post bodies. Layout and content are often tightly coupled, and plugins can inject arbitrary custom data. This makes full-fidelity migration to a structured CMS like Sitecore both interesting and challenging. While Gutenberg blocks are now the default editor format for this demo, WordPress content may also include shortcodes, custom fields (like ACF), reusable blocks, or content from visual builders like Elementor or WPBakery. These formats store content in different ways, sometimes as raw HTML, sometimes in JSON, and sometimes even outside the standard export. Each format comes with its own migration challenges and may require custom logic or additional exports.
What is SitecoreCommander?
SitecoreCommander is a C#-based tool used to manage and automate Sitecore tasks. It's typically used by developers and administrators working with the Sitecore CMS. It supports:
- Running admin tasks through the Sitecore APIs
- Importing data from XML files (like WordPress XML)
- Translating content into structured, flexible Sitecore components
- Using GraphQL APIs to fetch or write content
- Full development and debugging experience in Visual Studio
This WordPress migrator is built on top of SitecoreCommander and uses a modular structure to make it easy to extend.
Key components
- WordPressSampleImport.cs: main import logic
- eXtendedXml.cs: load WordPress XML and put it into a model
- WordPressMediaTextRenderer.cs: renders wp:media-text blocks into Sitecore components
- WordPressGalleryRenderer.cs: processes gallery blocks and shortcodes
- SmalleContentRenderer.cs: handles sogutenberg smallecontent blocks
Lessons learned
- WordPress XML is messier than expected. Gutenberg columns are difficult to detect and really need a proper parser or library to handle them correctly. The mostly AI-generated code for interpreting Gutenberg blocks is helpful but limited. It works reasonably well, but often misses important details like buttons, which end up inside plain rich text fields.
- Component optimization is important. The migrator sometimes generates 20 or more components on a single page. With smarter rules, many blocks can be merged. This improves both performance and maintainability.
- Mapping quality depends on your Sitecore setup. The more precise your component model is, the better the migration will match your actual frontend structure. Generic components lead to less usable content.
- Set sensible default styles. Sitecore XM Cloud allows you to define styling on the component and column level. Define defaults for padding, margins, and other layout values early, so content editors don’t have to manually fix styling later.
- Post-migration layout updates are possible. With SitecoreCommander, you can apply layout changes or updates across a tree of pages after migration. This lets you preserve manually edited content without having to delete and reimport everything.
- Media structure needs attention. WordPress media has no clear folder structure. I initially grouped media by type and year, that produce good results. It’s good to aim for folders with fewer than 100 items each, using a structure that fits your long-term media management strategy.
A closer look at WordPress HTML quirks
Sometimes you’ll run into tricky Gutenberg block variants. Here’s an example with three slightly different implementations of a quote block:
<!-- wp:quote -->
<blockquote class="wp-block-quote">
<!-- wp:sogutenberg/quote {"naam":"","functie":"","quote":"<em>Quote Text.</em>","imageUrl1":"..."} /-->
</blockquote>
<!-- /wp:quote -->
<!-- wp:sogutenberg/quote {"naam":"Test user","functie":"Director","quote":"\"Test Quote Text.\"","imageUrl1":"..."} /-->
<!-- wp:sogutenberg/quotelogo {"naam":"Test user","functie":"Manager","quote":"<em>\"Test Quote.\"</em>","imageUrl1":"..."} /-->
- The first block is wrapped inside a blockquote, which is easier to detect.
- The second and third are self-closing tags and represent two different block types (quote and quotelogo).
- Most regex-based approaches (like those in the sample code) don’t handle these variations well.
This highlights an important point: every WordPress export has its own edge cases. You’ll often need custom parsing logic for your use case.
Post-processing tips
Just like you can with Sitecore PowerShell, SitecoreCommander allows for easy updates via the Sitecore APIs after import. For example, the current migrator doesn’t handle things like inside plain text fields. You can clean those up afterward using a post-processing script, without wiping and reimporting all your content. This is especially helpful if you’ve already started editing migrated content manually.
Example
To replace all occurrences of &nbps; with a regular space (" ") in the Title fields under /sitecore/content, you can use the following code.
var result = await ReplaceFieldFromSubtree.ReplaceAsync(env, "/sitecore/content", "en", "Title", " ", " ", "Sample Item")
In this example, a template filter is also active for items based on the "Sample Item" template:
You can also use this method to perform replacements in the __Final Layout field. The code is easy to adjust, so you can apply more complex transformations if needed.
Want to try?
Check out the migrator on GitHub: SitecoreCommander, or feel free to reach out if you have feedback, ideas, or edge cases to share.