Roadmap To Enhance WordPress WXR Importer A Comprehensive Guide
Hey guys! Today, let's dive into an exciting journey of enhancing the official WordPress WXR importer. Our goal is to make it more robust, efficient, and user-friendly. We'll be leveraging the power of data processing libraries to bring about meaningful improvements. Forget about massive overhauls; we're focusing on delivering small, frequent updates that make a real difference.
Our Mission: Elevating the WordPress WXR Importer
Our primary mission is simple to improve the official WordPress WXR importer. We aim to ship a series of incremental yet impactful updates to the existing wordpress-importer plugin. We're not looking to reinvent the wheel but rather to refine and optimize what we already have. This approach allows us to deliver value consistently and avoid getting bogged down in lengthy, complex rewrites. Our roadmap focuses on practical enhancements that address common pain points and lay the groundwork for future innovations. By focusing on iterative improvements, we ensure a smoother, more reliable import experience for WordPress users worldwide.
What We're Not Doing
What we're explicitly not aiming for is a complete rewrite from scratch that takes months before the first user-facing change. We believe in agile development and delivering value quickly. So, no getting stuck in rewrite hell for us!
Draft Roadmap: Let's Brainstorm!
Here’s a draft of our roadmap. Let’s discuss and refine it together!
1. Testability: Taming the Spaghetti Code
The current wordpress-importer
implementation is, let’s face it, a bit of a tangled mess. It's a spaghetti of wp-admin UI rendering, input processing, and importing logic. This makes it incredibly difficult to unit test the WP_Import
class. Testing is crucial for ensuring the reliability of our changes. Without a solid testing foundation, we risk introducing bugs and regressions. Imagine trying to debug a complex system without any way to isolate and test individual components – it's a nightmare scenario. That's why our first step is to untangle the code and make it testable. By breaking the importer into logical components, we can write targeted tests that give us confidence in our changes. This not only improves the quality of our code but also makes it easier to maintain and extend in the future.
- [ ] Split importer code: We need to divide the code into UI, input processing, and importing parts. This separation of concerns will make the codebase more modular and easier to manage.
- [ ] Add a light test harness: A test harness will allow us to ensure that the data importing code remains robust as we make changes. This is crucial for maintaining the stability of the importer.
Side Effects: Reusability FTW!
As a fantastic side effect, this will enable reusing the importer in other libraries, like WP-CLI, Blueprints, and Playground. Think of the possibilities!
2. Compatibility: No Host Left Behind
Currently, the wordpress-importer
plugin relies on libxml
, which means it won’t work on some hosts. That’s quite restrictive, guys. We want to make it compatible with all the hosts!
- [ ] Add an XMLProcessor-based XML parsing driver: This will provide an alternative to
libxml
and broaden our compatibility. - [ ] Adjust runtime checks: We need to ensure that the importer can gracefully handle environments where
libxml
is not available. - [ ] Add test coverage: Testing is key to confirming that the importer remains functional under the new driver.
By ensuring compatibility across all hosting environments, we're making the WordPress import process accessible to a wider audience. No more frustrating error messages or compatibility issues – just a smooth, seamless import experience for everyone.
3. URL Rewriting: Fixing Broken Links
The wordpress-importer
plugin currently doesn’t rewrite absolute URLs in the imported posts and comments. This often results in broken links to the source site. It’s like moving houses and forgetting to update your address – visitors end up at the old place!
- [ ] Integrate structured URL Rewriting: We'll integrate the logic from the
StreamImporter
class in this repository to fix this issue. This will ensure that all links within the imported content point to the correct location on the new site.
Imagine the frustration of importing a website only to find that half the links are broken. It's a common problem that can be easily avoided with proper URL rewriting. By implementing this feature, we're saving users time and headaches, ensuring a smooth transition for their content.
4. Fast, Concurrent Assets Download: Speeding Things Up
The current wordpress-importer
plugin downloads remote assets one by one, which is slow. Let's parallelize those downloads and fetch, say, up to 10 files concurrently. Think of it as going from a single checkout lane at the grocery store to ten – much faster!
- [ ] Reorganize data flows: We need to support initiating the download and continuing the import before the download is complete.
- [ ] Integrate the
AttachmentDownloader
class: This will enable concurrent downloads and significantly speed up the import process.
Time is precious, and nobody wants to wait around for hours while their media files slowly trickle in. By implementing concurrent downloads, we're drastically reducing import times, making the process more efficient and user-friendly. It's a simple change that can have a big impact on the overall experience.
5. Fork the Importer Plugin: A Necessary Step
Existing filters, such as wp_import_categories
, assume the entire import context is stored in memory. This isn’t a viable approach for processing larger datasets. To support them, we need to break BC on the existing filters without breaking the extenders of those filters. There’s only one way I can think of doing that: forking the importer plugin and removing those filters until a new API settles down.
- [ ] Fork the plugin: This will give us the flexibility to make significant changes without disrupting existing users.
- [ ] Remove all filters and hooks for the time being: This allows us to create a clean slate and develop a more robust and scalable API for future extensions.
Forking a plugin can seem like a drastic step, but it's often necessary when dealing with complex legacy code and the need for significant changes. By forking the importer, we're creating a safe space to experiment and innovate, without the constraints of backward compatibility. This allows us to build a better foundation for the future, ensuring that the importer can handle even the most demanding import scenarios.
6. Naive Large File Support: Breaking the Limits
The wordpress-importer
is currently unable to process large files due to two main constraints: PHP request timeout and the memory limit. Let’s break out of those by supporting a re-entrant, multi-request importing flow. First, we wouldn’t store everything in memory. Second, we’d know how to pause the import process and resume it later. Think of it as downloading a large file – you want to be able to pause and resume without starting over from scratch.
- [ ] Store mapping data in the database: We'll store user/post mapping data in the database instead of in memory. This will significantly reduce memory usage and allow us to handle larger imports.
- [ ] Store the current import cursor: Storing the cursor allows us to track the progress of the import and resume it later if necessary.
- [ ] Support resuming the import: This is crucial for handling large files that may exceed PHP’s time and memory limits.
Large files can be a major hurdle for website migrations, and the current importer's limitations can be a real bottleneck. By implementing naive large file support, we're removing these limitations and making it possible to import even the largest websites. This feature is a game-changer for users who need to migrate large amounts of content, ensuring a smooth and hassle-free process.
Explicitly Not Covered (For Now)
Cases explicitly not covered at this stage include unsorted WXR files where parent posts come after their children. We’ll tackle this in a future iteration.
7. TBD: The Future is Bright
The above points will take some time to implement already. Here are some items that would be good to look into afterwards:
- More data formats: Think Markdown, HTML, and more!
- More data sources: WXR URL, Git repo, another WordPress site, an arbitrary URL – the possibilities are endless.
- UI improvements: A dropzone, progress bar, detailed import log and statistics would be fantastic additions.
- Error recovery: Features like “5 media files couldn’t be fetched, do you want to retry? ignore them? upload alternative files?” would greatly enhance the user experience.
Let's Build a Better Importer Together
This is just a draft roadmap, and we want your input! Let’s discuss these points and make the WordPress WXR importer the best it can be. Your feedback and ideas are invaluable in shaping the future of this essential tool. Together, we can create a seamless and efficient import experience for WordPress users around the world.