Why off-the-shelf migration tools did not help?

​Migration tools, are they worth the money?

Our organization is in a fortunate position where the team is able to migrate our customers’ content ourselves, without the need for 3rd party tools. And trust me, it is not because we wanted to. We like using good tools and most of these tools look great on the surface. We’ve purchased tools and use these on opportunities that make sense to do so, but unfortunately most of the time we have been let down by the experience. Either the customer’s requirements are too complex for the tools (complex flattening and merging requirements), or the metadata requirements are more complex than the tools support (managed metadata was a huge issue) or the documents themselves have broken QuickParts and lose their relationship with the metadata in the library and tools will not fix the content of the document, just move it. None of the migration tools do a good enough job in my opinion; it is the last 2% that makes the difference.

Overview of project

To shift gears for a second, I’d like to paint a quick picture of the system we replaced. The existing application was a bespoke, highly customised SharePoint 2010 document management and quality system. The system was implemented using a variety of custom components such as webparts, timer jobs, workflows, InfoPath forms and event receivers. We replaced these with “modern” alternatives suitable for SharePoint Online – SPFx webparts and forms, workflows and webhooks implemented using Azure functions and storage queues.

Content migration

We had assumed that the most challenging aspect of the project be the re-implementation of the assortment of SharePoint 2010 components as their newer, cloud-capable alternatives. The other part of the project, the content migration was supposed to be relatively straightforward. We had invested in an industry-leading content migration tool, successfully tested it on a subset of the customer’s content and were confident that with the necessary amount of finesse (and brute force when required), the tool could handle whatever challenges we would throw at it.

Unfortunately, despite our optimism and promising results from our early migration efforts, the content migration exercise would stymie the commercial migration tool and throw up a series of increasingly complicated challenges for us. We started the project with two parallel streams of work – development of the application components proceeded as planned and we had first versions of the complex event receivers implemented as webhooks within a matter of weeks. By that time, we had expected to have made serious headway into the content migration, but this was not the case.

We had approximately 60,000 individual file versions to migrate and whilst this doesn’t sound like a lot, it was going to test out mettle as experienced SharePoint consultants. Details of the major challenges and how we approached them can be found below.

Simplifying the site structure through “flattening”

All SharePoint implementations are undertaken with the best of intentions – information architectures are formulated, sites created, applications built, and content is uploaded. Though the reality is that after many years of use, when a system has been pushed to its limits with tens of thousands of documents, workflows and forms – the information architecture and content type hierarchy that seemed like a good idea all those years ago may no longer be working as well as the customer would like.

Commercial migration tools are often referred to as “lift-and-shift”. They take content from one site and move it to another however they struggle to do thing like easily change the structure or the metadata of the content during the migration process. Issues arise when the customer wants their content in the cloud, but they want it transformed during the migration process to better suit their current needs. If you just lift-and-shift then you are moving the content online but the customer may still be saddled with productivity-sapping idiosyncrasies of their current on-premise systems.

During this project we were able to significantly simplify the site structure by “flattening” it. Each subsite and its ~30-80 libraries were migrated into a single library. The result is that ~1000 libraries in the source site were converted into less than 20 libraries in the destination site. Some benefits of this include:

  • Out of box web parts vs custom web parts: we were able to use the out-of-the-box list view web parts to present content on landing pages rather than relying on custom / third-party web parts which would have been required to aggregate the content from multiple lists (a core requirement for the customer).
  • Offline access using OneDrive: having the content in a single library stored within a folder structure it made it accessible for users in offline mode using mobile devices (e.g. iPads) as they don’t have to maintain hundreds of connections for individual libraries.
  • Easier to maintain: it is easier for the customer to maintain the simplified site – and not just the content. It means that there are far fewer sites, fewer libraries, fewer workflows, landing pages and webhooks.

Issues with off-the-shelf-migration-tools

Along with the need to simplify the structure through flattening, we ran into several issues with our chosen off-the-shelf-migration-tools. Their support teams were unable to help us resolve them and when we asked for our money back they just ignored us!

  • Corrupted managed metadata fields: after migrating the structure (site columns, content types and libraries) we found that some of the managed metadata columns would be broken. We had this issue appear a number of times and each time a different field would be corrupted. The affected fields would be broken to the extent that their values were not populated in the destination system and we were unable to set them manually using the SharePoint UI. To resolve this issue we had to write a script to modify affected fields to fix the corruption.
  • Managed metadata values not set: we had an issue where our off-the-shelf-migration-tool would not set values for specific managed metadata fields. This was not isolated to a handful of files but affected thousands of documents. To make matters worse it was reporting that all content was migrated successfully – we didn’t realise the problem was happening until we did our own review of the content. The lack of error messages resulted in a general distrust of the off-the-shelf-migration-tool.
  • Inconsistent results: we ran into several issues migrating documents where the off-the-shelf-migration-tool would be unable to migrate the content due to nonsensical validation errors (e.g. required fields missing when all required fields were populated or had default values) only to find that on subsequent attempts the documents would migrate without issues. Other documents which had previously migrated fine would fail when we tried to remigrate them. This further eroded any confidence we had in the tool at the start of the project.
  • Validation based on the content source: some off-the-shelf-migration-tools are designed to validate content based on the rules defined in the source system. For example if a field is mandatory in the source but optional in the destination, they will not honour the configuration of the destination system and require that the field has a value. This caused a number of issues migrating older content that had missing metadata (see “required fields” below).

Iterative migrations take time and effort

Off-the-shelf-migration-tools are intended to be used iteratively. You run a pre-check report, fix any errors, try to migrate the content, review the errors and make any corrections and remigrate – repeating this process until all your content has been migrated. This works well on smaller sites with a simple structure and a small amount of content but does not scale to complex sites with a lot of content where it can take weeks to do a single migration pass.

One of the shortcomings of the existing commercial tools is that they can require a significant amount of user intervention – this doesn’t scale to large sites. In many cases it is not enough to configure the migration tool and leave it to move the content unattended. It can be a full-time job to monitor the tool, analyse reports, troubleshoot errors and make the necessary remediation.

We developed our custom migration utility in such a way that it was able to run largely unattended. For example, to do a full migration of a library (containing 3000 documents with ~30000 file versions) we could start it running at the end of the day and leave it running unattended overnight. The following morning we could review the results and easily re-migrate any failed documents (e.g. due to connection drop-outs). This meant we spent less time babysitting the migration tool and were able to focus on higher value activities for the customer.

Required fields that were once optional (or did not exist)

A field may be a mandatory field but this does not mean it has always been mandatory (it may have started out as optional and later been made mandatory), or that it will always have a value in the source system. Documents, especially older versions may be missing required metadata. This caused difficulties during the migration process as SharePoint does not allow a document to be published unless all required fields are filled in.

In some cases, the metadata was missing because when the documents were created (many years ago) the (now mandatory) fields did not exist or were not required. In other cases this was because documents were uploaded without the user filling in any of the metadata.

This caused issues because the migration tool we were using determined whether a field was mandatory by looking at the source system rather than the destination. We couldn’t simply make the field optional in the destination site, upload the content and then make it mandatory again. Their solution was to either use default values or to interactively specify values for each version of a file which isn’t practical when you have thousands of files.We were able to solve this issue in our custom migration process as we could make all required fields optional in the destination site and then migrate the older file versions with the missing metadata. This ensured that all of the key fields were 100% the same in the source and destination system and we didn’t have to rely on default values or interactive metadata entry.

Orphaned users

Orphaned users are those who are referenced by documents in the SharePoint source system (e.g. as a value in a person field) but have since been deleted from the on-premises Azure Active Directory and do not exist in the SharePoint Online environment. When orphaned are referenced by a column we are unable to migrate the document as we can’t resolve those users when setting the document metadata.

After discussions with the client, we were able to resolve this issue in our migration process by setting any orphaned users to a value of “Previous Employee” and creating a “Previous Employees” column to store information about these users. This column showed which fields contain which orphaned users so that from an auditing perspective, all the information in the source system is retained even with the missing users in the destination environment. It also meant the customer could quickly identify any documents referencing orphaned users and update the metadata appropriately.
There were some added complications with a small number of users (and groups) having different names in the source and destination environments as they had been renamed.

Migrating into a live site

The production site was not an empty site, it was already in use and had some structural elements such as site columns, and content types already created. This caused a number of issues both with the existing content in the live site and the new content that was being migrated – it may explain some of the difficulties that the off-the-shelf-migration-tool struggled with.
Some of the site columns in the source site had been created manually in the destination site with the same internal names but with different field types. For example they may have been managed metadata fields in the source site but were lookups in the destination site. Other issues were caused by fields that had the same internal names but different casing in the source and destination site (e.g. “DocumentId” vs “DocumentID”).

When the off-the-shelf migration tool ran it overwrote the site columns in the destination site with definitions from the source system without warning. This caused some unexpected behaviour and corruption / loss of data in the live site when the result was a site column changing from one field type to another. The lesson here is where possible to migrate into an empty site. If you have to migrate into an existing site then carefully scrutinise the existing structure and content before your migration.

Issues with Managed Metadata

On this project the site made extensive use of managed metadata which can make any migration / development effort more complex as the implementation is complicated (hidden lists, hidden fields, the term store) and the API has several quirks which need to be taken into consideration. Some of the complications resulting from this included:

  • Disabled terms: there were many disabled terms in the source environment, so we had to write some scripts to export and temporarily enable those terms in the destination environment (and disable them again after the migration). If the terms were disabled then they couldn’t be used during the content migration (either by off-the-shelf-migration-tool or the custom process).
  • Invalid values embedded into the DOCX files: many documents had invalid managed metadata values embedded into the DOCX files. This meant that after the documents were uploaded, the SharePoint parser would try to set the invalid values as metadata, resulting in corrupt documents being uploaded. This could be reproduced by downloading a document from the source system and manually uploading it (e.g. by drag and drop) into the destination system. The migration process had to be modified to handle this.

Validation of migrated content

A shortcoming of off-the-shelf-migration-tool we were trying to use is that after a migration, it did not validate that all content in the source system had been migrated into the destination system. We wanted to know that all versions of all files had been migrated and all the metadata was populated correctly.

For a site with tens of thousands of file versions it is too time consuming to manually validate the content. We needed to implement a validation process, a report that could be run after the content migration to compare content in the source and destination sites. Without this there would be no way to be certain that all the content had been migrated correctly.

In our case this process had to be relatively sophisticated as the structure of the content was different in the source and destination sites after flattening. Some of the values were also different between the two environments but needed to still be considered valid – e.g. the handling of orphaned users.

Copying version history

The first issue we ran into when writing our custom migration process was a fundamental one – we found that the SharePoint API has no mechanism to easily copy the version history of a document when you are moving content between two different sites.

We found that to copy a document, we needed to read all the versions from the source system and then re-create them one at a time (both the file data and the metadata), in the same order in the destination system. For each version we had to also override the created / modified dates and users so that they reflected the values in the source system.

This process was made more difficult when migrating content from a SharePoint 2010 environment as the newer REST API doesn’t exist in 2010 and significant parts of the CSOM API are missing. We had to integrate with the legacy ASMX / SOAP APIs to extract this data out of the 2010 system which was more time consuming than using the more modern APIs.

There were some additional complications for the migration process caused by some documents in the source system having gaps in their version history. A typical example would have a version history similar to the following: 1.0, 2.0, 3.5, 3.6, 3.7. This needed to be considered by the incremental migration process so that duplicate versions were not created to fill in the gaps.

Minor versions are not always needed

At face value it doesn’t seem like an excessive amount of content to migrate. The core content was ~3000 documents in one main library, ~6000 in an archive library and ~5000 spread across the various business unit sites. However, the amount of files that need to be migrated was significantly more when you included the version history. The main library had ~20 000 individual file versions that needed copying – and this is after all the minor versions had been deleted.

When we started the migration exercise, minor versions existed across all document libraries which increased the amount of content to migrate by an order of magnitude. After careful consideration by the customer, they were happy to delete the minor versions since to them they were essentially drafts and they only wanted to migrate published versions of documents (plus the current version regardless of whether it was draft or published).

In many cases, minor versions are just not needed as these are drafts and simply not required after a major version of a document is published. The problem is further exacerbated when documents are automatically saved in the background as users are editing in SharePoint online and with feature of the information panel in modern SharePoint that lets users edit one field at a time. Simply filling in the metadata for a document can result in 20-30 versions being created if users are not using the “edit all properties” function.

It goes without saying that this decision requires careful consideration and in many scenarios you will want to retain and migrate the minor versions. There may be some libraries however where the minor versions are not required – or in some cases only the latest version of a document may be required. It is always worth asking the question since if you don’t need to migrate some or all of the versions then you will save a significant amount of time when doing the content migration and you will be using less resources in your SharePoint Online environment.

Covering the virtues on minor versions is for another time, but it is puzzling as to why Microsoft don’t implement a policy of restricting the minor versions (OOTB) and allow customers to change it. This would save space and I think minor version management is a potential cost saving for businesses and Microsoft.

Moving a lot of content takes a lot of time

The 14,000 documents we had to migrate isn’t an excessively large amount of content for a SharePoint site. Even so, the reality of migrating large libraries is that downloading and uploading files (along with the API calls required to set all the metadata and orchestrate the migration process) takes a significant amount of time. We found that it would take ~24 hours (non-stop) to do a single migration pass of the 3000 documents in the core library and several days to the archive library.

The time it took wasn’t just a side effect of us implementing a custom migration process either. We found performance of our custom tool was comparable to that of the commercial migration product we were using. This may have been due to the throttling limitations that SharePoint places on API consumers. Regardless of how much bandwidth you have, you are limited in the amount of files you can upload in parallel before you start getting throttled.

It is also worth considering that many of the files in the environment are going to be larger than you might expect. For example you might encounter a library full of 40Mb PowerPoint files, each having 30-50 versions which can mean multiple gigabytes need to be transferred just to migrate each file.
This was compounded by the fact that we needed to re-migrate the content at various stages during the project after review / feedback from the customer. For example, after the solution for the orphaned users was implemented (described above), we needed to re-migrate the content to populate the additional columns containing this information.

Documents with quick parts

Word has a feature called “Quick Parts” where you can embed metadata values from a document library within the content of a Word document. These were used as a standard practice within this customer’s organisation which meant nearly every document used them for values such as the document number and document type. Unfortunately quick parts stop working when you move a document from one SharePoint environment to another – either with a migration tool or by manually downloading it and uploading it. If you read the support section of their websites, many migration tool vendors simply don’t support migrating documents with quick parts.

Anecdotal evidence online indicates that Microsoft’s recommended solution for this is to manually open each document and remove / re-add the quick parts after migration. This is practical if you have a handful of documents but not if you have thousands.

One of the first questions you should ask when starting a migration project is – are quick parts being used, and if so, how many documents use them? If the customer makes extensive use of quick parts then you may find that you simply cannot use a commercial migration tool.

We had to build a custom tool to scan documents for broken quick parts and fix them in-place. We plan to publish another article on this in the coming months.

Conclusion

Off-the-shelf-migration-tools have their place when moving essentially static content from a one SharePoint environment to another, but please don’t set your expectations too high. Our experience shows that you may come crashing down to Earth faster than you hoped. Good luck, migrating complex and bespoke SharePoint solutions to the cloud as this is not for the faint hearted!

Categories