A Guide to the Use of Equivalence Partitioning Skills During Data Migration Analysis and Mapping

Highlighting how the use of a skill that is associated with software testing can be applied in the context of data migration analysis.

Keith Hutchings

1/22/20245 min read

Equivalence partitioning is a term most commonly associated with software test design, where input values are classified into groups, or equivalence classes, that are expected to be processed in a similar manner by the software under test. The test designer then applies various design criteria to identify a limited number of test cases associated with each equivalence class. In this way the tester aims to test the input scenarios that they consider most likely to reveal faults in the software, distributing the test cases in a structured way across the input domain.

 In this blog I outline how this approach can be adapted to analyse data to be migrated and to structure the migration rules necessary to successfully transfer the data to a new system.

Data Migration Background?

Generally speaking, the objective of a data migration is to transfer data contained within a group of files or tables held in in a 'source' database or file system to a new datastore. The complexity of the migration depends on the degree to which the new, target, datastore differs in structure to that of the source, the difference in the data integrity rules that are enforced by the old and new data stores and the difference in the expected use of the data following migration.  So, for instance, a migration that moves a database into the cloud, with no change in structure or integrity rules to be used by the original system is expected to be of lower complexity than where the structure of the data is changed, new data integrity rules are applied in the target data store and a new system is to use the migrated information.

Definition of equivalence classes for data migration

In the context of data migration we can define equivalence classes as being any definition of source data that identifies a distinct set of records to be loaded into a target entity. This differs slightly from the context of software testing where the equivalence classes are allowed to overlap if that helps usefully identify additional test cases. In the context of data migration we are interested in distinctly identifying the entries to be placed into the target datastore, so the classes must not overlap. In the simple example, where the target datastore the structure, integrity and functional usage is unchanged compared to the source, as might be the case during a migration to the cloud, there would be just one equivalence class for each target entity - the records in the equivalent source entity. The value of this approach is more evident in more complex scenarios when target data structure, integrity or functional use is different to that of the source, as described below.

Differences in source and target data structure will necessitate some data transformation in order to populate the target entities, so an individual target entity may require multiple sources or transformations to be applied in order to fully populate. Also, depending on the target entity integrity rules, in circumstances where the source data is missing it may be acceptable to simply omit an entry or to populate columns with default values. Where a target entity maybe populated in multiple differing ways, each of these can be considered to be a different equivalence subclass for the target entity. The number of subclasses identified for a particular target entity will be a factor in the effort required to build the transformation logic, so an early estimate of this can be used as an input to the work and lead time estimation model.

Where data is being received from a third party, during client onboarding for example, it is likely that some of the source data content will violate integrity constraints of the target entity.  It may be possible to address these with adjustments to the transform rule that discard or amend the problem value, but initially at least it is useful to classify these as separate invalid equivalence classes and to report them so that subject matter experts (SMEs) can consider the nature and significance of the discrepancy.

As an example, considering the above categorisation, a particular target entity, Target 1, might be populated through a combination of the following equivalence classes:

• Target 1 from Source entity A

• Target 1 from Source entity B & C, transform rule (i)

• Target 1 from Source entity B & C, transform rule (ii)

• Target 1 from Default value

• Target 1 Invalid source data to be reported

Additional equivalence classes that are implied by rules embedded in target system functionality

In addition to ensuring that all of the required source data is addressed and that the integrity rules enforced by the target datastore are fulfilled, the classification needs to address one further constraint. This final layer of classification relates to integrity rules that are not enforced directly by the target datastore, but which are assumed to apply by the functionality that uses the target datastore.  Any such rule violations in the source data should initially be defined within the Invalid source data equivalence class, and if, subsequently, additional rules can be identified to acceptably transform these records, then they should be moved into a separate valid equivalence class linked to the new rule.

Identification of any such implicit rule violations is more challenging and is expected to occur only in circumstances where the target system functionality has not been designed to accommodate data imported from a particular source, is faulty, or where the source data does not comply with anticipated data integrity rule.  In the author's experience, this most commonly occurs where third party data is being received from another administrator during client onboarding, or where functional development of a replacement system development project has been conducted without sufficiently detailed consideration of the source data characteristics or without sufficient functional testing using data imported from the source.

Conclusion

The task of migrating data from one system to another can vary massively in complexity and hence costs and lead-times are similarly varied. When undertaking a migration project it is important to adopt an approach that can accommodate any level of complexity that will often start with wildly over optimistic hopes of the sponsors, but can reveal and explain data anomalies if they emerge during analysis and so engage stakeholders where necessary to address them and re-plan delivery. This can be achieved by systematically classifying the source data required to populate each target entity into equivalence classes as described above. Having classified the source data in this way the necessary transformations can be specified and automated.

Although data migration equivalence classes differ in important ways from those designed to structure functional test cases, it is possible that there will be significant overlap in the knowledge and skills necessary to perform these activities. So, in development programmes that combine a functional development and data migration projects, some useful resource sharing may result if the above approach is adopted.

Piersolutions Ltd has successfully migrated large volume transaction histories in various business contexts, structuring the analysis to understand the data set as above and then defining and automating the necessary migration logic. If you interested in the above approach and would like further information, please contact us.