Don't Get Duped When Duplicating Records

By Daniel Wood, 28 March 2023

dupe teaser

Introduction

On the surface, duplicating records in FileMaker is straightforward. We use the Duplicate Record script step to do this.

However much like everything in FileMaker, what appears simple on the surface can become rather technical and complex. There's always more than one way to skin a cat, as they say.

In this article we're going to explore some of the different approaches to duplicating multiple records and their relative performances.

We should note at this stage that these approaches were purely born out of experimenting and playing with different approaches in an attempt to find an optimal solution with regards to performance. Some of the approaches are certainly not recommended but are still mentioned for comparison and out of interest. You may find you learn something along the way!

 

Learning

 

Duplicating Parent and Child Records

Duplicating a single record is simple, but we will be duplicating many hundreds of records.  In addition, not only will these records be duplicated but they have related child records that also need to be duplicated. 

An important factor of parent/child duplication is the child must be reassigned to the new parent ID. This point becomes quite significant when deciding upon your approach to record duplication.

Duplicating

 

Example File!

As with all our articles, a lot of time and effort goes into our example files to help explain techniques used and to allow you to explore and try for yourself.

Please take time to download this as the article very closely follows the content of the example file — no email address required!

Click Here to Download

 

Trying some Different Approaches

The approaches taken to parent/child record duplication began with the first approach and subsequent approaches were born out of this by looking at ways to change it and potentially improve upon it by hunting out optimisations in the process.

In the end we came up with 7 different approaches. We must warn you not all approaches are great. We left in the bad ones because you can still learn something from them, and hey — you may just come up with a better solution yourself — we always love feedback!

Duplicate

  

Our Test Scenarios

We ran each duplication test a series of 5 times and took the average duration of each test to be our result. Taking the average means we can try to eliminate minor server performance fluctuations. Tests were performed on a Mac Mini M1 machine running FileMaker Server 19.6.3 with no connected users.

The tests were run using Perform Script on Server and before each test we cleared out any previously duplicated test data.

We ran ten variations per approach. We incremented the number of parent records being duplicated by 100, for a total of ten tests ranging from 100 to 1,000 parent records.

Finally each parent record duplicated had about 6 child records residing in two different child tables for a total of 12 child records overall.  All child records were duplicated and assigned their new parent ID.

 

Approach 1: Loop From End

Our first approach is the simplest approach. It involves looping through parent records.  Looping is itself an entire subject worth delving into (click here for an earlier article on this topic).

Duplicate

The above is the loop method being used in this approach. Important is to note the following aspects:

  • Looping begins at the end of the found set.
  • After we duplicate we omit both the new record (last record) and the parent record (one prior to that).
  • We proceed to duplicate any child records
  • The loop ends when we have no more records in the parent found set.

This approach is called looping from end because we start at the end of the found set, working backwards.

For each parent record duplicated in this fashion we repeat the exact same procedure for any child records. 

Child records are found by first navigating to the child layout and performing a find for any child records that are related to the parent. If any records are found they are duplicated using looping from end.

Pros:

A benefit of this approach is that the script is very simple to follow and understand for other developers as well as easy to write. It is also easy to add in extra child tables for duplication if required in future.

Cons:

A downside to this approach is that the resulting duplicated records are in reverse creation order to that of the originals. When we loop from the end we are duplicating the newest record first and working back to the oldest.

This may not seem like an issue but if you are duplicating records where you expect them to be present to the user in their original creation order, then this will not work unless you sort them. Sometimes you simply don't wish to sort records and so if that is the case this approach may not be for you.

Another often overlooked issue with reverse looping is how FileMaker pre-fetches records from server. When you are looking at a layout that has at least one field on it, FileMaker will download from the server the next set of records in the found set. When on Form View this is the next 25 records. For List View it is the visible records plus 50 more, and the same rule applies for Table View.

FileMaker does this to allow for smooth scrolling through records as it makes the assumption you will be doing just that. This only works forwards however, not backwards.

Consider starting at the end of a found set. No pre-fetching occurs because there are no records ahead in the found set. Now when you navigate back 1 record FileMaker must query and download this from the server as it has not been pre-fetched. Repeating this backwards through a found set requires an individual query and download of each record to occur.

 

Approach 2: Loop From Start

To help combat the reverse creation order issue presented in the first approach as well as pre-fetching, a small change can be made to simply loop from the start of the found set rather than the end.

Duplicate

Pros:

That's it. The only real change from approach 1.  The benefit now is we do get our duplicates in the same creation order as the originals. We also utilise pre-fetching as FileMaker will give us at least the next 25 records in the found set pre-fetched to reduce the number of server queries for record data required.  This fact also means doing a loop from Start in list or table view is preferable as more records will be pre-fetched.

Cons:

Not necessarily a bad thing but it can be noted that we now require some extra record navigation to duplicate records. Duplicate records are added to the end of the found set, so after we omit the duplicate, we have to jump back to the first record to omit that. With Approach 1 this was not required. 

In the grand scheme of things this record navigation is probably going to have a negligible impact.

 

Reducing Overhead

The first two looping approaches are simple to follow and robust. However keen eyes might have spotted in the example file that there is quite a lot of finding occurring as well as layout switching.

For each parent record we navigate to both child layouts and perform a find. We duplicate any found records before switching back to the parent layout and continuing.

That's a lot of layout switching! If duplicating 100 parent records, this means upwards of 300 layout switches and 200 finds!

With this in mind the next approaches aim to eliminate the need for these finds and switches as much as possible.

  

Approach 3: Go to Related Record - Per Record

Relationships can be created between parent & child and these relationships can be used to test whether any child records exist. By checking first we can eliminate the need to switch layout and search for child records.

Furthermore, the relationship itself can act as a means to navigate to the child records for duplication rather than doing a find.

This approach makes use of this fact by first testing child relationships for records and using the Go to Related Record script step to navigate to the child records for duplication.

Duplication of a found set is still achieved through looping using Approach 2.

Duplicate

Pros:

The significant benefit is that we eliminate needless layout switching and finding by testing the relationship instead and using Go to Related Record for navigating to the child records.

Cons:

Whether this is a con or not is all in the eye of the beholder. This method does require you to establish a relationship from your parent context to your child tables.  If you've somehow managed to build a FileMaker database without relationships (shame on you!) then you could see this as being a nuisance but realistically most solutions benefit from having parent/child relationships.

 

Go to Related Record…

At this point we were pretty happy with Approach 3. However there was still some potential here for improvement. So what in the approaches could still be optimised?

We looked to the Go to Related Record script step. In the previous approach we used this step to duplicate child records for each parent record. If we were duplicating 1,000 parent records then at worst that means 1,000 Go to Related Record steps per child table.

What if we could do the same thing but just with a single Go to Related Record step?

Well the step does give us the ability to navigate to child records for all records in the found set - can we use this?

Duplicate

The plan was as follows:

  • Loop and duplicate parents (but keep the original found set - do not omit records)
  • Go to all related child records for all parent records in a single step
  • Loop and duplicate all children.

Sounds great! There is however just one problem...

Child records must be assigned the new parent ID. When using Approach 1-3 this is simple because we're just storing the new parent ID into a variable, and setting it onto the children.

However with this modifed approach we don't know what the new parent ID is for a given child record. In order to know this we have to create a mapping between the original parent ID, and it's duplicate ID.

If we can establish a mapping structure then when we duplicate children we can consult the mapping to derive the new parent ID by looking up the old one.

 

Approach 4: Go to Related Record Found Set - List

Approach 4 is the first mapping approach and uses lists to map original parent IDs to their new duplicate IDs. These lists are maintained in local variables.

When it comes time to duplicate child records, we take the existing parent ID on the child record, and use it to look up it's position in the list of original parent IDs. This position can then be used to look up the corresponding parent ID from the new duplicate list.

Duplicate

Pros:

A benefit of this is we only require a single Go to Related Record script step to find and duplicate children records rather than one per parent record.

Creating the mapping is pretty simple and we are just using the List function to compile lists of old and duplicate IDs as we go.

Cons:

Having to maintain the mapping is itself a negative - the reasons why will become apparent later when we discuss results.

The overhead required to generate the mapping and to retrieve the new parent ID from this mapping is quite expensive and at high record duplication numbers will just completely negate any performance gained by doing a single GTRR step.

It's an elegant approach overall and one that does work well for low record duplication counts - but once you hit a certain point the cost outweighs the benefit.

  

Approach 5: Go to Related Record Found Set - JSON

Because of the list methods failed performance due to it's creation, management and retrieval of parent IDs, we looked to instead use JSON as our mapping mechanism.

JSON is great at storing information and in this case we are going to use simple key/value pairs to be the mapping between old and new parent IDs.

We simply add a mapping to the JSON using the following:

JSONSetElement ( theJSON ; parentID ; duplicateID ; JSONSTRING )

When needing to retrieve the duplicate ID we simply use:

JSONGetElement ( thsJSON ; parentID ) 

Pros:

You could consider the use of JSON a more robust method of mapping parent/duplicate IDs but that's about it!

Cons:

JSON functions in FileMaker are terrible! Their performance at large sizes is just garbage. Again at low duplicate numbers it's acceptable but for anything over a couple hundred the performance of these functions just ruin any perfromance gain.

  

We failed?

The mapping approaches in 4 and 5 in theory should be faster than layout switching all the time but the cost of having to establish a mapping between parent and duplicate IDs is just far too great. Until a better and much more efficient method of mapping can be found, these approaches only work well at low duplicate counts.

Duplicate

So what else is there to try?

  

Approach 6: Export / Import

The Import Records script step is another way in which records can be created in a table. What if we exported all required records to disk and then imported them straight back in to the same table?

This approach negates the need for any looping necessary for record creation.

Duplicate

We export a found set and import it back in as the new set of duplicates.

However we do still suffer from the issue of having to establish a mapping and use this to assign new parent IDs to the child records. Once we establish a mapping then they can be assigned using the "Replace Field Contents" script step as opposed to a loop.

Pros:

There are no loops in this approach as the import is the process of creating duplicates.

Cons:

The Export Records script step is not very server friendly because it cannot be used to export a FileMaker file. Nor can you import from a FileMaker file on server.

This means if the process is to be run on server then things like container fields could not be duplicated (unless you did something like base64 encode them as text for the export but that would negate performance).

This technique is best suited for smaller text only record structures.

 

An Idea about Improving Mapping....

The key issue with our approaches is the performance of mapping old/new IDs when it comes to using GTRR or Export/Import.

Mapping performance seems to be slow at large duplication sizes for two reasons:

  • The cost of compiling the lists.
  • The cost of retrieval of duplicate IDs from the list.

In thinking about this we came up with a variation on Approach 4 which attempts to eliminate the need for using the List function at each step of duplication. Rather than add one item to the list during each iteration, we instead compile the list before and after duplication.

 

Approach 7: Go to Related Record Found Set - List ++

This final approach makes use of the List Of summary function type. 

To obtain the original list of parent IDs we just store the summary fields value into a variable.

Next, we need to duplicate the parent records such that at the end of this process we are on a found set just of the new duplicate records.

This is fairly easily done by a small tweak to the looping process.

Duplicate

We do this in a new window to preserve our original found set for later.  Next we store the very first duplicates ID into a variable - this will act as our stopping point in the loop.

As we duplicate we omit the parent only from the start of the found set, and we stop once we reach that very first duplicate record created.

Now we're on a found set of duplicates and we can again use the summary field to return the list of duplicate IDs.

Pros:

We do gain a little performance here using this method over the List function.

Cons: 

Still at high record counts it's not great, most likely due to method of ID retrieval from the mapping.

 

We had to stop somewhere!

This article took so long to write because at every point in the process we kept finding new ways to optimise and tweak approaches. Tests had to be rewritten and re-run and so the whole thing would just keep going forever without drawing a line in the sand.

We invite any readers to attempt this approach and let us know in the comments how you go and how your performance compares to the other methods. 

Some other ideas explored include:

  • Sorting found sets by parent ID before duplication. This way we know the order of mapping, and if we sort child records by the parent ID, we can be sure their order matches the mapping. The means of retrieval of the duplicate ID from the mapping can then be changed to look up a value based on tracking a position in the list as you loop rather than always looking up from the list (where the ID could be in any random position).
  • Combinations of techniques - can you use export/import for child duplication, and looping for parent generation?
  • Does Replace Field Contents work better for assigning new parent IDs to child records?

Also if you have any other ideas around optimisations we'd love to hear them. 

  

Overall Results

So with all that said and done here are our final results for performance of tests (times are in milliseconds)

  Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 Test 7
100   711  676  645  473  482  326  468
200  1423  1356  1260  1022  1058  671  966
300  2147  2080  1871 1640   1751 1098   1513
400  2756  2667  2495  2320  2537  1544  2154
500  3619  3431  3125  3115  4167  2046  2814
600  4121  4211  3740  4179  5536  2630  3518
700  4901  4883  4360  5298  6851  3293  4441
800  5559  5314  4981  6501  8242  4023  5482
900  6181  6100  5581  7792  9771  4737  6579
1000  7120  6728  6299  9204  11351  5531  7448

 

dupe chart1

 

dupe chart2

 

What Did We Learn?

What we learnt could be summarised as follows

  • Looping is simplest and easiest for developers to write and understand and works well at low duplicate counts.
  • GTRR and mappings is even better for lower record counts, but can suffer at higher duplicate counts.
  • Export/Import works great and is fast so long as your conditions for using it are correct.
  • The methods can always be optimised further.

   

In Conclusion

Realistically speaking the need for massive amounts of record duplication is low and infrequent. The intention of this article is more to give you an understanding of how different approaches can be employed to solve a single seemingly simple situation. Every step used can be considered and may have an impact on performance. Not all techniques are linear - some start off good but get progressively worse compared to others as record sizes increase. Never assume a technique is going to scale linearly.

 

Example File

A final word – please check out our example file :)

Click here to download the example file if you haven't already.

 

 

Something to say? Post a comment...

Comments

  • Daniel Wood 29/03/2023 9:22am (1 year ago)

    Hi Dan, thanks for your comment. Interesting you should mention repeating variables, as one approach I tried but didn't make the cut due to it being so bad was to use a repeating variable as the mapping.

    The idea was prior to duplicating the parent records I sort them by primary key. I then just insert each into a successive repetition of a variable e.g. $id_duplicates

    The mapping becomes a repetition number to an ID.

    Then, when duplicating child records, I also sort them by primary key so they're in the same order. As I loop through and duplicate records I maintain a counter. Each time you move from one primary key value to another, you increment this counter, and this counter tells you which repetition in $id_duplicates to pull the new ID from.

    In theory it sounds like it should be optimal however the sorting process itself added significant overhead, and my results were way higher than I had anticipated. Maybe I made a mistake in my code and should revisit this.

    The sorting approach for mapping however can and was attempted to be used for other approaches to avoid having to look up positions in lists and retrieve values from certain positions. Again the sort overhead negated benefit.

  • Matt Petrowsky 29/03/2023 9:16am (1 year ago)

    Great content as always Daniel! Always nice to see someone in the community take on a performance challenge. To that end I would like to suggest a method which may not have been considered.

    While it's not always the most performant, exporting a subset of data as XML means it's already styled in the format that FileMaker expects. If you don't need to export/import container fields, then exporting as XML means you can limit the export to only the data fields you need duplicated. To bring that same data back in, yet be able to duplicate and associate to child records, you can use an additional field and simply use FileMaker's Data File script steps (or use a plug-in) to change just one single value within the XML export.

    In the case of your example file, finding the byte offset of NAME="zzrec_id" and simply changing it to NAME="parent_id" allows me to import the desired data with the parent id shifted into another field. By allowing perform auto-enter to run on the import, all new duplicate parent records get their own new id. You can now do a variety of things to replicate the child record sets.

    I would personally try exporting the full set of child records and via a dedicated, yet isolated set of TOs with relationships based on the alternate key fields, run a Replace Field Contents to reconnect children based on the new parent ID.

    Just an idea, given that it's really easy to manipulate a data file on disk and you only need to change the name of the field in one location within the XML file. Again, however, it won't work if you have container fields you want to replicate.

    Again, thanks for the contribution and maybe my idea is some food for thought.

  • Dan Shockley 28/03/2023 6:41pm (1 year ago)

    Love your articles as always!
    One thing that might help quite a bit with speed is using the “named buckets” technique for storing your mapping between the old parent IDs and the new ones. It turns out that using a repeating variable to handle a large quantity of things is much more performance than a return-delimited list (or JSON, I assume).

RSS feed for comments on this page | RSS feed for all comments

Categories(show all)

Subscribe

Tags