By Daniel Wood, 28 March 2023
On the surface, duplicating records in FileMaker is straightforward. We use the Duplicate Record script step to do this.
However much like everything in FileMaker, what appears simple on the surface can become rather technical and complex. There's always more than one way to skin a cat, as they say.
In this article we're going to explore some of the different approaches to duplicating multiple records and their relative performances.
We should note at this stage that these approaches were purely born out of experimenting and playing with different approaches in an attempt to find an optimal solution with regards to performance. Some of the approaches are certainly not recommended but are still mentioned for comparison and out of interest. You may find you learn something along the way!
Duplicating a single record is simple, but we will be duplicating many hundreds of records. In addition, not only will these records be duplicated but they have related child records that also need to be duplicated.
An important factor of parent/child duplication is the child must be reassigned to the new parent ID. This point becomes quite significant when deciding upon your approach to record duplication.
As with all our articles, a lot of time and effort goes into our example files to help explain techniques used and to allow you to explore and try for yourself.
Please take time to download this as the article very closely follows the content of the example file — no email address required!
The approaches taken to parent/child record duplication began with the first approach and subsequent approaches were born out of this by looking at ways to change it and potentially improve upon it by hunting out optimisations in the process.
In the end we came up with 7 different approaches. We must warn you not all approaches are great. We left in the bad ones because you can still learn something from them, and hey — you may just come up with a better solution yourself — we always love feedback!
We ran each duplication test a series of 5 times and took the average duration of each test to be our result. Taking the average means we can try to eliminate minor server performance fluctuations. Tests were performed on a Mac Mini M1 machine running FileMaker Server 19.6.3 with no connected users.
The tests were run using Perform Script on Server and before each test we cleared out any previously duplicated test data.
We ran ten variations per approach. We incremented the number of parent records being duplicated by 100, for a total of ten tests ranging from 100 to 1,000 parent records.
Finally each parent record duplicated had about 6 child records residing in two different child tables for a total of 12 child records overall. All child records were duplicated and assigned their new parent ID.
Our first approach is the simplest approach. It involves looping through parent records. Looping is itself an entire subject worth delving into (click here for an earlier article on this topic).
The above is the loop method being used in this approach. Important is to note the following aspects:
This approach is called looping from end because we start at the end of the found set, working backwards.
For each parent record duplicated in this fashion we repeat the exact same procedure for any child records.
Child records are found by first navigating to the child layout and performing a find for any child records that are related to the parent. If any records are found they are duplicated using looping from end.
A benefit of this approach is that the script is very simple to follow and understand for other developers as well as easy to write. It is also easy to add in extra child tables for duplication if required in future.
A downside to this approach is that the resulting duplicated records are in reverse creation order to that of the originals. When we loop from the end we are duplicating the newest record first and working back to the oldest.
This may not seem like an issue but if you are duplicating records where you expect them to be present to the user in their original creation order, then this will not work unless you sort them. Sometimes you simply don't wish to sort records and so if that is the case this approach may not be for you.
Another often overlooked issue with reverse looping is how FileMaker pre-fetches records from server. When you are looking at a layout that has at least one field on it, FileMaker will download from the server the next set of records in the found set. When on Form View this is the next 25 records. For List View it is the visible records plus 50 more, and the same rule applies for Table View.
FileMaker does this to allow for smooth scrolling through records as it makes the assumption you will be doing just that. This only works forwards however, not backwards.
Consider starting at the end of a found set. No pre-fetching occurs because there are no records ahead in the found set. Now when you navigate back 1 record FileMaker must query and download this from the server as it has not been pre-fetched. Repeating this backwards through a found set requires an individual query and download of each record to occur.
To help combat the reverse creation order issue presented in the first approach as well as pre-fetching, a small change can be made to simply loop from the start of the found set rather than the end.
That's it. The only real change from approach 1. The benefit now is we do get our duplicates in the same creation order as the originals. We also utilise pre-fetching as FileMaker will give us at least the next 25 records in the found set pre-fetched to reduce the number of server queries for record data required. This fact also means doing a loop from Start in list or table view is preferable as more records will be pre-fetched.
Not necessarily a bad thing but it can be noted that we now require some extra record navigation to duplicate records. Duplicate records are added to the end of the found set, so after we omit the duplicate, we have to jump back to the first record to omit that. With Approach 1 this was not required.
In the grand scheme of things this record navigation is probably going to have a negligible impact.
The first two looping approaches are simple to follow and robust. However keen eyes might have spotted in the example file that there is quite a lot of finding occurring as well as layout switching.
For each parent record we navigate to both child layouts and perform a find. We duplicate any found records before switching back to the parent layout and continuing.
That's a lot of layout switching! If duplicating 100 parent records, this means upwards of 300 layout switches and 200 finds!
With this in mind the next approaches aim to eliminate the need for these finds and switches as much as possible.
Relationships can be created between parent & child and these relationships can be used to test whether any child records exist. By checking first we can eliminate the need to switch layout and search for child records.
Furthermore, the relationship itself can act as a means to navigate to the child records for duplication rather than doing a find.
This approach makes use of this fact by first testing child relationships for records and using the Go to Related Record script step to navigate to the child records for duplication.
Duplication of a found set is still achieved through looping using Approach 2.
The significant benefit is that we eliminate needless layout switching and finding by testing the relationship instead and using Go to Related Record for navigating to the child records.
Whether this is a con or not is all in the eye of the beholder. This method does require you to establish a relationship from your parent context to your child tables. If you've somehow managed to build a FileMaker database without relationships (shame on you!) then you could see this as being a nuisance but realistically most solutions benefit from having parent/child relationships.
At this point we were pretty happy with Approach 3. However there was still some potential here for improvement. So what in the approaches could still be optimised?
We looked to the Go to Related Record script step. In the previous approach we used this step to duplicate child records for each parent record. If we were duplicating 1,000 parent records then at worst that means 1,000 Go to Related Record steps per child table.
What if we could do the same thing but just with a single Go to Related Record step?
Well the step does give us the ability to navigate to child records for all records in the found set - can we use this?
The plan was as follows:
Sounds great! There is however just one problem...
Child records must be assigned the new parent ID. When using Approach 1-3 this is simple because we're just storing the new parent ID into a variable, and setting it onto the children.
However with this modifed approach we don't know what the new parent ID is for a given child record. In order to know this we have to create a mapping between the original parent ID, and it's duplicate ID.
If we can establish a mapping structure then when we duplicate children we can consult the mapping to derive the new parent ID by looking up the old one.
Approach 4 is the first mapping approach and uses lists to map original parent IDs to their new duplicate IDs. These lists are maintained in local variables.
When it comes time to duplicate child records, we take the existing parent ID on the child record, and use it to look up it's position in the list of original parent IDs. This position can then be used to look up the corresponding parent ID from the new duplicate list.
A benefit of this is we only require a single Go to Related Record script step to find and duplicate children records rather than one per parent record.
Creating the mapping is pretty simple and we are just using the List function to compile lists of old and duplicate IDs as we go.
Having to maintain the mapping is itself a negative - the reasons why will become apparent later when we discuss results.
The overhead required to generate the mapping and to retrieve the new parent ID from this mapping is quite expensive and at high record duplication numbers will just completely negate any performance gained by doing a single GTRR step.
It's an elegant approach overall and one that does work well for low record duplication counts - but once you hit a certain point the cost outweighs the benefit.
Because of the list methods failed performance due to it's creation, management and retrieval of parent IDs, we looked to instead use JSON as our mapping mechanism.
JSON is great at storing information and in this case we are going to use simple key/value pairs to be the mapping between old and new parent IDs.
We simply add a mapping to the JSON using the following:
JSONSetElement ( theJSON ; parentID ; duplicateID ; JSONSTRING )
When needing to retrieve the duplicate ID we simply use:
JSONGetElement ( thsJSON ; parentID )
You could consider the use of JSON a more robust method of mapping parent/duplicate IDs but that's about it!
JSON functions in FileMaker are terrible! Their performance at large sizes is just garbage. Again at low duplicate numbers it's acceptable but for anything over a couple hundred the performance of these functions just ruin any perfromance gain.
The mapping approaches in 4 and 5 in theory should be faster than layout switching all the time but the cost of having to establish a mapping between parent and duplicate IDs is just far too great. Until a better and much more efficient method of mapping can be found, these approaches only work well at low duplicate counts.
So what else is there to try?
The Import Records script step is another way in which records can be created in a table. What if we exported all required records to disk and then imported them straight back in to the same table?
This approach negates the need for any looping necessary for record creation.
We export a found set and import it back in as the new set of duplicates.
However we do still suffer from the issue of having to establish a mapping and use this to assign new parent IDs to the child records. Once we establish a mapping then they can be assigned using the "Replace Field Contents" script step as opposed to a loop.
There are no loops in this approach as the import is the process of creating duplicates.
The Export Records script step is not very server friendly because it cannot be used to export a FileMaker file. Nor can you import from a FileMaker file on server.
This means if the process is to be run on server then things like container fields could not be duplicated (unless you did something like base64 encode them as text for the export but that would negate performance).
This technique is best suited for smaller text only record structures.
The key issue with our approaches is the performance of mapping old/new IDs when it comes to using GTRR or Export/Import.
Mapping performance seems to be slow at large duplication sizes for two reasons:
In thinking about this we came up with a variation on Approach 4 which attempts to eliminate the need for using the List function at each step of duplication. Rather than add one item to the list during each iteration, we instead compile the list before and after duplication.
This final approach makes use of the List Of summary function type.
To obtain the original list of parent IDs we just store the summary fields value into a variable.
Next, we need to duplicate the parent records such that at the end of this process we are on a found set just of the new duplicate records.
This is fairly easily done by a small tweak to the looping process.
We do this in a new window to preserve our original found set for later. Next we store the very first duplicates ID into a variable - this will act as our stopping point in the loop.
As we duplicate we omit the parent only from the start of the found set, and we stop once we reach that very first duplicate record created.
Now we're on a found set of duplicates and we can again use the summary field to return the list of duplicate IDs.
We do gain a little performance here using this method over the List function.
Still at high record counts it's not great, most likely due to method of ID retrieval from the mapping.
This article took so long to write because at every point in the process we kept finding new ways to optimise and tweak approaches. Tests had to be rewritten and re-run and so the whole thing would just keep going forever without drawing a line in the sand.
We invite any readers to attempt this approach and let us know in the comments how you go and how your performance compares to the other methods.
Some other ideas explored include:
Also if you have any other ideas around optimisations we'd love to hear them.
So with all that said and done here are our final results for performance of tests (times are in milliseconds)
|Test 1||Test 2||Test 3||Test 4||Test 5||Test 6||Test 7|
What we learnt could be summarised as follows
Realistically speaking the need for massive amounts of record duplication is low and infrequent. The intention of this article is more to give you an understanding of how different approaches can be employed to solve a single seemingly simple situation. Every step used can be considered and may have an impact on performance. Not all techniques are linear - some start off good but get progressively worse compared to others as record sizes increase. Never assume a technique is going to scale linearly.
A final word – please check out our example file :)