By Daniel Wood, 13 August 2011
In the first of a two part series, we begin the search for a faster way to perform aggregates in FileMaker. We cover why aggregates can be quite a costly operation - especially on wide area networks - and use that as a jumping off point for finding a faster method.
An aggregate is a collection of items gathered together to form a total quantity. In FileMaker terms, the collection of items are records, and functions can be applied to the quantity to obtain useful information such as the sum, count, average, minimum and maximum to name a few.
Aggregates generally come in two varieties. First are summary fields. These are commonly used on sub-summary reports and are placed in the various layout parts to summarize records in parts of the report. Summary fields can also be used to summarize record values through a relationship/portal.
The second type are in the form of functions that perform the aggregate operation on its passed parameters. You can pass them a specified set of values as parameters, but a more common usage is to pass them a related field using a relationship. This performs the aggregate on the field value from all related records, returning you the summarized result. Examples of these functions are list, sum, average, min, max, and a couple of other less commonly used ones.
Sum ( Home to Contacts for All Contacts::Weight )
If you do enough development in FileMaker you will undoubtedly have come across both summary fields and aggregate functions and so are familiar with how they work.
Both summary fields, and aggregate functions are hugely useful and beneficial features of FileMaker, and run great on local databases. However if you have ever tried to perform an evaluation of a summary field or aggregate function over a local or wide area network, you may have seen the following dialog from time to time:
A real favorite among developers and clients. Why do we see this dialog more often than not on a remotely hosted solution, and why does it sometimes take longer to process than other times? The answer lies in what is actually happening when a summary field or aggregate function is evaluated.
The caveat here is that the entire set of records being aggregated is being sent from host to client (with the exception of container fields, summary fields and unstored calculations).
To put it in other terms, if you want to find an aggregate of a single field in any given table, then every single record you aggregate is downloaded from the host. You want to obtain the total of a field from 10,000 records, then hope you got some hard disk space handy cos they're all heading your way! The second you reference anything on any record, that entire record is sent to you.
It can be summarized (ha) in single sentence that Mark Richman gave in his Devcon session this year about improving WAN performance:
If you touch it, you own it
I can only assume that the reason for this is due to the fact all of the processing in determining the result happens on the client-side of things, and FileMaker doesn't exactly do partial transfer of records, it's all or nothing.
So, now that we have established aggregate operations require the transfer of entire record sets from host to client, we can begin to identify some key factors in their performance:
All of these things serve to impede performance of aggregate functions. This article in particular will focus on the issue around entire records having to be sent from host to client. The performance hit is generally much worse the first time an aggregate is evaluated as it is at this point the records are sent. After that, aggregate functions are able to make use of the cached records on the client, so generally perform much better after that. However once you close the database and reopen, all of that cached data is lost, and you must re-download the records all over again to obtain aggregates.
This challenge has been somewhat of an obsession of mine for the past month. There have been numerous failed attempts to crack this issue and find a technique that would allow faster aggregate speed. While the technique proposed is by no means perfect, think of it more as an exercise in "can it be done", as we walk through the trial an error process of trying to find a faster method of doing aggregates.
I should point out that the technique found, and explained in this article is by no means perfect, in fact it does have some limitations which make it not a good candidate for replacing aggregate functions in some situations. However there are other situations where it would be well suited, and greatly outperform its summary field or aggregate function counterparts. Hopefully the remainder of this article not only gives you a possible technique you could apply, but also gives you some ideas of your own which you could develop, and explain some of the trial and error process behind looking for a technique :-)
This was the single most important question to answer to begin looking for a solution. The main bottleneck in aggregates is the downloading of records being aggregated. If we could somehow eliminate this aspect, it would give us a good chance of greatly improving aggregate speed.
There aren't a whole lot of things in FileMaker that only send us partial information about records, a couple of those identified were:
Get(FoundCount) was used in this article to carry out very fast count aggregate operations, but unfortunately it doesn't have any use for the other aggregate types. Get(RecordID) is handy to know about insomuch as obtaining it for a record will not send you the entire record. Perhaps not useful for aggregates, but there is potential there, and we are currently working with it to look for useful applications in the areas of fast record sorting.
The last item is value lists.
Value lists can be setup in a few various ways, such as custom values, using a list from another file, obtaining a field value either from all records in a table, or a subset via a relationship.
In testing all of these possible value list combinations, we found that there is one specific setup of a value list that outperforms the rest in terms of traffic sent between host and client, and that is a field-based value list using all values, in other words all records in the table.
When the above configuration of a value list is used, record data from all records is not sent from host to client, instead, it appears that a form of the fields index is sent instead, or perhaps just the field values for that specific field.
We are not entirely sure why, but hypothesize that the reason for this is no query is required from the client to obtain the value list from the host. FileMaker simply sees that the value list is for every record in the table, and decides it might just be faster to send the index for that field to the user rather than every single record (hopefully someone could explain exactly what is happening here).
In any event, the host is not sent entire record data, which is VERY interesting for us.
Let's assume for now that our end goal is to obtain the sum of a field called Weight across every record in the table. If we take what we know now, the next logical step may be to build a value list of all values from the Weight field. We could then obtain the value list contents using the ValueListItems function. With a list of the weights from every record obtained, we could then figure out a nice way to sum each value in that list to obtain the overall sum, sounds simple enough :-)
The primary purpose of a value list is a means to assist in selecting data for data entry, perhaps via a drop-down menu or popup, or checkboxes. Because of this, there is no real need to duplicate values in a value list. If two records contain the same value in a field, then both values are not shown in the list. The value list simply combines them into a single entry.
This had major implications for us in devising this technique. If we have 10,000 records in our table, then we want our value list to contain values for every single record with a non-empty value in the field - duplicates included. It is no use to us if 100 records with the same weight value end up as a single value in the value list, because when we go to add up all items in the value list, we are not going to get the correct result.
The challenge now was to somehow modify the value list, so that we would obtain all values from the records, rather than just unique values.
Hey what about the above setup of a value list? We keep the first field as the field we want to aggregate, but we make use of the "show second field" and set it to our primary key field that we know to be unique. Surely this will now force the value list to keep same values in the Weight field as separate entries in the value list because we know each value in Contact ID is unique, right?
As we discovered, as soon as you introduce a second display field into any value list, be it "All Values" setup, or related field setup, this will trigger FileMakers knee-jerk reaction of sending you every single record in its entirety.
So with that option dashed, we were left with the question of how to build this value list using only a single field value. We know that the field must contain the weight field as this is what we are going to use to aggregate. However records with the same value have to be kept unique from one another.
The solution here was to build a new stored calculation field. This would comprise of both the weight and the contact ID. If we could make this field unique, and use it in the value list, then maybe there is a way at the other end to parse the contact ID out and leave us with just the weight values, which is what we are after.
Ahh, another problem! The weight field is a number, but in our database our primary key field is also a number - a serial number. If we combine these two fields, how will we know what is what when it comes time to parse out the contact ID later on?
We looked at a number of ways in which we could possibly do this. Some involved building recursive custom functions. We could insert the Contact ID into the calculation wrapped in special identifying characters, which we could then identify later on and use to rip out the Contact ID. The problem with this is that recursion is itself a costly process and the overhead in recursing through a huge list may in fact negate any performance benefit from the technique overall. A better way would be to find a native FileMaker function to use which was non-recursive and fast.
For this particular situation we realised that the weight field will never contain any non-numerical digits, and the same applied for the Contact ID. Perhaps we could simply turn the contact ID into an alphabetical string rather than numerical.
For example, if we had a Weight of 100.5, and our contact ID was 1234, our calculation might begin looking something like:
Weight & Contact ID
which would give us:
Not very nice, but if we did a simple number to character translation, what if we made it:
and then after we obtain the value list items, we could strip out the Contact ID section simply by using the Filter function:
Filter ( value ; "0123456789.-ÃÂ¶" )
This would leave us with just the numerical component, minus our contact ID whose only purpose was to make the entry in the value list unique. This is exactly what we did.
The first step was to build a straightforward custom function to carry out the translation of the Contact ID into an alphabetical string:
[ "0" ; "A" ] ;
[ "1" ; "B" ] ;
[ "2" ; "C" ] ;
[ "3" ; "D" ] ;
[ "4" ; "E" ] ;
[ "5" ; "F" ] ;
[ "6" ; "G" ] ;
[ "7" ; "H" ] ;
[ "8" ; "I" ] ;
[ "9" ; "J" ]
Then, our value list calculation field looks something like:
Weight & numToText ( Contact ID )
We can now be sure that every record, given it has a unique Contact ID, will now also have a unique result to the above calculation too. This field provided the basis of our single-field "all values" value list!
Note that if your primary key is a combination of numeric and alpha characters you will need to pay more attention to converting the primary key into something non-numeric. You may need to translate your numeric components of the primary key into characters you know do not currently exist anywhere in your primary keys, that way you can be sure that any conversion from the numbers to characters will not end up in records having the same converted primary key value for this calculation.
This basically concludes the first part of this two-part series. In this article so far we have identified that the reason why aggregates are slow in FileMaker comes down to a number of factors, the main one being that any aggregate operation on a set of records requires all records to be sent from host to client. Number of records, size of records, bandwidth and latency all tie into this to impede performance.
In the article we sought to identify something in FileMaker which will let us obtain values from records we wish to aggregate, but without sending us the entire record contents. We found that value lists were a key candidate for this.
The value lists we tested yielded one particular configuration in which a much smaller amount of data was transmitted between host and client when the value list was evaluated. This was value lists that use a field value from a table, and are set to use "All Values" from that table.
We identified an issue with this particular value list configuration in that all common values across records are combined into a single value in the value list. Because our goal is to aggregate from all records, we needed each record in the table to be represented in the value list with its weight value, respecting duplicates in the list.
We then tried to achieve this using a value lists "show value from second field" option, but found that enabling this meant FileMaker transmitted entire record data, something we are trying to avoid.
The only option left to us was to build a calculation that was unique for every record, but contained the weight value we required. We needed to construct the calculation not only to be unique, but to be fast and easy for us to obtain just the weight value after the value list was evaluated.
To do this we built the calculation to be a combination of the weight value, and the records primary key field. To prevent mixup of the two fields, we first converted the primary key from a numerical value into a strictly alphabetical text string. This then allowed us later on to use the Filter function to easily strip out any non-numeric characters - such as the primary key - leaving us with the original weight values.
This article turned out to be a bit too large to cram into a single article, so we have split it into two. This first part was identifying the issue, and constructing a path toward a solution.
In part two, we are going to illustrate the technique in action, introduce and solve a couple more issues, and give some performance results across both a locally hosted solution and a wide area network.
Please find attached an example file. This file was used for all the screenshots in this article, and is provided to help you fully understand what is going on in this article, and to let you experiment in FileMaker with this solution.