The Search for Fast Aggregates - Trial & Error

By Daniel Wood, 13 August 2011

In the first of a two part series, we begin the search for a faster way to perform aggregates in FileMaker. We cover why aggregates can be quite a costly operation - especially on wide area networks - and use that as a jumping off point for finding a faster method.

What are Aggregates

An aggregate is a collection of items gathered together to form a total quantity. In FileMaker terms, the collection of items are records, and functions can be applied to the quantity to obtain useful information such as the sum, count, average, minimum and maximum to name a few.

Aggregates generally come in two varieties. First are summary fields. These are commonly used on sub-summary reports and are placed in the various layout parts to summarize records in parts of the report. Summary fields can also be used to summarize record values through a relationship/portal.

fastaggregates1 1

The second type are in the form of functions that perform the aggregate operation on its passed parameters. You can pass them a specified set of values as parameters, but a more common usage is to pass them a related field using a relationship. This performs the aggregate on the field value from all related records, returning you the summarized result. Examples of these functions are list, sum, average, min, max, and a couple of other less commonly used ones.

Sum ( Home to Contacts for All Contacts::Weight )

If you do enough development in FileMaker you will undoubtedly have come across both summary fields and aggregate functions and so are familiar with how they work.

So what's the problem with them?

Both summary fields, and aggregate functions are hugely useful and beneficial features of FileMaker, and run great on local databases. However if you have ever tried to perform an evaluation of a summary field or aggregate function over a local or wide area network, you may have seen the following dialog from time to time:

fastaggregates1 2

A real favorite among developers and clients. Why do we see this dialog more often than not on a remotely hosted solution, and why does it sometimes take longer to process than other times? The answer lies in what is actually happening when a summary field or aggregate function is evaluated.

The caveat here is that the entire set of records being aggregated is being sent from host to client (with the exception of container fields, summary fields and unstored calculations).

To put it in other terms, if you want to find an aggregate of a single field in any given table, then every single record you aggregate is downloaded from the host. You want to obtain the total of a field from 10,000 records, then hope you got some hard disk space handy cos they're all heading your way! The second you reference anything on any record, that entire record is sent to you.

It can be summarized (ha) in single sentence that Mark Richman gave in his Devcon session this year about improving WAN performance:

If you touch it, you own it

I can only assume that the reason for this is due to the fact all of the processing in determining the result happens on the client-side of things, and FileMaker doesn't exactly do partial transfer of records, it's all or nothing.

The implication this has on Aggregate Speed

So, now that we have established aggregate operations require the transfer of entire record sets from host to client, we can begin to identify some key factors in their performance:

  • Bandwidth speed, notably download speed
  • Latency between host and client
  • The number of records being aggregated
  • The size of the records being aggregated

All of these things serve to impede performance of aggregate functions. This article in particular will focus on the issue around entire records having to be sent from host to client. The performance hit is generally much worse the first time an aggregate is evaluated as it is at this point the records are sent. After that, aggregate functions are able to make use of the cached records on the client, so generally perform much better after that. However once you close the database and reopen, all of that cached data is lost, and you must re-download the records all over again to obtain aggregates.

Where oh where to begin??

This challenge has been somewhat of an obsession of mine for the past month. There have been numerous failed attempts to crack this issue and find a technique that would allow faster aggregate speed. While the technique proposed is by no means perfect, think of it more as an exercise in "can it be done", as we walk through the trial an error process of trying to find a faster method of doing aggregates.

I should point out that the technique found, and explained in this article is by no means perfect, in fact it does have some limitations which make it not a good candidate for replacing aggregate functions in some situations. However there are other situations where it would be well suited, and greatly outperform its summary field or aggregate function counterparts. Hopefully the remainder of this article not only gives you a possible technique you could apply, but also gives you some ideas of your own which you could develop, and explain some of the trial and error process behind looking for a technique :-)

So if everything we do in FileMaker sends us records, is there anything that doesn't?

This was the single most important question to answer to begin looking for a solution. The main bottleneck in aggregates is the downloading of records being aggregated. If we could somehow eliminate this aspect, it would give us a good chance of greatly improving aggregate speed.

There aren't a whole lot of things in FileMaker that only send us partial information about records, a couple of those identified were:

  • Certain functions, ie Get(RecordID), or Get(FoundCount)
  • Certain value lists

Get(FoundCount) was used in this article to carry out very fast count aggregate operations, but unfortunately it doesn't have any use for the other aggregate types. Get(RecordID) is handy to know about insomuch as obtaining it for a record will not send you the entire record. Perhaps not useful for aggregates, but there is potential there, and we are currently working with it to look for useful applications in the areas of fast record sorting.

The last item is value lists.

A Value List you say?

Value lists can be setup in a few various ways, such as custom values, using a list from another file, obtaining a field value either from all records in a table, or a subset via a relationship.

In testing all of these possible value list combinations, we found that there is one specific setup of a value list that outperforms the rest in terms of traffic sent between host and client, and that is a field-based value list using all values, in other words all records in the table.

fastaggregates1 3

When the above configuration of a value list is used, record data from all records is not sent from host to client, instead, it appears that a form of the fields index is sent instead, or perhaps just the field values for that specific field.

We are not entirely sure why, but hypothesize that the reason for this is no query is required from the client to obtain the value list from the host. FileMaker simply sees that the value list is for every record in the table, and decides it might just be faster to send the index for that field to the user rather than every single record (hopefully someone could explain exactly what is happening here).

In any event, the host is not sent entire record data, which is VERY interesting for us.

Okay, so now what?

Let's assume for now that our end goal is to obtain the sum of a field called Weight across every record in the table. If we take what we know now, the next logical step may be to build a value list of all values from the Weight field. We could then obtain the value list contents using the ValueListItems function. With a list of the weights from every record obtained, we could then figure out a nice way to sum each value in that list to obtain the overall sum, sounds simple enough :-)

Another road block, value lists don't play nice

The primary purpose of a value list is a means to assist in selecting data for data entry, perhaps via a drop-down menu or popup, or checkboxes. Because of this, there is no real need to duplicate values in a value list. If two records contain the same value in a field, then both values are not shown in the list. The value list simply combines them into a single entry.

This had major implications for us in devising this technique. If we have 10,000 records in our table, then we want our value list to contain values for every single record with a non-empty value in the field - duplicates included. It is no use to us if 100 records with the same weight value end up as a single value in the value list, because when we go to add up all items in the value list, we are not going to get the correct result.

The challenge now was to somehow modify the value list, so that we would obtain all values from the records, rather than just unique values.

fastaggregates1 4

Hey what about the above setup of a value list? We keep the first field as the field we want to aggregate, but we make use of the "show second field" and set it to our primary key field that we know to be unique. Surely this will now force the value list to keep same values in the Weight field as separate entries in the value list because we know each value in Contact ID is unique, right?

WRONG!

As we discovered, as soon as you introduce a second display field into any value list, be it "All Values" setup, or related field setup, this will trigger FileMakers knee-jerk reaction of sending you every single record in its entirety.

Back to a single field in the Value list, how to make it unique?

So with that option dashed, we were left with the question of how to build this value list using only a single field value. We know that the field must contain the weight field as this is what we are going to use to aggregate. However records with the same value have to be kept unique from one another.

The solution here was to build a new stored calculation field. This would comprise of both the weight and the contact ID. If we could make this field unique, and use it in the value list, then maybe there is a way at the other end to parse the contact ID out and leave us with just the weight values, which is what we are after.

Building the Value List calculation field

Ahh, another problem! The weight field is a number, but in our database our primary key field is also a number - a serial number. If we combine these two fields, how will we know what is what when it comes time to parse out the contact ID later on?

We looked at a number of ways in which we could possibly do this. Some involved building recursive custom functions. We could insert the Contact ID into the calculation wrapped in special identifying characters, which we could then identify later on and use to rip out the Contact ID. The problem with this is that recursion is itself a costly process and the overhead in recursing through a huge list may in fact negate any performance benefit from the technique overall. A better way would be to find a native FileMaker function to use which was non-recursive and fast.

For this particular situation we realised that the weight field will never contain any non-numerical digits, and the same applied for the Contact ID. Perhaps we could simply turn the contact ID into an alphabetical string rather than numerical.

For example, if we had a Weight of 100.5, and our contact ID was 1234, our calculation might begin looking something like:

Weight & Contact ID

which would give us:

100.51234

Not very nice, but if we did a simple number to character translation, what if we made it:

100.5BCDE

and then after we obtain the value list items, we could strip out the Contact ID section simply by using the Filter function:

Filter ( value ; "0123456789.-¶" )

This would leave us with just the numerical component, minus our contact ID whose only purpose was to make the entry in the value list unique. This is exactly what we did.

The first step was to build a straightforward custom function to carry out the translation of the Contact ID into an alphabetical string:

function: numToText

Substitute (
Value ;
[ "0" ; "A" ] ;
[ "1" ; "B" ] ;
[ "2" ; "C" ] ;
[ "3" ; "D" ] ;
[ "4" ; "E" ] ;
[ "5" ; "F" ] ;
[ "6" ; "G" ] ;
[ "7" ; "H" ] ;
[ "8" ; "I" ] ;
[ "9" ; "J" ]
)

Then, our value list calculation field looks something like:

Weight & numToText ( Contact ID )

We can now be sure that every record, given it has a unique Contact ID, will now also have a unique result to the above calculation too. This field provided the basis of our single-field "all values" value list!

fastaggregates1 5

fastaggregates1 6

Note that if your primary key is a combination of numeric and alpha characters you will need to pay more attention to converting the primary key into something non-numeric. You may need to translate your numeric components of the primary key into characters you know do not currently exist anywhere in your primary keys, that way you can be sure that any conversion from the numbers to characters will not end up in records having the same converted primary key value for this calculation.

A good time to summarize what we have done so far

This basically concludes the first part of this two-part series. In this article so far we have identified that the reason why aggregates are slow in FileMaker comes down to a number of factors, the main one being that any aggregate operation on a set of records requires all records to be sent from host to client. Number of records, size of records, bandwidth and latency all tie into this to impede performance.

In the article we sought to identify something in FileMaker which will let us obtain values from records we wish to aggregate, but without sending us the entire record contents. We found that value lists were a key candidate for this.

The value lists we tested yielded one particular configuration in which a much smaller amount of data was transmitted between host and client when the value list was evaluated. This was value lists that use a field value from a table, and are set to use "All Values" from that table.

We identified an issue with this particular value list configuration in that all common values across records are combined into a single value in the value list. Because our goal is to aggregate from all records, we needed each record in the table to be represented in the value list with its weight value, respecting duplicates in the list.

We then tried to achieve this using a value lists "show value from second field" option, but found that enabling this meant FileMaker transmitted entire record data, something we are trying to avoid.

The only option left to us was to build a calculation that was unique for every record, but contained the weight value we required. We needed to construct the calculation not only to be unique, but to be fast and easy for us to obtain just the weight value after the value list was evaluated.

To do this we built the calculation to be a combination of the weight value, and the records primary key field. To prevent mixup of the two fields, we first converted the primary key from a numerical value into a strictly alphabetical text string. This then allowed us later on to use the Filter function to easily strip out any non-numeric characters - such as the primary key - leaving us with the original weight values.

In part two:

This article turned out to be a bit too large to cram into a single article, so we have split it into two. This first part was identifying the issue, and constructing a path toward a solution.

In part two, we are going to illustrate the technique in action, introduce and solve a couple more issues, and give some performance results across both a locally hosted solution and a wide area network.

Link to Part Two - Implementation & Results

Example File

Please find attached an example file. This file was used for all the screenshots in this article, and is provided to help you fully understand what is going on in this article, and to let you experiment in FileMaker with this solution.

Download Example File With 10,000 Records (3.5mb)

Download Example File With 100,000 Records (36.4mb)

Something to say? Post a comment...

Comments

  • Nate 05/01/2013 7:58pm (11 years ago)

    Why not use Filemaker Server Advanced XML Grammar Set and send a url requesting the totals? It always takes no time at all, and alleviates the client device from processing the data, so it works fast even on an FMGO.

    I use php in FM12 Server advanced to do this. It works out okay.

  • Brad Stanford 12/10/2011 9:29am (12 years ago)

    At MightyData, we were talking about fast WAN performance, and I too, was looking for a way around the record-centric nature of FileMaker. My solution was to use Dropbox to communicate business logic (calling scripts) and to return data. This technique would be perfect for grabbing aggregates: http://tranquilitybasics.wordpress.com/2011/10/08/filemaker-pro-runtime-thin-client-using-dropbox/

  • HOnza 22/08/2011 10:35pm (13 years ago)

    Daniel, your research is deep as always. Great resource!

  • Daniel Wood 16/08/2011 8:51am (13 years ago)

    Interesting idea. I guess it would be possible only if all the clients kept the database open overnight. Then a script would have to be triggered on the client machines overnight which goes to each data-table layout and grabs all the records, perhaps by doing a sort or something. Of course inevitably what will happen is the user will close the database at some stage and ruin all that hard work :)

  • Doug Gardner 16/08/2011 8:47am (13 years ago)

    Hmmm... GetAsNumber ( "100.5BCDE" ) = 100.5, and of course GetAsNumber ("100.5BCDE¶") removes the return character. I was thinking of stripping out the junk after getting the value, but you're right that it won't work as a single step for getting a return delimited list. Anyway, the real bummer, as you point out in the article, is that we're heading toward a solution that works on the entire data set, but not a subset, so the rest is like rearranging the deck chairs on the Titanic. Since FileMaker returns really fast results once the data is local to the machine, I wonder whether it might be worth focussing on "pre-caching" by working with a local data set. If your reports didn't need to include the very latest data, could you move data down to the client during a non-busy time (like at night, or before work in the morning), then run reports against that data? It's a totally situation-dependent solution, but sometimes that's a lot easier way to get a workable result than coming up with a universal solution.

  • Daniel Wood 16/08/2011 8:21am (13 years ago)

    Hi Doug, thank you for the comment. I just ran a quick test using GetAsNumber, but found that applying it to the return-delimited list actually removes the return delimiters too! So a list of 123 and 234 would yield 123234. It's a good thought though, were you thinking of using the function at some other point of the process perhaps? Cheers!

  • Doug Gardner 16/08/2011 3:46am (13 years ago)

    Good article, Daniel. I think you could make it slightly faster using GetAsNumber on the resulting number-letter string, rather than the Filter function. The real speed boost, of course, comes from moving less data across the network. Thanks for sharing your results!

RSS feed for comments on this page | RSS feed for all comments

Categories(show all)

Subscribe

Tags