Optimizing Spotfire Over a Distributed Network

By May 3, 2016Analytics, Spotfire

I’ve been working on a project to create an analytical tool using Spotfire. My IT colleagues did a nice job creating a data model for me, and we worked through the bugs on loading the data into SQL Server. So the next step was to create the visualizations in Spotfire.

To my dismay I discovered load times of 30+ minutes for my 11 million row (and growing) dataset. That’s a decent size data set but it isn’t big by Big Data standards. Spotfire should eat that stuff up. Running an identical query closer to the metal in SQL Server Management Studio ran in about 4 minutes. Something was wrong. Here’s what I learned:

  • My support team suggested Data-On-Demand. This reduced load time but because I need around 3-4 million rows minimum I was still looking at 20 minute load times. Not good enough.
  • Next I tried switching from an in-memory Information Link to a in-database Data Connection. In the Spotfire Windows client this worked great, but there are catches. First, you lose Spotfire features. This seems fine until you try that one bloody thing that always worked for you and you can no longer do it. Annoying. The second and bigger catch was that I need my app to work in the Spotfire WebPlayer and the in-database links did not play nicely with our security system. We eventually found a solution to make this happen but if you work in an environment with certain security standards, what seems simple may not be.
  • While waiting for IT to investigate how to make in-database connections work with WebPlayer, I tried to optimize my Information Links. There isn’t a ton of configuration in the Information Link itself, but I discovered that my database and Spotfire server were in different cities and that led me to explore the configuration of the underlying DBConnection. I’m no DBA, so I used the Google and concluded that Fetch Size might be a critical parameter for me. I decided to experiment to see what would happen.
Fetch Size  Rows Retrieved  Time Elapsed

0 (let database decide)

3,780,604

32:05

10,000 (Spotfire default)

3,780,604

11:39

100,000

3,780,604

2:59

1,000,000

3,780,604

2:02

2,000,000

3,780,604

1:55

3,000,000

3,780,604

1:54

1,000,000

11,986,073

5:45

2,000,000

11,986,073

5:52

3,000,000

11,986,073

5:51

11,000,000

11,986,073

5:47

A pretty amazing difference. It’s not clear to me how effective this is if your Spotfire server and database are co-located but it seems to make a big difference when you have some distance between sites. I settled on a Fetch Size of 1,000,000 vs. the 10,000 default value.

  • Finally, after reading an article from Bear on Spotfire I thought I could reduce the amount of data I transmitted across the wire. Instead of one Information Link that joined together four tables into one big wide table, I decided that I could make it work with three separate Information Links sending me tall/skinny tables and then manipulating client-side instead of on the server. With this I was able to cut load times in half again, which meant that for a typical 3-4 MM row load, load times were approximately 1 minute.

The joys of business technology.