Mikael Eliasson - The great optimization, part 1

I'll describe how I fixed an insanely slow algorithm. In this part I'll cover the analysis phase.

First of all: I wrote the original algorithm so when I complain how about how stupid it was I’m only blaming myself. It was a mistake from my part and now I finally got to fix it. I’m a bit ashamed by the whole thing but at least I have learnt a few lessons, which I will come back to later.

This post got quite long. If you are not interested in the background of the problem there is more tips in the end of the post. And part 2 will contains much more technical information. [Skip the background](#tldr)

I can’t share much code here because it’s not open software and I’m not the owner. I will try to provide the most relevant examples and some diagrams explaining the problem though.

The application

You can find the application at [http://eventor.orientering.se/Sverigelistan](http://eventor.orientering.se/Sverigelistan). It’s a ranking list for all orienteers in Sweden. Don’t know what orienteering is? Here is a 3 minutes video from one of the best runners in the world

The ranking list is part of a larger IT system and integrated as a module which means there are some constraints to what I can access and how I access it. Actually it’s three different systems.

- My part, Sverigelistan (sverige = sweden and listan = the list)

- The rest of the website showing all Swedish competitions and intefaces managing entries for these, called Eventor.

- A backend java server which stores all the competition data, called OLA.

Eventor and Sverigelistan can(shall) only retrieve these competitions by using a REST api against the server. Of course we can request new API methods but generally we better stick with what we got.

Your ranking is the average of your 6 best scores from the last year. There are also special lists for all disciplines and for each discipline (I call these modules) the lists are divided by age to match our class system. There are 6 modules and each module has about 30 lists. In total there are around 30 000 runners in the database which means and each person exists twice in each module so we have around 360 000 list items. What’s worse from a performance standpoint is that there are 600 000 results and that is increasing with about 250 000 each year.

The problematic algorithm

At the core of Sverigelistan is a nightly recalculation of the scores and generating ranking lists for them. The process is fairly complex, you can see and overview in the diagrams below.

[![My flow](https://zar4rg.blu.livefilestore.com/y1pdaj5iVUT-AvtoK_8fW8uRoeQp8IhSKZoDw5EkqpJ3lOOtqj_7h9Ukejdf9qnfUXHdgiaY0nMSJ78pHbXDrzvxtQQ_YN9tu-j/oldflow.png?psid=1)](https://zar4rg.blu.livefilestore.com/y1pdaj5iVUT-AvtoK_8fW8uRoeQp8IhSKZoDw5EkqpJ3lOOtqj_7h9Ukejdf9qnfUXHdgiaY0nMSJ78pHbXDrzvxtQQ_YN9tu-j/oldflow.png?psid=1)

**The observant reader notices that there are no save here**! Actually that’s because when I was optimizing I realized I have no clue where it saves. I used NHibernate and to be honest I never really managed to fully understand NHibernate the way I understand EF. Had this project used mssql I would probably have been able to figure it out using Sql Profiler which I love. Now it uses mysql. I’m sure mysql has great tools too. I just haven't found any for profiling.

The problems

The performance sucked! Having to wait 20 min to test a single run is really hurting productivity. Because it only runs once a day it didn’t seem to matter much first and in production it doesn’t. But the cost of this problem is huge considering the time it steals from me during development. I can’t even count the times I screamed in agony when there is an exception 10 minutes into to algorithm and I didn’t run the debugger. Rebuild with debugger, wait 10 minutes without completely losing focus and then try to find the problem. Fix the problem. Build. Wait 10 minutes to see if the fix worked..

The other problem was that even though the algorithm was supposed to add missing persons to the database it didn’t. At least it didn’t on the server. While running on my computer with the exact same databases it did save these persons. There were other things that didn’t update properly as well. This is what actually made me rebuild the algorithm. There were bugs there I just didn’t have the knowledge to fix. I knew the code to add these persons ran but it just wasn’t saved to the database. And because there were no explicit save points I couldn’t debug it.

The first steps : Measure, measure, measure

**Note: In this case I’m only interested in execution time. In many cases memory and clock cycles are almost as interesting. For example if you can half the execution time of an algorithm by running it on two cores you have probably used up more clock cycles and if it is on a server you might hurt scalability.**

How to measure? There are many ways. Personally I’m a strong proponent of MiniProfiler by StackExchange when doing web work. The first thing I do when I start a project is to add that. It’s much easier to continuously monitor performance degradation too see when you cause a problem than it is to backtrack later on.

[![](https://zar4rg.blu.livefilestore.com/y1pFVBDBaWKpTRM2VkqR8C3avx32eTp38jRT088sr2Zo0OwoB_z7EQEKqcP8z_pWyxRPCKoSUIwTFerkFw701mq8R26mMmMXPkA/MiniProfiler.JPG?psid=1)](https://zar4rg.blu.livefilestore.com/y1pFVBDBaWKpTRM2VkqR8C3avx32eTp38jRT088sr2Zo0OwoB_z7EQEKqcP8z_pWyxRPCKoSUIwTFerkFw701mq8R26mMmMXPkA/MiniProfiler.JPG?psid=1)

*MiniProfiler running live on **http://opath.se *

There are also profilers that can measure every call in your code automatically but I find that these sometimes aren’t very accurate because they fail to compensate for their own overhead. Also they are looking at too much detail. I’m more interested in the overall problems.

In this case I couldn’t use MiniProfiler so I decided to build a dead simple variant for myself with similar functionality. NOTE: My solution isn’t threadsafe!! You can’t use it for websites normally but it will only profile a very specific piece of code in my case. And that code is never run in parallel.

See the code for this simple profiler at gist: [https://gist.github.com/3438763](https://gist.github.com/3438763)

My measurements: (First col = time from start, 3^rd col = duration of that part) All times in ms

680	Get persons	1193
1873	Get tickets	724
2598	UpdateEvents	127149	127 s
129747	Reload persons	5552
135301	Get events to recalculate	5256
140558	Calculate scores	262414	262 s
403033	Run module:GeneralRankingList	54268
457301	Run module:SprintList	115875	115 s
573176	Run module:MiddleList	97699
670875	Run module:LongList	102987
773862	Run module:NightList	93776
867639	Run module:FilterList	98987
991261	Delete unused persons	0

Total time: 991s or 16min 31 s

Before I look at the measurement I usually make a quick estimation of which parts that will take the longest and what’s a reasonable time they should take. I find this highly valuable because it allows me to get a sense of what each parts needs to do. When I later compare my estimation with the measured results I can start by focusing on the inconsistencies between the two. For example in this case “Calculate Scores” is computational heavy but .net can crunch numbers really fast so 262s seems very odd. I would have guessed less than 30s.

Most likely there is something odd going on in that part of the code that wasn’t my intention. It could be lazy loading, a query returning too much, using the wrong data structures or maybe working with an Enumerable that gets reevaluated. There are plenty of possible pitfalls. I will later taka out how I go about finding out which problem(s) I have.

The suspects and how we can nail them

There are plenty of other problems than what I will go into here but these are the three problems I find most common.

N + 1 problems

This one is easy if you have the correct tools. Sql Profiler, NHibernate Profiler, EF profiler and event MiniProfiler will all show you this right away, BUT only if you use them. Sometimes we can’t use them and then N + 1 problem from lazy loading can be fairly tricky to find. One trick is to kill the connection to the database when you have loaded all the stuff you think you need. Run with the debugger and you will see an exception thrown where the lazy loading occurs.

Using the wrong data structures

Using the wrong data structure usually causes O(higer than it should be) problems. The fast way to check for this is to change the problem space. For example if I would half my input and I would see that some of the times dropped to ¼ or maybe 1/8 of the original times (not growing linear to the problem size) it would be fairly likely that I have this problem in at least some of these parts. Then I would go about to profile them more granularly until I pinpoint the problem.

Reevaluation of IEnumerable

This is a tricky one. Basically the problem occurs when a method returns IEnumerable . The execution of an IEnumerable is deferred so you can add more .Where, .Select etc etc to it. Think of the IEnumerable as specification of what’s going to be done being passed around. The IEnumerable is executed when we use it in a foreach or if the call .First(), .ToList(), .Count() or similar methods.

Code like this is the problem:

var persons = GetAllPersons(); //Returns IEnumerable var women = persons.Where(p => p.Gender == Gender.Female).ToList(); var men = persons.Where(p => p.Gender == Gender.Male).ToList();

This will actually call GetAllPersons **twice**!! If GetAllPersons is expensive this is a real problem. It can be even worse though:</div>

var persons = GetAllPersons(); //Returns IEnumerable foreach (var result in results){ var p = persons.First(p => p.Id == result.PersonId); }

If GetAllPersons call the database this is easy to find by using a profiler that can show the sql calls. Otherwise I don’t have a good way to find these. It helps to be aware though and if you suspect this problem you could place a breakpoint inside GetAllPersons to see if it gets hit often. Or just call ToList or something similar on it and see if the code runs faster. For example to fix the last example we would write.

var persons = GetAllPersons().ToList(); //Returns IEnumerable foreach (var result in results){ var p = persons.First(p => p.Id == result.PersonId); }

So what was wrong in my code?

Short answer: Everything! (Beside those IEnumerables which I fixed long ago)

Long answer is that I immediately suspected a lazy loading problem when I saw Get persons only taking around 1s according to my measurements. The code was:

///

/// Gets all persons. For every person that doesn't exist they are created. Flushes the session to make sure all new persons are inserted. /// The persons eagerly load the following path "Results.RankingEventClass.RankingEvent" ///

/// public virtual IEnumerable GetAllPersonsEagerlyLoaded() { return this.personRepository.GetQuery("Results.RankingEventClass.RankingEvent").ToList().Distinct();

So I thought I told it to eager load all results for all persons with the class and the event that result belonged to. But the measurements shows me there is no way this is what happend. This tweet sums it up why 1 second isn’t realistic

When your ORM loads 27k entities joined three levels deep against 600k, 50k and 2k entities in 4s you should suspect cheating! #lazyloadwoes
  — Mikael Eliasson (@MikaelEliasson) [August 17, 2012](https://twitter.com/MikaelEliasson/status/236452983116816385)

So the conclusion was that my Eager loading didn’t work as I expected which probably meant I was accessing lazy loaded collections all through out this algorithm.

This made me realize that my diagram above was a faulty view of my algorithm. Here is an updated view:

[![](https://u7s3kq.blu.livefilestore.com/y1pdaj5iVUT-Av7IUcICq438vwye_9bmC0v-Kt40KRqObRZ9beV5Sogz1XGj9vyGeUMgrUe7aj3Hp8vgzkgamywxNZ_zsFgBFmk/oldflow_annotaded.png?psid=1)](https://u7s3kq.blu.livefilestore.com/y1pdaj5iVUT-Av7IUcICq438vwye_9bmC0v-Kt40KRqObRZ9beV5Sogz1XGj9vyGeUMgrUe7aj3Hp8vgzkgamywxNZ_zsFgBFmk/oldflow_annotaded.png?psid=1)

I can summarize it in one sentence too. **I have no clue whatsoever when my code is hitting the database and what calls it makes!**

This is priority one to fix! I knew saving stuff would be slow because NHibernate wouldn’t batch the calls but rather make an insane amount of database calls but obviously it’s also making an insane amount of calls just to read the data.

Because this post is already longer than I think most people will care to read I’m going to write about how I fixed these problems in another post coming in a few days. If you have any questions or want me to explain something more carefully, please make a comment and I will either respond in the comments or include it in the next post.

Lessons from this part

Measure right away and make sure you have the tools!

I would never have messed up this bad with my regular stack where I have tools that reveals these problems. In my case it’s MiniProfiler and Sql Profiler.

Don’t assume the code actually does what you think!

At least not until you have verified it. This mainly applies when you are using other frameworks out of your control. It would have been dead easy for me to actually verify that NHibernate would eager load that query. But I didn’t. Unless you are much smarter than I am you better keep in mind that things might not always work as expected.

When the complexity of the algorithm grows favor simplicity and predictability

Actually simplicity is always good but even more so when the code gets hard to overview. In unit tests there is the pattern Arrange -> Act -> Assert. Almost the same pattern is pretty much what simple code should do: Load data -> Process data -> Save or return the processed data.

If we can make the code follow that pattern it’s much clearer where we should look for problems.

Lazy Loading is hell and worse!

There is no reason whatsoever that I would want to Lazy load data. And this really comes back predictability. I want to know when my code is going to hit the database. In EF I always disable lazy loading and explicitly eager load the data I want.