Eclipse mde: October 2011

During the sponsored work on EMF Compare, we have taken some time to measure the performance of EMF Compare and think about possible improvement axis. We meant to profile both the time needed to compare two models and the overall memory footprint of a comparison, this has been achieved through the use of the Yourkit Java profiler.

The first measures we took highlighted such a huge kludge that we took the time to better the comparison algorithms before taking any further action. Through the use of google's Guava and some rethinking, we improved the comparison time on big model comparison a great deal, dividing the total comparison time in half!

Here is a sample of the time it takes now to compare your models with the latest builds of EMF Compare :

Structure of the sample models used in these tests ("fragments" are the number of fragmented files, the rest are UML model elements contained by the samples) :


	Small	Nominal	Large
Fragments	99	399	947
Packages	97	389	880
Classes	140	578	2169
Primitive Types	581	5370	17152
Data Types	599	5781	18637
State Machines	55	209	1311
States	202	765	10156
Dependencies	235	2522	8681
Transitions	798	3106	49805
Operations	1183	5903	46029

Time and memory used required to compare each of the model sizes (the model is copied, randomly modified, then the copy is compared with its original) :


	Small	Nominal	Large
Time (seconds)	6	22	125
maximum Heap (Mo)	555	1019	2100

We initially thought that the time was mostly spent matching the elements together. It turns out that the real bottleneck of a comparison is the differencing phase (we know the two elements "match", now we need to know if, and how they differ). Just goes to show you that a profiler is mandatory in order to really know why your product might be slow ;).

Future development

These profiling sessions made clear the problems of EMF Compare on large models. We can properly handle "medium" models : EMF Compare is fast and its GUI can react at reasonable speed. However we cannot scale this behavior to large models : memory management is somewhat inefficient (more than 2Go of heap space to compare two models that both "weigh" 50Mo on disk...), the GUI is sluggish... We did isolate a number of potential improvements though. Stay tuned!

Note : the report is online and can be retrieved from here for those interested.

Eclipse mde

Thursday, October 27, 2011

EMF Compare scalability