Comparing Clones Revisited

Published 14 February 07 05:11 PM | andersnoras 

Update: Kjell-Sverre and Petter pointed out that I'd posted the wrong code. Further I was also made aware that Kjell-Sverre had posted an update to his original post. His updated version also accounts for issues with decimals and null strings.

One of my last posts to my previous blog was on using serialization to deep-clone objects. Kjell-Sverre posted a follow up to this on using a hash algorithm to compare two object graphs for equality. We’ve used Kjell-Sverre’s code in our project, but yesterday one of our developers asked if it would be more efficient to perform a bytewise comparison of the serialized streams, rather than using the MD5 cryptography provider. I reckoned that this was plausible, so I wrote a small test to verify the performance of the different approaches. The developer was right, doing a bytewise comparison was faster – but the performance difference depended much on the whether to two object graphs were equal.

Hash, cloned objects:        174809462 ticks, 48835 ms
Hash, different objects:      88705489 ticks, 24781 ms
Bytewise, cloned objects:     93802905 ticks, 26205 ms
Bytewise, different objects:  88564240 ticks, 24741 ms

The results above show the performance of comparing 500,000 complex object graphs for equality. If the two graphs are equal the bytewise comparison is much faster than the hashing algorithm, but the interesting thing is that there is virtually no performance overhead of using the crypto provider when the two graphs are different.
The bytewise equality algorithm is shown below, Kjell-Sverre’s hash-based algorithm can be found here.

if (obj1==null || obj2==null)
{
    return false
;
}
IFormatter
formatter = new BinaryFormatter();
Stream stream1 = new MemoryStream();
Stream stream2 = new MemoryStream();
formatter.Serialize(stream1, obj1);
formatter.Serialize(stream2, obj2);
if (stream1.Length != stream2.Length)
{
    return false;
}

byte[] buffer1 = stream1.GetBuffer();
byte[] buffer2 = stream2.GetBuffer();
for (int i = 0; i < buffer1.Length; i++)
{
    if (buffer1[i] != buffer2[i]) return false;
}
return true;

There were some reactions to my original post on using serialization to clone objects. I am fully aware that is far from the most efficient way of cloning objects – at least if you think about application performance. The technique is useful when you need developer performance – using a generic clone method enables you to clone an object without implementing the ICloneable interface and a custom deep-copy algorithm. If you find that the performance of the generic clone method is insufficient, you can always implement this at a later time.

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(optional)
(required) 
Enter the code you see below