A few months I described how I handled the encoding bug of the WebBrowser control in WP7. Unfortunately it was not just a problem with the “smart quotes” but also with all non western characters, and the naïve solution to the problem caused a performance problem. Let’s see how I solved it.

The problem

I needed a more general purpose character replacing procedure, like the second one I mentioned in the previous post:

foreach (char value in strText)
{
    int decValue = int.Parse(string.Format("{0:x4}", (int)value), 
        System.Globalization.NumberStyles.HexNumber);
    txtUnicode = txtUnicode + "&#" + decValue+ ";";
}

But this had two problems (one “cosmetic” and the other more serious about performances):

  1. if the post is in a western language, it will replace a lot of characters that wouldn’t need to be replaced
  2. it is concatenating stings, and since a post can be also quite big, this has a huge performance cost. I learned this the hard way by looking at how long FeedTso was taking to display long posts (a version of FeedTso that solves this problem has been submitted and will soon be available on the MarketPlace)

The solution

So I created an optimized version that just replaces characters that cannot be displayed correctly by the WebBrowser control, and that uses a StringBuilder instead of direct concatenation of strings.

public static string EncodeToUnicodeBuilder(this string strText)
{
    string chararray =
        " `1234567890-=qwertyuiop[]\\asdfghjkl;'"+
        "zxcvbnm,./~!@#$%^&*()_+QWERTYUIO"+
        "P{}|ASDFGHJKL:\"ZXCVBNM<>?";
    StringBuilder builder = new StringBuilder();
    foreach (char value in strText)
    {
        if (chararray.IndexOf(value) >= 0)
        {
            builder.Append(value);
        }
        else
        {
            int decValue = int.Parse(string.Format("{0:x4}", (int)value),
                                        System.Globalization.NumberStyles.HexNumber);
            builder.Append("&#" + decValue.ToString() + ";");
        }
    }
    return builder.ToString();
}

The performance improvements

The speed improvement is impressive: in a standard English post of around 6k (which probably required very few replacements, if no replacement at all) I got a speed improvement of 120 times.

And with a text of a similar length, but that needs a lot more of replacements (a Japanese post) the improvement is almost 20 times.

As you can see the Japanese text needs more replacements but still filtering the ASCII characters out gives a 1,3x speed improvement, but after adding the StringBuilder the overall gain is still 19x.

Testing on a WP7

These tests were run on a console application running on my developer machine. I also ran the same tests both on the emulator and on a real device (a Samsung Omnia 7).

As you could have guessed already, running in an emulator is slower than the console application and the real device is even slower. And the results are even more impressive: replacing just the characters needed and using a StringBuilder gave a 130x performance improvement.

The figures

And here are the results, of all the tests using the naïve version (replace everything using concatenation) as base result:

  Replace all Replace just non ASCII Non ASCII + StringBuilder
Console App (EN) 56ms 6x (9,21ms) 121x (0,46ms)
Console App (JP) 45ms 1,3x (34ms) 19x (2,33ms)
Emulator (EN) 267ms 5,8x (46ms) 73x (3,7ms)
Emulator (JP) 215ms 1,4x (154ms) 15x (14ms)
Device (EN) 1,45 sec 5x (275ms) 131x (11,1ms)
Device (JP) 1,22 sec 1,2x (1,02sec) 17x (69,4ms)

Here you can download the test application (both console and WP7 versions).

What did I learn from this?

This is yet another proof that you shall never use plain string concatenation when you have to do more than a few concatenations. Always use the StringBuilder. This is even more important in a WP7 device, which is 25 times slower than a console application, and the non optimized version takes more than a second instead of 1/100th of a second.