While finalizing the v1 of the news reader for WP7 I’m working on, I noticed that sometimes the text in the WebBrowser control contained some little squares. It was not an encoding problem because accented characters were displayed correctly.
“Smart Quotes” are the problem
After a while I understood that the squares were displayed when there was a “smart quote” (yes, that ones) that was not correctly escaped in the post (as in not using the correct HTML entity). And this is more common than you thing, because most Windows applications, like WLW and Word, automatically change the plain quote to the much nicer smart quotes.
Not really sure why this happens: if I use to browser to go to the original page the smart quotes are displayed correctly.
Looking around on the Internet I found that it is a problem with the “Silverlight” web browser, and can be solved by using the NavigateToStream after converting the string to an array of bytes, but unfortunately NavigateToStream is not available on WP7.
HTML Entities to the rescue
The trick is to convert these weird characters in their corresponding HTML entities before calling the NavigateToString method.
string displayString = GetMyString(); displayString = displayString.Replace((char)8216, '‘'); displayString = displayString.Replace((char)8217, '’'); displayString = displayString.Replace((char)8220, '“'); displayString = displayString.Replace((char)8221, '”'); web.NavigateToString(displayString);
I haven’t tried with Eastern European languages or Oriental languages yet. But I hope these will be handled correctly by the WebBrowser. Otherwise I’d probably have to convert all non-ASCII characters in their HTML entities, just to play safe, like this:
foreach (char value in strText) { int decValue = int.Parse(string.Format("{0:x4}", (int)value), System.Globalization.NumberStyles.HexNumber); txtUnicode = txtUnicode + "&#" + decValue+ ";"; }
Is there a better why to do it?
This seems like yet another encoding bug in WP7.
I hope a future version of the NavigateToString method will handle those characters correctly. And in the meanwhile if you know of a better workaround, please post it in the comments.