Monday, May 20, 2013

Quick Tip: Processing HTML Content in Windows Store Apps

The WebView control allows you to display content from sites in your app using a small window that renders the HTML using the same rendering engine as Internet Explorer. It does have some limitations and most likely if you are providing content in your app, your goal is to augment your app with fresh data rather than try to superimpose a full-blown web application on your own native Windows Store app.

Trying to strip down content can be quite cumbersome once you wade through the myriad RegEx expressions or other utilities available. Here’s a simple trick that will work with most content-oriented sites like blogs and online magazines. It allows you to get a more basic view of the content and present it without all of the bells and whistles you may end up pulling down with regular content.

Step 1: Be Mobile

When you are loading content, make the server believe you are mobile. This will often result in a simpler page being presented, often without the heavy script tags or headers, table of contents, and other sections included. Here’s a nice mobile user agent string that will impersonate one of the most popular mobile clients out there, an iPhone:

private const string MobileUserAgent = "Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1A543a Safari/419.3";

With the mobile header you can override the user agent and request the content. Instead of having the WebView control load the page directly, use the HttpClient and grab the content like this:

var handler = new HttpClientHandler { AllowAutoRedirect = true };

var
client = new HttpClient(handler);
client.DefaultRequestHeaders.Add("user-agent", MobileUserAgent);
var
response = await client.GetAsync(new Uri(InsertSuperAwesomeUrlHere));
response.EnsureSuccessStatusCode();
var
html = await response.Content.ReadAsStringAsync();

Allowing redirects is important because many pages will auto-detect your mobile client and instead of dynamically serving content will redirect you to a new page for the mobile content. This allows the client to follow the redirect and then pull down the content that is tailored to mobile devices. At this stage you may think you have what you need, but if you pass this to the WebView control you’ll find there is often script content trying to make updates that will throw exceptions and generally cause the WebView to choke. Next you must cleanse the data.

Step 2: Cleanse the Content

Cleansing the content may be easier than you think. It turns out that you need to have clean HTML to share HTML content with other Windows Store apps. In fact, the sharing mechanism provides a nice HtmlFormatHelper class (located in the Windows.ApplicationModel.DataTransfer namespace) designed to package your HTML so it is ready for sharing. It also contains a useful GetStaticFragment method that will strip out the dynamic code so you have nice, clean content. The trick is to prep the HTML as if you were going to share it, then get the static fragment so you have the raw text you can load.

var fragment = HtmlFormatHelper.GetStaticFragment(HtmlFormatHelper.CreateHtmlFormat(html));

Now you are ready to show it to the user.

Step 3: Show it to the User

Now that you have clean HTML, you can ask the WebView control to navigate to the string like this:

WebViewControl.NavigateToString(fragment);             

That’s all there is to it, and you should get a nice, clean page of data without dynamic tags.