(X) Hide this
    • Login
    • Join
      • Generate New Image
        By clicking 'Register' you accept the terms of use .

Creating the SilverlightShow Windows Phone App: part 2

(4 votes)
Peter Kuhn
>
Peter Kuhn
Joined Jan 05, 2011
Articles:   44
Comments:   29
More Articles
0 comments   /   posted on Apr 11, 2012
Categories:   Windows Phone
Tweet

In the first part of this mini series I was mostly talking about the process of developing the SilverlightShow Windows Phone application, and didn't discuss any of the technical specifics. Today I want to dive into some of the details around downloading and optimizing RSS feeds. I not only want to give you insight into the particular problems we were facing, but also some hopefully useful advice and guidelines should you ever want to develop a similar feature for your own application. If you haven't used or seen the app yet, you can watch a quick intro video here and find it on the Marketplace here.

It's all about content

As described before, the SilverlightShow portal has already established a nice infrastructure for creating, maintaining and accessing its rich content, and that includes multiple RSS feeds which seamlessly cover all the data we need in the mobile app.

So the first step was to prototype an app that could download and display the content provided by these feeds, to evaluate whether going down that road is feasible.

When you search for samples and tutorials on RSS feed reading on Windows Phone, you inevitably will find articles that recommend using the Syndication assembly that comes with Silverlight, which indeed is safe to use on the phone too. I won't post any links here, because I strongly vote against doing this. If you still want to try it out yourself, you will realize that the parser included with this library is really strict, and unfortunately that means it's fighting severe problems with real world RSS feeds. A single unexpected space character somewhere in the data of feed items can result in an exception and the inability to parse the feed at all. As you can see in the old Connect bug I've just linked to, people continue complaining about this and other issues with the Syndication framework even though the bug entry itself is "closed as fixed". 

Don't use System.ServiceModel.Syndication.dll

In an environment where you have full control over the very detail of the feeds you may be able to work around this problem, but most of the time you are not. For example here at SilverlightShow, we weren't able to fix some of the glitches that let the Syndication framework fail, because the feeds are not created manually, but handled by a third-party framework for generating RSS feeds from existing data. So what's the alternative? Fortunately, parsing RSS feeds manually isn't such a big deal, because in the end it all comes down to walking an XML tree and extracting the data you want from its elements. When you do that, you are able to handle the specific problems of the feed you're dealing with yourself, or can create a parse algorithm for e.g. date values (these often are problematic) that is more forgiving and flexible than the built-in parser of the Syndication framework.

If you're looking for a full example, I'm pleased to announce that the official RSS starter kit for Windows Phone switched to a more robust way of parsing the feeds just a few days ago, and can now be recommended as a great starting point for these kinds of applications. It used to use the Syndication framework but Chris Koenig decided to change the implementation to manual processing recently. You can find its web site here, and its GitHub page here. The code for downloading and parsing essentially looks like this:

// retrieve the feed and it's items from the internet
var request = HttpWebRequest.CreateHttp(SelectedFeed.RssUrl) as HttpWebRequest;
request.BeginGetResponse((token) =>
{
  // process the response
  using (var response = request.EndGetResponse(token) as HttpWebResponse)
  using (var stream = response.GetResponseStream())
  using (var reader = XmlReader.Create(stream))
  {
    var doc = XDocument.Load(reader);
   
    // ...
    feed.Description = doc.Root.GetSafeElementString("summary");
 
    // ...
    foreach (var item in doc.Root.Elements(doc.Root.GetDefaultNamespace() + "entry"))
    {
      var newItem = new RssItem()
      {
        Title = item.GetSafeElementString("title"),
        // ...
      }
    }
  }
}

In the SilverlightShow app, we're using a slightly different approach to squeeze out the last bit of performance :). Instead of working with Linq to Xml, the code uses the XmlReader object directly to sequentially extract all the required information from the feed. Like:

private void ReadFeed(XmlReader xmlReader, FeedData result)
{
  try
  {
    xmlReader.ReadStartElement("rss");
    xmlReader.ReadStartElement("channel");
    while (xmlReader.IsStartElement())
    {
      if (xmlReader.IsStartElement("title"))
      {
        result.Title = xmlReader.ReadElementContentAsString();
      }
 
      // ...
    }  
  }   
 
  // ...
}

Working this way, you are flexible enough to e.g. apply a more relaxed date parsing, or process the content in any way you want. You'll also be able to do your own custom error handling, for example to only skip a single item if you're unable to parse it, instead of throwing away the whole feed content when you run into a small error.

Optimizations, optimizations, optimizations

Now that we were able to correctly load the content in question onto the phone, it was important to evaluate whether doing this was a practical solution. One of the invaluable helpers in the tool box of every developer to analyze these scenarios is a network monitoring tool. I generally start with Fiddler because it's so simple and convenient to use, and only switch to more advanced tools when Fiddler isn't able to capture the traffic or provide the information I need. For developing with the Windows Phone emulator, Fiddler is perfectly fine as long as you remember to first start Fiddler and then launch the emulator afterwards (or you won't be able to capture any of the emulator's traffic). You can find more information and download Fiddler for free here.

In the case of analyzing the RSS feeds of SilverlightShow you don't even have to go through the Windows Phone emulator, but can access the feed directly in your browser and watch the traffic in Fiddler that way. One of the particular feeds we are interested in is the one that retrieves article content from the site:

http://www.silverlightshow.net/FeedGenerator.ashx?type=Article

At the time of writing this article, the size of the returned 25 items in this feed is a whopping 1.5 MiB:

Obviously that amount of data is unsuitable to be transferred to a mobile device frequently, where people often are using slow connections or even worse, have to pay for the transferred data byte-by-byte. It was clear that we needed to introduce a whole lot of optimization and improvements to make this work. Fortunately, we had already reckoned this situation and collected a bunch of ideas during the design phase.

Squashing the Content

One of the particular details we quickly identified for optimization are the code snippets that are included with most of the articles on SilverlightShow. Authors can use a nice plug-in to Windows Live Writer to beautify the code they paste. Behind the scenes, the coloring and formatting of the code is achieved by applying CSS styles to it. This can tremendously increase the size of the HTML necessary to display the code. For example, this small snippet of code (147 characters) taken from a recent article…

private void button1_Click(object sender, RoutedEventArgs e)
{
    MyBeep myBeep = new MyBeep();
    myBeep.PlaySound(BeepTypes.Information);
}

… is turned into a block of 3313 characters in the RSS feed. In addition to the CSS styling information, the required encoding of HTML entities also adds quite a lot to this size. This means that we see a size increase of factor 22 here just for the added benefit of having colored code snippets. Stripping this formatting from the content when it is requested by the phone was a natural consequence. We still have some code formatting in the app (like using a fixed-size font), but without the syntax highlighting. This was one of the areas that required some work on the existing services. A newly added parameter for the feed generator solved this problem nicely. You can see the result using the following url:

http://www.silverlightshow.net/FeedGenerator.ashx?type=Article
&stripCodeFormatting=true

This change alone shrinks the size of the feed by approximately two thirds, depending on the actual content and the percentage of code in the current set of articles. At the moment, the new size is ~625 KiB:

The content on the phone after that looks similar to this…

private void button1_Click(object sender, RoutedEventArgs e)
{
    MyBeep myBeep = new MyBeep();
    myBeep.PlaySound(BeepTypes.Information);
}

… and only requires 118 additional characters per document, and these are generated on the device itself (more on that later).

In addition to this, we also reduced the maximum number of articles returned to the phone from 25 to 10, as we felt that this is sufficient for the mobile device where you are more interested in being notified about new content than reading the archives. This of course also reduces the total size a lot.

Parsing HTML

A side note on stripping the code tags and CSS from the original content: this actually is a very interesting topic that we unfortunately do not have the time and space to cover in more detail here. Real-world HTML often does not follow the strict rules of normal XML content; especially when you're working with WSIWYG editors, and when multiple authors are responsible for a merged final output, the result might look significantly different from what you would expect. In most cases, trying to treat HTML as XML, and assuming that it follows the same strict rules is a dangerous thing to do. Therefore, I generally recommend against the usual approaches people attempt to follow when they work with that kind of data. In particular, do not:

  • Use regular expressions
  • Use string replacement
  • Use XML readers or documents

All of these fail quickly with faulty HTML, and are not flexible enough to handle the situations you will come across, with reasonable effort. So what is the alternative here? There are some projects out there that have put quite some work into creating more tolerant parsers for HTML that are capable of handling malformed tags and content better than probably any custom solution you (and I) would be able to create within the limits of a small phone project. One of these projects is the Html Agility Pack hosted on CodePlex. It has built-in support for Windows Phone and not only helped us strip the code formatting from the articles, but also to improve some other details of the content, like dealing with multi-column layouts and turn it into something that is more suitable for the small phone screen.

If you are ever in the situation where you need to parse HTML, I recommend you take a look at this or similar projects to make your life easier, instead of trying to reinvent the wheel.

Local Caching

Another important feature of the SilverlightShow app is that we do not pull all the RSS items from the feeds whenever you hit the refresh button or start the app on your device. Instead, we locally store the items and only fetch those that we haven't retrieved in the past yet. To achieve this, another small extension of the existing service was implemented that allows you to restrict the returned data by date, by appending an "after" parameter to the url:

http://www.silverlightshow.net/FeedGenerator.ashx?type=Article
&stripCodeFormatting=true
&after=2012-04-03%2014:57:57Z

Retrieving a single new article then results in a really small transferred size, depending of course on the actual article content. At the moment, retrieving only the newest article for example only requires transmitting 15.428 bytes.

This means that after the initial download of the full articles list, successive downloads will be much faster.

Mother of Optimization: Dynamic Data Compression

Surprisingly few people know about the possibility to compress dynamic content on the fly in IIS. By default this feature is disabled, and you may even have to add it manually using the "Turn Windows features on or off" or server roles dialog of the operating system:

This is a feature that comes virtually at no cost for normal application developers. In my experience, it also has little to no impact on existing clients and applications, as you are able to configure its details per application, and clients explicitly have to opt-in to make use of it (meaning if they don't support it, turning it on shouldn't break anything for existing clients). It does have some implications for your server however, because obviously it needs to compress the content it delivers on the fly now, which increases CPU load; if you're in doubt, talk to your system administrator or provider.

Unfortunately, Windows Phone devs do not fall into the "normal application developers" category; even the most recent version of the SDK does not have built-in support for compressed content, which means the available networking classes do not explicitly opt-in to the feature and also do not do the decompression for you transparently. The benefit you get from implementing this manually however is tremendous for text-based content (like RSS feeds), so it's something you should always consider. As often, some people have done most of the work for you already. Morten Nielsen for example has created an implementation of this that you can use with just a few lines of code, and the package also is conveniently available on NuGet.

I've used a very similar technique in all of my Windows Phone development that required downloading text-based content; for the SilverlightShow app, the code looks like this:

var request = WebRequest.CreateHttp(uri);
// opt-in to compressed content
request.Headers[HttpRequestHeader.AcceptEncoding] = "gzip";
var ar = request.BeginGetResponse(GetResponse_Callback, feedResult);
 
// ...
 
using (var response = feedResult.Request.EndGetResponse(ar))
{
  using (var stream = response.GetResponseStream())
  {
    // decide what to do
    Stream decodedStream;
    if (response.Headers[HttpRequestHeader.ContentEncoding] == "gzip")
    {
      decodedStream = new GZipStream(stream, CompressionMode.Decompress);
    }
    else
    {
      // simply use the original
      decodedStream = stream;
    }
 
    // ...
  }
} 

Here, "GZipStream" is my own custom implementation similar to what Morten has developed himself. The savings are tremendous. The original feed that currently has 1,591,787 bytes, without any of the other phone optimizations applied, is reduced to a size of 302,537 bytes:

Final Comparison

Let's take a final look at the before and after situation for the articles RSS feed with all the optimizations in place:

  Original RSS Optimized for the phone + gzipped

Saved

Retrieving the full list (25/10)

1,591,787 bytes

91,399 bytes

94%

Update (1 new item)

1,591,787 bytes

~5,000 – 15,000 bytes

>99%

This is a very reasonable amount of data to handle for the phone. Also please remember that we are only looking at the article feed here. All the other feeds we're using in the app (news, events etc.) are much smaller in their nature anyway, but have the same optimizations applied. This is one of the reasons why the app works well even with not so fast internet connections, and updating or refreshing the list typically only takes a few seconds, at the most.

Now that we have the possibility to pull content into the app, and also optimized the amount of data that has to be transferred for this, what do we do with that content? It's all HTML formatted text, images, and tables – we need to find a way to nicely display that content to the user. This topic will be covered in the next part of the series.


Subscribe

Comments

No comments

Add Comment

Login to comment:
  *      *       

From this series