A lot has been said and written about the features of the upcoming Mango update for Windows Phone 7.
One particularly interesting improvement to the tooling is the performance analysis feature addition of the built-in profiler, which you can read more about here on MSDN, or in my XNA series here. However, the platform had and still has a set of additional features that can become very helpful for monitoring application performance, sometimes in a way not possible with the profiler.
In this article I want to talk about three of these features and how to make use of them during development.
The Frame Rate Counters
When you create a new Windows Phone 7 Silverlight application in Visual Studio, then the frame rate counters are enabled automatically as soon as a debugger is attached. This is accomplished by a snippet of code that is part of the auto-generated App.xaml.cs file, in particular the constructor of the App class (shortened for simplicity):
1: // Show graphics profiling information while debugging.
2: if (System.Diagnostics.Debugger.IsAttached)
3: {
4: // Display the current frame rate counters.
5: Application.Current.Host.Settings.EnableFrameRateCounter = true;
6: }
The documentation of this switch can be found here. When you run the application, the result of this code is a bunch of seemingly cryptic numbers displayed at the top of the right edge of the emulator or device:
The problem with these numbers probably is not that developers don't know what they stand for – that is something that can easily be looked up in the above linked documentation or in further MSDN articles like this one. The problematic part is how to make sense of the values, and how to determine whether the values shown are reasonably good or indicate that you should do something to improve them.
Let's go through them one by one and talk a bit about each one individually. Please note that as with all performance analyses, these numbers also make only little sense in the emulator. Since I'm using the emulator to take screenshots, you should always keep in mind that what you see here likely does not reflect what you will see on a real device.
Composition Thread Frame rate
That first number indicates the number of frames per second rendered by the composition thread. On the phone, a separate thread is responsible for the rendering. Once created and layout (on the UI thread), elements are handed off to this thread for rendering as bitmaps (textures). The benefit of having a separate thread for this is that certain operations that e.g. do not change the layout of elements can be performed by the composition thread independently, without putting strain on or executing additional computations on the UI thread. This frees the UI thread for additional tasks and improves the overall responsiveness of the application. The operations that can be performed on this special thread are:
- Scale transforms (as long as they are less than 50 percent of the original size)
- Translate transforms
- Rotate transforms
- Plane projections
- Changing the opacity
- Clipping
In a lot of cases the operating system is able to automatically determine when to run operations on this thread, for example when you're using storyboards for your animations. You can however explicitly state that you want to use this feature for certain UI elements by setting their CacheMode property to a BitmapCache. The runtime then tries to create a texture for the element and skip future rendering passes on the UI thread if possible, so the composition thread can take care of the element instead. This can yield in a significant performance increase if it's used wisely (more on this below).
The targeted value for a Silverlight application for the composition thread frame rate is 60 fps. Lower frame rates are tolerable, but as soon as you see this number drop below 30, you know that there is a problem with the rendering performance in your application, and you should start looking for optimizations.
Note: the number you see here will often be zero or not update, for example if no animations are running and there isn't any update available for rendering. This is a normal and expected behavior. If you want to continuously see a reported frame number here, you can add a dummy animation to your application that only runs during development (all the time, for example in an infinite loop); you will then see the current frame numbers at any time.
UI Thread Frame Rate
The second number displayed is the frames per second the UI thread runs at. The UI thread has some major tasks to perform, for example:
- Handle all rendering/animations that cannot be handled by the composition thread (see above)
- Perform all runtime operations that require access to UI elements (e.g. handling data bindings)
- Perform all of your custom code that requires access to UI elements or that you have put in the UI thread deliberately or according to the circumstances (e.g. code located in handlers of UI element events)
As you can see the UI thread has a lot to do in most applications, and hence one of the primary goals should be to avoid performing unnecessary additional work on it - for example by using background threads for data processing, or by actively using the above mentioned bitmap caching to hand off work to the composition thread. One way to keep an eye on this is to watch the above marked number of frames per second. It should not drop below 15, or the user experience is likely to degrade significantly.
Notes: again, if there's no work to do, the number here may not be updated. Another detail is that the UI thread will never run faster than the composition thread (the UI thread needs to hand off work to the composition thread, so it depends on it being as fast or faster than itself). This shouldn't be a problem most of the time, but it's another thing to keep in mind.
Texture Memory Usage
The third number, according to the documentation, tells you
"The video memory and system memory copies of textures being used in the application."
This is the first value that needs additional explanation to be made sense of. Like I wrote above, after an element has been created on the UI thread ("rasterized"), the resulting texture is handed off to the composition thread in the form of a so-called surface for the actual rendering to the screen. The amount of memory displayed here is not a general memory counter of your application, but only the amount of memory used by those textures.
The question you have probably is whether you should worry about this if some threshold is exceeded? The answer is most likely no. As long as you are able to keep the frame rates of both the composition thread and the UI thread in the recommended regions, and if you don't run into problems with the overall memory consumption of your application, there is nothing to worry about here.
Surface Counters
The next two counters are described by the documentation in the following way:
"Surface Counter: The number of explicit surfaces being passed to the GPU for processing.
Intermediate Surface Counter: The number of implicit surfaces generated as a result of cached surfaces."
We learned about surfaces in the last paragraph in terms of memory consumption, and these two numbers give us more details about them. The left one tells you how many surfaces are passed on from the UI thread to the composition thread explicitly. As you can see, the UI thread can and will partition the elements on the screen into several sub-sections (in this case, 6) that are then rasterized individually and handed off to the composition thread as separate surfaces. The right number is the number of implicitly created surfaces as a result of the caching feature. One of the tasks of the composition thread, as its name says, is to compose elements for the final rendering. So it takes these surfaces and combines them into the final result, according to their correct z-ordering. This result then is shown on the screen.
When do these numbers become a problem? General advice is hard to give here. It can be interesting to watch these number and compare them to what you consciously do on the screen. If there is a great mismatch between e.g. what number of elements you are moving around or change on the screen and the number of surfaces that are created, then this can be an indication for a problem. However, there also are situations where the runtime optimizes what happens here and the differences you see are not a problem at all. Again, the more important values are the frame rates and the fill rate that we will discuss next; you can however use the techniques described after the next paragraph to further analyze the situation when you're in doubt.
Fill Rate
That last counter marked in the screenshot above is the fill rate of the current rendering, given as the relative number of phone screens rendered in each frame:
"The number of pixels being painted per frame in terms of screens. A value of 1 represents 480 x 800 pixels."
This is one of the most important indicators for application performance, and yet one of the most confusing ones for many: "the number of pixels being painted in terms of screens"? To understand this you have to realize that the composition of the final screen may require several pixels to be rendered multiple times. Imagine two elements that are animated, for example moved around on the screen or rotated. When everything is set up correctly, the composition thread is able to handle those two animations without the UI thread, by working on each of the resulting textures separately first. But the problem of overloading the UI thread is not the only limitation. In the example, at some point the result of these animations needs to be drawn to the screen; to do this, the composition thread combines all the surfaces, according to their z-order, into the final result. Now when those two bitmaps partly overlap, those overlapping pixels will be drawn twice. In cases where both pixels are opaque one of the drawing operations was unnecessary, but when we are working with transparency, both pixels actively contribute to the final result, they are blended. In fact, they may even be blended with an already existing background texture if both are not fully opaque, which makes the number of involved source pixels to produces the final color value of the target pixel three already.
So in situations like these, even though the screen only has 480x800 visible pixels, a lot more need to be drawn to produce the final output. On the phone this ratio of drawn pixels to the number of native pixels on the screen is called the fill rate (which is different from how the same term is used for desktop GPUs). The theoretical ideal value for this ratio is 1.0 when the whole screen is drawn, which means each visible pixel on the screen only needed to be rendered once. It cannot get any lower than this. Of course the displayed fill rate value can be < 1 in cases when not the whole screen is redrawn though, like in the screenshot above. In any case, you should keep an eye on this number, and try to stay below 2.5, which means on average, each pixel needs to be drawn 2.5 times to produce the final frame. The user will see a significant and noticeable performance drop if the fill rate exceeds 3.5, and the recommended maximum upper threshold is 3.0.
To further analyze and determine how to optimize fill rates that are too high, you can make use of the cache visualization feature described below.
Redraw Regions
The diagnostic feature of showing the redraw regions can help you determine what you can do to optimize your application performance with regards to what areas need to be drawn every frame by the UI thread, for example when you have animations running. The following is a screenshot of the application named "PerFrameCallback", available from a package of performance analysis samples from Microsoft which can be found here. It moves around some rectangles and a square on the screen for demonstration, and in the image it has the redraw regions feature enabled:
What "redraw regions" does is tint all the areas that are redrawn on the UI thread in a frame in random colors to make it obvious for you to see what is going on. When contents are redrawn every frame like in the sample above this will result in a lot of flickering, as the colors are switched and cycled through on each frame. The effect is hard to describe, and the screenshot doesn't show what exactly happens in the demonstration application in a clear way, so I very much encourage you to go ahead and play with the sample yourself.
You can enable and disable redraw regions by using the same settings that allow to display the frame rate counters (code taken from the above sample):
1: private void redrawBtn_Click(object sender, RoutedEventArgs e)
2: {
3: _isRedraw = !_isRedraw;
4: Application.Current.Host.Settings.EnableRedrawRegions = _isRedraw;
5: }
How can this help you? By analyzing what areas of the application are constantly redrawn that way, you can identify potential elements that could be optimized by using the bitmap caching feature I mentioned in the beginning of the article. It also helps you determine cases where the optimization of the runtime may be off. In this particular sample application there really is no reason why the rectangles shouldn't be able to benefit from the caching, and to demonstrate that the application allows to toggle the cache mode for all of these rectangles on the screen on and off by using the respective button. The effect of doing that become apparent instantly when you look at the redraw regions diagnosis; the caching is successfully applied to these elements, and the flickering will stop instantly, indicating that all rendering successfully happens on the compositor thread now. This also results in a much increased performance, especially when you have added a few more rectangles in the demo.
Despite the fact that it looks a bit weird and distorted when you turn on this diagnostic feature, I think it is really intuitive to use; when you have some animations going on on the screen, you have a certain image in your head anyway what the device is doing in terms of rendering and caching (or let's say, what it should be doing). By turning on redraw regions you can very quickly verify that this really is what is happening in reality, or if and what you potentially could improve. If for example too large areas are redrawn in each frame, the visual effect will immediately catch your eye and you can start thinking about how and where enabling said caching for certain elements would make sense.
Note: redraw regions is a feature that is not dependent on running on a real device. You can perfectly use this in the emulator, because you only analyze the behavior of the application with regards to rendering and bitmap caching, not it's performance.
Cache Visualization
This is the last helper I want to briefly talk about. Similar to how the redraw regions feature works, this also tints parts of the screen. The documentation can be found here; when you read through that documentation, make sure you understand the different behavior of the feature on the phone compared to Silverlight on the desktop (where it works the other way round). Once again this can be turned on and off by using the host settings:
1: Application.Current.Host.Settings.EnableCacheVisualization = true;
What it does is: it gives all your textures that are passed to the GPU a blue tint, with a certain amount of transparency. When multiple textures overlap, the half-transparent colored tints will add up to a more opaque color. This allows you to quickly see where a lot of textures are drawn on top of each other, and to identify optimization potential by combining multiple elements into one.
An example would be that e.g. you have received several static images from your designer that should be combined into a background image for your application. Designers like to work with that kind of layers when they create graphics, but on the phone creating a single background out of two or more individual, probably partly transparent images quickly becomes a problem. Using the caching alone does not solve this problem, because these layers will increase the file rate dramatically (see above). With cache visualization you can quickly identify the spots where you maybe could combine parts of multiple textures, or even whole textures into a single resulting image, which will result in an instant and significant decrease of your application's fill rate.
Note: just like the redraw regions feature, the cache visualization can be safely used in the emulator too.
Conclusion
Like almost all features, using the monitoring and diagnostic features is best learned by actively using them. Let me once again point you to the entry page on the topic on MSDN, and in particular to the package of sample applications available for download from there. These samples will tremendously help you understand the correlation between caching, the composition thread, fill rate and other terms, and how the indicators mentioned above can be used to visualize and analyze the involved details. If you have any questions or want to provide feedback, please let me know in the comments below or contact me directly.