Archive

Author Archive

Weekly report 6

July 5th, 2010 topfs2 Comments off

Last week have been a busy week indeed and I have focused on getting the dirty region based rendering to work. I am glad I followed my mentor’s advice and did dirty region before moving to an event based rendering, especially considering that when dirty region rendering works we have gotten an event based rendering, but not a an event based processing. So while CPU could be limited further this should show if its worth doing.

Status

  • Many of the more common controls works and produce fully workable dirty regions, amongst these are MultiImage, Image, Label, Button, Groups (Windows and Dialogs). There are a lot of other controls that work somewhat with some artifacts. Here is a video of the working controls and there respective dirty regions.
  • As can be seen in this video were I have enabled and only render what changes on screen on my workstation, Confluence is almost fully usable. Note that my workstation thankfully copy backbuffer to frontbuffer and as such I can assume the backbuffer to be defined after the flip, sadly this is nonstandard and why I have no video from the beagle board yet.
  • Experimented with a more scalable algorithm for handling the dirty regions after they have been generated here.

Plan

  • Continue to fixup controls to create workable dirty regions.
  • While the dirty regions are created and the clipping works I need to make the backbuffer defined on the BeagleBoard to have it working without flicker. This is a vital goal for this week. We have 3 options, either define the backbuffer via EGL_BUFFER_PRESERVE. Second option is to render the entire interface to a framebuffer object then before flip we render the framebuffer object to the backbuffer. Third option is to render as usual to the backbuffer and before flip we copy the content of the backbuffer to a texture and on the start of the scene we render that texture to the backbuffer. The first option is preferable and I have added code for it and egl seems to state it should be preserved my initial tests on the beagleboard indicates it not being preserved. Second option is probably the most useful on non-embedded since it limits the needed fragment operations more but bump the required GL driver abit (not of concern in GSoC). For this to be an option on the BeagleBoard I need to make sure its ok to create 720p framebuffer objects but I would assume so since max texture size is 2048×2048. The third is more of a fallback as its a bit wasteful but is useful on older graphics driver in the desktop segment, although I would guess copy the frontbuffer to the backbuffer without the need of an intermediate texture through glCopy is more proper.

Risks

  • The biggest problem for next week will be getting a defined backbuffer and is essential for the success of the project.
  • A risk worth mentioning is that the dirty regions might not be as beneficial for the beagleboard as anticipated, this is doubtful however since SGX seems to have software fallbacks on certain rendering stages. So any limitation of the area should bring down CPU usage. My workstation had a significant lighter CPU load with dirty region rendering enabled, on average it was around half. Note that since not all controls are working, as an example RSS control, these numbers should be taken lightly.

What to do when you have the dirty regions?

June 30th, 2010 topfs2 Comments off

For those that have followed the weekly updates might know that my branch now can generate somewhat correct dirty regions for many controls. This means that most of the GUI is usable in a dirty region rendered way.

I searched quite a lot on the internet for how to handle the regions created in a good way and I found very little information beyond the simple case, so I decided to show my solution to spur some discussion and perhaps to help someone that might be faced with the same problem I was.

While we wander down the tree of controls each control marks a dirty region if they have changed, this means that when we are ready to render we might have lots of regions available which must be rendered. To render all these regions a simple solution is to take the union of them all, this allows us to render the parts that have changed. While this surely is more effective than rendering the entire screen it is still likely we render an area which has not been changed. Say you change one pixel in each screen corner, the union would envelope the entire screen which would lead to unnecessary bandwidth use. On the other hand rendering each and every region makes overhead more noticeable. Each render pass will while iterating through the tree of controls do a lot of matrix operations. Each matrix operations leads to a lot of floating point calculations which is something CPU’s normally aren’t suited for. Intuitively one would say that a better solution would be a middle ground, create unions of those regions which are close and adding rendering passes on those that are far away as seen in figure 1. The question is, how should one solve this problem?

Figure 1 – Two ways one could create the render passes

To create a greedy algorithm that solves this perticular problem we first need to define it. The problem we are faced with is that we have X regions, does there exist a solution with Y regions which has a lower cost of rendering than the X regions.

When the problem is defined one can see that the unification of the regions will solve the problem, and as a matter of fact many applications solve it this way due to its simplicity. A well known example is the Android operating system which render in one pass and as such the union of the changed regions.

The union solution is great when you have a limited amount of pixels or a well structured layout when its unlikely a change isn’t confined, which is the case on Android. XBMC on the other hand wants changes at many places throughout the screen, an example is RSS scrolling while a button is pulsating and in PM3.HD this would mean needing to render half the screen if the union solution is used.

To find a more optimal solution we first define 2 costs, one cost is per area needing to be rendered and another cost is the cost for doing one rendering pass no matter the size of the area. This reduces the problem to a cost reduction problem which can be solved rather simple. While it can be solved with trying all possible solutions its far from optimized and as such I have composed a greedy algorithm to solve it fast and hopefully often optimally.

The psuedo code would be

For every marked dirty region:

Find the cheapest unions cost against the already created rendering passes:

If the cheapest union is cheaper than adding the marked region as a new rendering pass, use the union, otherwise create a new rendering pass.

This solution can either become one unified region or several small ones depending on the costs, this makes it a very flexible solution which takes almost the same time to solve as a plain union. One thing one might note is that the order of the marked dirty region might affect the found solution. Since I am no algorithm expert I can’t mathematically show how well the solution will perform so I have tried it instead. To see how often it finds the cheapest I tried randomizing a number of regions then use the algorithm on all available orders to find the one with the lowest cost. It turns out that it depends quite much on the number of regions and there spread. If I used 10-20 regions 30% of the cases the first tried is the one with the lowest cost while 5 regions that number was 70%. While these numbers might sound small, especially with more regions, the first one found was usually 108% of the lowest region in my tests. When having only 5 regions dirty it went down to 103%. So while it might not be the smallest it is still producing a rather good approximation. Noteworthy is that the union solution produces a region about 112% when 20 regions and 150% when 5 regions.

Figure 2 – Union against the cheapest solution (blue is cheapest and red is union)

Figure 3 – Greedy algorithm  of cost reduction against cheapest solution (blue is cheapest and red is the greedy solution)

Weekly report 5

June 28th, 2010 topfs2 8 comments

Status

  • Finally XBMC runs on Angstrom, turns out it was optical code paths deadlocking while loading. Commited a –disable-optical-drive on the gsoc branch.
  • XBMC Runs on C4 at about 10-15 fps (15fps in 640×480 and 10 in 720p) but koen have tried it on the new BeagleBoard xM which does an amazing 38fps in 720p! http://www.youtube.com/watch?v=80Uia6FkvnA
  • Commited the initial playground patches for split of processing and rendering.

Plan

  • Since C4 runs at 10-15 fps and 100% CPU we need to tripple performance and since CPU seems to be limiting here (unless buffer flip does busy wait) next stop will be limiting processing amount. First will be eventbased since this will allow a way to create a skin thats extremely light on processing resources.
  • Finding out what paths eat the most amount of CPU time (should be somewere in font according to other tests) and try to set up a proper plan on how to limit the CPU usage.
  • Fix up the window and list issues with the processing and rendering split.
  • Backport a few changes to trunk to allow building xbmc on angstrom on trunk.

Risks

  • Given that it looks like CPU might be a limiting factor getting XBMC to lower resource use by a third might be hard without limiting the skin

Weekly report 4

June 21st, 2010 topfs2 5 comments

After discussion with my mentor we decided to not use the EVAS model suggested in my last weekly report.  While it might have worked it would have taken to much time to realize which is a bad idea due to the limited time of GSoC. Hence I have started altering the guilib to do the actual event based and dirty region creation and actually with rather great results with little work!

Status

  • Initial split of rendering methods in control to one processing stage and one rendering stage. Most controls seems to work except the containers which I am yet to fully understand :)
  • Initial calculation of dirty regions based on what skinners provide. See figure 1. Looking at the figure we can see that even the simple generation of render regions works reasonably well!
  • Controls mark a dirty region if their animation transformation has changed. This works surprisingly well on control groups but much more can change than just the animation transformation (moving in a list etc.)
  • Since the code for this is still crude I haven’t commited it. I have however added it to ticket #9448 so it can be discussed.

Plan

  • Fix up the processing stage in the containers
  • Allow controls to mark dirty regions based on other changes than animation
  • While processing is done separate from rendering its still done every frame. Create a process scheduling and rendering scheduling as a beginning to event based rendering / updating.
  • The generated dirty regions does not take fully confine rotated controls or controls with altered perspective. This must be done otherwise coverflow or other 3D type of effects cannot be used in skinning.

Risks

  • Calculating dirty regions in lists seems to be hard, say just one item in the list changes without the entire list changing. I doubt it won’t be solvable and alternatively just marking the entire list should still be an ok workaround for now.

Possible dirty regions in confluence settings screen
Figure 1 – Controls possible dirty regions*

*)

The colors reflect the type of the control, mainly to ease viewing.

  • Red overlay – Button controls
  • Blue overlay - Image controls
  • Green overlay - Labels
  • Light green overlay - Other

Weekly report 3

June 14th, 2010 topfs2 Comments off

Since I’ve been without computer the last week the status update will be slim.

Status

  • Committed the initial patches to get XBMC on Ångström building
  • Decided on an abstracted rendering model which works like EVAS. Basically guilib will add drawable elements to the render system. When an animation occurs or the control get hidden the guilib will manipulate the needed elements and the rendering system will handle any needed rendering. This differs in a small way how its done currently were each control is actually rendering its needed elements were the new solution will make this happen outside control space. This is beneficial because it leaves any optimization outside XBMC, and moves it in the new abstracted rendering system. Since there are roughly 20 controls its much simpler to leave the optimizations out of them and move it inside the minimal amount, 5 or so, of fundamental elements needed for the guilib to be rendered. Also it will be trivial to introduce a new element that controls could use if the old ones does not provide optimizations enough. One example of this is the fading between two images, this can be done much faster if you blend using a shader between the two images and then blend created fragment to the back buffer instead of blending the two images separately into the back buffer.

Plan

  • Finalize at the higher hierarchy of the new rendering system, thus finalizing the API that will be used by XBMC’s guilib
  • Post the API somewhere for discussion
  • Move some of the current code of rendering into the new rendering system

Risks

  • Moving to the new rendering system might bump some of the needed requirements to render the gui, but this is more of a risk after GSoC and outside embedded since it has the needed requirements.
  • Moving the current code might be harder than thought

Weekly report 2

June 7th, 2010 topfs2 Comments off

While I had exams last week there were some major work done with getting Ångström working as a development environment for XBMC, much thanks to the extremely helpful people over at #beagle!.

The plans for next weeks will consist mostly of reading up on already created technologies as I won’t be near my development machine.

Status

  • Thanks to, amongst many, koen from #beagle I finally got the Ångström distribution up and running and compiled XBMC. Koen has provided a few patches necessary to get XBMC compilable and have added a few missing libraries to the repository.
  • Narcissus image creator for Ångström have gotten a “Beagleboard GSoC 2010 XBMC build dependencies” as a development option to help others to easily create an image that can build XBMC. Its still missing some libraries which will get updated along the way.
  • While I have focused on Ångström for now since more mentors use it and is more supported Ubuntu seems to be working with the latest SGX which make it a viable fallback once again!
  • Branch created gsoc-2010-beagleboard

Plan

  • Commit the patches needed to build XBMC on Ångström.
  • Read up on already created graphical libraries like EVAS how they solve buffered font and event based rendering.
  • Sketch up a proposal of buffered font.
  • Finalize the event based rendering proposal.

Risks

  • No development environment will make it harder to create proposals.
  • The proposed event driven rendering will alter much of the internals of XBMC (more of a long term risk)
  • While buffering text seems viable we generally use text quite differently hence a one fit all solution might be hard to find. Examples of uses
    • “Streaming” text in inputfields and subtitles
    • Small text labels
    • Large scrolling text which might stretch beyond a texture limit. While it may not change much with the exception of scrolling, generate the text might take a significant time and would use a lot of memory. This might introduce lagg to the GUI or just take up to much memory.

Weekly report 1

June 1st, 2010 topfs2 1 comment

Finally the GSoC have begun and while I have been plauged with exams this week I have made some accomplishments. It is sad but next week will also be filled with exams but hopefully I will find some time during the weekend. First of I planned to fix up and use Ångström but since I just recieved kernel panics and had significant problems with the g_ether I decided to first do an ubuntu image as a test (I’m much more comfortable in Ubuntu) and when that worked I wanted to continue creating the Ångström image.

Status

  • Have created an Ubuntu image which has all dependencies needed to build XBMC (except SGX for now). The most problematic part with ubuntu was that the usbnetwork did not want to work. Thanks to DanaG in #beagle I finally got it working by adding g_ether.use_eem=0 as a boot argument, not sure if Ångström needs this or if its just ubuntu related but I have added it to the beaglebeginners wiki.
  • Thanks to maltanar who discovered a cure for the kernel panics in Ångström I have atleast a running image I can continue to work on.
  • A very helpful person, Phaeodaria, in #xbmc-arm suggested a possible solution for the dirty region problem by the use of a stencil buffer. I think this is something we can use to relieve the CPU of the stress to generate the dirty regions and as an added bonus with stencil buffer we could probably get an even more exact image of whats needed to be rendered (less wasted cycles). Phaeodaria is currently testing how much cycles generating a stencil buffer will take.
  • Also Phaeodaria have found a very interesting flag in EGL called EGL_BUFFER_PRESERVED which would relieve the stress of using a framebuffer object and instead assume the backbuffer holding the last render. The obvious backside with this is that the driver cannot do a simple flip of the buffers but instead need to copy it, however its unlikely that its slower than the framebuffer idea. Also a downside with this is that its EGL specific, thus nothing the desktop segment can use and they likely will have to do the framebuffer idea (or omit the dirty region rendering all together). Benefit with using a framebuffer is that we can manipulate it with shaders, making it possible to do post processing and effects like blur, saturation and any color alteration really. Ofcourse seeing as the devices not being able to use this due to no framebuffer are the devices which are unlikely to have the power to do the effect, its probably no downside.
  • Since XBMC uses a recursive build and configure the cross compilation to arm currently doesn’t work. And according to koen in #beagle scratchbox is alot of trouble for ångström. Thus I will probably build XBMC directly in Ånsgröm on the beagle board, however slow it might be. Thankfully most of the proposed solutions can probably be made for the desktop GL version, tested and with some adjustments implemented for GLES and finally compiled and tested on the beagleboard. This will allow for the bigger changes to be done with a fast workstation and a fast compile while when the rough parts are done it can be compiled and tried on the beagleboard. Hopefully this won’t make to serious impact on the development speed.

Plan

  • Finalize the Ångström image.
  • Build XBMC on the BeagleBoard.
  • Read up on how the eventbased and dirty region solutions are done Android, EVAS and Java Swing (All of these are very portable and seems simple).
  • Try to create a proposition for an improved font rendering that will be faster (mostly due to being buffered).

Risks

  • Meeting the dependencies for XBMC in Ångström might be hard, XBMC really is a big app with way to much dependencies with no easy way to scale it down. Solution would be to use Ubuntu to test and compile and continue to work on the Ångström image as much as possible.
  • If Ångström will be a problem Ubuntu might be used, then a risk for Ubuntu might be that SGX will be problematic to install. Atleast the latest version is said to not work in ubuntu.
  • Little gain from buffered font rendering. Before any implementation is done its vital to try and see if it will be beneficial, sandbox testing with alpha blended quads vs a fbo of same boxes yielded a significant FPS increase though.