Archive

Author Archive

Weekly report 12

August 16th, 2010 topfs2 2 comments

This is the last weekly report and I would like to thank google and the beagleboard community for the fun summer! It have been a wonderful opportunity and I’ve learned heaps and gotten extremely interested in embedded development.

Status

  • Done further investigation regarding finding the performance bottleneck and is reasonably sure its not shaders any more, which is good. I have noticed that it seems likely that many skins suffer from texture cache issues which is due to textures being to large and making SGX having to do unnecessary work to handle the presentation. I’ve verified this by splitting up a texture in 4 pieces and received 50% performance increase. It seems it doesn’t depend on the size of the rendering area but the texture size. This is of course only true if the USSE isn’t running at maximum capacity since then larger render area would make it slower. And on most skins it is running at full capacity which is were dirty region is useful.
  • Since much of the performance increase is done by skinners and hard to fix magically by code I’ve written up a documentation with what I’ve learned which could be used by skinners who wants to make a lightweight skin. Documentation can be found at elinux wiki (http://elinux.org/BeagleBoard/GSoC/2010_Projects/XBMC#Documentation)
  • Added a background to the beagleskin which makes it look rather close to confluence while still running at 30fps in 720p on the BeagleBoard C4.

Plan

  • Need to google abit for rebase in git so I can upload the patches done during gsoc to google. Or just use format-patch, not sure which is the best.
  • Take a few weeks off before going back to school :)
  • Need to find a suitable something to hook up the beagleboard to when arriving to Canada so I can continue to play with it :)
  • Continue with the lightweight skin, I very much like to use it despite it being far from finished.

Weekly Report 11

August 9th, 2010 topfs2 4 comments

Before we go into the actual status report I would like to be abit nostalgic, this is the last week of feature addition after all.

Before GSoC XBMC didn’t even compile on angstrom and now I just finished watching a SD video which ran without the BeagleBoard breaking much of a sweat! I have had much help from the extremely talented beagleboard community and it would have never been possible without them, now its almost possible to have a fully capable SD box made from the BeagleBoard with XBMC.

While not all of the original ideas made it into XBMC we have still come a long way and XBMC have went from not running to becoming rather usable on the BeagleBoard, still room left for more improvement ofcourse but still great progress IMO. Since the YUV transform is done by NEON and the hardware overlay videos actually run good, before GSOC it couldn’t even present the lowest resolution video at anything beyond 10fps.

It has been made clear what is slowing down GUI rendering and much lies in the skinners hands, which is what the next weeks documentation will be about. I have commited a minimal but still useable skin which might just hit 100fps ;). If we add backgrounds and make it static but close to looking as ordinary skins that number goes down to about 20fps, with the addition of the dirty region rendering this number actually gets to about 30fps, which was my goal for 720p!

Status

  • Finally got the overlay to be under the controls which needs it.
  • Created a minimal skin which actually hit vsync when rendering.
  • Refactored the rendering passes and dirty region tracker to support N-buffering. Also added the new cost reduction algorithm, the cost reduction algorithm is using static costs now but could probably calculate dynamic costs if it got timing data back regarding its choices. I guess linear regression would work if we don’t use to old data points?

Plan

  • Polish and add more dialogs and such to the minimal skin to make it a viable skin for day to day usage.
  • Document what a skinner can do to make a skin which runs fast on BeagleBoard

Risks

  • Can’t think of anything …

Weekly report 10

August 4th, 2010 topfs2 2 comments

Status

  • Overlay is scaled and positioned according to how the skin wants it.
  • Overlay playback is tearing free and rather fast. Runs bunny 480p almost perfect while before it was about 5fps.
  • Finally got crosscompile working. Thanks again koen for all your help providing me with builds up to this point

Plan

  • Still need to get overlay under some of the GUI controls. I have tried rendering SGX with alpha bits on but haven’t gotten it to work, possibly set the blending mode of the overlay improperly.
  • Find a low resource skin and limit it down as a start of a beagle skin.
  • Investigate what the performance hog is in videoplayback using oprofile, the memcpy should be gone but its still far behind omapfbplay.

Risks

  • My temporary apartment seems to lie in a wireless deadzone so my internet is incredibly slow. I will still be able to work (thanks git!) but my time on irc will be less and i will probably group commits more.
  • A skin may hide the videoplayback temporarily, need to find a way to temporarily hide the overlay and possibly discard doing yuv transforms during this time. Not sure if its possible without reconfiguration (on hide and on visible) of the overlay.

Weekly report 9

July 26th, 2010 topfs2 2 comments

Status

  • OMAP Overlay is hooked up to XBMC and works rather well (for a few seconds)
  • Switched the yuv transformation from swscale to the NEON method created by mru.
  • Have created a hack that should get rid of the memcpy, needs major redesign though to get it into trunk though but probably ok to use for beagleboard and possibly tegra to gain some speed while decoding video.

Plan

  • Get to the bottom of the deadlock which makes playback stop after a few seconds
  • Position the overlay properly according to skinning specifications (rotation will not be possible but nearly no skin uses that on video control so no real biggie).
  • Get stuff rendering ontop of the overlay is still needed, so plan is to read through the documentation about the blending modes and how to setup so SGX shines through as supposed to.

Risks

  • Time…. next week will be my final moving week and I need to clear out the entire apartment. Hence I suspect the following week will be hard to get much work done in. Weeks after this I’ll be setup and free in a new apartment  so I’ll take care of the time lost then.

Weekly report 8

July 18th, 2010 topfs2 1 comment

Status

  • Understood how OMAP Overlay works and how it should be done, thanks to måns awesome application omapfbplay.
  • Implemented a VideoRenderer in XBMC which transforms from yuv420p to yuv422p via swscale. I might use the neon optimized from Måns at a later stage but for now I isolate the unknowns to only be OMAP Overlay.
  • Refit the overlay code from omapfbplay to fit into the video renderer I created, it starts the overlay but locks up somewhere.

Plan

  • Fix the remaining issues and actually get video displayed using OMAP Overlay
  • Make the overlay scale and position correctly in the GUI and with respect of the window underneath.
  • Try to get SGX to render to the topmost frame buffer to get the OSD over the overlay. Not sure how this should be done code wise though, I guess open a new EGL Context or Surface is needed?
  • Get rid of the unneeded memcpy’s (might be out of scope since its not problematic for 480p)

Risks

  • Won’t get SGX to render over the overlay.
  • While dvdplayer takes little resources it does do some unnecessary memcpy’s which might take away the possibility of 720p, and getting rid of those memcpy’s will require refactoring a large portion of the rendering of video.

Weekly report 7

July 12th, 2010 topfs2 Comments off

Status

  • Finally gotten the dirty region based rendering solution to work on the Beagle Board. During the week I’ve tried rendering to a framebuffer and render this buffer to the backbuffer but it was incredibly slow but Måns Rullgård (mru) pointed me to track dirty regions from 2 frames back, i.e. what is undefined for 2 frames. This works since normally a graphics just flip the pointers between front and back buffers and as such if we know what we have rendered on both we can know what we need to render to have a perfectly defined backbuffer. While confluence uses lots of fullscreen animations its now getting to the point were its possible to create a snappy low resource skin, and when the old PM3 skin gets updated with the addon changes it might just be the perfect candidate.
  • While many controls are working and marks dirty regions properly there is still a couple of vital ones left, most notably the lists.
  • Tracked down why XBMC in Angstrom used up the entire CPU time thanks to some the wonderful program oprofile and help from Måns. It seems like SDL Audio (which we use for GUI Sounds) constantly poked the audio device, I would take a guess that this is coupled with the problem elupus found with SDL Audio constantly feeding the audio device null data when unused instead of closing the device. For now I have thus disabled GUI sounds on Angstrom in my branch and this dropped CPU from 100% to 10-20%.
  • I have profiled video playback and it seems that while playing 480p content (Big Buck Bunny) we spend most of the time just idleing, thus the decoding is not CPU bound. Some more testing showed that the yuv to rgb transform seems to make the presentation slow and this is because this transform is done in shader. If we disable the yuv to rgb in the shaders presentation is almost whatchable in terms of frames per second. And as my mentor (Michael Zucchi) and Måns have suggested it seems like OMAP overlay (which can do the yuv to rgb transform in hardware) will help alot.

Plan

  • Continue making controls mark their dirty regions. Most importantly make lists mark their region.
  • Make the marking handle the 3D case were controls are rotated.
  • Buffer control groups using frame buffer objects.

Risks

  • Making controls mark their regions is a daunting task, takes quite a while for each control. Lists will be a nightmare and might take lots of time.
  • Buffer control groups using frame buffer objects might not save all to much, need to do some testing regarding this but should hopefully be just a question of moving the code I created for using framebuffer for the entire dirty region solution to the control groups (as a test, need to be better abstracted later).
  • Using OMAP Overlays might be hard in xbmc? Probably low risk and Måns have a sample app called omapfbplay which code could be taken from.

Weekly report 6

July 5th, 2010 topfs2 Comments off

Last week have been a busy week indeed and I have focused on getting the dirty region based rendering to work. I am glad I followed my mentor’s advice and did dirty region before moving to an event based rendering, especially considering that when dirty region rendering works we have gotten an event based rendering, but not a an event based processing. So while CPU could be limited further this should show if its worth doing.

Status

  • Many of the more common controls works and produce fully workable dirty regions, amongst these are MultiImage, Image, Label, Button, Groups (Windows and Dialogs). There are a lot of other controls that work somewhat with some artifacts. Here is a video of the working controls and there respective dirty regions.
  • As can be seen in this video were I have enabled and only render what changes on screen on my workstation, Confluence is almost fully usable. Note that my workstation thankfully copy backbuffer to frontbuffer and as such I can assume the backbuffer to be defined after the flip, sadly this is nonstandard and why I have no video from the beagle board yet.
  • Experimented with a more scalable algorithm for handling the dirty regions after they have been generated here.

Plan

  • Continue to fixup controls to create workable dirty regions.
  • While the dirty regions are created and the clipping works I need to make the backbuffer defined on the BeagleBoard to have it working without flicker. This is a vital goal for this week. We have 3 options, either define the backbuffer via EGL_BUFFER_PRESERVE. Second option is to render the entire interface to a framebuffer object then before flip we render the framebuffer object to the backbuffer. Third option is to render as usual to the backbuffer and before flip we copy the content of the backbuffer to a texture and on the start of the scene we render that texture to the backbuffer. The first option is preferable and I have added code for it and egl seems to state it should be preserved my initial tests on the beagleboard indicates it not being preserved. Second option is probably the most useful on non-embedded since it limits the needed fragment operations more but bump the required GL driver abit (not of concern in GSoC). For this to be an option on the BeagleBoard I need to make sure its ok to create 720p framebuffer objects but I would assume so since max texture size is 2048×2048. The third is more of a fallback as its a bit wasteful but is useful on older graphics driver in the desktop segment, although I would guess copy the frontbuffer to the backbuffer without the need of an intermediate texture through glCopy is more proper.

Risks

  • The biggest problem for next week will be getting a defined backbuffer and is essential for the success of the project.
  • A risk worth mentioning is that the dirty regions might not be as beneficial for the beagleboard as anticipated, this is doubtful however since SGX seems to have software fallbacks on certain rendering stages. So any limitation of the area should bring down CPU usage. My workstation had a significant lighter CPU load with dirty region rendering enabled, on average it was around half. Note that since not all controls are working, as an example RSS control, these numbers should be taken lightly.