OSX Gets H.264 Acceleration
Apple has long been known to be tapping the GPU for accelerated video decode under OSX with their private applications such as iTunes or QuickTime player. Even with the AppleTV, there are there hints of some sort of CPU/GPU decoding deep down inside the private OS frameworks. The release of Snow Leopard (10.6) showed even more usage of the GPU for video decode and this was seen directly by the large reduction in CPU usage when playing video content using Apple’s players. But this magic API remained hidden and private, available only to Apple’s own applications.
Fast forward to about two weeks ago and the big news was the Apple vs Adobe verbal fight about Flash. Adobe Flash has always been a CPU sucker on OSX, even for SD Flash video content and Apple made that point very clear. Then something mysterious happened. Very quietly, with zero fanfare, Apple posts Technical Note TN2267 carrying the title “Video Decode Acceleration Framework Reference”. Holy Cow, the elusive accelerated video decode API just popped into the open. This API is called VDADecoder and is available on Mac models equipped with the NVIDIA GeForce 9400M, GeForce 320M or GeForce GT 330M running OSX 10.6.3. Here’s a list of supported models:
- MacBook (Aluminum) shipped between October, 2008 and June, 2009.
- MacBooks shipped after January 21st, 2009.
- Mac Minis shipped after March 3rd, 2009.
- MacBook Pros shipped after October 14th, 2008.
- iMacs which shipped after the first quarter of 2009.
We are very happy to announce that Apple’s new API (VDADecoder) for exposing GPU accelerated H.264 decoding under OSX 10.6.3 is present and fully functional in svn (r29729 and above) trunk builds of XBMC for Mac, and will appear in the next stable version. Now, non-CrystalHD equipped XBMC for Mac users can enjoy the pure pleasure of accelerated H.264 decoding that was enjoyed by XBMC for Linux (VDPAU/VAAPI/CrystalHD) and XBMC for Windows (DXVA/CrystalHD) users. The VDADecoder API will handle all H.264 video content including that in m2ts containers. The last bit is very important as it means accelerated video decoding of decrypted Bluray m2ts files that are in H.264 format. Bluray can be one of three formats, VC1, MPEG-2 and H.264, with about 75 percent being in H.264 format. There is no word yet on accelerated VC1 decoding but since VDADecoder interfaces to low level Nvidia VP3 functions which can handle H.264 AND VC1/MPEG-2 content, we might see VC1/MPEG-2 support added to VDADecoder API at later date.
Also, while performance is fantastic with VDADecoder, there is still room for improvement. Running XBMC for Mac under Shark (Apple’s profiler) shows that about half of the CPU usage is now due to copying and converting the video frame from UYVY422 (VDADecoder’s native format) to YUV420P (XBMC’s internal format). Future work on the XBMC rendering path will allow passing decoded frames directly up to our renderer and thus skipping the copy/convert step. Stay tuned :)
Will this work now with the Apple TV as well, or only on Snow Leopard equipped computers?
Either way this is definitely a major get.
Awesome work as ever guys. I hope Apple will add support for other formats like you said, but I’m not holding my breath.
You guys rock! My mini is eternally grateful!
@Joe Papps – No, this only applies to OSX 10.6.3 with the Apple devices listed above. For Apple TV you need to install the Broadcomm CrystalHD card to get h.264 hardware decoding.
http://www.logicsupply.com/products/bcm970012
As of February 9th, Logic Supply was/is offering $10 off to XBMC users. Use the code: “XBMC10″.
Man I wish I had a newer Mini… this will do me no good. Super happy though that you guys are making these improvements though! Some day, when I get my HTPC, I will benefit :D
@Joe Papps Sorry to be said, although i’m not one of devs nor Apple, Apple TV is not supported. Apple TV is based on old nvidia GPU, something like NVIDIA 6000 series or 7000 series. While Apple API just support NVIDIA 9400 and later.
Even new iMac with ATI still not supported with the new ATI, may need some more time for Apple to make bridge between NVIDIA instruction to ATI instruction.
I really hope for a quick stable release. The feature is amazing.
YES! Finally……about time Apple…as a side note, I hope Adobe will use the library to kill the CPU suction process in Flash…
Great work team, just great work! Thank you.
Will this mean another method for OS X to perhaps more easily integrate CrystalHD support for AppleTV?
@peprasetya So I was about to purchase Crystal’s HD decoder fot my Apple TV, what do you recommend, should I go ahead and get it? Thank you all!
Never mind, for 10.6.3+ and Nvidia only.
This is great news. Now we only need adaptive refresh rate switching for the OS X :) And an AppleTV which has a video card which is supported by VDADecoder.
I second this!
The AppleTV uses a Nvidia 7300. Nvidia’s VP3 API does not support 7xxx series GPUs so don’t look for VDADecoder appearing under the AppleTV even if you could possibly get 10.6.3 booted on it. VDADecoder required a GeForce 9400M, GeForce 320M or GeForce GT 330M. This article clearly states that requirement.
third!
Great work! Just need a nightly build.
Wow, 2 weeks thats very very fast. So finally the Apple users join the ranks of the accelerated video watchers in XBMC. Now the only platfor left out is the xbox. :-)
For those wanted to know, VDADecoder CAN be set to output YUV420P. The reason for not doing this is we are dynamically loaded the VDADecoder framework and the setup to define the destinationImageBufferAttributes fails with an unresolved symbol at runtime. Since XBMC for Mac is compiled to run under 10.4/10.5/10.6/AppleTV platforms we cannot statically link to the VDADecoder framework which is only present under 10.6 SDK. Also by dynamically loading the framework at runtime we can determine if VDADecoder is supported or not. The ability to run the same binary on 10.4/10.5/10.6/AppleTV platforms out weights the performance gain. We don’t have to worry about users running the wrong binary under the wrong platform. And besides the performance difference is less than 15 percent total CPU on direct output of YUV420P vs output and convert of UYVY422. In other words, not a whole lot.
I dont run xbmc on a mac, but this is exciting. I’m glad to hear things are progressing well :)
@davilla Can we workaround the unresolved symbol issue?
The link to the Technical Note doesn’t work, this one is working:
http://developer.apple.com/mac/library/technotes/tn2010/tn2267.html
Yes, I know… :) I was referring to the adaptive refresh rate switching… ;)
@jjgod
Possible but you will still get hit with consuming CPU time with the copyback of picture frame data from GPU space to CPU space. This copyback is about 1/2 the overhead of the copy/convert operation. The correct way to do this is to pass the reference to the picture frame data up to the renderer and use it there. Since it’s already in GPU space, the picture frame “upload” will be super fast.
Funny… my Mac Mini was ordered exactly on Mar 3, 2009, so I guess I’ll try it!
@davilla I see, so the optimum way will be retain the CVImageBufferRef returned by decode callback and use it directly for XBMC rendering?
BTW, there seem to be more A/V sync issues surfaced than software decoder, any word on this part?
Great Work! This works perfectly on one of my minis, but I can’t seem to get it right on the other.
They are 2 different versions, a 2.53 ghz with nvidia 9400 2 gb ram, and a 2.00 ghz, nvidia 9400, 1 gb ram.. VDAD seems to be fine on the faster one, never really eclipsing about 50% cpu usage, and not dropping frames (though it struggled with the killa_sample) .. but when I play 1080p content on the slower mini, CPU usage is around 75% and it drops a wack load of frames (in Avatar 1080p, Galapagos 1080p, and Planet Earth 1080p) .. I tried every SVN build since May 1st and I get the same problem… Is the hardware just too slow? am I doing something wrong?? I JUST upgraded to snow leopard on the slower mini.. could that be the reason? do I perhaps need to do a clean install?
Thanks in advance for any advice. I’ve got the mini running in dual boot now with Ubuntu.. VDPAU works exceptionally well and XBMC looks absolutely amazing! Would love to get ‘er going in OSX too!
@jjgod
I don’t see any a/v sync issues with XBMC implementation of VDADecoder nor have I heard of any. If you are seeing these, please make a post to the forums with details so I can investigate it. I do see that Plex is suffering from them but that’s because they have not kept up to date with our DVDPlayer/DVDPlayerVideo/Renderer changes and are using a source code base that is more than a year old.
@jfx
Make sure you are running 10.6.3, VDADecoder is not present before this version. Type an ‘o’ during playback, you should see the decoder listed as ‘vda-h264′.
help me @davilla ! haha you may be the only one who can! :)
@davilla
Thanks for the reply! Just checked and I am indeed running 10.6.3, and when I hit “o” during 1080p playback I can see the “vda-h264″.. So it would appear that everything is working! yet it still drops a lot of frames, CPU usage is up around 75%… Hmmm… and its very jerky in spots..
Any other ideas?
@jfx
please take this to the forums and create a post, I’m not about to start debugging this here :)
Davilla: Ok, will do.. only trouble is.. the mini is at a different location. I don’t have a debug log as of now :(
@davilla
There it is! Thanks again for your help! :)
http://forum.xbmc.org/showthread.php?t=74020