XBMC Community Forum  

Go Back   XBMC Community Forum > Development > Scraper Development

Scraper Development Developers forum for meta data scrapers. Scraper developers only!
Not for posting feature requests, bugs, or end-user support requests!

Reply
 
Thread Tools Search this Thread Display Modes
Old 2008-10-16, 11:00   #1
journey4712
Junior Member
 
Join Date: Oct 2008
Posts: 6
journey4712 is on a distinguished road
Default docuwiki.com scraper in development

Greetings.

I've started working on a docuwiki.com scraper. Even though its a wiki its got a fairly static structure that can be scraped most of the time. I've got the basics down(extracting the title/year/narrator and the episode titles/plots(if they exists). I'm running into an issue though of being unsure how the documentaries should be organized, since they are somewhere between tvshows and movies.

They are like tvshows in the *some* of them come in multi-part series. They are like movies in that a good number are only single part though. What i'm not sure of is how they need to be organized to be scanned into xbmc with the single parters being recognized as single part documentaries(movies esentially), and the multi-parts being recognized as a show with episodes. Do i need to seperate them into different source directories with different scrapers? a "movie" scraper for the single part documentaries, and a "tvshow" scraper for the multi part? would be essentially the same scraper so seems a bad hack to do it that way.

The next problem relates to advancedsettings.xml. When creating the regexp for <tvshowmatching> it seems it needs to detect the season and the episode from each name. The main problem though is that documentaries dont have a season, for simplicity sake my scraper currently outputs all episodes as part of season 1. Documentaries are usually Name.XXofYY.EpisodeTitle.quality.ripgroup.avi. How can i recognize 2of3 or 5of9 as being Season 1 episode 2, or season 1 episode 5 based off those file names? Its escaping me beacause i need to capture a 1 for the season but there is not reliably a "1" present in the names.

Basically, it seems documentaries dont fit in very well as movies or as tv shows, any pointers to getting this done would be well apreciated. Or should i be submitting a trac ticket for a third type of video file, movies, tvshows, and the new documentaries? I'm not particularly excited to submit a trac ticket for this though because i imagine it could be a few months before anything solid happens(if ever) in respect to a new type of video file.

journey4712
journey4712 is offline   Reply With Quote
Old 2008-10-16, 11:18   #2
journey4712
Junior Member
 
Join Date: Oct 2008
Posts: 6
journey4712 is on a distinguished road
Default

Another thing i forgot to mention regarding filenames. Documentaries cataloged at docuwiki use the following naming convention for extras:

Name.1of3.title.avi
Name.2of3.title.avi
Name.3of3.title.avi
Name.Extras.1of2.title.avi (or sometimes Name.Extras.1.title.avi)
Name.Extras.2of2.title.avi (or sometimes Name.Extras.2.title.avi)

This also creates complications in matching if trying to use 1of3 as season 1 ep 1, because the extras get seen as being part of the season 1 as well. I'm starting to wonder, is there anyway i can reference episodes in the scraper by title instead of by episode? Every documentary i've looked at on docuwiki follows that same naming format(filename is available on docuwiki), always having the title of the episode. In most cases docuwiki doesn't even show episode numbers with the episode descriptions, just the title of the episode and then the plot, followed by the next episode.

For an example page, look at A History of Britain or Battleplan. Unfortunatly they arn't exactly the same as battleplan uses episode numbers in the episode names, and a history of britain(along with most of the site) does not.

journey4712
journey4712 is offline   Reply With Quote
Old 2008-10-27, 19:15   #3
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

hi,

sorry for the late reply this somehow went past my attention.

to support this properly we need to define some semantics and a new content type for documentaries. from what i can see we need added logic for
1) support 'movie' and 'tvshow' documentaries. this should be quite easy to add, it would just be some flags in the returned xml from GetSearchResults.
2) some additional rules for extras-naming. i guess this should be done in a general way to support movies and tvshows as well.
3) possibly support for scraping episodes by filenames - this one we want to avoid if possible (but certainly doable if we deem it necessary).

but in any case, we will have to return to this after atlantis. nag me then
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Old 2008-12-05, 00:16   #4
voiddreamer
Junior Member
 
Join Date: Nov 2008
Posts: 9
voiddreamer is on a distinguished road
Default How is this going?

I'm very interested in this and had thought of making my own. Any update on this?
voiddreamer is offline   Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 15:20.


Protected by Akismet, We recommend WordPress blogs
Copyright © 2008, XBMC Project