XBMC Community Forum  

Go Back   XBMC Community Forum > Development > Scraper Development

Scraper Development Developers forum for meta data scrapers. Scraper developers only!
Not for posting feature requests, bugs, or end-user support requests!

Reply
 
Thread Tools Search this Thread Display Modes
Old 2007-02-12, 11:43   #1
morte0815
Member
 
morte0815's Avatar
 
Join Date: Sep 2004
Location: Germany
Posts: 61
morte0815 is on a distinguished road
Default OFDB scraper

Hi,

This is my first "release" of the ofdb (germen version of imdb) scraper.

The mainfeatures work, but there are some problems:

- Umlauts are not readable (??siteencoding??)

- i parse the genres into individual tags (not only one genre tag; mabe one could change this in the scraper parser code (In databases you should not store lists in attributes )) If this will not be changed, then i will change the parser...

- atm no original title since there is no tag for it
- mpaa is only fetched if there is the movie was in the cinemas... (addition here: maybe some possibility to check if a regex failed. example:

PHP Code:
<regex name"theregextocheck" .../>
<
Regex name="aName" condition="theregextocheck"> <!--this one will only be called if the condition does match--> 
cheers morte
Attached Files
File Type: zip ofdb.zip (15.6 KB, 10 views)
morte0815 is offline   Reply With Quote
Old 2007-02-12, 12:29   #2
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

i have no idea how ofdb expects its urls to be encoded. it certainly does not use normal url encoding (umlauts SHOULD be encoded, it does not accept that).

several <genre> tags, sure. i dont see why this is easier to do than the current / separated list though. as for storing them like that in the db, its to speed up the queries, constructing the / separated string each time takes time....

just add the original title as a tag, we will get it in there eventually.

as for those conditions, they are easily simulated using two regexps + clear="#ofbuffer". let one regexp grab the conditional block, clearing the buffer no matter what. then the next expression will either have nothing to look through (and hence fail), or you did grab something during the first expression so they will succeed....
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Old 2007-02-12, 12:55   #3
morte0815
Member
 
morte0815's Avatar
 
Join Date: Sep 2004
Location: Germany
Posts: 61
morte0815 is on a distinguished road
Default

encoding the urls is the one thing... but the other thing is, that you cant read the results. in the list, where you can choose the the movie the umlauts work like a charm, but e.g. the plot is not readable if there are umlauts in it.

the thing with the genre is given through a db design pattern... it is normal not to store lists in an attribute. it also should make the genre-queries easier, since you do not have to split the result of the genre but only to iterate through all genre tags to find the movies with the needed genre...

the original title will be added, and the condition thing will be testet

thanks for the response

morte
morte0815 is offline   Reply With Quote
Old 2007-02-12, 13:43   #4
sCAPe
Team-XBMC QA Specialist
 
Join Date: Mar 2005
Location: Germany
Posts: 442
sCAPe is an unknown quantity at this point
Default

spiff: Any chance of integrating this ofdb scraper to SVN when its stable and finished? Or should the users download this scraper by themselves?

Don't know how it is planned for all future scrapers?

sCAPe
__________________
My XBOX built into a Sony Hifi CD-Player Case
XBOX Hifi Media Center Picture Gallery

Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


sCAPe is offline   Reply With Quote
Old 2007-02-12, 13:45   #5
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

yes ofc we will stick it in svn.

everything deemed to be of high enough quality will be stuck in svn. the more the merrier
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Old 2007-02-12, 15:48   #6
morte0815
Member
 
morte0815's Avatar
 
Join Date: Sep 2004
Location: Germany
Posts: 61
morte0815 is on a distinguished road
Default

So here is the version with the original title (<originaltitle>)!

It also reorders the title: Matrix, the => The Matrix
If someone does not want this behaviour, then tell me and i write another version..

CYA Morte
Attached Files
File Type: zip ofdb.zip (15.8 KB, 9 views)
morte0815 is offline   Reply With Quote
Old 2007-02-12, 15:54   #7
asg
Member
 
Join Date: Nov 2005
Posts: 80
asg is on a distinguished road
Default

Quote:
<RegExp input="$$1" output="&lt;horst&gt;\1&lt;/horst&gt;" dest="8">
.. horst

Thanks
asciii
asg is offline   Reply With Quote
Old 2007-02-12, 15:56   #8
morte0815
Member
 
morte0815's Avatar
 
Join Date: Sep 2004
Location: Germany
Posts: 61
morte0815 is on a distinguished road
Default

well, thats some debugging tag... forgot to delete it.. but as long as it works...

AND NO: MY NAME IS NOT HORST!

cheers morte
morte0815 is offline   Reply With Quote
Old 2007-02-12, 17:31   #9
Solo0815
Senior Member
 
Join Date: Sep 2004
Posts: 214
Solo0815 is on a distinguished road
Default

THX!
I'll try it later this evening and report back!

BTW: Horst is a nice name for a debugging tag Are there Karl and Hans in the script, too *LOL* jk
__________________
there are only 10 sorts of people: those who understand binary and those who don't
Solo0815 is offline   Reply With Quote
Old 2007-02-12, 18:06   #10
Solo0815
Senior Member
 
Join Date: Sep 2004
Posts: 214
Solo0815 is on a distinguished road
Default

Quote:
encoding the urls is the one thing... but the other thing is, that you cant read the results. in the list, where you can choose the the movie the umlauts work like a charm, but e.g. the plot is not readable if there are umlauts in it.
Thats the only "bug" i noticed so far. Nice scraper!
__________________
there are only 10 sorts of people: those who understand binary and those who don't
Solo0815 is offline   Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 09:36.


Protected by Akismet, We recommend WordPress blogs
Copyright © 2008, XBMC Project