XBMC Community Forum  

Go Back   XBMC Community Forum > Development > Scraper Development

Scraper Development Developers forum for meta data scrapers. Scraper developers only!
Not for posting feature requests, bugs, or end-user support requests!

Reply
 
Thread Tools Search this Thread Display Modes
Old 2007-07-11, 00:26   #1
blaize
Member
 
Join Date: Oct 2006
Posts: 63
blaize is on a distinguished road
Default AsianDB.com scraper

Hi, i'm working on a Asiandb.com scraper, continuing the work of 'esd'.
so far quite a few things work, but i've still got a few little problems.

- The plot outline is loading what should be the plot, asiandb.com doenst have a plot outline section really.

- with some movies it does load the plot outline (plot) but then it doesnt load the actors, even though they are available on the site, and are loaded fine with other movies.

i'll be pretty much happy when i get those to work, i've also havent got MPAA rating to work, but i dont really care that much about that one atm.

any help would be greatly appreciated.

here is the current XML.

AsianDB.com Scraper
blaize is offline   Reply With Quote
Old 2007-07-14, 17:41   #2
blaize
Member
 
Join Date: Oct 2006
Posts: 63
blaize is on a distinguished road
Default

ok, to bad no-one has offered any help so far.

i didnt work on this for a few days, but today i continued it again.
there seems to be a problem somewhere.

i got the plot to appear in the plot section (as it should ofcourse).

I'm mainly testing this scraper with 2 movies "the classic" and "peppermint candy" since those 2 movies have all their info fille in on the site.

but eventhough they both have all the info, with peppermint candy the plot gets downloaded, but the actors dont.
and with The classic, i get the actors, but no plot.

i went through the script quite a few times but cant seem to find what causes this.

any help, ideas, suggestions etc would be great, even if it's just a guess.
you can check my previous post for the script, thanks.

-Blaize
blaize is offline   Reply With Quote
Old 2007-07-14, 22:11   #3
blaize
Member
 
Join Date: Oct 2006
Posts: 63
blaize is on a distinguished road
Default

ok, again a small update.

right now everything except plot works.
would be great if someone else could fix this since i'm pretty much lost.
here's what i've got right now, but i'm not really sure if i'll continue it since no-one has offered any help, and i'm stuck myself.

hopefully it can be useful to people:

http://pastebin.com/md64eb9f
blaize is offline   Reply With Quote
Old 2007-07-15, 13:40   #4
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

i'd suggest

1) putting the plot grabbing regexp inside outermost one - stuffing things into buffer 5 after it has been transfered into the final buffer (3) is your main issue.

2) this version instead
Code:
<RegExp input="$$1" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="5+">
 <expression trim="1">Synopsis&lt;/td&gt;&lt;/table&gt;&lt;div[^&gt;]*&gt;&lt;table[^&gt;]*&gt;&lt;td[^&gt;]*&gt;&lt;img[^&gt;]*&gt;(.*)&lt;/td&gt;&lt;/table&gt;&lt;/div&gt;&lt;p&gt;</expression>
</RegExp>
note that i grab the synopsis and not the introduction.

oh, and please do not pm me asking for help - i read the forums quite frequently and will respond when i have the time / if i feel like it. i get a lot of these and they bump my grump factor
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.

Last edited by spiff; 2007-07-15 at 13:49.
spiff is offline   Reply With Quote
Old 2007-07-15, 14:38   #5
blaize
Member
 
Join Date: Oct 2006
Posts: 63
blaize is on a distinguished road
Default

thanks spiff, but i dont understand point 1.

i've put your version of the plot regexp, and it did the trick with 1 or 2 movies.
the rest still dont get the plot.

if it's not to much to ask, could you perhaps make the required changes to the scraper version posted above ?

thanks

-blaize
blaize is offline   Reply With Quote
Old 2007-07-15, 16:43   #6
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

http://pastebin.ca/620400

notice that i moved the plot inside the
<RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Old 2007-07-15, 18:17   #7
blaize
Member
 
Join Date: Oct 2006
Posts: 63
blaize is on a distinguished road
Default

thanks Spiff, but i'm sorry to say that the problem isnt fixed
still only a few (3/4) movies get their plot, the rest doesnt, evethough with quite a few the info is available on the website.

like with the movie "April snow"
blaize is offline   Reply With Quote
Old 2007-07-15, 18:27   #8
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

i never said i fixed your expressions - i fixed what you had in there.

anyways, the problem is obvious. some pages has synopsis, some have synopsis and introduction and some have only introduction.

solution: add another expression after the synopsis one that grabs the introduction block. if the page has a synopsis the first <plot> tag will be used. if it doesnt, we'll use the introduction text.
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Old 2007-07-15, 19:16   #9
blaize
Member
 
Join Date: Oct 2006
Posts: 63
blaize is on a distinguished road
Default

Great!, it works

thanks for the help Spiff, couldn't have finished it without you of course
blaize is offline   Reply With Quote
Old 2007-07-15, 19:18   #10
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

cool - now tidy it up (for starters you should include the year in the returned search results as the imdb scraper does). when you think its ready submit a patch at sf. cheers
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 02:34.


Protected by Akismet, We recommend WordPress blogs
Copyright © 2008, XBMC Project