PDA

View Full Version : Scraping inconsistency scrap.exe/xbmc?


ezd
2007-06-16, 19:29
I'm making a scraper for AsianDB.com. It seems to work flawlessly under scrap.exe, but XBMC misses a lot of info it retrieves. Here's an example details XML output:

<details>
<title>Violent Cop</title>
<year>1989</year>
<director>Takeshi Kitano</director>
<runtime>103mins</runtime>
<thumb>http://www.asiandb.com/data/title/mini/4141.jpg</thumb>
<rating>7</rating>
<votes>3</votes>
<genre>Action</genre>
<genre>Crime</genre>
<credits>Takeshi Kitano</credits>
<credits>Hisashi Nozawa</credits>
<actor>
<name>Takeshi Kitano</name>
</actor>
</details>


XBMC doesn't extract the director, genre, credits (correct way to enter writers?) and actors, but does get all other items.

Is there a bug in my XML output? (Note: pretty-printed for readability, no extra whitespace in actual XML)

Also, pressing X+Y during boot did get me in debug mode, but didn't tell much about the scraping process. Is there a method (like in the old days :grin: ) to set the debuglevel to 'insane' or similar?

Thanks for any help you can give,

ezd

ezd
2007-06-16, 20:24
For reference, I've upped the current asiandb.xml (http://pastebin.com/930316) to pastebin.

spiff
2007-06-18, 08:41
i'm on a conference this week, so this post is only to say that i cannot see anything wrong at first glimse. i hardly have inet accessiblity so i have to wait until i get back home to investigate.

ezd
2007-06-18, 19:57
Thanks for the heads up, no hurry here, mostly did this for the Greater Xbmc Good :)

Enjoy your conference!

spiff
2007-06-26, 16:40
before each of those you have regexp's that grabs the relevant pieces of the html. on those you don't specify 'noclean="1"' and hence all html tags are stripped off. i guess the scrap.exe doesnt honor this.

blaize
2007-07-08, 15:13
any progress on this scraper ?
i really need this one. :laugh:

spiff
2007-07-09, 10:56
then i suggest you finish it

blaize
2007-07-09, 11:34
wise-ass... if i could don't you think i would ?
some people have learned them selfs programming skills, other artistic skills.

spiff
2007-07-09, 11:38
it doesnt take programming skills. that's the whole reason i created the scraper system. it only takes some logic and reading a 10 min regexp guide.

blaize
2007-07-09, 11:57
if you think it's that easy for everyone, then why is 'esd' having problems with it ?
I'm pretty much code-blind, but if you (or anyone else) could give a little help i might give it (another) try.

spiff
2007-07-09, 12:07
ezd had done a simple screwup which i explained.

i'll answer specifics.

blaize
2007-07-10, 23:08
sorry for the triple post, editing post doest seem to work for me for some reason.
a mod can combine/delete the posts if they feel the need.

the changes i made were pretty much only adding noclean="1" on the right places.
i also tried that with the stuff thats still not working (tagline, plot, cast, MPAA rating) but that didnt change anything.
so i edited those wrong or something else is wrong that i'm missing.

-blaize

pike
2007-07-10, 23:24
how the heck did you manage ? the 2 posts are 10 (TEN) minutes apart!

blaize
2007-07-10, 23:27
i know, i went back a page (history) but because i'm walking bcak and forth my PC and box i got confused and though i pressed edit (still cant find that button >_>)
thats how i reposted it.

Actors are working now, a stupid typo :sniffle:

ezd
2007-07-22, 16:59
Sorry for the slow reply, been away for a while, thanks Spiff for your reply. Still had a strange problem with XBMC hanging when I enabled plot extraction, but no problem in the new build, so I've upped the scraper on Sourceforge for inclusion.

For those in a hurry to enable Asiandb:

https://sourceforge.net/tracker/?func=detail&aid=1758452&group_id=87054&atid=581840

Cheers,

ezd

blaize
2007-07-22, 21:54
then Grab the plot extraction from my scraper (check the SVN)

it works fine.
(remember that with some movies it's called synopsis, and others introduction, hence the double expressions)

ezd
2007-07-23, 22:52
The plot issue had something to do with the build, as a later build worked fine with the same scraper.

Glad you corrected the open issues in my scraper, the actors and cast were working already though? Anyway, if the scaper in SVN is working all is well...