XBMC Community Forum  

Go Back   XBMC Community Forum > Development > Scraper Development

Scraper Development Developers forum for meta data scrapers. Scraper developers only!
Not for posting feature requests, bugs, or end-user support requests!

Reply
 
Thread Tools Search this Thread Display Modes
Old 2008-12-04, 00:59   #1
Bleckshire
Junior Member
 
Join Date: Nov 2008
Posts: 18
Bleckshire is on a distinguished road
Default AdultDVDEmpire Scraper

After helping artik with his Excalibur Films scraper, I learned a lot more about regexp coding and scrapers in general so I was able to finish my AdultDVDEmpire scraper. This scraper will retrieve the following info:

- Film Title along with box cover.
- Production year and film studio.
- Rating (which should always be XXX, but I set it to pull anyway).
- Film Director
- Film Genres/Categories (All categories are pulled if a film fits into multiple ones).
- Film Actors/Actress along with thumbnails for each star if available.
- Film runtime.
- Film plot/tagline. *

* About the plot and tagline. Some films have just plots, some have just taglines, and some have both. Since the taglines come first in the code, I've set it to try and pull just a tagline first then it tries to pull just a plot, then tries to pull a plot if it has a tagline before it in the code. This should work most of the time but their are a few films which it fails on. Another thing is it will fail to pull the complete plot if the plot itself has a '<' bracket in it. For instance: "This here is a plot about an <b>AWESOME</b> movie!" There are a few plots like that and in that case it will pull everything up to 'an'. I tried to figure out a way around this but couldn't and finally settled with just having it pull as much as it can if it has a case like that. If it fails otherwise, it's most likely due to that particular movie having weird coding (which I've also run into during testing).

This script pulled a good majority of my collection on first try. I do have a few low budget films that it couldn't find but it was hard to even find those using google, so I'm happy either way. Here's some shots taken off my xbox with the MediaStream skin and the script:







Code:
<scraper name="Adult DVD Empire" content="movies" thumb="adultdvdempire.jpg">
<NfoUrl dest="3">
<RegExp input="$$1" output="&lt;url&gt;http://www.adultdvdempire.com/itempage.aspx?item_id=\1&lt;/url&gt;" dest="3">
<expression noclean="1">adultdvdempire.com/itempage.aspx?item_id=([0-9]*)</expression>
</RegExp>
</NfoUrl>

<CreateSearchUrl dest="3">
<RegExp input="$$1" output="&lt;url&gt;http://www.adultdvdempire.com/SearchTitlesPage.aspx?SearchString=\1&lt;/url&gt;" dest="3">
<expression noclean="1"></expression>
</RegExp>

</CreateSearchUrl>

<GetSearchResults dest="6">
<RegExp input="$$5" output="&lt;?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; standalone=&quot;yes&quot;?&gt;&lt;results&gt;\1&lt;/results&gt;" dest="6">
<RegExp input="$$1" output="\1" dest="4">
<expression>&lt;a href=&quot;itempage.aspx?item_id=([0-9]*)[^&gt;]&gt;</expression>
</RegExp>
<RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url&gt;http://www.adultdvdempire.com/itempage.aspx?item_id=\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
<expression repeat="yes">ListItem_ItemTitle&quot;&gt;&lt;a href=[^=]*=([0-9]*)[^&gt;]*&gt;([^&lt;]*)</expression>
</RegExp>
<expression noclean="1"></expression>
</RegExp>
</GetSearchResults>

<GetDetails dest="3">
<RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
<RegExp input="$$1" output="&lt;thumb&gt;http://images2.dvdempire.com/res/movies/\1h.jpg&lt;/thumb&gt;" dest="5">
<expression>BoxCover_Container&quot;&gt;[^&gt;]*&gt;&lt;img src=&quot;http://images2.dvdempire.com/res/movies/([^m]*)</expression>
</RegExp>

<RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;" dest="5+">
<expression>Item_Title&quot;&gt;([^&lt;]*)</expression>
</RegExp>

<RegExp input="$$1" output="&lt;studio&gt;\1&lt;/studio&gt;" dest="5+">
<expression>StudioProductionRating&quot;&gt;([^&lt;]*)</expression>
</RegExp>

<RegExp input="$$1" output="&lt;year&gt;\1&lt;/year&gt;" dest="5+">

<expression>Year: ([0-9]*)</expression>
</RegExp>

<RegExp input="$$1" output="&lt;tagline&gt;\1&lt;/tagline&gt;" dest="5+">
<expression>InfoTagLine&quot;&gt;([^&lt;]*)</expression>
</RegExp>

<RegExp input="$$1" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="7">
<expression clear="yes">Item_InfoContainer&quot;&gt;[^ ]*([^&lt;]*)&lt;</expression>
</RegExp>

<RegExp input="$$1" output="&lt;plot&gt;\1&lt;/plot&gt;" dest="5+">
<expression>Item_InfoContainer&quot;&gt;[^&gt;]*&gt;[^&lt;]*&lt;/span&gt;[^ ]*([^&lt;]*)&lt;</expression>
</RegExp>

<RegExp input="$$1" output="&lt;actor&gt;&lt;name&gt;\2&lt;/name&gt;&lt;thumb&gt;http://images.dvdempire.com/pornstar/actors/\1.jpg&lt;/thumb&gt;&lt;/actor&gt;" dest="5+">
<expression repeat="yes">cast_id=([0-9]*)[^t]*type=1&quot;[^&gt;]*&gt;([^&lt;]*)</expression>
</RegExp>


<RegExp input="$$1" output="&lt;genre&gt;\1&lt;/genre&gt;" dest="5+">
<expression repeat="yes">media_id=[^i]*item_id=[^&gt;]*&gt;([^&lt;]*)</expression>
</RegExp>


<RegExp input="$$1" output="&lt;runtime&gt;\1&lt;/runtime&gt;" dest="5+">
<expression>&gt;Length: ([^&lt;]*)&lt;</expression>
</RegExp>

<RegExp input="$$1" output="&lt;mpaa&gt;\1&lt;/mpaa&gt;" dest="5+">
<expression>&gt;Rating: ([^&lt;]*)</expression>
</RegExp>

<RegExp input="$$1" output="&lt;director&gt;\1&lt;/director&gt;" dest="5+">
<expression repeat="yes">type=4&quot;&gt;([^&lt;]*)</expression>
</RegExp>
<expression noclean="1"></expression>
</RegExp>
</GetDetails>
</scraper>
To spiff, if you read this: I submitted a ticket for this already.

Last edited by Bleckshire; 2008-12-04 at 01:15.
Bleckshire is offline   Reply With Quote
Old 2008-12-06, 03:09   #2
lovedaddy
Junior Member
 
Join Date: Oct 2003
Location: Sunderland, UK
Posts: 18
lovedaddy is on a distinguished road
Send a message via ICQ to lovedaddy
Default

Nice!
lovedaddy is offline   Reply With Quote
Old 2009-01-24, 18:55   #3
NotShorty
Senior Member
 
NotShorty's Avatar
 
Join Date: Feb 2007
Posts: 117
NotShorty is on a distinguished road
Default

Thanks man! JadedVideo scraper lacked plots the last time I used it. And yes, plots/summaries can be a very useful feature of an adult library (though not as cool as cross-referencing by actress).

NS
NotShorty is offline   Reply With Quote
Old 2009-02-09, 12:31   #4
Anacotic
Member
 
Join Date: Jan 2009
Posts: 30
Anacotic is on a distinguished road
Default

I donīt get any Infos from this scraper, No Images, No Plot, No Actors, nothing...
Anacotic is offline   Reply With Quote
Old 2009-03-09, 20:39   #5
gongloo
Junior Member
 
Join Date: Jan 2009
Location: Quincy, MA, USA
Posts: 1
gongloo is on a distinguished road
Send a message via ICQ to gongloo Send a message via AIM to gongloo Send a message via MSN to gongloo Send a message via Yahoo to gongloo Send a message via Skype™ to gongloo
Thumbs up

Quote:
Originally Posted by Anacotic View Post
I donīt get any Infos from this scraper, No Images, No Plot, No Actors, nothing...
I found the same as well. I've fixed the issue on my end and submitted a patch[1]. Hopefully this will make it to SVN soon!

[1] http://xbmc.org/trac/ticket/6047
gongloo is offline   Reply With Quote
Old 2009-03-09, 20:54   #6
vdrfan
Team-XBMC Developer
 
Join Date: Jan 2008
Location: Germany
Posts: 1,280
vdrfan is on a distinguished road
Default

Quote:
Originally Posted by gongloo View Post
I found the same as well. I've fixed the issue on my end and submitted a patch[1]. Hopefully this will make it to SVN soon!

[1] http://xbmc.org/trac/ticket/6047
It's in r18354. Cheers!
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.


vdrfan is offline   Reply With Quote
Old 2009-03-30, 21:29   #7
Bleckshire
Junior Member
 
Join Date: Nov 2008
Posts: 18
Bleckshire is on a distinguished road
Default

Quote:
Originally Posted by gongloo View Post
I found the same as well. I've fixed the issue on my end and submitted a patch[1]. Hopefully this will make it to SVN soon!

[1] http://xbmc.org/trac/ticket/6047
ADE must have altered their layout a bit and I didn't notice (not to mention I haven't been on the forums for a while). Thanks for the fix, gongloo.
Bleckshire is offline   Reply With Quote
Old 2009-03-31, 14:10   #8
Bleckshire
Junior Member
 
Join Date: Nov 2008
Posts: 18
Bleckshire is on a distinguished road
Default Update

I noticed after gongloo fixed the script for those changes ADE made, it screwed up the plot detection. Been messing around with it all night and changed up a few things:
- The script still pulls the same info as before. Title, front cover, now pulls the back cover as well, production year, studio, director, all actresses and actors as well as all genres and categories (great for sorting by star or style of porn), runtime, and plot + tagline.

As before, not every film has both a plot and tagline but the way ADE coded it was a little difficult to scrape. I've now got it to pull both tagline and plot if a film has both or pull just the plot if the flim doesn't have a tagline and only a plot. I haven't run across any films that have ONLY a tagline. I've just seen both, just plot, or nothing. If you run across one that has only a tagline, nothing will be pulled. Seems rare though if not non-existent. Also, just like before, plots will be scraped completely unless the plot uses HTML tags. It will then be scraped up until the first HTML tag it hits. Taglines should be complete all the time.

http://xbmc.org/trac/ticket/6215


Last edited by Bleckshire; 2009-03-31 at 14:19. Reason: Added ticket.
Bleckshire is offline   Reply With Quote
Old 2009-08-05, 16:54   #9
nc88keyz
Member
 
Join Date: Feb 2008
Posts: 57
nc88keyz is on a distinguished road
Default

broken - box covers

looks like url might have changed to:

Code:
http://images2.dvdempire.com/res/movies/1/
checked all r/w permissions.

might want to check it out.

r21936 / all skins.

If you rescrape it loses cover as well.
nc88keyz is offline   Reply With Quote
Old 2009-08-05, 17:02   #10
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,714
spiff is on a distinguished road
Default

see the sticky in this very forum
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 09:42.


Protected by Akismet, We recommend WordPress blogs
Copyright © 2008, XBMC Project