XBMC Community Forum  

Go Back   XBMC Community Forum > Development > Scraper Development

Scraper Development Developers forum for meta data scrapers. Scraper developers only!
Not for posting feature requests, bugs, or end-user support requests!

Reply
 
Thread Tools Search this Thread Display Modes
Old 2008-05-28, 21:30   #1
floohh
Junior Member
 
Join Date: May 2008
Posts: 5
floohh is on a distinguished road
Default filmstarts.de scraper development - help needed

Hi guys,

im currently developing around a http://filmstarts.de scraper. After hours I managed to get the XBMC to recognize the search results of filmstarts, but when i try to get the details page, XBMC requests a empty url. I think there is a failure in my RegExp Code

And here we go:

The Filmstarts-HTMl looks like:

HTML Code:
<li><a href="/kritiken/35848-Der-Fluch-von-Darkness-Falls.html">
<img alt="" src="http://thumbs.filmstarts.de/nano/DerFluchVonDarknessFalls_poster_1.jpg">
<span class="r"> <img src="/designs/default/images/ratings/310er.gif" alt="Wertung: 3 / 10">

</span>
<span class="t">Der Fluch von Darkness Falls</span>
<span class="g">Teenie-Horror</span> </a></li>


<li><a href="/kritiken/36232-Fluch-der-Karibik.html">
<img alt="" src="http://thumbs.filmstarts.de/nano/fluchderkaribik-poster1.jpg">
<span class="r"> <img src="/designs/default/images/ratings/910er.gif" alt="Wertung: 9 / 10">
</span>
<span class="t">Fluch der Karibik</span>
<span class="g">Abenteuer</span> </a></li>


<li><a href="/kritiken/37419-Blueberry-und-der-Fluch-der-D%E4monen.html">

<img alt="Blueberry und der Fluch der Dämonen" src="/designs/default//images/no_film_small.gif" height="44" width="30">
<span class="r"> <img src="/designs/default/images/ratings/610er.gif" alt="Wertung: 6 / 10">
</span>
<span class="t">Blueberry und der Fluch der Dämonen</span>
<span class="g">Fantasy-Action</span> </a></li>
and my RegExp is:

Code:
<GetSearchResults dest="3">
<RegExp input="$$5" output="<?xml version="1.0" encoding="iso-8859-1" standalone="yes"?><results>\1</results>" dest="3">
<RegExp input="$$1" output="<entity><title>\2</title><url>http://www.filmstarts.de/\1</url><id>\1</id></entity>" dest="5">
<expression repeat="yes"><a href="/kritiken/([-.%\w]+)">[^<]|[\n]<span class="t">([-%. \w]+)</span></expression>
</RegExp>
<expression noclean="1"></expression>
</RegExp>
</GetSearchResults>
(I know the entities aren't converted, but I decoded them for better understanding)

The most important line is:

Code:
<expression repeat="yes"><a href="/kritiken/([-.%\w]+)">[^<]|[\n]<span class="t">([-%. \w]+)</span></expression>
to recognize

HTML Code:
<a href="/kritiken/35848-Der-Fluch-von-Darkness-Falls.html">
<img alt="" src="http://thumbs.filmstarts.de/nano/DerFluchVonDarknessFalls_poster_1.jpg">
<span class="r"> <img src="/designs/default/images/ratings/310er.gif" alt="Wertung: 3 / 10">

</span>
<span class="t">Der Fluch von Darkness Falls</span>
The only thing XBMC does is to request "/"


Can somebody may help me?
floohh is offline   Reply With Quote
Old 2008-05-28, 22:59   #2
floohh
Junior Member
 
Join Date: May 2008
Posts: 5
floohh is on a distinguished road
Default

okay i altered the term for skipping the unneccesary text, but now i only catch the first match, any idea how to solve?

Code:
<li><a href="/kritiken/([-.a-z0-9A-Z]+)">.*<span class="t">([0-9a-zA-Z .]+).*</li>
floohh is offline   Reply With Quote
Old 2008-05-29, 03:07   #3
floohh
Junior Member
 
Join Date: May 2008
Posts: 5
floohh is on a distinguished road
Default

After hours of hard work, finally it worked
floohh is offline   Reply With Quote
Old 2008-05-29, 11:36   #4
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

i assume your issue was that you didn't realize you are writing xml. so you need to escape special chars such as ", i.e. do &quot;

sorry i didnt see you inquery earlier. feel free to ask again i will try to be of help when i see it
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Old 2008-06-02, 16:14   #5
floohh
Junior Member
 
Join Date: May 2008
Posts: 5
floohh is on a distinguished road
Default

Find solution here:
Link
floohh is offline   Reply With Quote
Old 2008-06-02, 16:30   #6
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

great - the more the merrier. will add to svn cheers
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Old 2009-06-29, 09:46   #7
tatoosh
Member
 
Join Date: Mar 2008
Posts: 48
tatoosh is on a distinguished road
Default

Hey,

i cant download your filmstarts.de scraper. can u give me a link?
it would be nice to use this great website.
tatoosh is offline   Reply With Quote
Old 2009-06-29, 11:13   #8
w00dst0ck
Junior Member
 
Join Date: Aug 2008
Posts: 25
w00dst0ck is on a distinguished road
Default

@Tatoosh: Mach doch mal ein Update deiner XBMC Version.
Alternativ kannst Du über http://xbmc.org/trac die aktuelle Version aus dem SVN downloaden.
w00dst0ck is offline   Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 16:00.


Protected by Akismet, We recommend WordPress blogs
Copyright © 2008, XBMC Project