XBMC Community Forum  

Go Back   XBMC Community Forum > Development > Scraper Development

Scraper Development Developers forum for meta data scrapers. Scraper developers only!
Not for posting feature requests, bugs, or end-user support requests!

Reply
 
Thread Tools Search this Thread Display Modes
Old 2007-03-23, 17:25   #1
The_Dogg
Member
 
Join Date: Feb 2004
Posts: 62
The_Dogg is on a distinguished road
Default Allocine.fr (TV Shows) scraper

I'm working on a TV Show scraper for allocine.fr.

I'm down to the episode list, but I have a little problem:

I use the scrap.exe tool to test it, and when the tool get the links for the episode list, there is a "&" sign that gets lost, let me show you:

Code:
</status><premiered>
 7 Aošt 2005</premiered><episodeguide><url>http://www.allocine.fr/series/episodes_gen_csaison=1511&cserie=513.html</url>
<url>http://www.allocine.fr/series/episodes_gen_csaison=2450&cserie=513.html</url></episodeguide></details>
Episodelist URL 1:http://www.allocine.fr/series/episodes_gen_csaison=1511cserie=513.html
Episodelist URL 2:http://www.allocine.fr/series/episodes_gen_csaison=2450cserie=513.html
GetEpisodeListInternal 2 returned :
GetEpisodeList returned :
Error: Unable to parse episodelist.xml
this is the output of the scrap.exe tool.

You can see that in the <details> tag the URL are OK :
Code:
<url>http://www.allocine.fr/series/episodes_gen_csaison=1511&cserie=513.html</url>
but when the tool says "Episodelist URL" the & sign is lost in the link, causing a near empty page on the website.
Code:
Episodelist URL 1:http://www.allocine.fr/series/episodes_gen_csaison=1511cserie=513.html
and here is the code from the scraper.xml
Code:
<RegExp input="$$8" output="&lt;episodeguide&gt;\1&lt;/episodeguide&gt;" dest="5+">     
				<RegExp input="$$2" output="&lt;url&gt;http://www.allocine.fr/series/episodes_gen_csaison=\1&amp;cserie=$$4.html&lt;/url&gt;" dest="8">
					<expression repeat="yes">&quot;/series/casting_gen_csaison=([0-9]*)&amp;cserie=$$4.html&quot; class=&quot;link1&quot;>[0-9]&lt;/a&gt;</expression>
				</RegExp>
				<expression noclean="1"></expression>
			</RegExp>
I tried replacing the $amp; with only &, i tried putting it twice (&amp;&amp; and &&) the & sign never shows up. but when i try to change the &amp; with &quot; the " sign appears where I need it, only the &amp; that doesnt seems to work.

any help would be appreciated.

The_Dogg
The_Dogg is offline   Reply With Quote
Old 2007-03-23, 18:08   #2
The_Dogg
Member
 
Join Date: Feb 2004
Posts: 62
The_Dogg is on a distinguished road
Default

After a little more research I found the way to have the missing & show


I had to put
Code:
&amp;amp;
so the resulting scraper code is:

Code:
<RegExp input="$$8" output="&lt;episodeguide&gt;\1&lt;/episodeguide&gt;" dest="5+">     
				<RegExp input="$$2" output="&lt;url&gt;http://www.allocine.fr/series/episodes_gen_csaison=\1&amp;amp;cserie=$$4.html&lt;/url&gt;" dest="8">
					<expression repeat="yes">&quot;/series/casting_gen_csaison=([0-9]*)&amp;cserie=$$4.html&quot; class=&quot;link1&quot;>[0-9]&lt;/a&gt;</expression>
				</RegExp>
				<expression noclean="1"></expression>
			</RegExp>
The_Dogg is offline   Reply With Quote
Old 2007-03-23, 18:37   #3
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

reason for this is: you are in an xml document. and you return xml.... each time xml is parsed, you need &amp; or it will be stripped due to being a nonvalid xml char....
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 22:39.


Protected by Akismet, We recommend WordPress blogs
Copyright © 2008, XBMC Project