![]() |
|
|||||||
| Scraper Development Developers forum for meta data scrapers. Scraper developers only! Not for posting feature requests, bugs, or end-user support requests! |
![]() |
|
|
Thread Tools | Search this Thread | Display Modes |
|
|
#1 |
|
Senior Member
Join Date: Dec 2006
Posts: 109
![]() |
I think I've found a bug in scrap.exe; it may be in xbmc's parsing of scrapers but I think it is in scrap.exe, they have different behaviour when dealing with "cleaning" expressions (when you do not specify noclean="1"). It was turning me crazy...
I was trying to extract <genre> for the example scraper in "scraper for dummies". The interesting part of $$1 is: Code:
...<font class = 'titulo3'>Género:</font><br>Terror / Thriller<br><br><font class = 'titulo3'>Nacionalidad:</font>... regexp1: Code:
<RegExp input="$$1" output="\1" dest="9"> <expression noclean="1">G.nero:(.[^:]*)Nacionalidad:</expression> </RegExp> Code:
</font><br>Terror / Thriller<br><br><font class = 'titulo3'> Code:
<RegExp input="$$9" output="\1/" dest="7"> <expression noclean="1">>(.[^<>]*)<</expression> </RegExp> Code:
Terror / Thriller/ Code:
<RegExp input="$$7" output="<genre>\1</genre>" dest="8+"> <expression repeat="yes" trim="1">([^/]*)/</expression> </RegExp> Code:
<genre>Terror</genre><genre>Thriller</genre> But in my first attempt, I forgot to add "noclean=1" in both regexp1: and regexp2:, it should not work because the expression in regexp2 does not resolve to anything and and since $$7 is not cleared, in regexp3 it will use the previous content and generate some random <genre> or nothing, that is what happens in xbmc, but in scrap.exe it actually worked and gave me correct results!! It occurred then to me that, since cleaning should strip all html content, using noclean="1" in regexp1 should return directly "Terror / Thriller", and so this shorter version should do the same (stripping regexp2): Code:
<RegExp input="$$7" output="<genre>\1</genre>" dest="8+"> <RegExp input="$$1" output="\1/" dest="7"> <expression>G.nero:(.[^:]*)Nacionalidad:</expression> </RegExp> <expression repeat="yes" trim="1">([^/]*)/</expression> </RegExp> Code:
<genre><</genre><genre>font><br>Terror</genre><genre>Thriller<br><br><font class = 'titulo3'></genre> I'm using in the "scraper for dummies" the first, longer version with regexp 1, 2 and 3 because works in both cases, and so is better to not confuse people that may try it by hand. |
|
|
|
|
|
#2 |
|
Team-XBMC Developer
Join Date: Oct 2003
Posts: 15,076
![]() |
Unfortunately, scrap.exe is out of date, and no longer maintained. The original author lost the sources to his updated build.
This means that the only way to test it reliably at this point is directly from XBMC.
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting. Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules. For troubleshooting and bug reporting please make sure you read this first.
|
|
|
|
|
|
#3 |
|
Senior Member
Join Date: Dec 2006
Posts: 109
![]() |
Ok, I will include that info in the wiki for everyone to know.
scrap.exe is still useful enough for some quick tests, I haven't found other bugs except that noclean issue. |
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|