View Full Version : Can XBMC or scrapers pre-process file-names before preforming lookups?
Hi,
XBMC has trouble finding the films in either IMDB or other scappers because it gets confused by the file naming convention I am using.
Let me explain : I name my files like this
Le Fabuleux destin d'Amélie Poulain (Amelie of Montmartre) - Jean-Pierre Jeunet (2001).avi
That's
original name
English name
Director
Year
The thing is... when I send this info to a scrapper, it gets confused. Is there a way I can "help" the scrapper by telling it which information is which?
Couldn't find anything in the Wiki.
V.
I mean... Is there any other solution than playing with the scrapper XML file (http://www.xboxmediacenter.com/wiki/index.php?title=How_To_Write_Media_Info_Scrapers)?
V.
Ok, I think I'm lost. I tried to modify the scrapper as follows:
Original:
<CreateSearchUrl dest="3">
<RegExp input="$$1" output="http://www.allocine.fr/recherche/?motcle=\1/" dest="3">
<expression></expression>
</RegExp>
</CreateSearchUrl>
Modified
<CreateSearchUrl dest="3">
<RegExp input="$$1" output="http://www.allocine.fr/recherche/?motcle=\1/" dest="3">
<expression trim="1" noclean="1">[^\(-]*</expression>
</RegExp>
</CreateSearchUrl>
But that does not seem to help. I guess that the scrapper engine would take my file name "title (original title) - director (year).avi" and apply the above regex, which should clean it up into "title ", then send it to the search engine.
Does not seem top work, though... :blush:
Any idea?
V.
isn't this what the RegEx in Advanced Settings is for, or does that only apply to TV shows?
isn't this what the RegEx in Advanced Settings is for, or does that only apply to TV shows?
Well, given the name of the property in AdvancedSettings.xml (http://www.xboxmediacenter.com/wiki/index.php?title=AdvancedSettings.xml), I doubt it...
<tvshowmatching>
Contains regular expression to match the season and episode numbers in filenames.
V.
OK, I've spent some time learning more about RegEx (never thought I'd need to someday), but now I am really confused as of why the below does not work, because it should really do what I want
<CreateSearchUrl dest="3">
<RegExp input="$$1" output="http://www.allocine.fr/recherche/?motcle=\1" dest="3">
<expression repeat="no" trim="1" noclean="1" clear="no">([^(]*)</expression>
</RegExp>
</CreateSearchUrl>
([^(]*) should match all the first characters until it finds the first '(', then return this group in the URL, then build the search string.
Any idea anybody?...
V.
search query input is url encoded
search query input is url encoded
Thank you for yor reply, but I am not sure I understand what you mean.
Do you mean I should replace ([^(]*) with (%5B%5E(%5D*)?
Because according to Scraper.xml (http://www.xboxmediacenter.com/wiki/index.php?title=Scraper.xml) page, none of my characters above need encoding. Also, if I look at the reference imdb.xml (http://xbmc.svn.sourceforge.net/viewvc/xbmc/trunk/XBMC/system/scrapers/video/imdb.xml?view=markup) file from SVN, it does contains un-encoded regexps (but that is for other functions, not CreateSearchUrl).
I'll try this tonight, but if it works, I'm not sure I'll understand why... :sad:
V.
Unfortunaltey, no luck :sad:
Anybody has any idea?
V.
INPUT, i.e. the contents of $$1
INPUT, i.e. the contents of $$1
Thank you for taking time to reply, spiff.
Even if $$1 is URL encoded, the parenthesis still remains, so the file name "Le Fabuleux destin d'Amélie Poulain (Amelie of Montmartre) - Jean-Pierre Jeunet (2001).avi" becomes Le%20Fabuleux%20destin%20d'Am%E9lie%20Poulain%20(A melie%20of%20Montmartre)%20-%20Jean-Pierre%20Jeunet%20(2001).avi
If I apply the ([^(]*) regex to this string, I still get Le%20Fabuleux%20destin%20d'Am%E9lie%20Poulain%20, which should really return the proper movie in the search (this would be the resulting URL : http://www.allocine.fr/recherche/?motcle=Le%20Fabuleux%20destin%20d'Am%E9lie%20Poul ain%20)
By the way, is there any way to run the scraper engine in debug mode so that I can understand what it does and log the different values?
Again, thank you for your patience.
V.
there is a scraper development environment in tools/scrap
however its broken (there's a binary which works) and we have been shouting at the author for months but he just do not want to respond :/
otherwise, looking at the source itself is your best bet. this stuff would be taking place in CIMDB::GetURL()
Issue resolved!
After I looked at the source based on your suggestion, I noticed that the content of $$1 is actually pre-processed a lot inside the code before it even reaches the scraper.
So I started XBMC in debug mode, and found out that the query URL is actually printed, so I finally got it.
In fact the file name "Le Fabuleux destin d'Amélie Poulain (Amelie of Montmartre) - Jean-Pierre Jeunet (2001).avi" is actually transformed to "Le Fabuleux destin d'Amélie Poulain (Amelie of Montmartre) Jean-Pierre Jeunet". Note that the <space><minus sign><space> in the middle becomes the <space><space><space>. So once it is escaped, it becomes %20%20%20.
Finally, I managed to get what I want with the following regular expression :
<expression>([^(]+)(%20%20%20|%20%28)</expression>
the %20%28 is here to strip out everything after the first round bracket, and in case the movie is already in English (like "The Rock - Michael Bay (1996).avi"I don't have this first bracket, so I use the %20%20%20 to get rid of the movie director name.
V.