PDA

View Full Version : Scraper functions question


ababak
2009-04-09, 22:54
Hello!

I am trying to merge locally acquired movie thumb with function-returned data. It looks like I can't send just a plain text trough $$ buffers? Sending <thumbs><thumb>url</thumb></thumbs> works only when enclosed in <details></details> tags. Is there any way I can merge this returned data with local <thumb>url</thumb> tags? I am also wondering how is it merged with the documented <GetDetails> data already available in the <details></details> format when I am collecting it in some buffer using "4+"?

Thank you!

spiff
2009-04-09, 23:17
have a look at how the allmusic scraper passes the thumb tags (i chose this one for clarity). the trick is the usage of the clearbuffers parameter on the functions. (if i understand your question correctly).

after every function we call the load() on the returned xml. this is an additive procedure, i.e. we keep what has been added before. but that does not mean that we can load several <thumbs> tags, hence the trick with clearbuffers

ababak
2009-04-09, 23:57
Hello spiff! Thank you for your reply.

Are you talking about bundled /Applications/Plex.app/Contents/Resources/Plex/system/scrapers/music/allmusic.xml ? I can't see any clearbuffers there...

Could you correct me whether arbitrary text can't be passed as a returned value from function?

How is <details></details> format retuned by custom function being merged with GetDetails <details></details>?

spiff
2009-04-10, 00:01
plex? this is xbmc.. they apparently haven't updated the scrapers then.

http://xbmc.org/trac/browser/branches/linuxport/XBMC/system/scrapers/music/allmusic.xml


<RegExp input="$$1" output="&lt;thumb&gt;http://image.allmusic.com/00/amg/pic200/dr\1\200/\1\2\3\4/\1\2\3\4\5.jpg&lt;/thumb&gt;" dest="7+">
<expression noclean="1" repeat="yes">&quot;([A-Z^])([0-9^])([0-9^])([0-9^])([^&quot;]*)&quot;</expression>
</RegExp>

this builds a list of the allmusic available artist thumbs in buffer 7. we want to add htbackdrop thumbs, and that requires another scrape, the GetThumbs function. now, we flag
GetArtistDetails with clearbuffers="no". this means that when that function is finished we do NOT clear the contents of the buffers. we then enter GetThumbs

<RegExp input="$$13" output="&lt;details&gt;&lt;thumbs&gt;\1$$7&lt;/thumbs&gt;&lt;/details&gt;"

note the usage of $$7 here - this is the list that was built prior in GetDetails. since we do not clear the buffers after the GetDetails function has run, this is still available.

hope that explains it.

no you cannot pass arbitrary data as the result of a function. once you see the allmusic code you'll get the point. as i already explained; after every function we call load() on the returned string (xml). this is an additive procedure, i.e. any new tags present will get loaded. if you return a tag that has been returned earlier, it's overridden. hence the need to do the clearbuffers trick

ababak
2009-04-10, 00:06
Ah, ok, sorry, I didn't realize the scrappers code-base isn't common between these projects. I'll have a look. Thank you very much!

ababak
2009-04-10, 09:03
Thanks, spiff! That explained everything I asked!

ababak
2009-04-10, 09:36
Two more questions.

1. What is an optimal way of replacing strings in the buffer? For example, I need to replace &nbsp; with spaces.
2. How to use "cache" parameter in the url tag (I've seen it in several scrapers but can't understand what it does and how it works)?

spiff
2009-04-12, 21:47
1) optimality can be discussed but this works

<RegExp input="$$2" output="\1&amp;amp;\2" dest="3">
expression noclean="1,2" repeat="yes">(.*?)&amp;(.+)</expression>
</RegExp>

2) cache is just a local file name. that way you can run several function on the same page without redownloading it.