PDA

View Full Version : New to scraper development, help please


Gangsta
2009-03-21, 23:03
Hi

Im have installed apache/mysql/php on my media server (192.168.0.10), and have a script that responds to (for example)

http://192.168.0.10/search.php?videoID=tt4638525

with an xml like this


<?xml version="1.0" encoding="UTF-8"?>
<movie>
<details>
<title></title>
<year></year>
<director></director>
<top250></top250>
<mpaa></mpaa>
<tagline></tagline>
<runtime></runtime>
<thumb></thumb>
<credits></credits>
<rating></rating>
<votes></votes>
<genre></genre>
<actor>
<name></name>
<role></role>
</actor>
<outline></outline>
<plot></plot>
</details>
</movie>


This is how far i have got with the scraper - but i dont think its right - I cant get my head around RegEx's at all.


<scraper name="LocalMedia" content="movies" thumb="LocalMedia.gif">

<NfoUrl dest="3">
<RegExp input="$$1" output="http://192.168.0.10/search.php?videoID=/1" dest="3">
<expression noclean="1">192.168.0.10/(.*)</expression>
</RegExp>
</NfoUrl>

<CreateSearchUrl>
<RegExp>
<expression></expression>
</RegExp>
</CreateSearchUrl>

<GetSearchResults>
<RegExp>
<expression></expression>
</RegExp>
</GetSearchResults>

<GetDetails>
<RegExp>
<expression></expression>
</RegExp>
</GetDetails>

</scraper>


any help appreciated please :nod:

kimp93
2009-03-22, 00:32
I don't know why you want to make this way and I am not really good at regex as well.
It should be really simple to do.


<GetDetails dest="3">
<RegExp input="$$5" output="&lt;details&gt;\1&lt;/details&gt;" dest="3">
<RegExp input="$$1" output="&lt;title&gt;\1&lt;/title&gt;" dest="5">
<expression noclean="1">&lt;title&gt;(.[^&lt;]*)</expression>
</RegExp>
<RegExp input="$$1" output="&lt;year&gt;\1&lt;/year&gt;" dest="5+">
<expression noclean="1">&lt;year&gt;(.[^&lt;]*)</expression>
</RegExp>
<expression noclean="1"></expression>
</RegExp>
</GetDetails>

You just copy and change rest of them same way.


I don't know how your search.php output search results. I can not help on that.

spiff
2009-03-22, 00:49
afaict the response is almost the xbmc format. if so;


<GetDetails dest="3">
<RegExp input="$$1" output="\1\2" dest="3">
<expression noclean="1">(.*)&lt;movie&gt;(.*)&lt;/movie&gt;</expression>
</RegExp>
</GetDetails>

does the job. if you ditch the movie tag in the output you can do

<GetDetails dest="3">
<RegExp input="$$1" output="\1" dest="3">
<expression noclean="1"/>
</RegExp>
</GetDetails>


as for nfourl that is for recognizing a url in a .nfo file, something like

<NfoUrl dest="3">
<RegExp input="$$1" output="\1" dest="3">
<expression>(http://192.168.0.10/.*)</expression>
</RegExp>
</NfoUrl>

does the job

kimp93
2009-03-22, 00:52
Wow way better and simple ;P

Gangsta
2009-03-22, 01:20
wow, thanks for the quick responses.

The reason im wanting to do it this way is to keep all my xbox's syncronised.

ive got a scraper that i made in vb, which adds the details scraped to the mysql. My figuring is that my 100Mbps home network is WAAAAAAAAY faster than my broadband (0.5Mbps) so my scraper examines my media folders then scrapes anything is needed.

Now the files and data are stored locally on my highspeed network - and available to all the xbox's when they run their update, and hopefully they will all update much quicker.