PDA

View Full Version : Help with a new Korean Music Scraper?


kimp93
2009-03-20, 07:16
This is my third scraper. this time, I'm trying to make a music scraper.
However, I don't even get search results.
Here is a debug log that may relevant.

23:50:03 T:668 M:154071040 DEBUG: FileCurl::Open(0012D844) http://music.search.cyworld.com/cymusic/search.html?query=Gee%20%28The%20First%20Mini%20Al bum%29&v=1
23:50:03 T:668 M:154066944 INFO: XCURL::DllLibCurlGlobal::easy_aquire - Created session to http://music.search.cyworld.com
23:50:04 T:668 M:149823488 DEBUG: Curl::Debug About to connect() to music.search.cyworld.com port 80 (#0)
23:50:04 T:668 M:149823488 DEBUG: Curl::Debug Trying 117.53.105.15...
23:50:04 T:668 M:174813184 DEBUG: Curl::Debug Connected to music.search.cyworld.com (117.53.105.15) port 80 (#0)
23:50:04 T:668 M:174813184 DEBUG: Curl::Debug GET /cymusic/search.html?query=Gee%20%28The%20First%20Mini%20Al bum%29&v=1 HTTP/1.1
23:50:04 T:668 M:174813184 DEBUG: Curl::Debug User-Agent: XBMC/pre-9.04 r18650 (Windows; Windows XP Professional Service Pack 2 build 2600; http://www.xbmc.org)
23:50:04 T:668 M:174813184 DEBUG: Curl::Debug Host: music.search.cyworld.com
23:50:04 T:668 M:174813184 DEBUG: Curl::Debug Accept: */*
23:50:04 T:668 M:174813184 DEBUG: Curl::Debug Connection: keep-alive
23:50:04 T:668 M:175108096 DEBUG: Curl::Debug HTTP/1.1 200 OK
23:50:04 T:668 M:175108096 DEBUG: Curl::Debug Date: Fri, 20 Mar 2009 03:50:02 GMT
23:50:04 T:668 M:175108096 DEBUG: Curl::Debug Server: Apache
23:50:04 T:668 M:175108096 DEBUG: Curl::Debug Connection: close
23:50:04 T:668 M:175108096 DEBUG: Curl::Debug Transfer-Encoding: chunked
23:50:04 T:668 M:175108096 DEBUG: Curl::Debug Content-Type: text/html
23:50:05 T:668 M:174272512 DEBUG: Curl::Debug Expire cleared
23:50:05 T:668 M:174272512 DEBUG: Curl::Debug Closing connection #0
23:50:05 T:668 M:174272512 DEBUG: FileCurl::Close(0012D844) http://music.search.cyworld.com/cymusic/search.html?query=Gee%20%28The%20First%20Mini%20Al bum%29&v=1

full debug log
(http://pastebin.com/m715dc02f)

link to music scraper (http://dl.getdropbox.com/u/237787/cyworld.xml)

If I put same address in web browser, it works all right. XBMC don't seem to get anything after that.
I tried scrap.exe after change couple of tags to run. It seems to work Ok. It makes all xml properly. I don't know why XBMC don't.

I don't think it a bug in XBMC, since other music scrapers work. Please give me some idea.

Thanks
Young-cho

spiff
2009-03-20, 14:01
hi,

since i don't read korean it's hard for me to locate the source where you found the form. i suspect you might need to post the form, not submit

kimp93
2009-03-20, 18:28
I really appreciate your comment. I was totally lost.
I did tried "post" and it does not seem to work.
After your comment, I look at html more carefully.

From http://music.search.cyworld.com/cymusic/search.html
<script type="text/javascript" src="http://music.cyworld.com/common/cybgm_snb_script.asp?query=&v=1"></script>


From http://music.cyworld.com/common/cybgm_snb_script.asp

+ ' <form name="search" id="search" autocomplete="off" action="http://music.search.cyworld.com/cymusic/search.html">'
+ ' <p id="selectTxt" onclick="ct_toggle();">전체</p>'
+ " <input type=\"text\" class=\"text\" name=\"query\" id=\"query\" onclick=\"ac_toggle(this);\" maxlength=\"100\" onKeyPress=\"if( event.keyCode == 13 ) { go_search(); return false; }\" title=\"검색어 입력\" value=\"\" onblur=\"toggleSearchBar(0);\" onfocus=\"toggleSearchBar(1);\" />"
+ ' <input type="button" class="btn" title="검색" onclick="go_search();" />'
+ ' </form>

I don't know much about java script. Based on this, they don't seem to use post.
After the form. there are more java probably for ajax auto-completion which I'm not sure it matter though.

kimp93
2009-05-15, 21:46
Now I gave up on previous music scraper. So I made a new one that I familiar with.

I'm trying to scrape same site as movie scraper (DAUM) that I made before.
However same thing are happening as before. I can not get search result.

Here is a part of the scraper

<CreateAlbumSearchUrl dest="3">
<RegExp input="$$1" output="&lt;url&gt;http://music.daum.net/search/album.do?query=\1&lt;/url&gt;" dest="3">
<expression noclean="1"></expression>
</RegExp>
</CreateAlbumSearchUrl>
<GetAlbumSearchResults dest="8" SearchStringEncoding="UTF-8">
<RegExp input="$$5" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
<RegExp input="$$1" output="&lt;entity&gt;&lt;artist&gt;\3&lt;/artist&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url&gt;\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
<expression repeat="yes">&lt;a href=&quot;(.[^&quot;]*)&quot; class=&quot;fl&quot;&gt;(.[^\n]*)\n[^\:]*\:[^\:]*\:[^&gt;]*&gt;(.[^&lt;]*)&lt;</expression>
</RegExp>
<expression noclean="1"></expression>
</RegExp>
</GetAlbumSearchResults>


Here is a debug log


14:22:09 T:3828 M:232128512 DEBUG: thread start, auto delete: 0
14:22:09 T:3828 M:232022016 DEBUG: FileCurl::Open(0012D364) http://music.daum.net/search/album.do?query=Gee%20%28The%20First%20Mini%20Album %29
14:22:09 T:3828 M:232017920 INFO: XCURL::DllLibCurlGlobal::easy_aquire - Created session to http://music.daum.net
14:22:11 T:3828 M:231038976 DEBUG: FileCurl::Close(0012D364) http://music.daum.net/search/album.do?query=Gee%20%28The%20First%20Mini%20Album %29
14:22:11 T:3828 M:230985728 DEBUG: Thread 3828 terminating
14:22:11 T:3372 M:231096320 INFO: Loading skin file: DialogOK.xml
14:22:11 T:3372 M:231092224 DEBUG: Load DialogOK.xml: 2.17ms



Once again, I could get search result with the URL in debug log from firefox.

Since same kind of movie scraper daum.xml work ok, I don't know why same site with similar search URL don't work in music scraper.

http://music.daum.net/search/album.do?query=Gee%20%28The%20First%20Mini%20Album %29


If anything need to solve the problem, please let me know.

Here is a link to the scraper and a sample.
download (http://dl.getdropbox.com/u/237787/daumMusic.rar)