XBMC Community Forum  

Go Back   XBMC Community Forum > Development > Scraper Development

Scraper Development Developers forum for meta data scrapers. Scraper developers only!
Not for posting feature requests, bugs, or end-user support requests!

Reply
 
Thread Tools Search this Thread Display Modes
Old 2009-03-20, 07:16   #1
kimp93
Aeon Group
 
Join Date: Mar 2004
Posts: 111
kimp93 is on a distinguished road
Question Help with a new Korean Music Scraper?

This is my third scraper. this time, I'm trying to make a music scraper.
However, I don't even get search results.
Here is a debug log that may relevant.

Code:
23:50:03 T:668 M:154071040   DEBUG: FileCurl::Open(0012D844) http://music.search.cyworld.com/cymusic/search.html?query=Gee%20%28The%20First%20Mini%20Album%29&v=1
23:50:03 T:668 M:154066944    INFO: XCURL::DllLibCurlGlobal::easy_aquire - Created session to http://music.search.cyworld.com
23:50:04 T:668 M:149823488   DEBUG: Curl::Debug About to connect() to music.search.cyworld.com port 80 (#0)
23:50:04 T:668 M:149823488   DEBUG: Curl::Debug   Trying 117.53.105.15...
23:50:04 T:668 M:174813184   DEBUG: Curl::Debug Connected to music.search.cyworld.com (117.53.105.15) port 80 (#0)
23:50:04 T:668 M:174813184   DEBUG: Curl::Debug GET /cymusic/search.html?query=Gee%20%28The%20First%20Mini%20Album%29&v=1 HTTP/1.1
23:50:04 T:668 M:174813184   DEBUG: Curl::Debug User-Agent: XBMC/pre-9.04 r18650 (Windows; Windows XP Professional Service Pack 2 build 2600; http://www.xbmc.org)
23:50:04 T:668 M:174813184   DEBUG: Curl::Debug Host: music.search.cyworld.com
23:50:04 T:668 M:174813184   DEBUG: Curl::Debug Accept: */*
23:50:04 T:668 M:174813184   DEBUG: Curl::Debug Connection: keep-alive
23:50:04 T:668 M:175108096   DEBUG: Curl::Debug HTTP/1.1 200 OK
23:50:04 T:668 M:175108096   DEBUG: Curl::Debug Date: Fri, 20 Mar 2009 03:50:02 GMT
23:50:04 T:668 M:175108096   DEBUG: Curl::Debug Server: Apache
23:50:04 T:668 M:175108096   DEBUG: Curl::Debug Connection: close
23:50:04 T:668 M:175108096   DEBUG: Curl::Debug Transfer-Encoding: chunked
23:50:04 T:668 M:175108096   DEBUG: Curl::Debug Content-Type: text/html
23:50:05 T:668 M:174272512   DEBUG: Curl::Debug Expire cleared
23:50:05 T:668 M:174272512   DEBUG: Curl::Debug Closing connection #0
23:50:05 T:668 M:174272512   DEBUG: FileCurl::Close(0012D844) http://music.search.cyworld.com/cymusic/search.html?query=Gee%20%28The%20First%20Mini%20Album%29&v=1
full debug log


link to music scraper

If I put same address in web browser, it works all right. XBMC don't seem to get anything after that.
I tried scrap.exe after change couple of tags to run. It seems to work Ok. It makes all xml properly. I don't know why XBMC don't.

I don't think it a bug in XBMC, since other music scrapers work. Please give me some idea.

Thanks
Young-cho
kimp93 is offline   Reply With Quote
Old 2009-03-20, 14:01   #2
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

hi,

since i don't read korean it's hard for me to locate the source where you found the form. i suspect you might need to post the form, not submit
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Old 2009-03-20, 18:28   #3
kimp93
Aeon Group
 
Join Date: Mar 2004
Posts: 111
kimp93 is on a distinguished road
Default

I really appreciate your comment. I was totally lost.
I did tried "post" and it does not seem to work.
After your comment, I look at html more carefully.

From http://music.search.cyworld.com/cymusic/search.html
HTML Code:
<script type="text/javascript" src="http://music.cyworld.com/common/cybgm_snb_script.asp?query=&v=1"></script>
From http://music.cyworld.com/common/cybgm_snb_script.asp

HTML Code:
	+ '	<form name="search" id="search" autocomplete="off" action="http://music.search.cyworld.com/cymusic/search.html">'
	+ '	<p id="selectTxt" onclick="ct_toggle();">전체</p>'
	+ "	<input type=\"text\" class=\"text\" name=\"query\" id=\"query\" onclick=\"ac_toggle(this);\" maxlength=\"100\" onKeyPress=\"if( event.keyCode == 13 ) { go_search(); return false; }\" title=\"검색어 입력\" value=\"\" onblur=\"toggleSearchBar(0);\" onfocus=\"toggleSearchBar(1);\" />"
	+ '	<input type="button" class="btn" title="검색" onclick="go_search();" />'
	+ '	</form>
I don't know much about java script. Based on this, they don't seem to use post.
After the form. there are more java probably for ajax auto-completion which I'm not sure it matter though.
kimp93 is offline   Reply With Quote
Old 2009-05-15, 21:46   #4
kimp93
Aeon Group
 
Join Date: Mar 2004
Posts: 111
kimp93 is on a distinguished road
Default

Now I gave up on previous music scraper. So I made a new one that I familiar with.

I'm trying to scrape same site as movie scraper (DAUM) that I made before.
However same thing are happening as before. I can not get search result.

Here is a part of the scraper

Code:
	<CreateAlbumSearchUrl dest="3">
		<RegExp input="$$1" output="&lt;url&gt;http://music.daum.net/search/album.do?query=\1&lt;/url&gt;" dest="3">
			<expression noclean="1"></expression>
		</RegExp>
	</CreateAlbumSearchUrl>
	<GetAlbumSearchResults dest="8" SearchStringEncoding="UTF-8">
		<RegExp input="$$5" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
			<RegExp input="$$1" output="&lt;entity&gt;&lt;artist&gt;\3&lt;/artist&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url&gt;\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
				<expression repeat="yes">&lt;a href=&quot;(.[^&quot;]*)&quot; class=&quot;fl&quot;&gt;(.[^\n]*)\n[^\:]*\:[^\:]*\:[^&gt;]*&gt;(.[^&lt;]*)&lt;</expression>
			</RegExp>
			<expression noclean="1"></expression>
		</RegExp>		
	</GetAlbumSearchResults>

Here is a debug log


Code:
14:22:09 T:3828 M:232128512   DEBUG: thread start, auto delete: 0
14:22:09 T:3828 M:232022016   DEBUG: FileCurl::Open(0012D364) http://music.daum.net/search/album.do?query=Gee%20%28The%20First%20Mini%20Album%29
14:22:09 T:3828 M:232017920    INFO: XCURL::DllLibCurlGlobal::easy_aquire - Created session to http://music.daum.net
14:22:11 T:3828 M:231038976   DEBUG: FileCurl::Close(0012D364) http://music.daum.net/search/album.do?query=Gee%20%28The%20First%20Mini%20Album%29
14:22:11 T:3828 M:230985728   DEBUG: Thread 3828 terminating
14:22:11 T:3372 M:231096320    INFO: Loading skin file: DialogOK.xml
14:22:11 T:3372 M:231092224   DEBUG: Load DialogOK.xml: 2.17ms


Once again, I could get search result with the URL in debug log from firefox.

Since same kind of movie scraper daum.xml work ok, I don't know why same site with similar search URL don't work in music scraper.

http://music.daum.net/search/album.d...ini%20Album%29


If anything need to solve the problem, please let me know.

Here is a link to the scraper and a sample.
download

Last edited by kimp93; 2009-05-15 at 21:58. Reason: add scraper and sample
kimp93 is offline   Reply With Quote
Reply

Bookmarks

Tags
korean, scraper


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 09:25.


Protected by Akismet, We recommend WordPress blogs
Copyright © 2008, XBMC Project