XBMC Community Forum  

Go Back   XBMC Community Forum > Development > Scraper Development

Scraper Development Developers forum for meta data scrapers. Scraper developers only!
Not for posting feature requests, bugs, or end-user support requests!

Reply
 
Thread Tools Search this Thread Display Modes
Old 2009-02-27, 11:58   #1
kimp93
Aeon Group
 
Join Date: Mar 2004
Posts: 111
kimp93 is on a distinguished road
Default help on unicode string with white spaces

I have a problem unicode string (korean) matching that is surrounded by lots of tab and spaces.


Code:
<strong>등급</strong></dt>
<dd>
				
																																		 청소년관람불가(한국)			</dd>
What I trying to get is words between <dd> and </dd>.

Code:
<RegExp input="$$7" output="&lt;mpaa&gt;\1&lt;/mpaa&gt;" dest="8+">
     <RegExp input="$$1" output="\1" dest="7">
               <expression noclean="1">&lt;strong&gt;등급&lt;/strong&gt;&lt;/dt&gt;[^&gt;]*&gt;(.[^&lt;]*)&lt;/dd&gt;</expression>
     </RegExp>
     <expression trim="1"></expression>
</RegExp>
With this, I could get whatever between <dd> and </dd>
problem is that I can not get rid of white spaces around words.

I tried with no "noclean", "trim", /s, /t which does not help.
If I use /b, it get rid of whole string. regex engine does not seem to support /p. I looked at pcre and saying that supporting /p is option.

please guide me on this.
kimp93 is offline   Reply With Quote
Old 2009-03-06, 00:30   #2
kimp93
Aeon Group
 
Join Date: Mar 2004
Posts: 111
kimp93 is on a distinguished road
Default

never mind. I solved the problem.
kimp93 is offline   Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 22:02.


Protected by Akismet, We recommend WordPress blogs
Copyright © 2008, XBMC Project