![]() |
|
|||||||
| Scraper Development Developers forum for meta data scrapers. Scraper developers only! Not for posting feature requests, bugs, or end-user support requests! |
![]() |
|
|
Thread Tools | Search this Thread | Display Modes |
|
|
#1 |
|
Aeon Group
Join Date: Mar 2004
Posts: 111
![]() |
I have a problem unicode string (korean) matching that is surrounded by lots of tab and spaces.
Code:
<strong>등급</strong></dt> <dd> 청소년관람불가(한국) </dd> Code:
<RegExp input="$$7" output="<mpaa>\1</mpaa>" dest="8+">
<RegExp input="$$1" output="\1" dest="7">
<expression noclean="1"><strong>등급</strong></dt>[^>]*>(.[^<]*)</dd></expression>
</RegExp>
<expression trim="1"></expression>
</RegExp>
problem is that I can not get rid of white spaces around words. I tried with no "noclean", "trim", /s, /t which does not help. If I use /b, it get rid of whole string. regex engine does not seem to support /p. I looked at pcre and saying that supporting /p is option. please guide me on this. |
|
|
|
|
|
#2 |
|
Aeon Group
Join Date: Mar 2004
Posts: 111
![]() |
never mind. I solved the problem.
|
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|