XBMC Community Forum  

Go Back   XBMC Community Forum > Development > Scraper Development

Scraper Development Developers forum for meta data scrapers. Scraper developers only!
Not for posting feature requests, bugs, or end-user support requests!

Reply
 
Thread Tools Search this Thread Display Modes
Old 2009-02-11, 22:43   #1
Maxim
Fan
 
Join Date: Sep 2004
Posts: 702
Maxim is on a distinguished road
Default Developing a Regex for Anime TV Series

I thought this might be the best place to post this question.

I'm looking to develop a regex or few for the Anime TV Series I have. I plan on doing one for movies and OVAs too, but that's later.

Here is what I have now.
Code:
[-_ .]([0-9]+)[-_ \[\].v(]
I have excluded the "Season" part and just focusing on the file name for right now. This line will get, as far as I understand, almost all of the files I have. I am running into a few files that are giving me trouble. Also i'm not sure if i'm making the line properly since it's also matching the character to the left, and to the right of the string I actually want to match. Here is a list of files that are giving me issues:
Code:
[Bleach-Society]Bleach_-_73-74[Xvid][C03A425E].avi
[Lunar] Bleach - 52-53 [B937F496].avi
[Lunar] Bleach - 67 v2 [A1C97A64].avi
[Lunar] Bleach - 68-69 [C23724B5].avi
[m.3.3.w] Chaos Head - 01v2 (H.264) [094A3E22].mkv
[a4e]Get_Backers_20[divx5.2.1].mkv
[a4e]Get_Backers_21v2[h.264].mkv
[a4e]Mahoromatic_Summer_Special[divx5.1.1].mkv
[B-G_&_w.0.0.f]_Shigofumi_Opening.DVD(H.264_DD2.0)_[91FDE4D2].mkv
[Exiled-Destiny]_Wolfs_Rain_Ep01_(6F7967EA).mkv
In order to test the line I've been using this site here: http://regexpal.com/

The issues i'm having are regarding double episodes, where it doesn't pickup the second episode. The group names that are abbreviated similar to the characters around the episode numbers. Codec names appearing similar to the episodes. And the "Ep" that appears in front of wolf's rain.

I'm very new to regular expression so please forgive my ignorance.
Maxim is offline   Reply With Quote
Old 2009-02-12, 20:44   #2
Maxim
Fan
 
Join Date: Sep 2004
Posts: 702
Maxim is on a distinguished road
Default

Ok. This string here:
Code:
[-_ p.](\d{2})[-_ (v.\[]
gets almost all of the the episodes except for these two:
Code:
[Bleach-Society]Bleach_-_73-74[Xvid][C03A425E].avi
[Lunar] Bleach - 52-53 [B937F496].avi
It's not able to get them because it doesn't match the first part "[-_ p.]" with anything because what is before it is the first match. I'm not sure how to go from here.
Maxim is offline   Reply With Quote
Old 2009-02-12, 21:35   #3
Maxim
Fan
 
Join Date: Sep 2004
Posts: 702
Maxim is on a distinguished road
Default

Oh man. Totally got it. So exciting.

Here is the string in all it's glory:
Code:
[-_ p.](\d{2})[-_ (v.\[](\d{2})?
This regexp is able to find the episode numbers in 930 files from various release groups and several varying filename formats. I must say that Regex is simply awesome.

Last edited by Maxim; 2009-02-12 at 21:38.
Maxim is offline   Reply With Quote
Old 2009-02-12, 22:28   #4
althekiller
Team-XBMC Developer
 
althekiller's Avatar
 
Join Date: May 2004
Posts: 4,056
althekiller is on a distinguished road
Default

That last expression group seems to be ignored? You probably want (?:expr) in that case.
althekiller is offline   Reply With Quote
Old 2009-02-12, 22:47   #5
Maxim
Fan
 
Join Date: Sep 2004
Posts: 702
Maxim is on a distinguished road
Default

Well, I haven't tested it in XBMC yet. I was using this webpage to act as my test bed:

http://www.fileformat.info/tool/regex.htm

It was able to isolate group 1 which would be the first episode or only episode, and it was able to isolate the group 2 which would be picked up as the second episode in the sequence ##-##.

I'll have to do some testing when I get a chance to see if it actually works in XBMC.

I also used this page here:

http://www.gskinner.com/RegExr/

Which is by far the best regular expression page i've seen.

I'm not familiar with that (?:) sequence, i'll take a look at that also. From what I understand putting the ? makes the preceding token optional.

Last edited by Maxim; 2009-02-12 at 22:51.
Maxim is offline   Reply With Quote
Old 2009-02-12, 22:53   #6
althekiller
Team-XBMC Developer
 
althekiller's Avatar
 
Join Date: May 2004
Posts: 4,056
althekiller is on a distinguished road
Default

(?:expr) groups expressions but doesn't create an output based on the field. Have a look at the wiki for how we currently handle multiple eps.
althekiller is offline   Reply With Quote
Old 2009-02-13, 00:02   #7
Maxim
Fan
 
Join Date: Sep 2004
Posts: 702
Maxim is on a distinguished road
Default

Hmm, after looking at the wiki I seem to have come up confused.

On this page:

http://xbmc.org/wiki/?title=Advanced...howmatching.3E

It has a note regarding multi part episode files:

NOTE: for multi-episode matching to work, there needs to be a third set of parentheses on the end. This part is fed back into the regexp engine.

I wasn't really sure what that meant, but I added a second grouping on, so that with the Season grouping which has been excluded it would make three sets of parentheses, with the second episode in the filename being the third grouping.

However, this theory is limited to just two episode files, three episode files it wouldn't work, or probably not return the same results.

Then there is this page here:

http://xbmc.org/wiki/index.php?title...-part_Episodes

Which describes multi-part episodes and says that the regex will get applied multiple times, but is not clear on whether then "entire" expression gets applied to the same string (filename) or whether just the episode portion does. If the latter is true then how does it differentiate between the season portion, and the episode portion.

The regex given in that section is this:
Code:
[-EeXx]+([0-9]+)
However, testing that expression, with the given criteria on the page displays unexpected results in that the 201 is picked up as an episode where as it should be a season, but then again, that expression doesn't have a Season section just an episode.

All in all i'm pretty confused over the matter, but really won't be able to do anything until I get home and poke around in advancedsettings and the debug log.
Maxim is offline   Reply With Quote
Old 2009-02-13, 01:26   #8
Maxim
Fan
 
Join Date: Sep 2004
Posts: 702
Maxim is on a distinguished road
Default

It seems that it's not picking up the regex at all.

I have a DEBUG log here:

http://pastebin.com/m7bb94dff

I can see in the log that it loads advancedsettings.xml properly, however when it gathers the files it doesn't say that it's checking against the regex like i've seen in other log files in other regex threads. An example of what appears to be missing from my log file:

DEBUG: running expression \[[Ss]([0-9]+)\]_\[[Ee]([0-9]+)\]?([^\\/]*)$ on label m:\tv\30 rock\season 01\1 - pilot.avi

Here is the contents of my advancedsettings.xml
Code:
<advancedsettings>
  <loglevel>3</loglevel>

  <tvshowmatching>
    <regexp>Season ([0-9]+)[\\/][-_ p.](\d{2})[-_ (v.\[](\d{2})?[^\\/]*</regexp>
  </tvshowmatching>
</advancedsettings>

Last edited by Maxim; 2009-02-13 at 01:40.
Maxim is offline   Reply With Quote
Old 2009-02-15, 17:32   #9
Maxim
Fan
 
Join Date: Sep 2004
Posts: 702
Maxim is on a distinguished road
Default

It seems my regex was yet again malformed. The latest incarnation which works with XBMC is:
Code:
Season ([0-9]{2}).*[\\/].*[-_ p.]([0-9]{2})[-_ (v.\[]([0-9]{2})?[^\]\\/]*

Last edited by Maxim; 2009-02-15 at 20:53.
Maxim is offline   Reply With Quote
Reply

Bookmarks


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 00:44.


Protected by Akismet, We recommend WordPress blogs
Copyright © 2008, XBMC Project