PDA

View Full Version : TheTVDB.com Scraper absolute_number for Anime?


acaranta
2009-01-30, 18:29
Hi everyone ...


As I posted somewhere else in here, I tryied to code a scraper for anidb.net, but I couldn't find an easy way to do it ....

But I found that TheTVDB had nearly anything I needed for my Animes ... except that, for example, Naruto was tagged with around 10 Season, whereas my files are tagged as a single season of 220 episodes ....

I found (and read here) that TVDB was providing "absolute_number" which could fit nicely my needs ...

So I modified thetvdb.xml to add an parameter "Force Single Season (use absolute_number)" which returns to XBMC that all episodes gathered from TVdb are in Season 1 and uses absolute_number ...

And it works fine ;)

Shall I post it here or give it to someone to commit the changes after tests ?

Or anything else ? ;)

PS : I just need some final changes to allow Single Season Animes to responds normally when The "Force Single Season switch" is active, but that's just for lazy people like me :p

https://dav.caranta.com/public/forcesingleseasontvdb.jpg

NB : Sorry for the ceretificate warning if it happens for the picture ;)

acaranta
2009-01-30, 18:32
Of course this means that your anime episodes should be named "S01Exx" ;)

But that is really easy ! Or at least easier than going to rename as every seasons in tvdb.

spiff
2009-01-30, 18:32
trac please. and this is awesome :)

the option should be named 'Use Absolute Ordering' though.

also it should be fairly simple to hack episode-only expressions in there (to avoid the renaming)

acaranta
2009-01-30, 18:34
Please pardon my ignorance ... the url of xbmc trac ? ;)

spiff
2009-01-30, 18:34
cleverly hidden as

xbmc.org/trac

;P

acaranta
2009-01-30, 18:36
ok found the url :p so dumb on friday evening ;)

acaranta
2009-01-30, 18:40
Wow ... err ... i'll search how to add asap ... because I have to go to a concert right now ... but normally before tomorrow it should be available (as soon as I'll know how to use trac ... I use subversion a lot, but never really used trac :p )

spiff
2009-01-30, 18:50
basically CVideoInfoScanner::EnumerateSeriesFolder

you need a new set of expressions that is to be run if all others fail

acaranta
2009-01-31, 03:56
Ok ... Patch submitted to TRAC :
http://xbmc.org/trac/ticket/5804

I merged the changes from revision I had missed
I changed the option name to "Use Absolute Numbering"
And Commented the code within the xml.

;)

spiff
2009-01-31, 14:11
r17482

goku31640
2009-02-02, 21:05
Quick question, how does this effect "Specials" Episodes that are named S0Exx, because those do not have absolute numbers, but are within the same folder?

spiff
2009-02-02, 21:29
afaik nothing change wrt to specials...

goku31640
2009-02-03, 01:48
I just checked, Special episodes are ignored, so when adding by absolute number, Season 0 episodes are not added to the library. Is there any way to add them?

acaranta
2009-02-03, 07:57
Well ... This is my next part of "the plan" :p

I want that when in absolute_number mode, the episodes without absolute_number should be returned with thier usual season numbers. Instead of being ignored.

Which should mean S0Exx and S01xxx should be available ...

I'll keep up asap !

goku31640
2009-02-03, 08:05
Well ... This is my next part of "the plan" :p

I want that when in absolute_number mode, the episodes without absolute_number should be returned with thier usual season numbers. Instead of being ignored.

Which should mean S0Exx and S01xxx should be available ...

I'll keep up asap !

Sweet. I dont know if you read my comment before you posted that, but if you did then... alright I contributed a little to xbmc :)

If not then :(

But thanks for the Great work absolute numbering was a much needed addition to the scraper (i'm surprised it wasn't in there before)

sstarcher
2009-02-05, 16:51
Great idea I did not think of going that route. Over the last month I have been working on a script to rename tv series using TheTVDB. It is working out very well, but still has a few cases it does not catch.

What problems did you have with anidb.net? I was considering looking into writing a scraper for it as well as using it as the source for my renaming program.

Maxim
2009-02-05, 17:30
Keep in mind also that renaming files is often not an option for some users. Torrents require grouped file names to be absolute. Also some torrent sites require file names stay true to their originals released by the group. I wouldn't mind "renaming" the entry that is shown by XBMC but I would definitely have to either, be able to select multiple episodes at once, or select a folder and apply all in a recursive manner.

I would use this, it would also be nice to create thumbnails for each episode at that time too.

sstarcher
2009-02-05, 19:15
I can understand that some users may not want to rename the files. I also think any torrent group "requiring" torrent names to stay the same is laughable. Once I have the file downloaded I will do as I please and they have no clue what's going on.

If it was possible to inform XBMC of the information through a property setting or some other manner that would be a great idea and easy to implement.

Sorry I did not mean to take this thread away from it's original intent, but to finish answering your question. The project I was working on currently functions by looking into a folder and parsing out data from the file name then doing a lookup of that data and for most things named in the manner of [group] Series name # (info).avi as long as it has Series name followed by a # it will automatically detect it for you and rename it and move it to a folder with the series name as the folder name. For anything else in say the format of .. let me find a real example.. "[Eclipse] Code Geass - Lelouch of the Rebellion R2 - 01 (XviD) [43EEA862].avi" The program will bring you to an interface where it parse out all data and seperates it and will present you with a list of the data as follows
"Eclipse
Code
Geass.. etc

you then would select from the list Code Geass and it would rename all episodes that are in the format of that episode and grab the episodes numbers. So if i have the entire season of code geass I must only tell the parser one time what words to use for the name, unless of course each episode was taken from a different source and the name formatting is different.

Now to return you to your scheduled program.

I love the automated absolute numbering idea.

acaranta
2009-02-07, 02:56
Right ... to keep up with what I said previously ...

I modified 'again' ;) the tvdb.xml scraper.

Previously my changes to use absolute_numbers with a single season (1) were ignoring seasons 00 (Specials) and other seasons ...

With tonight changes when the "Absolute_Numbering" options is checked :
-All episodes with and 'absolute_number' are returned to XBMC as S01Exxx
-AND episodes WITHOUT 'absolute_number' are also returned as SxxxEyyy (which allows single season serie to be tagged and specials to be tagged too )

The Patch Ticket has been created ;)

Now to answser thing that were said ...
*concerning anidb.net scraper ... I have a fully working scraper at home ... BUT they do not allow scraping their website ... so you can use it but after a certain number of accesses ... you get banned for 24h ... which is not something I call practical for every users. Moreover, Anidb do not provide a easy API ... from my point of view, I would need to code a xmlAPI-to-anidbUDPAIP ... but I would be the only entry point ... therefore, A nice SPOF, and regarding the anidb rules (once again) I would be banned every day ... :( Finally, anidb does not provide the marvellous icons, banners, poster and fanart TVDB provides.

*The problem of renaming files ... well I got over it using the magical "rename" command under linux and its wonderfull regexp function ;) Renaming all my Naruto files took me about 30 seconds ... and this has no price ;)

*I'm glad my changes can be usefull for people ... This is actually my first commit to an open source project ... and it feel great ;)

acaranta
2009-02-07, 03:04
For example ... the changes allow Cowboy bebop anime to be nicely tagged even with the "absolute_numbering" option active ;)

I actually wonder if this option is really needed ... what happens if a scraper return several times the same episode ?

What would happen if absolute_numbered episode AND normal episodes were returned all the time ?

I guess (ok it's 2am here in France) I shouldn't be a problem ... I have to test this ;)
It could really be classy to have such an automatic thing ! no ?

acaranta
2009-02-07, 03:07
Sweet. I dont know if you read my comment before you posted that, but if you did then... alright I contributed a little to xbmc :)

If not then :(

But thanks for the Great work absolute numbering was a much needed addition to the scraper (i'm surprised it wasn't in there before)

Sorry ... I was planning this after releasing the first absolute_numbering system ;) I felt bad to check/unchek the option for some anime that had only 1 season, and which was not tagged with absolute_numbers.

But hey ... I really glad to see that I'm not the only one thinking this way :D

goku31640
2009-02-07, 04:41
Sorry ... I was planning this after releasing the first absolute_numbering system ;) I felt bad to check/unchek the option for some anime that had only 1 season, and which was not tagged with absolute_numbers.

But hey ... I really glad to see that I'm not the only one thinking this way :D

I love this addition, a couple of my anime shows did not have absolute numbers on tvdb, but they make it so easy to add them, that I was up and running in about 10 minutes. I was able to add all of the absolute numbers to initial D in about 5 minutes, and GTO in about 3 minutes.

acaranta
2009-02-07, 12:24
I love this addition, a couple of my anime shows did not have absolute numbers on tvdb, but they make it so easy to add them, that I was up and running in about 10 minutes. I was able to add all of the absolute numbers to initial D in about 5 minutes, and GTO in about 3 minutes.

Do you know btw if there is a way to add absolute_number in a easy way other than clicking every episodes via the web interface ?

Concerning dragon bal, for instance, that will be incredibly long ... ;)

goku31640
2009-02-07, 19:30
Do you know btw if there is a way to add absolute_number in a easy way other than clicking every episodes via the web interface ?

Concerning dragon bal, for instance, that will be incredibly long ... ;)

No I dont, the reason its a little bit easier for me, is because my mouse has an auto scroll and click, so i set it to automatically scroll down the page and click in a certain spot so all i have to do is click then enter the number and click then enter the number etc.

I have dragonball and dragonball z, but i decided to leave those as is, because first of all there are way too many episodes, secondly I believe the japanese version has more episodes (i could be wrong though).

goku31640
2009-02-07, 19:32
If you really want to, theres 15 seasons. I could knock out about 3 season maybe you could do a few and if someone else lends a hand we could get them all done a lot quicker :)

goku31640
2009-02-13, 20:11
Any update on this at all?

acaranta
2009-02-14, 20:21
Well the patch has been submitted last week ... But the team, I guess, have a social life I guess ... so It will be applied one day ;)

http://xbmc.org/trac/ticket/5860

Just have to wait or try the scraper which is attached to the patch ticket.

goku31640
2009-02-15, 03:36
I just downloaded the tvdb.xml, but its not working the way it should for me. It works exactly like before but it still doesnt get my season 0 episodes.

they are named
Naruto 0x04 - It's the Snow Princess Ninja Arts' Book! (Movie 1)
etc.
Any Ideas?

acaranta
2009-02-15, 12:04
ah I had this problem too on my "production" XBMC while I wasn't getting this error on the "developpement" XBMC...

The only thing I can think of is that I changed internally the name on the variable that is used to switch between absolute to normal ...

I managed to correct it by re-scanning the whole source.

However, it might lose some info ... such as viewed episodes ... but I'm not sure of this point.

tripe
2009-02-19, 05:48
How difficult would it be to alter absolute number so that it simply used absolute number to find the episode. Then correctly fetched into the library the actual season #s etc. This would help save scrolling in some of the longer series.
I'm willing to figure it out if its actually possible.

Maxim
2009-02-19, 17:32
How difficult would it be to alter absolute number so that it simply used absolute number to find the episode. Then correctly fetched into the library the actual season #s etc.

I was curious about this idea myself. Theoretically it should be possible since the information in the library is arbitrary and based of nothing but the information that is put into it. It doesn't really have anything to do at all with the filename of the episode itself. I'll keep an eye on this thread in the case that this is implemented.

acaranta
2009-02-21, 08:22
I'm not sure I understand what you're saying ... anyway, What you could try with my lastest version of the tvdb (which has been commited to svn ;) ) if activate the absolute numbering ...

Now in this mode, the scraper should return both episodes as S01e<absolutenumber> and SXXEXX as normal seasons ... so it *should* work all the time both absolute and not absolute

But I may not have understood your request :p

sho
2009-02-21, 12:00
I think they mean they want to skip the season identifier altogether.
If I understand things correctly that would require an XBMC code change and no developer has sprung out of the woodwork to volunteer to do that.

goku31640
2009-02-22, 08:26
im not sure, but i think yhey are trying to say is that it would be cool if you turn on absolute numbering and name all your episodes s01e54, s01e192 etc etc but they still show up in the library as seasons 1,2,3,4 etc.

The filenames are in absolute format, but once entered into the library, they show up as they normally would with absolute ordering.

Am I making sense?

tripe
2009-02-22, 21:43
it would be cool if you turn on absolute numbering and name all your episodes s01e54, s01e192 etc etc but they still show up in the library as seasons 1,2,3,4 etc.

^^ Basically I meant what he said.

acaranta
2009-02-23, 08:19
OK ;)

got it ... but yes it'd need changes in the xbmc code ... and i'm not really good enough to do such thing :p

WorldWide01
2009-03-22, 20:00
When I do a scan, it seems like all of my episodes are coming back as specials. The number is listed in the filename as " - 050 -" for example. Can you point me in the right direction?

Update:
Okay. I think I see what's going on now. Please correct me if I'm wrong in any of these statements:

1) You're naming the files in the format "S01Exxx" where "xxx" represents the absolute number
2) Using the same format, you're designating specials as season "0" in the format S0Exxx (with "xxx" being the special number again)

If this is correct, I see where some people raising concerns. You can't really rename it if it's in a torrent and you'd like to seed in order to share with others. For example, most of the people that have downloaded DBZ probably got it from the old [AHQ] release which numbers their episodes as I've stated above.

I've only been using XBMC since Thursday, so please forgive my lack of knowledge. So are you saying that XBMC itself hanldes the parsing of the filenames to determine a number and then the scraper takes that and uses it against the database? I guess where I'm a little lost is in what the scraper is handling and what XBMC is doing.

To the leech who scoffed at leaving the names alone for the purposes of a torrent: I guess I'm crazy for wanting to keep seeding in order to allow others to enjoy the same file set? If nobody was seeding, you never would've gotten yours in the first place...

spiff
2009-03-22, 20:54
xbmc enumerate a series. the scraper returns an episode guide (i.e. a map with season and episode numbers as key, url as data). we match those, and run the scraper on the url's in question. there is absolutely no problem to handle this if only you are willing to do some stuff yourself.
stick the files in a numbered season folder (season 1). do a regexp to grab the season number for the folder name, the ep number from the files. voila

WorldWide01
2009-03-24, 03:37
xbmc enumerate a series. the scraper returns an episode guide (i.e. a map with season and episode numbers as key, url as data). we match those, and run the scraper on the url's in question. there is absolutely no problem to handle this if only you are willing to do some stuff yourself.
stick the files in a numbered season folder (season 1). do a regexp to grab the season number for the folder name, the ep number from the files. voila

I think I get what the issue is with what I'm trying to do. I'm trying to have theTVDB actually parse out the seasons and episiodes. This is difficult (impossible?) b/c that's done by XBMC before the query is ever sent. Is there any way to do a reverse lookup during the scrape or possibly a second pass? Is the following even possible?

XBMC enumerates the files -> scrape is done -> data is examined -> re-enumerated by what was returned from the scrape.

I think that's the only way to leave the absolute numbers in the file name and then parse it into manageable seasons (DBZ, I blame you) without renaming any files.

spiff
2009-03-24, 11:10
of course it's possible. totally unwanted ofc, but possible. also it doesn't make sense at all, why would we need to fetch anything?

WorldWide01
2009-03-24, 17:30
LOL! Okay. It's obvious I'm not making any sense, so scrap what I've said and let me know how you'd solve the following:

Without altering the filenames or separating them out into different folders, I'd like to have the anime series split into seasons as according to theTVDB.com.

The query to theTVDB would be by absolute episode number (177, for example). Once it finds the episode, it also now has the Season and corresponding episode number as well. I'd then like to use the season and episode number returned from the scrape to sort the series into my library.

Does that make any more sense? If so, would you mind giving me the junior dev treatment and laying out the process for me?


Thanks for your continued responsiveness.

spiff
2009-03-24, 17:42
hi,

what is needed is a change in the enumeration code itself. currently, as i said, it uses a map with a pair of season,episodenr as key. what would be needed is

1) a set of regexp'es to enumerate episode-only filenames
2) code in the enumerator to use these regexp's and which puts the season number at some random number (1 being the logical)
3) code in the scraper to return the episode guide using absolute numberings and with the season set to the one chosen over
4) backend support for translation - currently there is none. if you request absolute numbering you only get those returned iirc.

edward.81
2009-05-03, 17:39
I cant get absolute numbering to work.
I try to scrape bleach.
From the log the episode number is detected correctly [serie 01 episode 001] from log (s01e001)

but the xbmc detect it as special episode 01
I have make some try by putting 2 file in the folder with following names:
xxx_s01e001.avi
xxx_s00e001.avi
When i scan i see in top right corner the msg telling me that are searching the episodes number 0x01 and 1x01 but both in library have the tittle of special 01.

edward.81
2009-05-03, 20:31
Ok i found why appen this.
The problem is the regexp in tvdb.xmb (line 189)
<Episode>.*?<id>([0-9]*)</id>.*?<EpisodeName>([^<]*)</EpisodeName>.*?<absolute_number>([0-9]+)</absolute_number>.*?</Episode>

If the special episodes are on top of the xml file downloaded from tvdb this found the absolute number of the first item that are not empty and ignore the empty <absolute_number></absolute_number>

This regExp instead work...for now..
<Episode>.*?<id>([0-9]*)</id>.*?<EpisodeName>([^<]*)</EpisodeName>.*?<absolute_number>((?:[^<]?)+)

encoded version.

&lt;Episode&gt;.*?&lt;id&gt;([0-9]*)&lt;/id&gt;.*?&lt;EpisodeName&gt;([^&lt;]*)&lt;/EpisodeName&gt;.*?&lt;absolute_number&gt;((?:[^&lt;]?)+)

spiff
2009-05-05, 14:11
do NOT abuse report post. this is NOT what it is for. you just spammed ALL the forum operators and all of them thought there was something wrong with a post.

if you want our attention you do things the proper way and submit a diff on trac.

psorcerer
2009-05-08, 01:35
hi,

what is needed is a change in the enumeration code itself. currently, as i said, it uses a map with a pair of season,episodenr as key. what would be needed is

1) a set of regexp'es to enumerate episode-only filenames
2) code in the enumerator to use these regexp's and which puts the season number at some random number (1 being the logical)
3) code in the scraper to return the episode guide using absolute numberings and with the season set to the one chosen over
4) backend support for translation - currently there is none. if you request absolute numbering you only get those returned iirc.

Sounds quite complicated to me.
Meanwhile I've done a quick hack to make it work using empty regexp for the season value (e.g. something like ([0-9]*) and append="yes" in advanced settings)

@@ -830,7 +829,12 @@
if (season && episode)
{
CLog::Log(LOGDEBUG,"found match %s (s%se%s) [%s]",strLabel.c_str(),season,episode,expression[j].c_str());
- myEpisode.iSeason = atoi(season);
+ if (season[0] != '\0')
+ {
+ myEpisode.iSeason = atoi(season);
+ }
+ else
+ { myEpisode.iSeason = 1; }
myEpisode.iEpisode = atoi(episode);
episodeList.push_back(myEpisode);
bMatched = true;

spiff
2009-05-08, 01:39
hmm, that isnt a bad idea at all :) it solves part of the puzzle in a not too shabby way. you still only have a single season of course, but it sure scratches the anime itch.

trac it please.

psorcerer
2009-05-08, 02:13
hmm, that isnt a bad idea at all :) it solves part of the puzzle in a not too shabby way. you still only have a single season of course, but it sure scratches the anime itch.

trac it please.

http://xbmc.org/trac/ticket/6508

I think it's good enough if combined with absolute numbering hack for thetvdb.com You get all the episodes inside one season but with the correct info and numbering.

Lithochasm
2009-06-23, 21:46
Hello everyone. I'm new here but i'v been using XBMC for awhile and I cant get the absolute numbering to identify any of my anime.

I enabled absolute numbering in the tvdb options, and I also tried the hack psourcer posted since none of my anime has a season number: ( it looks like Darker Than Black 01[001A3f5].mkv )

Edit:// Didn't see that i needed the <advancedsettings> tag. Now it only found one episode for three anime series out of 15. Ie, it found episode two of cowboy bebop. This sounds like a regex problem. Let me see if I can pinpoint whats going on.

Yeah it looks like its matching season 0 for all of the series and only returning the specials. I guess the only fix so far that doesn't require me to rename everything is to use an .nfo file. Any other ideas?

Lithochasm
2009-11-04, 18:46
Sorry to beat a dead horse, but has there been any progress with anime scraping and the TVDB? Its been five months and I still haven't gotten my anime to be scraped correctly.

spiff
2009-11-04, 18:58
thats only due to you sitting on your ass doing nothing. this has been supported for uhm 8 months maybe? add expressions matching only episode numbers.

Lithochasm
2009-11-04, 19:27
Iv been running the dev builds and I have this:

<tvshowmatching action="append">
<regexp>[\._ \-]([0-9]*)([0-9][0-9])([\._ \-][^\\/]*)</regexp>
</tvshowmatching>

in my advanced settings and it is not working, I was just asking for clarification. I also tried wiping my existing dev build and going back to just 9.04 and it still doesn't work.

spiff
2009-11-04, 19:40
because that expression matches season and episode number. exactly NOT what i said ;)

also i assume you have stripped the <advancedsettings> tag there

Lithochasm
2009-11-04, 19:53
Ahh thats the expression listed in the main body of the trac ticket, I thought that was updated, oops.

Also I did have the advanced settings tag. What is the correct regex, just match on an episode number ala:

<regexp>Ep([0-9]+)</regexp> ?

spiff
2009-11-04, 20:02
yeah

Lithochasm
2009-11-04, 20:09
Awesome thanks!

tbob19
2009-11-22, 12:05
So, do I have to build it myself (and edit the VideoInfoScanner.cpp) to get this to work properly?

With Absolute Ordering on, newer shows that have no Absolute Number do not find any episodes but without Absolute Ordering on then other longer shows with the Absolute Number (no season or just 1x(episode) in the filename) find no episodes. I'm not sure if there was a way to have it look for both at the same time, I may have missed something.

This is what I added to my Advanced settings:

<advancedsettings>
<tvshowmatching action="append">
<regexp>Ep([0-9]+)</regexp>
</tvshowmatching>
</advancedsettings>

And I downloaded the #5860 tvdb.xml..

Another option would be to just add the absolute number to the newer shows.

Thanks :grin: