View Full Version : Last.fm scraper in development - help wanted
http://xbmc.org/forum/showthread.php?t=38378i have considered it but i do not feel comfortable scraping a site that provides open api's. that being said, anyone else is ofc free to do it
Hi spiff!
I've already tried to begin with the last.fm scrapper. But i have some problems understanding how the flows and interaction between the scrapper and xbmc works...
I started modifying your allmusic scrapper. Just to have something to begin with..
I would like to fully understand i the flow.
First, the scrapper create the albumsearchurl, i maganaged to get that working... after a working url has been created xbmc makes the request to it and then the regexp to parse the resuts. What follows is what i don't fully understand, once i got the basic information, album title, and url.. how the request for the album information url is done. I never get a list of albums or anything.
This is mi getalbumsearchresult :
<GetAlbumSearchResults dest="8">
<RegExp input="$$5" output="<results>\1</results>" dest="8">
<RegExp input="$$1" output="<entity><year>2000</year><genre>test</genre><title>\2</title><url>http://www.last.fm/music\1</url></entity>" dest="5">
<expression repeat="yes"><a href="(.*)">(.*)</a> <span</expression>
</RegExp>
<expression noclean="1"></expression>
</RegExp>
</GetAlbumSearchResults>
I don't know if all fields (year,genre,title,etc) need to exist in the result of the first request, i don't have them available at first, but i would from the url fetched in the regexp...
Perhaps there is something wrong with the regexp, ive tried m any simple ones, with no result, is there any way to force a valid result?. So i can get to the next step in the scrapper?.
Well, i don't really know if i'm actually making any sense here.. but thank you very much in advance for your help.
Pyro-X
okay, i'm in a bit of a hurry so i'll just go fast.
createalbumsearchurl - creates the url
getalbumsearchresults - returns a list of possible matches. here you only fill title, artist, year, genre (actually only title & artist is required). xbmc chooses which of these the user want (either letting the user choose from a list or scoring the matches and taking the one with the highest score), fetches this url then runs the scraper function GetAlbumDetails on the contents of that url. THIS is where you will fill ALL details.
okay, i'm in a bit of a hurry so i'll just go fast.
createalbumsearchurl - creates the url
getalbumsearchresults - returns a list of possible matches. here you only fill title, artist, year, genre (actually only title & artist is required). xbmc chooses which of these the user want (either letting the user choose from a list or scoring the matches and taking the one with the highest score), fetches this url then runs the scraper function GetAlbumDetails on the contents of that url. THIS is where you will fill ALL details.
Thank you very much for your explanation.. finally i got it to work :). My blocking step was that i didn't know why was the need of two regexp at the GetAlbumSearchResults section. I still don't know, but wrote then two regexp one for the "search title" and another one for the actual results. And then finally got into the next step :).
Anyway, what fields are mandatory for the album detail?, there isn't so much information on last.fm. I don't know what i should do with the "tags" because a tag here can be, a genre, a mood, and sometimes any other crazy thing the last.fm user decides xD.
Anyway thank you very much, i'm already making progress with it :). I even can get the cover !! :))
Oh!, one more thing i'm thinking about... if mp3s are already tagged with its genre , year of publish, and then with the scrapper get some of that information again from the web, which one is taken into account for the xbmc db and the library mode?. Ones from the id3 tags, or the ones from the scrapping result.
Thanks,
Pyro-X
I would love to be able to use tags as genres so I could go through my library by last.fm tags. It would be nice if tags such as "seen live" and "awesome", etc. were filtered out though.
Keep up the good work, Last.fm is much better than allmusic.
Gamester17
2008-10-15, 19:28
See:
http://xbmc.org/wiki/?title=Category:Scraper
and:
http://xbmc.org/forum/showthread.php?t=38379
and:
http://xbmc.org/forum/showthread.php?t=38378
:;):
looking good pyro-x, keep up the hard work, last.fm will be a real nice repo to have as a scraper.
spyrojyros_tail
2008-11-21, 03:30
Hey guys, a last.fm scraper would be brilliant!! I would love to have my collection updated.
Pyro-x are you and rwparris2 going to be working on this together ???
rwparris2
2008-11-21, 05:30
Hey guys, a last.fm scraper would be brilliant!! I would love to have my collection updated.
Pyro-x are you and rwparris2 going to be working on this together ???
I pretty much gave up on it -- couldn't get my head to wrap around what was going on, and decided to waste my time on things that didn't frustrate me.
Buut I really want it so if Pyro-x or anyone else wants to email me / PM me feel free.
(Any emails that want whatever I have so far will be replied with " " because thats basically all I managed)
TechLife
2008-11-22, 01:42
...either letting the user choose from a list or scoring the matches and taking the one with the highest score...
How are the returned results scored? If the site supports it, can I return a relevance value so that the proper entry from the returned list is chosen?
usually it's a equally weighted fuzzy string match of artist and album.
with 16266 add the following in the return xml
<relevance scale="yy">x.xx</relevance>
where x.xx is a number between 0 and 1 and yy is an optional scale (if your number is scaled otherwise) and your wish is granted.
TechLife
2008-11-22, 02:55
luv u spiff :D
hey guys, its good to find out somebody is working on more music scrapers. Most of my albums come up blank on allmusic.com. An alternative scraper to work around this issue would be great! Last.FM and discogs.com are my favorite sites, and i'd be happy to help out with scraper development for either site.
Is there any public place where i can find the current development version of your scraper ? Ofcourse i'd like to know where the development of the lastfm scraper is at. I have a lot of scripting experience (all sorts of stuff: PHP, Perl, pl/pgsql, LUA script, etc..) and have constructed a lot of complex regular expressions in the past, so if any help is needed, let me know.. :)
Aron Parsons
2008-12-24, 04:39
@pyro-x
Are you still working on this? Do you want any assistance? If so, post your latest revision and I'll see where I can help out.
kastrolis
2008-12-29, 16:35
Pyro-x is either so busy with his work on the scraper that he doesn't even have time to check this message board, or he has given up on it altogether. if the second option were to be true, also taking in account that work on discogs.com scraper also seems to have been discontinued, there's no doubt - some other people should take over the alternative scraper development. personally I'm ready to start working on this scraper, however I have serious doubts that last.fm really is the best source of information, as the only real thing that it provides are album covers. discogs.com seems slightly better, but I don't like it listing all the different issues of the record (Canadian Vinyl editions and stuff like that).
Aron Parsons
2008-12-29, 20:40
last.fm is sometimes good for more obscure artists and that's the main reason I want to scrape from it. Not all of the groups I listen to have entries at the more popular sites (e.g. AllMusic).
It's not currently possible to cascade scrapers, is it? For example, it can't find an artist with scraper 1, so it tries #2, then #3. Perhaps that is another bit of functionality that I can work on if others would find it useful as well.
that would be neat, and some system to cross reference several scrapers in order to make matches more reliable
AllMusic.com isn't updated on a regular basis and doesn't list many relatively unknown artists. I'd expect a boost in search results from almost any new scraper, if its last.fm or discogs.. These two sites are both as good, provide a decent API etc.. One might be better than the other when it comes to specific musical nices. Most people are better off to use their favorite scraper for regular searches. We might want to make it a bit easier to switch scrapers, from the Album/Movie info screen or the manual search screen maybe ?
Anyway, in meantime I did some testing on a discogs.com scraper. This scraper script is just a few lines, it has CreateAlbumSearchUrl and GetAlbumSearchResults nodes and thats about it. Where I get stuck is testing the thing, XBMC simply crashes if I use it (no messages in xbmc.log).
The 'Scrap' test tool looks promising, however it is built to handle video scrapers, not music. Tried to fix that as well, but got caught up in C++ syntax :S Scraper development would be much easier if anyone could fix that Scrap tool. Doesn't look like an easy job, though ..
Am I missing a development tool here, should I be debugging from Visual C++ etc. ?
Aron Parsons
2008-12-30, 17:24
Where I get stuck is testing the thing, XBMC simply crashes if I use it (no messages in xbmc.log).
The 'Scrap' test tool looks promising, however it is built to handle video scrapers, not music. Tried to fix that as well, but got caught up in C++ syntax :S Scraper development would be much easier if anyone could fix that Scrap tool. Doesn't look like an easy job, though ..
Am I missing a development tool here, should I be debugging from Visual C++ etc. ?
I was fighting with the same thing last night. The 'Scrap' tool doesn't work correctly. Getting it to compile and run on Linux will be the first step, so maybe I'll try to get that sorted out over the upcoming weekend. Trying to debug scrapers through XBMC's GUI doesn't sound like fun.
scrap is broken as we lost the source (blame donj for bad committing practices).
the way i do scrapers is using a regexp tool + printf. it works just fine for me, but an updated scrap tool would be invaluable for sure
oh damn. kriziz, i have a semi-done discogs scraper (only album part)! we should try not to dupe work :)
if you want i'll gladly leave the artist part to you :)
basic scraper added to svn
not even a 'it does not work you silly sod'?
eh :)
i tried it but it didn't retrieve much... does it need a last.fm account set up?
(this was a politically correct version for 'it does not work you silly sod' :p )
shouldnt require anything liek that no.
last.fm offers very little info. just artist, album name, year, releasedate, review and track listing.
even less for artists, although some of the biography's are good.
you did not get any info at all?
really i didn't try it much... i'm going to try again and see, thanks :)
bashflyng
2009-01-13, 01:02
Thanks a lot spiff, I don't have the time to check it now, but this is one of the features I was really looking forward for XBMC to have :)
Loto_Bak
2009-01-23, 04:08
Would you be able to scrape each tracks 'listeners' and apply it to the 'ratings' field of the database?
each album might need to be calculated relative to itself
i have absolutely no idea what that last sentence is supposed to mean.
http://www.last.fm/api
as you can see there is no way to search for tracks from a specific album
TheNME123
2009-02-17, 19:55
Hey spiff,
i tried out your last.fm scraper and must say it works brilliant. The only downside are the crappy artist/album pics from last.fm but that is not your fault.
It would be nice if some kind of a language selection would be possible. Perhaps with a fallback to english if no information for the selected language is availabe.
Thanks for your great work
TheNME123