XBMC Community Forum  

Go Back   XBMC Community Forum > Development > Scraper Development

Scraper Development Developers forum for meta data scrapers. Scraper developers only!
Not for posting feature requests, bugs, or end-user support requests!

Reply
 
Thread Tools Search this Thread Display Modes
Old 2008-10-13, 17:01   #1
pyro-x
Junior Member
 
pyro-x's Avatar
 
Join Date: Sep 2008
Location: Madrid, Spain
Posts: 4
pyro-x is on a distinguished road
Send a message via AIM to pyro-x Send a message via MSN to pyro-x Send a message via Yahoo to pyro-x
Question Last.fm scraper in development - help wanted

http://xbmc.org/forum/showthread.php?t=38378
Quote:
Originally Posted by spiff View Post
i have considered it but i do not feel comfortable scraping a site that provides open api's. that being said, anyone else is ofc free to do it
Hi spiff!

I've already tried to begin with the last.fm scrapper. But i have some problems understanding how the flows and interaction between the scrapper and xbmc works...

I started modifying your allmusic scrapper. Just to have something to begin with..

I would like to fully understand i the flow.

First, the scrapper create the albumsearchurl, i maganaged to get that working... after a working url has been created xbmc makes the request to it and then the regexp to parse the resuts. What follows is what i don't fully understand, once i got the basic information, album title, and url.. how the request for the album information url is done. I never get a list of albums or anything.

This is mi getalbumsearchresult :
<GetAlbumSearchResults dest="8">
<RegExp input="$$5" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
<RegExp input="$$1" output="&lt;entity&gt;&lt;year&gt;2000&lt;/year&gt;&lt;genre&gt;test&lt;/genre&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url&gt;http://www.last.fm/music\1&lt;/url&gt;&lt;/entity&gt;" dest="5">
<expression repeat="yes">&lt;a href=&quot;(.*)&quot;&gt;(.*)&lt;/a&gt; &lt;span</expression>
</RegExp>
<expression noclean="1"></expression>
</RegExp>
</GetAlbumSearchResults>

I don't know if all fields (year,genre,title,etc) need to exist in the result of the first request, i don't have them available at first, but i would from the url fetched in the regexp...

Perhaps there is something wrong with the regexp, ive tried m any simple ones, with no result, is there any way to force a valid result?. So i can get to the next step in the scrapper?.

Well, i don't really know if i'm actually making any sense here.. but thank you very much in advance for your help.

Pyro-X

Last edited by Gamester17; 2008-10-15 at 19:20.
pyro-x is offline   Reply With Quote
Old 2008-10-13, 17:12   #2
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

okay, i'm in a bit of a hurry so i'll just go fast.

createalbumsearchurl - creates the url
getalbumsearchresults - returns a list of possible matches. here you only fill title, artist, year, genre (actually only title & artist is required). xbmc chooses which of these the user want (either letting the user choose from a list or scoring the matches and taking the one with the highest score), fetches this url then runs the scraper function GetAlbumDetails on the contents of that url. THIS is where you will fill ALL details.
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Old 2008-10-14, 12:19   #3
pyro-x
Junior Member
 
pyro-x's Avatar
 
Join Date: Sep 2008
Location: Madrid, Spain
Posts: 4
pyro-x is on a distinguished road
Send a message via AIM to pyro-x Send a message via MSN to pyro-x Send a message via Yahoo to pyro-x
Default

Quote:
Originally Posted by spiff View Post
okay, i'm in a bit of a hurry so i'll just go fast.

createalbumsearchurl - creates the url
getalbumsearchresults - returns a list of possible matches. here you only fill title, artist, year, genre (actually only title & artist is required). xbmc chooses which of these the user want (either letting the user choose from a list or scoring the matches and taking the one with the highest score), fetches this url then runs the scraper function GetAlbumDetails on the contents of that url. THIS is where you will fill ALL details.

Thank you very much for your explanation.. finally i got it to work . My blocking step was that i didn't know why was the need of two regexp at the GetAlbumSearchResults section. I still don't know, but wrote then two regexp one for the "search title" and another one for the actual results. And then finally got into the next step .

Anyway, what fields are mandatory for the album detail?, there isn't so much information on last.fm. I don't know what i should do with the "tags" because a tag here can be, a genre, a mood, and sometimes any other crazy thing the last.fm user decides xD.

Anyway thank you very much, i'm already making progress with it . I even can get the cover !! )

Oh!, one more thing i'm thinking about... if mp3s are already tagged with its genre , year of publish, and then with the scrapper get some of that information again from the web, which one is taken into account for the xbmc db and the library mode?. Ones from the id3 tags, or the ones from the scrapping result.

Thanks,

Pyro-X

Last edited by pyro-x; 2008-10-14 at 12:34.
pyro-x is offline   Reply With Quote
Old 2008-10-15, 00:29   #4
v0lrath
Member
 
Join Date: Sep 2008
Location: Redmond, WA/Provo, UT
Posts: 56
v0lrath is on a distinguished road
Default

I would love to be able to use tags as genres so I could go through my library by last.fm tags. It would be nice if tags such as "seen live" and "awesome", etc. were filtered out though.

Keep up the good work, Last.fm is much better than allmusic.
v0lrath is offline   Reply With Quote
Old 2008-10-15, 19:28   #5
Gamester17
Team-XBMC Project Manager
 
Gamester17's Avatar
 
Join Date: Sep 2003
Location: Sweden
Posts: 10,582
Gamester17 will become famous soon enough
Arrow Tips!

See:
http://xbmc.org/wiki/?title=Category:Scraper
and:
http://xbmc.org/forum/showthread.php?t=38379
and:
http://xbmc.org/forum/showthread.php?t=38378

__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
Gamester17 is offline   Reply With Quote
Old 2008-11-18, 01:13   #6
DuMbGuM
Senior Member
 
Join Date: Sep 2008
Location: Ireland
Posts: 242
DuMbGuM is on a distinguished road
Default

looking good pyro-x, keep up the hard work, last.fm will be a real nice repo to have as a scraper.
DuMbGuM is offline   Reply With Quote
Old 2008-11-21, 03:30   #7
spyrojyros_tail
Junior Member
 
Join Date: Nov 2008
Posts: 9
spyrojyros_tail is on a distinguished road
Default

Hey guys, a last.fm scraper would be brilliant!! I would love to have my collection updated.

Pyro-x are you and rwparris2 going to be working on this together
spyrojyros_tail is offline   Reply With Quote
Old 2008-11-21, 05:30   #8
rwparris2
Team-XBMC Python Coder
 
Join Date: Jan 2008
Location: US
Posts: 1,299
rwparris2 is on a distinguished road
Default

Quote:
Originally Posted by spyrojyros_tail View Post
Hey guys, a last.fm scraper would be brilliant!! I would love to have my collection updated.

Pyro-x are you and rwparris2 going to be working on this together
I pretty much gave up on it -- couldn't get my head to wrap around what was going on, and decided to waste my time on things that didn't frustrate me.

Buut I really want it so if Pyro-x or anyone else wants to email me / PM me feel free.
(Any emails that want whatever I have so far will be replied with " " because thats basically all I managed)
__________________

Always read the XBMC online-manual, FAQ and search and search the forum before posting.
For troubleshooting and bug reporting please read how to submit a proper bug report.

If you're interested in writing addons for xbmc, read docs and how-to for plugins and scripts ||| http://code.google.com/p/xbmc-addons/


rwparris2 is offline   Reply With Quote
Old 2008-11-22, 01:42   #9
TechLife
Member
 
Join Date: Aug 2008
Location: Wilmington, NC
Posts: 66
TechLife is on a distinguished road
Default

Quote:
Originally Posted by spiff View Post
...either letting the user choose from a list or scoring the matches and taking the one with the highest score...
How are the returned results scored? If the site supports it, can I return a relevance value so that the proper entry from the returned list is chosen?
__________________
HTPC: AMD 3.2GHz Dual-Core / 2GB RAM (1.5G System, 512M allocated to IGP) / Gigabyte GA-MA78GM-S2H w/ ATI Radeon HD3200 IGP / Catalyst 8.11 / XP MCE SP3 / XBMC r17272
Development Platform: AMD 2.4GHz Dual-Core / 2GB RAM / Asus A8N32-SLI Deluxe / EVGA 7950GX2 / GeForce 178.24 / Vista SP1 32-bit / XBMC r17272

Last edited by TechLife; 2008-11-22 at 01:51.
TechLife is offline   Reply With Quote
Old 2008-11-22, 02:51   #10
spiff
Grumpy Bastard Developer
 
spiff's Avatar
 
Join Date: Nov 2003
Posts: 7,715
spiff is on a distinguished road
Default

usually it's a equally weighted fuzzy string match of artist and album.
with 16266 add the following in the return xml

<relevance scale="yy">x.xx</relevance>

where x.xx is a number between 0 and 1 and yy is an optional scale (if your number is scaled otherwise) and your wish is granted.
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting.
Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules.
For troubleshooting and bug reporting please make sure you read this first.
spiff is offline   Reply With Quote
Reply

Bookmarks

Tags
discogs, last.fm, lastfm, scraper, scrapers


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 15:25.


Protected by Akismet, We recommend WordPress blogs
Copyright © 2008, XBMC Project