View Full Version : discogs.com music scraper - development help and bug reports wanted!
hey,
after seeing all the polite and constructive requests, as well as some not so constructive whining, for a better music scraper, i finally gave in and did it myself.
r16773
let the bug reporting commence
Wow nice, thank you for the work you did!
I had a quick go with it and it seems to work.. 4 new albums got scraped nicely using 'Update library'. 1 album came back with false results, though. I'll try to figure out what went wrong later today and report back.
Aahh, excellent! I was hoping for a discogs scraper :)
Working great so far, much much better results than with allmusic.
Still need to do a full scan, will report on how that goes.
Thanks a lot spiff!
nekrosoft13
2009-01-06, 15:25
nice, just tried it today on a rather small collection of music, just one thing to report.
discography doesn't work for any artist.
artist scrapes i have hardly worked on, personally i still use allmusic for artists
Finally had some time to do more testing; I cleared my music library today and ran album lookups using the discogs scraper on 50+ albums. All mp3s have proper id3 tags, i always use musicbrainz picard before anything is added to the library.
Think I've found one bug, which looks fixable. Not sure though if it's the scraper or XBMC's scraper framework which is broken. The details for Albums with a single quote in their name are never fetched from discogs. That is: the GetAlbumSearchResults thingy is executed (i have wireshark and strace logs that show the discogs HTTP request and response), but the album details lookup is never executed.
When looking up the album details XBMC briefly shows the 'Looking up album names..' dialog, which then disappears without further notice.
I'll try to create a patch and submit it.
Where can I get the file ??
CONKER UNIT1
2009-01-22, 07:32
im not clear on how to get this or how to install...
can someone please help...thanks
no. we won't help lazy ppl. everything you need is in the manual, and i bet google is available at your location.
besides, this is a DEV forum.
Scraper Development Developers forum for meta data scrapers. Scraper developers only!
Not for posting feature requests or end-user support requests!
yeah spiff.. once again you've made my day. this is a real reason to update to the latest svn - thanks! i'll report asap.
hey spiff,
not sure if you're still interested in maintaining this scraper,
but fetching album info / album artwork does not work anymore.
grabbing artist info / artist thumb still works without a problem.
create a ticket please and we'll see if it pops up on top
change the <GetAlbumSearchResults> expression to
\n<a href="([^"]*/release/[^"]*)".*?>(.*?) - (.*?)</a>
thanx to the both of you for the quick replies.
i grapped the modified scraper from svn, but no luck.
to make it work, i had to change the regex to:
<a href="([^"]*/release/[^"]*)".*?>(.*?) - (.*?)</a>
Hi,
Just found the Discogs Scraper and run it on my Library.
I tried to add ANV (artist name variations) so it would get more hits when looking for artists. This seems to work quite well.
Here is the code I've modified:
<GetArtistSearchResults dest="8">
<RegExp input="$$5" output="<results>\1</results>" dest="8">
<!-- artist name variation -->
<RegExp input="$$1" output="<entity><title>\2</title><url>http://www.discogs.com\1</url></entity>" dest="5+">
<expression repeat="yes" clear="yes"><a class="rollover_link" href="(/artist[^"]*anv=[^"]*)">(.+)</a></expression>
</RegExp>
<!-- exact match -->
<RegExp input="$$1" output="<entity><title>\2</title><url>http://www.discogs.com\1</url></entity>" dest="5+">
<expression repeat="yes" clear="no"><a class="rollover_link" href="(/artist[^"]*)"><span style="font-size:11pt;"><em>([^<]*)<</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</GetArtistSearchResults>
Then I found that the scraper doesn't find artists having "The " prepended. i.e "The Art of Noise" is not found since discogs expects it to be "Art of Noise, The"
I tried to add this, but couldn't get it to work. Maybe somebody can help.
<CreateArtistSearchUrl dest="3">
<RegExp input="$$2" output="http://www.discogs.com/search?type=artists&q="\1"&btn=Search" dest="3">
<RegExp input="$$2" output="\1,%20The" dest="2">
<RegExp input="$$1" output="\1" dest="2">
<expression noclean="1"/>
</RegExp>
<expression noclean="1" clear="no" repeat="no" trim="1">[Tt]he[ ](.+)</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</CreateArtistSearchUrl>
The problem seems to be the blank after "The". I tried:
"[Tt]he (.+)"
"[Tt]he\s(.+)"
"[Tt]he[ ](.+)"
but none of them matched.
When I change it to:
"[Tt]he(.+)" it works but of course \1 has a prepending blank and the resulting string is:" Art of Noise, The".
Any ideas?
Bernd
you are passed an url encoded version of the name, i.e. that whitespace is a + (or is it %20 can't recall).
updates as diff's on trac please
%20 worked!
Thanks for the hint.
Now I need to find out how to create a diff so I can post it on trac.
Bernd
PS: I would be nice if the debug log contained the inputs and outputs to the scraper. This would make debugging easier.
Ticket #6316 (http://xbmc.org/trac/ticket/6316) added and attached the patch.
Bernd
gaborlazar
2009-04-29, 11:58
Hi there,
There is an artist on discogs, and there is a bug in the name of artist. The last character is small, but it has to written with capitalized just like in all the releases.
http://www.discogs.com/artist/APh
can you/we fix this bug?
reply to aph01@freemail.hu pleasepleaseplese
Im sorry i posted a bug that which is already fixed but I didnt see that I was using an old version.