PDA

View Full Version : discogs.com music scraper - development help and bug reports wanted!


spiff
2008-12-31, 02:06
hey,

after seeing all the polite and constructive requests, as well as some not so constructive whining, for a better music scraper, i finally gave in and did it myself.

r16773

let the bug reporting commence

kriziz
2009-01-02, 12:32
Wow nice, thank you for the work you did!
I had a quick go with it and it seems to work.. 4 new albums got scraped nicely using 'Update library'. 1 album came back with false results, though. I'll try to figure out what went wrong later today and report back.

Jeroen
2009-01-03, 09:16
Aahh, excellent! I was hoping for a discogs scraper :)
Working great so far, much much better results than with allmusic.
Still need to do a full scan, will report on how that goes.
Thanks a lot spiff!

nekrosoft13
2009-01-06, 15:25
nice, just tried it today on a rather small collection of music, just one thing to report.

discography doesn't work for any artist.

spiff
2009-01-06, 15:32
artist scrapes i have hardly worked on, personally i still use allmusic for artists

kriziz
2009-01-10, 13:22
Finally had some time to do more testing; I cleared my music library today and ran album lookups using the discogs scraper on 50+ albums. All mp3s have proper id3 tags, i always use musicbrainz picard before anything is added to the library.

Think I've found one bug, which looks fixable. Not sure though if it's the scraper or XBMC's scraper framework which is broken. The details for Albums with a single quote in their name are never fetched from discogs. That is: the GetAlbumSearchResults thingy is executed (i have wireshark and strace logs that show the discogs HTTP request and response), but the album details lookup is never executed.

When looking up the album details XBMC briefly shows the 'Looking up album names..' dialog, which then disappears without further notice.

I'll try to create a patch and submit it.

Roborob
2009-01-20, 16:10
Where can I get the file ??

spiff
2009-01-20, 17:26
it's in svn

CONKER UNIT1
2009-01-22, 07:32
im not clear on how to get this or how to install...
can someone please help...thanks

spiff
2009-01-22, 09:43
no. we won't help lazy ppl. everything you need is in the manual, and i bet google is available at your location.

besides, this is a DEV forum.

Scraper Development Developers forum for meta data scrapers. Scraper developers only!
Not for posting feature requests or end-user support requests!

azido
2009-01-22, 11:25
yeah spiff.. once again you've made my day. this is a real reason to update to the latest svn - thanks! i'll report asap.

ronie
2009-03-26, 21:37
hey spiff,

not sure if you're still interested in maintaining this scraper,
but fetching album info / album artwork does not work anymore.

grabbing artist info / artist thumb still works without a problem.

spiff
2009-03-26, 21:45
create a ticket please and we'll see if it pops up on top

C-Quel
2009-03-26, 22:30
change the <GetAlbumSearchResults> expression to

\n&lt;a href=&quot;([^&quot;]*/release/[^&quot;]*)&quot;.*?&gt;(.*?) - (.*?)&lt;/a&gt;

ronie
2009-03-27, 02:50
thanx to the both of you for the quick replies.
i grapped the modified scraper from svn, but no luck.

to make it work, i had to change the regex to:
&lt;a href=&quot;([^&quot;]*/release/[^&quot;]*)&quot;.*?&gt;(.*?) - (.*?)&lt;/a&gt;

Bernd
2009-04-10, 00:30
Hi,

Just found the Discogs Scraper and run it on my Library.
I tried to add ANV (artist name variations) so it would get more hits when looking for artists. This seems to work quite well.
Here is the code I've modified:
<GetArtistSearchResults dest="8">
<RegExp input="$$5" output="&lt;results&gt;\1&lt;/results&gt;" dest="8">
<!-- artist name variation -->
<RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url&gt;http://www.discogs.com\1&lt;/url&gt;&lt;/entity&gt;" dest="5+">
<expression repeat="yes" clear="yes">&lt;a class=&quot;rollover_link&quot; href=&quot;(/artist[^&quot;]*anv=[^&quot;]*)&quot;&gt;(.+)&lt;/a&gt;</expression>
</RegExp>
<!-- exact match -->
<RegExp input="$$1" output="&lt;entity&gt;&lt;title&gt;\2&lt;/title&gt;&lt;url&gt;http://www.discogs.com\1&lt;/url&gt;&lt;/entity&gt;" dest="5+">
<expression repeat="yes" clear="no">&lt;a class=&quot;rollover_link&quot; href=&quot;(/artist[^&quot;]*)&quot;&gt;&lt;span style=&quot;font-size:11pt;&quot;&gt;&lt;em&gt;([^&lt;]*)&lt;</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</GetArtistSearchResults>

Then I found that the scraper doesn't find artists having "The " prepended. i.e "The Art of Noise" is not found since discogs expects it to be "Art of Noise, The"
I tried to add this, but couldn't get it to work. Maybe somebody can help.

<CreateArtistSearchUrl dest="3">
<RegExp input="$$2" output="http://www.discogs.com/search?type=artists&amp;q=&quot;\1&quot;&amp;btn=Search" dest="3">
<RegExp input="$$2" output="\1,%20The" dest="2">
<RegExp input="$$1" output="\1" dest="2">
<expression noclean="1"/>
</RegExp>
<expression noclean="1" clear="no" repeat="no" trim="1">[Tt]he[ ](.+)</expression>
</RegExp>
<expression noclean="1"/>
</RegExp>
</CreateArtistSearchUrl>

The problem seems to be the blank after "The". I tried:
"[Tt]he (.+)"
"[Tt]he\s(.+)"
"[Tt]he[ ](.+)"
but none of them matched.

When I change it to:
"[Tt]he(.+)" it works but of course \1 has a prepending blank and the resulting string is:" Art of Noise, The".

Any ideas?

Bernd

spiff
2009-04-10, 00:39
you are passed an url encoded version of the name, i.e. that whitespace is a + (or is it %20 can't recall).

updates as diff's on trac please

Bernd
2009-04-10, 17:38
%20 worked!
Thanks for the hint.

Now I need to find out how to create a diff so I can post it on trac.

Bernd

PS: I would be nice if the debug log contained the inputs and outputs to the scraper. This would make debugging easier.

Bernd
2009-04-11, 00:42
Ticket #6316 (http://xbmc.org/trac/ticket/6316) added and attached the patch.

Bernd

gaborlazar
2009-04-29, 11:58
Hi there,

There is an artist on discogs, and there is a bug in the name of artist. The last character is small, but it has to written with capitalized just like in all the releases.

http://www.discogs.com/artist/APh

can you/we fix this bug?

reply to aph01@freemail.hu pleasepleaseplese

flobbes
2009-09-11, 14:17
Im sorry i posted a bug that which is already fixed but I didnt see that I was using an old version.