![]() |
|
|||||||
| Scraper Development Developers forum for meta data scrapers. Scraper developers only! Not for posting feature requests, bugs, or end-user support requests! |
![]() |
|
|
Thread Tools | Search this Thread | Display Modes |
|
|
#1 |
|
Team Arcade
Join Date: Nov 2008
Location: Finland
Posts: 231
![]() |
This might sound like a very uneducated question, but how and with what markup language should one write a website so that it's;
a) easy to scrape, well organized.. b) one is still able to style it, whether it's with css or xls or... I don't need/except some to give me a complete tutorial, how to make, but more some pointers as to which markup I should look into. I've tried looking at thetvdb.com and themoviedb.org, but I epically fail to understand with what it's been written.
__________________
Fiinix Design presents: Posters, for Movies, TV Shows, Games, Arcade etc. » Latest Poster-Pack: The Silhouettes, TV Shows » Upcoming Poster-Pack: To Be Announced » Game & Emulator Poster, request here » Movie/TV Genre Poster, here |
|
|
|
|
|
#2 |
|
Grumpy Bastard Developer
Join Date: Nov 2003
Posts: 7,715
![]() |
both of those offer xml based api's.
the only thing needed to make a site scrapeable is a pattern that can be described using regular expressions. repeatability is the key..
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting. Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules. For troubleshooting and bug reporting please make sure you read this first. |
|
|
|
|
|
#3 |
|
Team Arcade
Join Date: Nov 2008
Location: Finland
Posts: 231
![]() |
Thank you for the swift reply..
![]() Ok, I've already tried to make a mockup of the structure, using xml. (not final). So my follow-up question will be simply be, how does this look? From a scraping view-point? (Thinking about creating a gamedb) ![]() Code:
<?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="simple.xsl" ?> <GAME_LIBRARY> <GAME> <GAME_ID>id of current game in list</GAME_ID> <!-- Could possible use the XBE ID tag??? or just a general id to tag everything for easy scraping --> <GAME_TITLE> <HEADER>Title</HEADER> <NAME>Halo: Combat Evolved</NAME> <IMG_URL>path to poster image</IMG_URL> </GAME_TITLE> <GAME_DEVELOPER> <HEADER>Developer(s)</HEADER> <NAME>Bungie Studios</NAME> <IMG_URL>path to logo</IMG_URL> </GAME_DEVELOPER> <GAME_PUBLISHER> <HEADER>Publisher(s)</HEADER> <NAME>Microsoft Game Studios</NAME> <IMG_URL>path to logo</IMG_URL> </GAME_PUBLISHER> <GAME_PLATFORM> <HEADER>Platform(s)</HEADER> <NAME>Xbox</NAME> <IMG_URL>path to logo</IMG_URL> <NAME>PC</NAME> <IMG_URL>path to logo</IMG_URL> </GAME_PLATFORM> <GAME_RELEASED> <HEADER>Released</HEADER> <YEAR>2001</YEAR> <MONTH>November</MONTH> <DATE>15</DATE> </GAME_RELEASED> <GAME_GENRE> <HEADER>Genre</HEADER> <SHORTHAND>FPS</SHORTHAND> <LONG>First-person Shooter</LONG> </GAME_GENRE> <GAME_SYNOPSIS> <HEADER>Synopsis</HEADER> <SYNOPSIS> Enter the mysterious world of Halo, an alien planet shaped like a ring. As mankind's super soldier Master Chief, you must uncover the secrets of Halo and fend off the attacking Covenant. During your missions, you'll battle on foot, in vehicles, inside, and outside with alien and human weaponry. Your objectives include attacking enemy outposts, raiding underground labs for advanced technology, rescuing fallen comrades, and sniping enemy forces. Halo also lets you battle three other players via intense split screen combat or fight cooperatively with a friend through the single-player missions.Enter the mysterious world of Halo, an alien planet shaped like a ring. As mankind's super soldier Master Chief, you must uncover the secrets of Halo and fend off the attacking Covenant. During your missions, you'll battle on foot, in vehicles, inside, and outside with alien and human weaponry. Your objectives include attacking enemy outposts, raiding underground labs for advanced technology, rescuing fallen comrades, and sniping enemy forces. Halo also lets you battle three other players via intense split screen combat or fight cooperatively with a friend through the single-player missions. </SYNOPSIS> </GAME_SYNOPSIS> <!-- COMMENT//Possibility of dynamicly aquiring from gamerankings.com? --> <GAME_RATING> <TEXT>95</TEXT> <IMG_URL>path to maybe generated img file</IMG_URL> </GAME_RATING> <!-- COMMENT//From gamerankings.com, percent rating from 0-100% --> <GAME_VGCRS> <HEADER>Rated</HEADER> <ESRB>M</ESRB> <ESRB_URL>path to image of rating</ESRB_URL> <BBFC>15</BBFC> <BBFC_URL>path to image of rating</BBFC_URL> <PEGI>16+</PEGI> <PEGI_URL>path to image of rating</PEGI_URL> <USK>16</USK> <USK_URL>path to image of rating</USK_URL> <OFLC>MA15+</OFLC> <OFLC_URL>path to image of rating</OFLC_URL> </GAME_VGCRS> <GAME_VGCRS_DESC> <HEADER>not sure why this would be required</HEADER> <TEXT_1>Blood and gore</TEXT_1> <IMG_1>path to blood and gore image</IMG_1> <TEXT_2>Violence</TEXT_2> <IMG_2>path to violence image</IMG_2> </GAME_VGCRS_DESC> </GAME> </GAME_LIBRARY>
__________________
Fiinix Design presents: Posters, for Movies, TV Shows, Games, Arcade etc. » Latest Poster-Pack: The Silhouettes, TV Shows » Upcoming Poster-Pack: To Be Announced » Game & Emulator Poster, request here » Movie/TV Genre Poster, here |
|
|
|
|
|
#4 |
|
Grumpy Bastard Developer
Join Date: Nov 2003
Posts: 7,715
![]() |
my eyes hurt, please drop the upper case. don't see the point of the <header> entries, not that it matters since they can easily be skipped.
the platform tags should be xml'ized, i.e. just have multiple <platform> .. </platform> <platform> .. </platform> instead of using img_url then name then img_url then name... much easier to parse and much more xml'ish. brief overlook only mind you
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting. Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules. For troubleshooting and bug reporting please make sure you read this first. |
|
|
|
|
|
#5 |
|
Team Arcade
Join Date: Nov 2008
Location: Finland
Posts: 231
![]() |
dropping the upper case now..
the header -tags is for display purposes, thought it would be better if they were also scrape able. But if they are unnecessary then by all means they will be cut out. No need to have data that isn't going to be scraped anyway. (Now to just figure out how to display images in browser based on URL data only) Going to review the platform -tag and consequently the game_vgcrs -tag for better formatting. Thank you! Other than that we're good, this is scrape able with ease?
__________________
Fiinix Design presents: Posters, for Movies, TV Shows, Games, Arcade etc. » Latest Poster-Pack: The Silhouettes, TV Shows » Upcoming Poster-Pack: To Be Announced » Game & Emulator Poster, request here » Movie/TV Genre Poster, here |
|
|
|
|
|
#6 |
|
Grumpy Bastard Developer
Join Date: Nov 2003
Posts: 7,715
![]() |
yeah, xml is piece of cake
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting. Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules. For troubleshooting and bug reporting please make sure you read this first. |
|
|
|
|
|
#7 |
|
Team Arcade
Join Date: Nov 2008
Location: Finland
Posts: 231
![]() |
Maybe for you. Mind you I have never done anything with XML so this is a first for me. I like a challenge though so..
![]() A final quicky if I may though? what formatting do you need to use to make the URL path in the XML easy to scrape but also so that it displays in browser? I can't wrap my head around it, it seems easy to make it scrape able but in browser view it just displays the path, not the actual image (styling with CSS)
__________________
Fiinix Design presents: Posters, for Movies, TV Shows, Games, Arcade etc. » Latest Poster-Pack: The Silhouettes, TV Shows » Upcoming Poster-Pack: To Be Announced » Game & Emulator Poster, request here » Movie/TV Genre Poster, here |
|
|
|
|
|
#8 |
|
Grumpy Bastard Developer
Join Date: Nov 2003
Posts: 7,715
![]() |
url format does not matter. you just need to remember, you are storing xml, so you need to escape special chars, in particular;
& -> & " -> " (prob not relevant in a url)
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting. Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules. For troubleshooting and bug reporting please make sure you read this first. |
|
|
|
|
|
#9 |
|
Team Arcade
Join Date: Nov 2008
Location: Finland
Posts: 231
![]() |
I'll keep that in mind. Yes I'm aware that formatting doesn't matter when storing URL data, but it matters when I want to be able to display it in a browser as well. And that's what I'm having issues with currently, then again, I should probably ask somewhere else for that...
__________________
Fiinix Design presents: Posters, for Movies, TV Shows, Games, Arcade etc. » Latest Poster-Pack: The Silhouettes, TV Shows » Upcoming Poster-Pack: To Be Announced » Game & Emulator Poster, request here » Movie/TV Genre Poster, here |
|
|
|
|
|
#10 | ||
|
Team-XBMC Project Manager
Join Date: Sep 2003
Location: Sweden
Posts: 10,582
![]() |
Quote:
Quote:
__________________
Always read the XBMC online-manual, FAQ and search the forum before posting. Do not e-mail XBMC-Team members directly asking for support. Read/follow the forum rules. For troubleshooting and bug reporting please make sure you read this first. Last edited by Gamester17; 2009-01-12 at 18:49. |
||
|
|
|
![]() |
| Bookmarks |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|