PDA

View Full Version : KinoPoisk2 (Russian Movies) Scraper


ababak
2009-04-13, 18:03
Hi,

Let me present another KinoPoisk.ru (http://xbmc.org/forum/showthread.php?t=45404) scraper. It's a completely re-worked scraper Kinopoisk.ru with following features:


Optimized regexps
Low-res cover if no poster present (really helpful on some old movies)
Artists' roles
Can fetch movie stills fanart, wallpapers fanart, or both
Fixed incorrect parsing of outline/plot




Download version 1.0 of KinoPoisk2 from here:
http://files.me.com/andrey_babak/gtxbcl

P.S. I'd like to thank spiff for his help!

spiff
2009-04-13, 18:35
awesome for you russians :)

one question though; what does that ServerEncoding tag do?

diemos
2009-04-13, 22:43
you are the man! spasibo balshoye. I was waiting for this.

Zemlyak, ya tozhe s Kieva teper v NY.

ababak
2009-04-14, 01:30
awesome for you russians :)

one question though; what does that ServerEncoding tag do?

I didn't check the source of the parser but as far as I can tell looking at the original scraper, it defines how the external URLs are parsed. Maybe it just does nothing though ;-) (or works in Plex only)

spiff
2009-04-14, 01:38
i know that it must be a plex thing as i wrote the scraper parser and most of the surrounding code :)

ababak
2009-04-14, 01:41
By the way, does the parser handle server encoding returned in headers? It would be great to make scraper completely UTF-8

spiff
2009-04-14, 01:45
scraper code does honor the encoding you set on the returned xml.

i guess the ServerContentEncoding is used to convert the html pages to utf-8 prior to passing them to the scrapers. i will dig in the plex git

edit: dug a bit. it's nonsense from the plex devs. the servercontentencoding is just a dupe of the encoding set on the returned xml

hamp
2009-05-14, 17:02
When xbmc load info from the site, Kinopoisk.ru ban me about 30 minutes. Because of what? At the Plex this does not happen.

TigerHeart
2009-05-14, 18:24
I try to get info about the movie Butterfly effect (I type movie name in russian - "Эффект бабочки"). But the scraper returns me next list of movies:
==============
Интервью с вампиром
Сделка с дьяволом
Мадагаскар 2
Ирония судьбы. Продолжение.
Загадочная история Бенджамина Баттона
Суини Тодд,демон+парикмахер с Флит-стрит
==============
And I see the same list every time when I try to get info about any movie. Whats wrong?
Thanks.
PS. I made the screenshots, but I can't understand how to attach them here. But I can send them to anyone by e-mail.

GooglieS
2009-05-14, 20:42
This script does not load any information/art from kinopoisk! Something is broken?
//не работает! фильм из списка находит, но никакую инфу с кинопоиска не подгружает :( Что делать?

hamp
2009-05-14, 21:48
Попытки исправить пока, что нулевые. Вот ждем гуру создателей хбмс. Исправлено только для Plex (http://forums.plexapp.com/index.php?showtopic=3861&st=20&start=20) - ссылка на форум. И очень интересная заметка - бан на самом кинопоиске по ип. И точно так же, как и у TigerHeart.

GooglieS
2009-05-14, 23:25
Как банит? Меня хттп не банит!

TigerHeart
2009-05-15, 09:38
Please, return the old version!!! We don't need your version 2!!! Nobody need it. It doesn't work at all!!! Version 1 is the best!!!

TigerHeart
2009-05-15, 11:11
Eng: Does anybody know where I can download the old wersion of kinopoisk.xml?

Rus: Кто-нибудь знает откуда можно скачать старую версию файла kinopoisk.xml?

hamp
2009-05-15, 14:11
Eng: Does anybody know where I can download the old wersion of kinopoisk.xml?

Rus: Кто-нибудь знает откуда можно скачать старую версию файла kinopoisk.xml?

kinopoisk.xm work fine. But ScraperParser.cpp not work.
Дело не в кинопоиске, а в скрипте, обрабатывающего этот скрапер. Именно в ScraperParser.cpp

Вот его история - http://xbmc.org/trac/log/branches/linuxport/XBMC/xbmc/utils/ScraperParser.cpp?rev=10815

Вот попробуйте этот - Если работает, то пишите сюда. 86

TigerHeart
2009-05-15, 15:31
Вот попробуйте этот - Если работает, то пишите сюда. 86

Ух-ты! Работает! Спасибо!!!

Упс! Рано обрадовался. Теперь название фильма находит правильно, но когда открываешь "Информацию о фильме", то никакой информацмм по этому фильму нету. А вместо названия фильма только одна строка "Кинопоиск.ru - Все фильмы планеты". Может меня просто забанили на самом КиноПоиске?

hamp
2009-05-15, 19:44
Вот вот, я о том же!

althekiller
2009-05-15, 19:49
Please keep discussion in the XBMC forums in English, thanks.

GooglieS
2009-05-15, 20:30
Does not working... XBMS hangs for a while, when fetching film info.

kerber
2009-05-25, 23:31
Hi all.
Sorry for my bad English ))))
Eng: Can a scrapper send a useragent message?
e.g. request.UserAgent = "Mozilla/5.0 (Windows; U; MSIE 7.0; Windows NT 6.0; en-US)";
request.Accept = "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";

When in MediaPortal there was a similar problem, it has dared transfer kinopoisk UserAgent.

Rus:Кто нибудь знает, можно ли в скрапере передавать UserAgent. Когда в MediaPortal'e была подобная проблема, он решилась обманом кинопоиска, передаче ему юзерагента.

hamp
2009-05-27, 09:57
hi all.
Sorry for my bad english ))))
eng: Can a scrapper send a useragent message?
E.g. request.useragent = "mozilla/5.0 (windows; u; msie 7.0; windows nt 6.0; en-us)";
request.accept = "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";

when in mediaportal there was a similar problem, it has dared transfer kinopoisk useragent.

Rus:Кто нибудь знает, можно ли в скрапере передавать useragent. Когда в mediaportal'e была подобная проблема, он решилась обманом кинопоиска, передаче ему юзерагента.

Я бы мог попробовать, но незнаю куда этот код вставлять...

vdrfan
2009-05-27, 12:21
English please.

spiff
2009-05-27, 12:35
uhm, what's that about user agent? i stopped reading this thread pages back. if you want our attention

1) stick to the forum rules - english only
2) stay out of the dev forum unless you have something to contribute

hamp
2009-05-27, 16:48
uhm, what's that about user agent? i stopped reading this thread pages back. if you want our attention

1) stick to the forum rules - english only
2) stay out of the dev forum unless you have something to contribute

Help...)

spiff
2009-05-27, 16:59
with what?

hamp
2009-06-11, 00:14
Can a scrapper send a useragent message?
request.useragent = "mozilla/5.0 (windows; u; msie 7.0; windows nt 6.0; en-us)";
request.accept = "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";

spiff
2009-06-11, 00:48
hmm, currently not. but it seems like something i would consider adding. ticket please

hamp
2009-06-11, 11:54
Sorry, no time for ticket. Help.

hamp
2009-06-11, 18:39
What are the differences version 8.10 and version 9 of the suorce Scraper?
http://xbmc.org/trac/browser/branches/8.10_Atlantis-linux-osx-win32/XBMC/xbmc/utils/ScraperParser.cpp
http://xbmc.org/trac/browser/branches/9.04_Babylon-linux-osx-win32/XBMC/xbmc/utils/ScraperParser.cpp
http://xbmc.org/trac/browser/branches/8.10_Atlantis-linux-osx-win32/XBMC/xbmc/utils/ScraperUrl.cpp
http://xbmc.org/trac/browser/branches/9.04_Babylon-linux-osx-win32/XBMC/xbmc/utils/ScraperUrl.cpp

hamp
2009-06-21, 14:19
Work/ Wait for test. Uploaded later.
//Заработало. Скоро выложу. Оказалось - ошибка было в коде скрапера.

kozovoy
2009-06-22, 07:58
Work/ Wait for test. Uploaded later.
//Заработало. Скоро выложу. Оказалось - ошибка было в коде скрапера.

Wow! That's cool! I look forward to.
//Ждем с нетерпением

goodwill
2009-07-05, 12:09
Hamp? Post a fix you did for this somewhere?

// Polozhite kudanibut' patchik ili sam file??

Komandor
2009-07-28, 22:26
Hello. Thank you for the scrapper. It works good, but there is one little problem: genres can not be scrapped. I don't see this field at the information screen. I hope, that this problem will be solved. Good luck.

vlavrinenko
2009-08-02, 17:20
when I use this scraper in xbmc, it gets information in 1251 encoding, and it displays wrong. Seems it has to convert it to UTF-8. Is it possible?