교보문고 베스트셀러 크롤링

from urllib.request import urlopen
from bs4 import BeautifulSoup

# 교보문고의 베스트셀러 웹페이지
html = urlopen("http://www.kyobobook.co.kr/bestSellerNew/bestseller.laf")
bsObject = BeautifulSoup(html, "html.parser")

# 책의 상세 웹페이지 주소를 추출하여 리스트 저장.
book_page_urls = []
for cover in bsObject.find_all('div',{'class','detail'}):   # {'class':'cover'}가 아닌 이유가 뭘까....
    link = cover.select('a')[0].get('href')		# link = cover.select_one('a').get('href')와 같은 뜻
    book_page_urls.append(link)


# 메타 정보로부터 필요한 정보를 추출, 저자 따로
for index, book_page_url in enumerate(book_page_urls): #함수는 기본적으로 인덱스와 원소로 이루어진 터플(tuple)을 만듦
    html = urlopen(book_page_url)
    bsObject = BeautifulSoup(html, "html.parser")
    title = bsObject.find('meta',{'property':'eg:itemName'}.get('content'))
    author = bsObject.select('span.name a')[0].text
    image = bsObject.select('div.cover img')[0].get('src')
    #image = bsObject.find('meta', {'property':'eg:itemImage'}).get('content')
    url = bsObject.find('meta',{'property':'eg:itemUrl'}).get('content')
    origin_price = bsObject.find('meta',{'property':'eg:originalPrice'}).get('content')
    sale_price = bsObject.find('meta',{'property':'eg:salePrice'}).get('content')
    print(index+1 , title, author, image, url, origin_price, sale_price)

저작자표시

'Crawling' 카테고리의 다른 글

XML(RSS) 스크레이핑 후 파싱하기_날씨 (0)	2022.01.05
웹 페이지 스크레이핑 (0)	2022.01.05
네이버 쇼핑 크롤링2_평점, 댓글수, (0)	2022.01.05
네이버 쇼핑 크롤링1_제목, url, 카테고리 (0)	2022.01.05
urllib, BeautifulSoup 으로 잡코리아 크롤링 (0)	2022.01.04

왕초보 우당탕탕 고군분투 코딩배우기

교보문고 베스트셀러 크롤링

'Crawling' 카테고리의 다른 글

티스토리툴바

교보문고 베스트셀러 크롤링

'Crawling' 카테고리의 다른 글

'Crawling' Related Articles

티스토리툴바