DON’T PANIC

BeautifulSoup 筆記

2019-03-06

Python 套件 beautiful soup 筆記

導入

from bs4 import BeautifulSoup

從 URL 抓取 HTML

import requests
se = requests.Session()
r = se.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

抓 tags

# <div class="hello"><ul><il>item 1</il><il>item 2</il></ul></div>

## 指定 tag
soup.find("il")     # <il>item 1</il>
soup.find_all("il") # [<il>item 1</il>, <il>item 2</il>]  
soup("il")          # [<il>item 1</il>, <il>item 2</il>]
 
## 指定 class
soup.find_all(class_= "hello") 
# [<div class="hello"><ul><il>item 1</il><il>item 2</il></ul></div>]

抓出 tags 中的屬性（如超連結、class名稱）

# <a href="https://www.google.com" tag="myLink"></a>
soup.find("a")["href"]      # 'https://www.google.com'
soup.find("a")["tag"]       # 'myLink'