DC學院數據分析學習筆記(三):基于HTML的網頁爬蟲
終于可以用python實踐一下html的爬蟲了,之前零散的也學過一些,這次希望能通過在DC學院的學習慢慢深入的了解爬蟲的理論知識。
OK,來看今天的數據分析學習筆記!
希望能有所收獲( ̄︶ ̄)
from bs4 import BeautifulSoup
html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """
使用BeautifulSoup解析HTML文檔示例
soup = BeautifulSoup(html_doc,'html.parser')
“html_doc”表示這個文檔名稱,在上面的代碼中已經定義,“html_parser”是解析網頁所需的解析器,所以使用BeautifulSoup解析HTML文檔的一般格式為soup=BeautifulSoup(網頁名稱,'html.parser')
用 soup.prettify 打印網頁
print(soup.prettify())
<html>
<head>
<title>
The Dormouse's story
</title>
</head>
<body>
<p class="title">
<b>
The Dormouse's story
</b>
</p>
<p class="story">
Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">
Elsie
</a>
,
<a class="sister" href="http://example.com/lacie" id="link2">
Lacie
</a>
and
<a class="sister" href="http://example.com/tillie" id="link3">
Tillie
</a>
; and they lived at the bottom of a well.
</p>
<p class="story">
...
</p>
</body>
</html>
BeautifulSoup 解析網頁的一些基本操作
soup.title
<title>The Dormouse's story</title>
soup.title.name
'title'
soup.title.string
"The Dormouse's story"
soup.find_all("a")
[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find(id="link3")
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
爬取“NATIONAL WEATHER”的天氣數據
DC學院中提供的示例時舊金山天氣頁面地址:
http://forecast.weather.gov/MapClick.php?lat=37.77492773500046&lon=-122.41941932299972#.WUnSFhN95E4
小技巧:可以使用瀏覽其中的開發者工具查看代碼
如圖:
1.通過url.request返回網頁內容
import urllib.request as urlrequest
weather_url='http://forecast.weather.gov/MapClick.php?lat=37.7771&lon=-122.4196'
web_page=urlrequest.urlopen(weather_url).read()
## print(web_page) 這個太多了。。。此處省略一萬字
2.通過BeautifulSoup抓取網頁中的天氣信息
from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body').get_text())
Today
SunnyHigh: 74 °F
Tonight
ClearLow: 52 °F
Thursday
SunnyHigh: 73 °F
ThursdayNight
ClearLow: 51 °F
Friday
SunnyHigh: 68 °F
FridayNight
Mostly ClearLow: 50 °F
Saturday
SunnyHigh: 64 °F
SaturdayNight
Mostly ClearLow: 50 °F
Sunday
SunnyHigh: 66 °F
// equalize forecast heights
$(function () {
var maxh = 0;
$(".forecast-tombstone .short-desc").each(function () {
var h = $(this).height();
if (h > maxh) { maxh = h; }
});
$(".forecast-tombstone .short-desc").height(maxh);
});
發現上面打印出來的前面部分很完美,但是后面卻多了js的代碼,那好,怎么去掉呢?
重新打印一下整個的div
from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body'))
<div class="panel-body" id="seven-day-forecast-body">
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
<script type="text/javascript">
// equalize forecast heights
$(function () {
var maxh = 0;
$(".forecast-tombstone .short-desc").each(function () {
var h = $(this).height();
if (h > maxh) { maxh = h; }
});
$(".forecast-tombstone .short-desc").height(maxh);
});
</script> </div>
我們發現在上面的代碼最后面,之前多余的js代碼是在最外層的div里面的,也就是在div class="panel-body" id="seven-day-forecast-body"這個里面的,而div id="seven-day-forecast-container"之中并沒有包含我們不需要的這一段js代碼。那就好辦了:把id="seven-day-forecast-body"改為id="seven-day-forecast-container"
from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-body'))
<div class="panel-body" id="seven-day-forecast-body">
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
<script type="text/javascript">
// equalize forecast heights
$(function () {
var maxh = 0;
$(".forecast-tombstone .short-desc").each(function () {
var h = $(this).height();
if (h > maxh) { maxh = h; }
});
$(".forecast-tombstone .short-desc").height(maxh);
});
</script> </div>
from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container'))
<div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Today<br/><br/></p>
<p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Tonight<br/><br/></p>
<p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/><br/></p>
<p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Thursday<br/>Night</p>
<p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/><br/></p>
<p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Friday<br/>Night</p>
<p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/><br/></p>
<p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Saturday<br/>Night</p>
<p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">Sunday<br/><br/></p>
<p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
這樣看著就舒服多了,好了,js代碼終于沒有了,執行一下之前的操作看看
from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container').get_text())
Today
SunnyHigh: 74 °F
Tonight
ClearLow: 52 °F
Thursday
SunnyHigh: 73 °F
ThursdayNight
ClearLow: 51 °F
Friday
SunnyHigh: 68 °F
FridayNight
Mostly ClearLow: 50 °F
Saturday
SunnyHigh: 64 °F
SaturdayNight
Mostly ClearLow: 50 °F
Sunday
SunnyHigh: 66 °F
但這樣我們也不太好提取,通過prettify美化一下,再看看怎么提取我們需要的信息
from bs4 import BeautifulSoup
soup=BeautifulSoup(web_page,'html.parser')
print(soup.find(id='seven-day-forecast-container').prettify())
<div id="seven-day-forecast-container">
<ul class="list-unstyled" id="seven-day-forecast-list">
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Today
<br/>
<br/>
</p>
<p>
<img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/>
</p>
<p class="short-desc">
Sunny
</p>
<p class="temp temp-high">
High: 74 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Tonight
<br/>
<br/>
</p>
<p>
<img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/>
</p>
<p class="short-desc">
Clear
</p>
<p class="temp temp-low">
Low: 52 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Thursday
<br/>
<br/>
</p>
<p>
<img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/>
</p>
<p class="short-desc">
Sunny
</p>
<p class="temp temp-high">
High: 73 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Thursday
<br/>
Night
</p>
<p>
<img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/>
</p>
<p class="short-desc">
Clear
</p>
<p class="temp temp-low">
Low: 51 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Friday
<br/>
<br/>
</p>
<p>
<img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/>
</p>
<p class="short-desc">
Sunny
</p>
<p class="temp temp-high">
High: 68 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Friday
<br/>
Night
</p>
<p>
<img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/>
</p>
<p class="short-desc">
Mostly Clear
</p>
<p class="temp temp-low">
Low: 50 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Saturday
<br/>
<br/>
</p>
<p>
<img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/>
</p>
<p class="short-desc">
Sunny
</p>
<p class="temp temp-high">
High: 64 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Saturday
<br/>
Night
</p>
<p>
<img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/>
</p>
<p class="short-desc">
Mostly Clear
</p>
<p class="temp temp-low">
Low: 50 °F
</p>
</div>
</li>
<li class="forecast-tombstone">
<div class="tombstone-container">
<p class="period-name">
Sunday
<br/>
<br/>
</p>
<p>
<img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/>
</p>
<p class="short-desc">
Sunny
</p>
<p class="temp temp-high">
High: 66 °F
</p>
</div>
</li>
</ul>
</div>
從上面的HTML代碼來看,我們發現我們需要的信息分別對應三個class:period-name,short-desc,temp
soup_forecast = soup.find(id='seven-day-forecast-container')
soup_forecast.find_all(class_='period-name')
[<p class="period-name">Today<br/><br/></p>,
<p class="period-name">Tonight<br/><br/></p>,
<p class="period-name">Thursday<br/><br/></p>,
<p class="period-name">Thursday<br/>Night</p>,
<p class="period-name">Friday<br/><br/></p>,
<p class="period-name">Friday<br/>Night</p>,
<p class="period-name">Saturday<br/><br/></p>,
<p class="period-name">Saturday<br/>Night</p>,
<p class="period-name">Sunday<br/><br/></p>]
3.最后,將我們需要的信息完整的輸出
soup_forecast=soup.find(id='seven-day-forecast-container')
date_list=soup_forecast.find_all(class_='period-name')
desc_list=soup_forecast.find_all(class_='short-desc')
temp_list=soup_forecast.find_all(class_='temp')
for i in range(9):
date=date_list[i].get_text()
desc=desc_list[i].get_text()
temp=temp_list[i].get_text()
print("{} {} {}".format(date,desc,temp))
Today Sunny High: 74 °F
Tonight Clear Low: 52 °F
Thursday Sunny High: 73 °F
ThursdayNight Clear Low: 51 °F
Friday Sunny High: 68 °F
FridayNight Mostly Clear Low: 50 °F
Saturday Sunny High: 64 °F
SaturdayNight Mostly Clear Low: 50 °F
Sunday Sunny High: 66 °F
完整代碼:
#導入需要的包和模塊,這里需要的是 urllib.request 和 Beautifulsoup
import urllib.request as urlrequest
from bs4 import BeautifulSoup
#通過urllib來獲取我們需要爬取的網頁
weather_url='http://forecast.weather.gov/MapClick.php?lat=37.77492773500046&lon=-122.41941932299972'
web_page=urlrequest.urlopen(weather_url).read()
#用 BeautifulSoup 來解析和獲取我們想要的內容塊
soup=BeautifulSoup(web_page,'html.parser')
soup_forecast=soup.find(id='seven-day-forecast-container')
#找到我們想要的那一部分內容
date_list=soup_forecast.find_all(class_='period-name')
desc_list=soup_forecast.find_all(class_='short-desc')
temp_list=soup_forecast.find_all(class_='temp')
#將獲取的內容更好地展示出來,用for循環來實現
for i in range(9):
date=date_list[i].get_text()
desc=desc_list[i].get_text()
temp=temp_list[i].get_text()
print("{}{}{}".format(date,desc,temp))
TodaySunnyHigh: 74 °F
TonightClearLow: 52 °F
ThursdaySunnyHigh: 73 °F
ThursdayNightClearLow: 51 °F
FridaySunnyHigh: 68 °F
FridayNightMostly ClearLow: 50 °F
SaturdaySunnyHigh: 64 °F
SaturdayNightMostly ClearLow: 50 °F
SundaySunnyHigh: 66 °F
智能推薦
數據分析之Pandas學習筆記(三)
數據分析之Pandas學習筆記(三)(統計) df.describe() 常用統計方法 相關系數、協方差 唯一化 計數(頻率) df.describe() 數值型,一種結果 官方文檔 describe參數詳解,統計應用 分位數例子 比如對上述這樣的,全部是數字的DataFrame,discribe()方法返回的結果為: count: 計數 mean: 平均值 std: 標準差 min: 最小值 2...
Python數據分析與挖掘實戰學習筆記(三)
本次學習筆記重點介紹數據分析中的挖掘建模: 經過數據探索與數據預處理,得到了可以直接建模的數據,根據挖掘目標和數據形式可以建立分類與預測、聚類分析、關聯規則、時序模式和偏差檢測等模型。 1.分類與預測 分類和預測是預測問題的兩種主要類型,分類主要是預測分類標號(離散屬性),而預測主要是建立連續值函數模型,預測給定自變量對應的因變量的值。 1.1實現過程 (1)分類是構造一個分類模型,輸入樣本的屬性...
數據分析之Matplotlib模塊學習筆記(三)
Matplotlib 高級繪圖功能 – 今日知識點如下: 散點圖 擴充知識點 填充圖 1.散點圖 散點圖顏色映射的使用 散點效果圖: 2.填充圖 以某種顏色自動填充兩條曲線的閉合區域 案例:繪制兩條曲線:sin(x) cos(x/2)/2 [0,8π] 填充效果圖: 3. 繪制3D圖像 使用matplotlib繪制3D圖像,需要先獲取3D坐標軸,調用ax3d對象的方法繪制3D...
《利用python進行數據分析》學習筆記(三)
處理US Baby Names 1880-2010 data set 導入表格數據 read_csv本身讀進來是一個dataframe,但是因為我們要利用for循環讀取多個并將它們依次append于一個list中,所以最后需要利用pd.concat()將list中的所有dataframe合并為一個。 ignore_index=True是為了重置index。 names數據 數據聚合與重建 tota...
猜你喜歡
freemarker + ItextRender 根據模板生成PDF文件
1. 制作模板 2. 獲取模板,并將所獲取的數據加載生成html文件 2. 生成PDF文件 其中由兩個地方需要注意,都是關于獲取文件路徑的問題,由于項目部署的時候是打包成jar包形式,所以在開發過程中時直接安照傳統的獲取方法沒有一點文件,但是當打包后部署,總是出錯。于是參考網上文章,先將文件讀出來到項目的臨時目錄下,然后再按正常方式加載該臨時文件; 還有一個問題至今沒有解決,就是關于生成PDF文件...
電腦空間不夠了?教你一個小秒招快速清理 Docker 占用的磁盤空間!
Docker 很占用空間,每當我們運行容器、拉取鏡像、部署應用、構建自己的鏡像時,我們的磁盤空間會被大量占用。 如果你也被這個問題所困擾,咱們就一起看一下 Docker 是如何使用磁盤空間的,以及如何回收。 docker 占用的空間可以通過下面的命令查看: TYPE 列出了docker 使用磁盤的 4 種類型: Images:所有鏡像占用的空間,包括拉取下來的鏡像,和本地構建的。 Con...
requests實現全自動PPT模板
http://www.1ppt.com/moban/ 可以免費的下載PPT模板,當然如果要人工一個個下,還是挺麻煩的,我們可以利用requests輕松下載 訪問這個主頁,我們可以看到下面的樣式 點每一個PPT模板的圖片,我們可以進入到詳細的信息頁面,翻到下面,我們可以看到對應的下載地址 點擊這個下載的按鈕,我們便可以下載對應的PPT壓縮包 那我們就開始做吧 首先,查看網頁的源代碼,我們可以看到每一...
Linux C系統編程-線程互斥鎖(四)
互斥鎖 互斥鎖也是屬于線程之間處理同步互斥方式,有上鎖/解鎖兩種狀態。 互斥鎖函數接口 1)初始化互斥鎖 pthread_mutex_init() man 3 pthread_mutex_init (找不到的情況下首先 sudo apt-get install glibc-doc sudo apt-get install manpages-posix-dev) 動態初始化 int pthread_...