• <noscript id="e0iig"><kbd id="e0iig"></kbd></noscript>
  • <td id="e0iig"></td>
  • <option id="e0iig"></option>
  • <noscript id="e0iig"><source id="e0iig"></source></noscript>
  • DC學院數據分析學習筆記(三):基于HTML的網頁爬蟲

    終于可以用python實踐一下html的爬蟲了,之前零散的也學過一些,這次希望能通過在DC學院的學習慢慢深入的了解爬蟲的理論知識。
    OK,來看今天的數據分析學習筆記!

    希望能有所收獲( ̄︶ ̄) 

    from bs4 import BeautifulSoup
    
    html_doc = """ <html><head><title>The Dormouse's story</title></head> <body> <p class="title"><b>The Dormouse's story</b></p> <p class="story">Once upon a time there were three little sisters; and their names were <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>; and they lived at the bottom of a well.</p> <p class="story">...</p> """

    使用BeautifulSoup解析HTML文檔示例

    soup = BeautifulSoup(html_doc,'html.parser') 

    “html_doc”表示這個文檔名稱,在上面的代碼中已經定義,“html_parser”是解析網頁所需的解析器,所以使用BeautifulSoup解析HTML文檔的一般格式為soup=BeautifulSoup(網頁名稱,'html.parser')

    用 soup.prettify 打印網頁

    print(soup.prettify()) 
    
    <html>
     <head>
      <title>
       The Dormouse's story
      </title>
     </head>
     <body>
      <p class="title">
       <b>
        The Dormouse's story
       </b>
      </p>
      <p class="story">
       Once upon a time there were three little sisters; and their names were
       <a class="sister" href="http://example.com/elsie" id="link1">
        Elsie
       </a>
       ,
       <a class="sister" href="http://example.com/lacie" id="link2">
        Lacie
       </a>
       and
       <a class="sister" href="http://example.com/tillie" id="link3">
        Tillie
       </a>
       ; and they lived at the bottom of a well.
      </p>
      <p class="story">
       ...
      </p>
     </body>
    </html>
    
    

    BeautifulSoup 解析網頁的一些基本操作

    soup.title
    <title>The Dormouse's story</title>
    
    soup.title.name
    'title'
    
    soup.title.string
    "The Dormouse's story"
    
    soup.find_all("a")
    [<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
     <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>,
     <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
    
    soup.find(id="link3")
    <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
    

    爬取“NATIONAL WEATHER”的天氣數據

    DC學院中提供的示例時舊金山天氣頁面地址:
    http://forecast.weather.gov/MapClick.php?lat=37.77492773500046&lon=-122.41941932299972#.WUnSFhN95E4

    小技巧:可以使用瀏覽其中的開發者工具查看代碼

    如圖:

    image

    1.通過url.request返回網頁內容

    import urllib.request as urlrequest
    weather_url='http://forecast.weather.gov/MapClick.php?lat=37.7771&lon=-122.4196'
    web_page=urlrequest.urlopen(weather_url).read()
    ## print(web_page) 這個太多了。。。此處省略一萬字

    2.通過BeautifulSoup抓取網頁中的天氣信息

    from bs4 import BeautifulSoup
    soup=BeautifulSoup(web_page,'html.parser')
    print(soup.find(id='seven-day-forecast-body').get_text())
    
    
    
    Today
    SunnyHigh: 74 °F
    
    Tonight
    ClearLow: 52 °F
    
    Thursday
    SunnyHigh: 73 °F
    
    ThursdayNight
    ClearLow: 51 °F
    
    Friday
    SunnyHigh: 68 °F
    
    FridayNight
    Mostly ClearLow: 50 °F
    
    Saturday
    SunnyHigh: 64 °F
    
    SaturdayNight
    Mostly ClearLow: 50 °F
    
    Sunday
    SunnyHigh: 66 °F
    
    // equalize forecast heights
    $(function () {
        var maxh = 0;
        $(".forecast-tombstone .short-desc").each(function () {
            var h = $(this).height();
            if (h > maxh) { maxh = h; }
        });
        $(".forecast-tombstone .short-desc").height(maxh);
    });
     
    
    

    發現上面打印出來的前面部分很完美,但是后面卻多了js的代碼,那好,怎么去掉呢?

    重新打印一下整個的div

    from bs4 import BeautifulSoup
    soup=BeautifulSoup(web_page,'html.parser')
    print(soup.find(id='seven-day-forecast-body'))
    <div class="panel-body" id="seven-day-forecast-body">
    <div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Today<br/><br/></p>
    <p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Tonight<br/><br/></p>
    <p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Thursday<br/><br/></p>
    <p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Thursday<br/>Night</p>
    <p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Friday<br/><br/></p>
    <p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Friday<br/>Night</p>
    <p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Saturday<br/><br/></p>
    <p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Saturday<br/>Night</p>
    <p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Sunday<br/><br/></p>
    <p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
    <script type="text/javascript">
    // equalize forecast heights
    $(function () {
        var maxh = 0;
        $(".forecast-tombstone .short-desc").each(function () {
            var h = $(this).height();
            if (h > maxh) { maxh = h; }
        });
        $(".forecast-tombstone .short-desc").height(maxh);
    });
    </script> </div>
    
    

    我們發現在上面的代碼最后面,之前多余的js代碼是在最外層的div里面的,也就是在div class="panel-body" id="seven-day-forecast-body"這個里面的,而div id="seven-day-forecast-container"之中并沒有包含我們不需要的這一段js代碼。那就好辦了:把id="seven-day-forecast-body"改為id="seven-day-forecast-container"

    from bs4 import BeautifulSoup
    soup=BeautifulSoup(web_page,'html.parser')
    print(soup.find(id='seven-day-forecast-body'))
    <div class="panel-body" id="seven-day-forecast-body">
    <div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Today<br/><br/></p>
    <p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Tonight<br/><br/></p>
    <p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Thursday<br/><br/></p>
    <p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Thursday<br/>Night</p>
    <p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Friday<br/><br/></p>
    <p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Friday<br/>Night</p>
    <p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Saturday<br/><br/></p>
    <p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Saturday<br/>Night</p>
    <p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Sunday<br/><br/></p>
    <p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
    <script type="text/javascript">
    // equalize forecast heights
    $(function () {
        var maxh = 0;
        $(".forecast-tombstone .short-desc").each(function () {
            var h = $(this).height();
            if (h > maxh) { maxh = h; }
        });
        $(".forecast-tombstone .short-desc").height(maxh);
    });
    </script> </div>
    
    
    from bs4 import BeautifulSoup
    soup=BeautifulSoup(web_page,'html.parser')
    print(soup.find(id='seven-day-forecast-container'))
    <div id="seven-day-forecast-container"><ul class="list-unstyled" id="seven-day-forecast-list"><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Today<br/><br/></p>
    <p><img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 74 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Tonight<br/><br/></p>
    <p><img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 52 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Thursday<br/><br/></p>
    <p><img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 73 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Thursday<br/>Night</p>
    <p><img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/></p><p class="short-desc">Clear</p><p class="temp temp-low">Low: 51 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Friday<br/><br/></p>
    <p><img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 68 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Friday<br/>Night</p>
    <p><img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Saturday<br/><br/></p>
    <p><img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 64 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Saturday<br/>Night</p>
    <p><img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/></p><p class="short-desc">Mostly Clear</p><p class="temp temp-low">Low: 50 °F</p></div></li><li class="forecast-tombstone">
    <div class="tombstone-container">
    <p class="period-name">Sunday<br/><br/></p>
    <p><img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/></p><p class="short-desc">Sunny</p><p class="temp temp-high">High: 66 °F</p></div></li></ul></div>
    
    

    這樣看著就舒服多了,好了,js代碼終于沒有了,執行一下之前的操作看看

    from bs4 import BeautifulSoup
    soup=BeautifulSoup(web_page,'html.parser')
    print(soup.find(id='seven-day-forecast-container').get_text())
    
    
    Today
    SunnyHigh: 74 °F
    
    Tonight
    ClearLow: 52 °F
    
    Thursday
    SunnyHigh: 73 °F
    
    ThursdayNight
    ClearLow: 51 °F
    
    Friday
    SunnyHigh: 68 °F
    
    FridayNight
    Mostly ClearLow: 50 °F
    
    Saturday
    SunnyHigh: 64 °F
    
    SaturdayNight
    Mostly ClearLow: 50 °F
    
    Sunday
    SunnyHigh: 66 °F
    
    

    但這樣我們也不太好提取,通過prettify美化一下,再看看怎么提取我們需要的信息

    from bs4 import BeautifulSoup
    soup=BeautifulSoup(web_page,'html.parser')
    print(soup.find(id='seven-day-forecast-container').prettify())
    <div id="seven-day-forecast-container">
     <ul class="list-unstyled" id="seven-day-forecast-list">
      <li class="forecast-tombstone">
       <div class="tombstone-container">
        <p class="period-name">
         Today
         <br/>
         <br/>
        </p>
        <p>
         <img alt="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. " class="forecast-icon" src="newimages/medium/skc.png" title="Today: Sunny, with a high near 74. Light and variable wind becoming north northeast 5 to 7 mph in the morning. "/>
        </p>
        <p class="short-desc">
         Sunny
        </p>
        <p class="temp temp-high">
         High: 74 °F
        </p>
       </div>
      </li>
      <li class="forecast-tombstone">
       <div class="tombstone-container">
        <p class="period-name">
         Tonight
         <br/>
         <br/>
        </p>
        <p>
         <img alt="Tonight: Clear, with a low around 52. North wind around 6 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Tonight: Clear, with a low around 52. North wind around 6 mph. "/>
        </p>
        <p class="short-desc">
         Clear
        </p>
        <p class="temp temp-low">
         Low: 52 °F
        </p>
       </div>
      </li>
      <li class="forecast-tombstone">
       <div class="tombstone-container">
        <p class="period-name">
         Thursday
         <br/>
         <br/>
        </p>
        <p>
         <img alt="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. " class="forecast-icon" src="newimages/medium/skc.png" title="Thursday: Sunny, with a high near 73. Light and variable wind becoming west 5 to 10 mph in the afternoon. "/>
        </p>
        <p class="short-desc">
         Sunny
        </p>
        <p class="temp temp-high">
         High: 73 °F
        </p>
       </div>
      </li>
      <li class="forecast-tombstone">
       <div class="tombstone-container">
        <p class="period-name">
         Thursday
         <br/>
         Night
        </p>
        <p>
         <img alt="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. " class="forecast-icon" src="newimages/medium/nskc.png" title="Thursday Night: Clear, with a low around 51. West southwest wind 6 to 11 mph. "/>
        </p>
        <p class="short-desc">
         Clear
        </p>
        <p class="temp temp-low">
         Low: 51 °F
        </p>
       </div>
      </li>
      <li class="forecast-tombstone">
       <div class="tombstone-container">
        <p class="period-name">
         Friday
         <br/>
         <br/>
        </p>
        <p>
         <img alt="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. " class="forecast-icon" src="newimages/medium/few.png" title="Friday: Sunny, with a high near 68. Light west northwest wind becoming west 9 to 14 mph in the morning. Winds could gust as high as 18 mph. "/>
        </p>
        <p class="short-desc">
         Sunny
        </p>
        <p class="temp temp-high">
         High: 68 °F
        </p>
       </div>
      </li>
      <li class="forecast-tombstone">
       <div class="tombstone-container">
        <p class="period-name">
         Friday
         <br/>
         Night
        </p>
        <p>
         <img alt="Friday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Friday Night: Mostly clear, with a low around 50."/>
        </p>
        <p class="short-desc">
         Mostly Clear
        </p>
        <p class="temp temp-low">
         Low: 50 °F
        </p>
       </div>
      </li>
      <li class="forecast-tombstone">
       <div class="tombstone-container">
        <p class="period-name">
         Saturday
         <br/>
         <br/>
        </p>
        <p>
         <img alt="Saturday: Sunny, with a high near 64." class="forecast-icon" src="newimages/medium/few.png" title="Saturday: Sunny, with a high near 64."/>
        </p>
        <p class="short-desc">
         Sunny
        </p>
        <p class="temp temp-high">
         High: 64 °F
        </p>
       </div>
      </li>
      <li class="forecast-tombstone">
       <div class="tombstone-container">
        <p class="period-name">
         Saturday
         <br/>
         Night
        </p>
        <p>
         <img alt="Saturday Night: Mostly clear, with a low around 50." class="forecast-icon" src="newimages/medium/nfew.png" title="Saturday Night: Mostly clear, with a low around 50."/>
        </p>
        <p class="short-desc">
         Mostly Clear
        </p>
        <p class="temp temp-low">
         Low: 50 °F
        </p>
       </div>
      </li>
      <li class="forecast-tombstone">
       <div class="tombstone-container">
        <p class="period-name">
         Sunday
         <br/>
         <br/>
        </p>
        <p>
         <img alt="Sunday: Sunny, with a high near 66." class="forecast-icon" src="newimages/medium/few.png" title="Sunday: Sunny, with a high near 66."/>
        </p>
        <p class="short-desc">
         Sunny
        </p>
        <p class="temp temp-high">
         High: 66 °F
        </p>
       </div>
      </li>
     </ul>
    </div>
    
    
    

    從上面的HTML代碼來看,我們發現我們需要的信息分別對應三個classperiod-name,short-desc,temp

    soup_forecast = soup.find(id='seven-day-forecast-container')
    soup_forecast.find_all(class_='period-name')
    [<p class="period-name">Today<br/><br/></p>,
     <p class="period-name">Tonight<br/><br/></p>,
     <p class="period-name">Thursday<br/><br/></p>,
     <p class="period-name">Thursday<br/>Night</p>,
     <p class="period-name">Friday<br/><br/></p>,
     <p class="period-name">Friday<br/>Night</p>,
     <p class="period-name">Saturday<br/><br/></p>,
     <p class="period-name">Saturday<br/>Night</p>,
     <p class="period-name">Sunday<br/><br/></p>]
    

    3.最后,將我們需要的信息完整的輸出

    soup_forecast=soup.find(id='seven-day-forecast-container')
    date_list=soup_forecast.find_all(class_='period-name')
    desc_list=soup_forecast.find_all(class_='short-desc')
    temp_list=soup_forecast.find_all(class_='temp')
    for i in range(9):
        date=date_list[i].get_text()
        desc=desc_list[i].get_text()
        temp=temp_list[i].get_text()
        print("{} {} {}".format(date,desc,temp))
    Today Sunny High: 74 °F
    Tonight Clear Low: 52 °F
    Thursday Sunny High: 73 °F
    ThursdayNight Clear Low: 51 °F
    Friday Sunny High: 68 °F
    FridayNight Mostly Clear Low: 50 °F
    Saturday Sunny High: 64 °F
    SaturdayNight Mostly Clear Low: 50 °F
    Sunday Sunny High: 66 °F
    
    

    完整代碼:

    #導入需要的包和模塊,這里需要的是 urllib.request 和 Beautifulsoup
    import urllib.request as urlrequest
    from bs4 import BeautifulSoup
    
    #通過urllib來獲取我們需要爬取的網頁
    weather_url='http://forecast.weather.gov/MapClick.php?lat=37.77492773500046&lon=-122.41941932299972'
    web_page=urlrequest.urlopen(weather_url).read()
    
    #用 BeautifulSoup 來解析和獲取我們想要的內容塊
    soup=BeautifulSoup(web_page,'html.parser')
    soup_forecast=soup.find(id='seven-day-forecast-container')
    
    #找到我們想要的那一部分內容
    date_list=soup_forecast.find_all(class_='period-name')
    desc_list=soup_forecast.find_all(class_='short-desc')
    temp_list=soup_forecast.find_all(class_='temp')
    
    #將獲取的內容更好地展示出來,用for循環來實現
    for i in range(9):
        date=date_list[i].get_text()
        desc=desc_list[i].get_text()
        temp=temp_list[i].get_text()
        print("{}{}{}".format(date,desc,temp))
    TodaySunnyHigh: 74 °F
    TonightClearLow: 52 °F
    ThursdaySunnyHigh: 73 °F
    ThursdayNightClearLow: 51 °F
    FridaySunnyHigh: 68 °F
    FridayNightMostly ClearLow: 50 °F
    SaturdaySunnyHigh: 64 °F
    SaturdayNightMostly ClearLow: 50 °F
    SundaySunnyHigh: 66 °F
    
    
    版權聲明:本文為weixin_33810006原創文章,遵循 CC 4.0 BY-SA 版權協議,轉載請附上原文出處鏈接和本聲明。
    本文鏈接:https://blog.csdn.net/weixin_33810006/article/details/89935401

    智能推薦

    數據分析之Pandas學習筆記(三)

    數據分析之Pandas學習筆記(三)(統計) df.describe() 常用統計方法 相關系數、協方差 唯一化 計數(頻率) df.describe() 數值型,一種結果 官方文檔 describe參數詳解,統計應用 分位數例子 比如對上述這樣的,全部是數字的DataFrame,discribe()方法返回的結果為: count: 計數 mean: 平均值 std: 標準差 min: 最小值 2...

    Python數據分析與挖掘實戰學習筆記(三)

    本次學習筆記重點介紹數據分析中的挖掘建模: 經過數據探索與數據預處理,得到了可以直接建模的數據,根據挖掘目標和數據形式可以建立分類與預測、聚類分析、關聯規則、時序模式和偏差檢測等模型。 1.分類與預測 分類和預測是預測問題的兩種主要類型,分類主要是預測分類標號(離散屬性),而預測主要是建立連續值函數模型,預測給定自變量對應的因變量的值。 1.1實現過程 (1)分類是構造一個分類模型,輸入樣本的屬性...

    數據分析之Matplotlib模塊學習筆記(三)

    Matplotlib 高級繪圖功能 – 今日知識點如下: 散點圖 擴充知識點 填充圖 1.散點圖 散點圖顏色映射的使用 散點效果圖: 2.填充圖 以某種顏色自動填充兩條曲線的閉合區域 案例:繪制兩條曲線:sin(x)  cos(x/2)/2  [0,8π] 填充效果圖: 3. 繪制3D圖像 使用matplotlib繪制3D圖像,需要先獲取3D坐標軸,調用ax3d對象的方法繪制3D...

    《利用python進行數據分析》學習筆記(三)

    處理US Baby Names 1880-2010 data set 導入表格數據 read_csv本身讀進來是一個dataframe,但是因為我們要利用for循環讀取多個并將它們依次append于一個list中,所以最后需要利用pd.concat()將list中的所有dataframe合并為一個。 ignore_index=True是為了重置index。 names數據 數據聚合與重建 tota...

    HTML中常用操作關于:頁面跳轉,空格

    1.頁面跳轉 2.空格的代替符...

    猜你喜歡

    freemarker + ItextRender 根據模板生成PDF文件

    1. 制作模板 2. 獲取模板,并將所獲取的數據加載生成html文件 2. 生成PDF文件 其中由兩個地方需要注意,都是關于獲取文件路徑的問題,由于項目部署的時候是打包成jar包形式,所以在開發過程中時直接安照傳統的獲取方法沒有一點文件,但是當打包后部署,總是出錯。于是參考網上文章,先將文件讀出來到項目的臨時目錄下,然后再按正常方式加載該臨時文件; 還有一個問題至今沒有解決,就是關于生成PDF文件...

    電腦空間不夠了?教你一個小秒招快速清理 Docker 占用的磁盤空間!

    Docker 很占用空間,每當我們運行容器、拉取鏡像、部署應用、構建自己的鏡像時,我們的磁盤空間會被大量占用。 如果你也被這個問題所困擾,咱們就一起看一下 Docker 是如何使用磁盤空間的,以及如何回收。 docker 占用的空間可以通過下面的命令查看: TYPE 列出了docker 使用磁盤的 4 種類型: Images:所有鏡像占用的空間,包括拉取下來的鏡像,和本地構建的。 Con...

    requests實現全自動PPT模板

    http://www.1ppt.com/moban/ 可以免費的下載PPT模板,當然如果要人工一個個下,還是挺麻煩的,我們可以利用requests輕松下載 訪問這個主頁,我們可以看到下面的樣式 點每一個PPT模板的圖片,我們可以進入到詳細的信息頁面,翻到下面,我們可以看到對應的下載地址 點擊這個下載的按鈕,我們便可以下載對應的PPT壓縮包 那我們就開始做吧 首先,查看網頁的源代碼,我們可以看到每一...

    Linux C系統編程-線程互斥鎖(四)

    互斥鎖 互斥鎖也是屬于線程之間處理同步互斥方式,有上鎖/解鎖兩種狀態。 互斥鎖函數接口 1)初始化互斥鎖 pthread_mutex_init() man 3 pthread_mutex_init (找不到的情況下首先 sudo apt-get install glibc-doc sudo apt-get install manpages-posix-dev) 動態初始化 int pthread_...

    精品国产乱码久久久久久蜜桃不卡