![how to search a web page f how to search a web page f](https://i.stack.imgur.com/pBLY2.png)
- #How to search a web page f how to
- #How to search a web page f code
- #How to search a web page f series
#How to search a web page f code
Keep in mind that the bestsellers list is updated daily, so don’t freak out if you don’t get the same data that are shown in this tutorial.įor starters, it’s always a good idea to build your code up step by step, so if you run into an error, you’ll immediately know which part of your code needs some rethinking. To scrape multiple pages, we’ll use a while loop and the page parameters in the URLs. What we’ll do in this article will be very similar to what we’ve already accomplished so far, but with more data: we’ll analyze not 30, but 1020 books.įor this reason we’ll reuse (with some small modifications) the code we’ve already written to get the titles, formats, publication years and prices of the bestseller books. (Remember: %matplotlib inline is necessary for the later data visualizations to appear if you write your code in Jupyter Notebook. To complete this tutorial, we’ll need to use the same libraries from the previous article, so don’t forget to import them: from bs4 import BeautifulSoup as bs Scraping multiple web pages with a while loop
![how to search a web page f how to search a web page f](https://ffp4g1ylyit3jdyti1hqcvtb-wpengine.netdna-ssl.com/wp-content/uploads/2020/04/Ffx75-searchbar-2-1.gif)
Now, let’s put this knowledge to good use. By assigning a certain number to page, we are able to request the bestsellers page corresponding to that number.
![how to search a web page f how to search a web page f](https://us.v-cdn.net/6032052/uploads/lithium_attachments/5491i139E215C9B11439C.jpg)
In our case page is the key and the number we assign to it is its value. Anything that comes after the ? is the query string itself, which contains key-value pairs. The ? part of a URL signifies the start of the so-called query string. Shortly I’ll show you how you can bring this knowledge over to web scraping, but first a quick explanation to the curious minds out there as to what the heck this ?page=number thing is exactly. Lucky for us, and are the same page with the same book results, so it seems that we’ve found a reliable solution that we can use to navigate between web pages by changing the URL. Let’s try this out real quick by replacing 3 with 28 ( ): Image source: Book Depositoryīut wait… what about the first page? It had no ?page=number in it! ? It seems that by changing the number after page=, we can go to whichever page we want to. ?page=2 turned into ?page=3 can you see where I’m going with this? ? Now let’s check out what happens if we visit the third page: The only difference is that ?page=2 has been appended to the base URL. Truth is, there are actually 34 pages of bestseller books that we can scrape: Image source: Book DepositoryĪnswer: by first inspecting what’s happening in the URL when we switch pages.īy going to the second page, you’ll notice that the URL changes to this:
#How to search a web page f series
If you recall, in the previous part of this tutorial series we scraped only the first bestsellers page of Book Depository. While in the previous article you learned to crawl, now it’s time for you to stand up and learn to walk.
#How to search a web page f how to
In this tutorial you’ll learn how to do just that along the way you’ll also make good use of your collected data by doing some visualizations and analyses. Scraping one web page is fun, but scraping more web pages is more fun.