Stuck with Tutorial "web scraping and beatifulsoup"

Screen Link:

My Code:

    # import libraries
    import pandas as pd
    import re
    import requests
    from bs4 import BeautifulSoup
    from time import sleep
    from random import randint
    from time import time
    from warnings import warn 


    # Redeclaring the lists to store data in
    names = []
    years = []
    imdb_ratings = []
    metascores = []
    votes = []
    pages = [str(i) for i in range(1,3)]
    years_url = [str(i) for i in range(2010,2018)]


    # Preparing the monitoring of the loop
    start_time = time()
    requests = 0


    # For every year in the interval 2000-2017
    for year_url in years_url:
        
        # For every page in the interval 1-4
        for page in pages:

            # Make a get request
            response = get('http://www.imdb.com/search/title?release_date=' + year_url + '&sort=num_votes,desc&page=' + page, headers = headers)
            # Pause the loop
            sleep(randint(8,15))

            # Monitor the requests
            requests += 1
            elapsed_time = time() - start_time
            print('Request:{}; Frequency: {} requests/s'.format(requests, requests/elapsed_time))
            clear_output(wait = True)

            # Throw a warning for non-200 status codes
            if response.status_code != 200:
                warn('Request: {}; Status code: {}'.format(requests, response.status_code))

            # Break the loop if the number of requests is greater than expected
            if requests > 10:
                warn('Number of requests was greater than expected.')
                break

            # Parse the content of the request with BeautifulSoup
            page_html = BeautifulSoup(response.text, 'html.parser')

            # Select all the 50 movie containers from a single page
            mv_containers = page_html.find_all('div', class_ = 'lister-item mode-advanced')

            # For every movie of these 50
            for container in mv_containers:
                # If the movie has a Metascore, then:
                if container.find('div', class_ = 'ratings-metascore') is not None:

                    # Scrape the name
                    name = container.h3.a.text
                    names.append(name)

                    # Scrape the year
                    year = container.h3.find('span', class_ = 'lister-item-year').text
                    years.append(year)

                    # Scrape the IMDB rating
                    imdb = float(container.strong.text)
                    imdb_ratings.append(imdb)

                    # Scrape the Metascore
                    m_score = container.find('span', class_ = 'metascore').text
                    metascores.append(int(m_score))

                    # Scrape the number of votes
                    vote = container.find('span', attrs = {'name':'nv'})['data-value']
                    votes.append(int(vote))

What I expected to happen:
as per the tutorial:

Request:72; Frequency: 0.07928964663062842 requests/s

What actually happened:

    NameError                                 Traceback (most recent call last)

    <ipython-input-96-383b3ebfb335> in <module>()
         46 
         47         # Make a get request
    ---> 48         response = get('http://www.imdb.com/search/title?release_date=' + year_url +
         49         '&sort=num_votes,desc&page=' + page, headers = headers)
         50         # Pause the loop

    NameError: name 'get' is not defined

i tried changing
response = get() to response = requests.get() plus tinker with the code but nothing seems to be working.

everything before the this part, worked as expected.

I am learning little by little.
Thanks for your help.

Regards

Hello @EricGon, welcome to the community!

As you can see in the error message, get is not defined.

 NameError: name 'get' is not defined

You need to use requests.get() or instead of import requests you use from requests import get.

Another problem, however, is the use of the name requests as a variable:

 # Monitor the requests
 requests += 1

If you use requests.get(), then it should work in the first interaction but from the second iteration on the name requests no longer represents the requests library, but a number and therefore requests.get() will not be defined.

Try using from requests import get or renaming the requests variable in the code and it should work.


@Sahil, I believe that’s a problem in this tutorial. It may not cause a syntax error in the tutorial if you use from requests import get, but it’s not a good practice to overwrite a library name with some variable. This, as you see, can create confusion with new programmers.

I’m not sure if you’re the one I should report this to, but I don’t know who else I could report to :sweat_smile:

2 Likes

HI guys,

Thanks for your help guys.

I sorted it out an hour or two after posting my question.
the issue is in the tutorial.

as @otavios.s pointed it out, the issue is the use of ‘requests’ as a variable.
I changed it to ‘requ’ (toghether with the rest of its iterations) and it ran without an issue.

Thanks again for your help.

2 Likes

Totally agree with you @otavios.s, I will contact the author for correction.

Best,
Sahil

1 Like