Screen Link:
I tried to scrape data of multiple pages at the same time for a online shopping website. Below is my code.
PS: This is my personal project
My Code:
for page in pages:
# print(page)
response = requests.get("https://www.flipkart.com/search?q=laptops&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page={}".format(page)).text #URL which want to scrape
#content = response.content #To get the content
soup = BeautifulSoup(response, 'html.parser')
#print(soup.prettify())
""" Now the below simple logic code helps us to scrape the data for each containers which we want from the all pages"""
desc = soup.find_all('div', class_ = '_3wU53n') # Extracting descriptions of each laptop
for i in range(len(desc)):
descriptions.append(desc[i].text)
len(descriptions)
commonclass = soup.find_all('li', class_ = 'tVe95H') #This class is applicable for all the features which are written below
for i in range(0,len(commonclass)):
p = commonclass[i].text #extracting the text from tags
if('Core' in p):
processors.append(p)
#print(processors)
elif('RAM' in p):
ram.append(p)
#print(ram)
elif('Operating' in p):
os.append(p)
#print(os)
elif('HDD' in p or 'SSD' in p):
storage.append(p)
# print(storage)
elif('Display' in p):
inches.append(p)
#print(inches)
elif('Warranty' in p):
warranty.append(p)
#print(warranty)
price = soup.find_all('div',class_ = '_1vC4OE _2rQ-NK') # Extracting price of each laptop
for i in range(len(price)):
prices.append(price[i].text)
len(prices)
rating = soup.find_all('div',class_ = 'hGSR34') # Extracting rating of each laptop
for i in range(len(rating)):
ratings.append(rating[i].text)
len(ratings)
exchange = soup.find_all('div',class_ = '_3_G5Wj') # Extracting exchange offer for each laptop
for i in range(len(exchange)):
exchange_off.append(exchange[i].text)
len(exchange_off)
print(len(descriptions))
print(len(processors))
print(len(ram))
print(len(os))
print(len(storage))
print(len(inches))
print(len(warranty))
print(len(prices))
print(len(ratings))
print(len(exchange_off))
Replace this line with your code
What I expected to happen:
I expected to see the length of all the features
What actually happened:
I see the length of all the features with a lot of variance in numbers
504
23
24
25
24
25
22
24
35
59
Replace this line with the output/error
My concern is when we print the length of all the features, we have a lot of variations in numbers.
For example; If you see the descriptions we have 504 descriptions in total from all 21 pages, while if we look at other features like processors, ram , storage etc the lengths are like 23, 24, 25. As per my belief at least the other features should be in the range of 400+ which is not in our case. So I would like to know the reason behind this variance when we compare with descriptions?. Is this a expected behaviour or anything which I am missing from my end or in my code.
This is the first time I am exploring ‘Web Scraping’ and my objective in this project is to scrape all these web pages and collect the information about laptops from it which we can use for further analysis.
I am really looking forward for your help community. Let me know if you require any further information.
Thanks in advance
Best
K!