2
votes

AttributeError: l'objet 'list' n'a pas d'attribut 'h3' (Beautifulsoup)

Je suis un débutant en web-scraping et je suis ce tutoriel ( https : //www.dataquest.io/blog/web-scraping-beautifulsoup/ ) pour extraire les données du film, je pense que j'ai mal défini "first_movie"!

voici le code

XXX

J'obtiens cette erreur:

Traceback (most recent call last):
File "mov1.py", line 13, in <module>
first_name = first_movie.h3.a.text
AttributeError: 'list' object has no attribute 'h3'

python html web-scraping beautifulsoup

3 commentaires

qu'est-ce que tu veux faire avec h3?

@Jeppe Cela ne fonctionnerait pas, car first_movie n'a aucun élément, c'est une liste vide.

@MatiasCicero Désolé, j'ai mal lu. html_soup.find_all renvoie une liste. Chacun de ces peut contenir un h3. Par exemple. movie_containers [0] .h3.a.text . Voir la documentation

4 Réponses :

2
votes

find_all renvoie toujours une liste.

Remplacez votre code:

Valerian and the City of a Thousand Planets
Baywatch
Darkest Hour
American Made
La Casa de Papel
Mindhunter
Transformers: The Last Knight
The Handmaid's Tale
The Lego Batman Movie
The Disaster Artist

Vers

for movie in movie_containers:
  print(movie.find("h3").find("a").text)

0 commentaires

1
votes

Essayez le code suivant.

Logan
Wonder Woman
Guardians of the Galaxy: Vol. 2
Thor: Ragnarok
Dunkirk
Star Wars: Episode VIII - The Last Jedi
Spider-Man: Homecoming
Get Out
Blade Runner 2049
Baby Driver
It
Three Billboards Outside Ebbing, Missouri
Justice League
The Shape of Water
John Wick: Chapter 2
Coco
Jumanji: Welcome to the Jungle
Beauty and the Beast
Kong: Skull Island
Kingsman: The Golden Circle
Pirates of the Caribbean: Salazar's Revenge
Alien: Covenant
13 Reasons Why
War for the Planet of the Apes
The Greatest Showman
Life
Fast & Furious 8
Murder on the Orient Express
Lady Bird
Ghost in the Shell
King Arthur: Legend of the Sword
Wind River
The Hitman's Bodyguard
Mother!
The Mummy
Call Me by Your Name
Atomic Blonde
The Punisher
Bright
I, Tonya
Valerian and the City of a Thousand Planets
Baywatch
Darkest Hour
American Made
La Casa de Papel
Mindhunter
Transformers: The Last Knight
The Handmaid's Tale
The Lego Batman Movie
The Disaster Artist

Résultat:

import requests
from bs4 import BeautifulSoup
url = 'https://www.imdb.com/search/title?release_date=2017&sort=num_votes,desc&page=1'
r = requests.get(url, headers = {'User-Agent' : 'Mozilla/5.0'})
soup = BeautifulSoup(r.content, 'html.parser')
items=soup.find_all('h3',class_='lister-item-header')
for item in items:
    print(item.find('a').text)

0 commentaires

1
votes

first_movie n'est pas attribué, remplacez movie_containers par lui. utilisez find () pour sélectionner le premier élément

first_movie = html_soup.find_all('div', class_ = 'lister-item mode-advanced')[0]
first_name = first_movie.h3.a.text

ou utilisez find_all () avec index

first_movie = html_soup.find('div', class_ = 'lister-item mode-advanced')
first_name = first_movie.h3.a.text

0 commentaires

1
votes

Un joli sélecteur court exploitant le combinateur de frères et sœurs adjacents pour obtenir une balise à côté de la classe

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.imdb.com/search/title?release_date=2017&sort=num_votes,desc&page=1')
soup = bs(r.content, 'lxml')
titles = [item.text for item in soup.select('.lister-item-index + a')]
print(titles)

0 commentaires