Je suis novice en matière de webscaping et je souhaitais parcourir tous les portraits de personnages du site lol et quand j'ai examiné l'une des images dans le navigateur, c'était dans une balise "img scr =" url "et je veux obtenir l'URL pour télécharger l'image, mais quand je fais de la soupe.select ('img [src ] ') ou soup.select (' img ') il renvoie une liste vide et je ne sais pas pourquoi
voici le code:
data=requests.get(website) data.raise_for_status() soup = bs4.BeautifulSoup(data.text,"lxml") print(soup) #soup returns html elems = soup.select('img[src]') print(elems) #elems returns an empty list
3 Réponses :
Voici votre réponse
OUTPUT: Out[21]: [<a href="/en/game-info/get-started/">Get Started</a>, <a href="/en/game-info/get-started/what-is-lol/">What is League of Legends?</a>, <a href="https://na.leagueoflegends.com/en/site/guide/index.html">New Player Guide</a>, <a href="/en/game-info/get-started/chat-commands/">Chat Commands</a>, <a href="/en/game-info/get-started/community-interaction/">Community Interaction</a>, <a href="/en/featured/summoners-code">The Summoner's Code</a>, <a href="/en/game-info/champions/">Champions</a>, <a href="/en/game-info/items/">Items</a>, <a href="/en/game-info/summoners/">Summoners</a>, <a href="/en/game-info/summoners/spells/">Summoner Spells</a>, <a href="/en/game-info/game-modes/">Game Modes</a>, <a href="/en/game-info/game-modes/summoners-rift/">Summoner's Rift</a>, <a href="/en/game-info/game-modes/the-twisted-treeline/">The Twisted Treeline</a>, <a href="/en/game-info/game-modes/howling-abyss/">Howling Abyss</a>, <a href="//na.leagueoflegends.com/en/">Home</a>, <a href="/en/game-info/">Game Info</a>] soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml') soup.find_all('script') Out[22]: soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml') soup.find_all('a') [<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-N98J');</script>, <script>window.ga = window.ga || function(){(ga.q=ga.q||[]).push(arguments)};ga.l = +new Date;</script>, <script src="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/modernizr.js" type="text/javascript"></script>, <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>, <script src="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/riot-all.js" type="text/javascript"></script>, <script src="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/riot-kit-all.js" type="text/javascript"></script>, <script type="text/javascript">rg_force_language = 'en_US';rg_force_manifest = 'https://ddragon.leagueoflegends.com/realms/na.js';rg_assets = 'https://lolstatic-a.akamaihd.net/game-info/1.1.9';</script>, <script type="text/javascript">window.riotBarConfig = {touchpoints: {activeTouchpoint: 'game'},locale: {landingUrlPattern : 'https://na.leagueoflegends.com//game-info/'},footer: {enabled: true,container: {renderFooterInto: '#footer'}}};</script>, <script async="" src="https://lolstatic-a.akamaihd.net/riotbar/prod/latest/en_US.js"></script>, <script src="https://ddragon.leagueoflegends.com/cdn/dragonhead.js" type="text/javascript"></script>, <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/riot-dd-utils.js" type="text/javascript"></script>, <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/riot-dd-i18n.js" type="text/javascript"></script>, <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/external/jquery.lazy-load.js" type="text/javascript"></script>, <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/DDFilterApp.js" type="text/javascript"></script>, <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/DDMarkupItem.js" type="text/javascript"></script>, <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/DDMarkupContainer.js" type="text/javascript"></script>, <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/champions/ChampionsListGridItem.js" type="text/javascript"></script>, <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/champions/ChampionsListGridView.js" type="text/javascript"></script>, <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/champions/ChampionsListApp.js" type="text/javascript"></script>] soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml') soup.find_all('link') Out[23]: [<link href="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/lol-kit.css" rel="stylesheet"/>, <link href="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/css/base-styles.css" rel="stylesheet"/>, <link href="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/resources/images/favicon.ico" rel="SHORTCUT ICON"/>]
import requests from bs4 import BeautifulSoup soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml') soup.find_all('link') #these are your tags eg: a , script link
Cela peut être possible avec request, mais il semble que votre requête get n'obtienne pas la pleine pageSource.
Vous pouvez résoudre ce problème en utilisant du sélénium pour obtenir simplement le contenu.
https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Aatrox.png https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Ahri.png https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Akali.png https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Alistar.png https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Amumu.png https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Anivia.png ...
Résultat:
from selenium import webdriver import bs4 driver = webdriver.Chrome() driver.get('https://na.leagueoflegends.com/en/game-info/champions/') page_source = driver.page_source driver.close() soup = bs4.BeautifulSoup(page_source, "lxml") print(soup) elems = soup.find_all('img') for elem in elems: print(elem.attrs['src'])
Utilisez le même point de terminaison que la page. Trouvez-le dans l'onglet réseau
import requests base = 'https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/' r = requests.get('https://ddragon.leagueoflegends.com/cdn/9.11.1/data/en_US/champion.json').json() images = [base + r['data'][item]['image']['full'] for item in r['data']] print(images)