1
votes

Je fais du web scraping avec bs4 et les URL n'apparaissent pas

Je suis novice en matière de webscaping et je souhaitais parcourir tous les portraits de personnages du site lol et quand j'ai examiné l'une des images dans le navigateur, c'était dans une balise "img scr =" url "et je veux obtenir l'URL pour télécharger l'image, mais quand je fais de la soupe.select ('img [src ] ') ​​ou soup.select (' img ') il renvoie une liste vide et je ne sais pas pourquoi

​​voici le code:

data=requests.get(website)
data.raise_for_status()


soup = bs4.BeautifulSoup(data.text,"lxml")
print(soup)
#soup returns html    


elems = soup.select('img[src]')
print(elems)
#elems returns an empty list


0 commentaires

3 Réponses :


-1
votes

Voici votre réponse

OUTPUT:
Out[21]: 
[<a href="/en/game-info/get-started/">Get Started</a>,
 <a href="/en/game-info/get-started/what-is-lol/">What is League of Legends?</a>,
 <a href="https://na.leagueoflegends.com/en/site/guide/index.html">New Player Guide</a>,
 <a href="/en/game-info/get-started/chat-commands/">Chat Commands</a>,
 <a href="/en/game-info/get-started/community-interaction/">Community Interaction</a>,
 <a href="/en/featured/summoners-code">The Summoner's Code</a>,
 <a href="/en/game-info/champions/">Champions</a>,
 <a href="/en/game-info/items/">Items</a>,
 <a href="/en/game-info/summoners/">Summoners</a>,
 <a href="/en/game-info/summoners/spells/">Summoner Spells</a>,
 <a href="/en/game-info/game-modes/">Game Modes</a>,
 <a href="/en/game-info/game-modes/summoners-rift/">Summoner's Rift</a>,
 <a href="/en/game-info/game-modes/the-twisted-treeline/">The Twisted Treeline</a>,
 <a href="/en/game-info/game-modes/howling-abyss/">Howling Abyss</a>,
 <a href="//na.leagueoflegends.com/en/">Home</a>,
 <a href="/en/game-info/">Game Info</a>]

soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml')
soup.find_all('script')
Out[22]: 
soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml')
soup.find_all('a')
[<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='//www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-N98J');</script>,
 <script>window.ga = window.ga || function(){(ga.q=ga.q||[]).push(arguments)};ga.l = +new Date;</script>,
 <script src="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/modernizr.js" type="text/javascript"></script>,
 <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>,
 <script src="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/riot-all.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/riot-kit-all.js" type="text/javascript"></script>,
 <script type="text/javascript">rg_force_language = 'en_US';rg_force_manifest = 'https://ddragon.leagueoflegends.com/realms/na.js';rg_assets = 'https://lolstatic-a.akamaihd.net/game-info/1.1.9';</script>,
 <script type="text/javascript">window.riotBarConfig = {touchpoints: {activeTouchpoint: 'game'},locale: {landingUrlPattern : 'https://na.leagueoflegends.com//game-info/'},footer: {enabled: true,container: {renderFooterInto: '#footer'}}};</script>,
 <script async="" src="https://lolstatic-a.akamaihd.net/riotbar/prod/latest/en_US.js"></script>,
 <script src="https://ddragon.leagueoflegends.com/cdn/dragonhead.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/riot-dd-utils.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/riot-dd-i18n.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/external/jquery.lazy-load.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/DDFilterApp.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/DDMarkupItem.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/DDMarkupContainer.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/champions/ChampionsListGridItem.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/champions/ChampionsListGridView.js" type="text/javascript"></script>,
 <script src="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/js/champions/ChampionsListApp.js" type="text/javascript"></script>]

soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml')
soup.find_all('link')
Out[23]: 
[<link href="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/lol-kit.css" rel="stylesheet"/>,
 <link href="https://lolstatic-a.akamaihd.net/frontpage/apps/prod/LolGameInfo-Harbinger/en_US/0d258ed5be6806b967afaf2b4b9817406912a7ac/assets/assets/css/base-styles.css" rel="stylesheet"/>,
 <link href="https://lolstatic-a.akamaihd.net/lolkit/1.1.6/resources/images/favicon.ico" rel="SHORTCUT ICON"/>]
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get("https://na.leagueoflegends.com/en/game-info/champions/").text, 'lxml')
soup.find_all('link')    #these are your tags eg: a , script link 


0 commentaires

3
votes

Cela peut être possible avec request, mais il semble que votre requête get n'obtienne pas la pleine pageSource.

Vous pouvez résoudre ce problème en utilisant du sélénium pour obtenir simplement le contenu.

https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Aatrox.png
https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Ahri.png
https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Akali.png
https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Alistar.png
https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Amumu.png
https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/Anivia.png
...

Résultat:

from selenium import webdriver
import bs4

driver = webdriver.Chrome()
driver.get('https://na.leagueoflegends.com/en/game-info/champions/')
page_source = driver.page_source
driver.close()
soup = bs4.BeautifulSoup(page_source, "lxml")
print(soup)

elems = soup.find_all('img')
for elem in elems:
    print(elem.attrs['src'])


0 commentaires

1
votes

Utilisez le même point de terminaison que la page. Trouvez-le dans l'onglet réseau

import requests 

base = 'https://ddragon.leagueoflegends.com/cdn/9.11.1/img/champion/'
r = requests.get('https://ddragon.leagueoflegends.com/cdn/9.11.1/data/en_US/champion.json').json()
images = [base + r['data'][item]['image']['full'] for item in r['data']]
print(images)


0 commentaires