1
Dudas y pedidos generales / como es para seguir el enlace usando Scapy y Python
« en: Septiembre 03, 2020, 06:13:27 pm »Código: (python) [Seleccionar]
import scrapy
class WitsiSpider(scrapy.Spider):
name = 'witsi'
allowed_domains = ['www.quotes.toscrape.com']
start_urls = ['http://quotes.toscrape.com']
def parse(self, response):
citas = response.xpath('//*[@class="quote"]')
for cita in citas:
texto = cita.xpath('.//*[@class="text"]/text()').extract_first()
autor = cita.xpath('.//*[@class="author"]/text()').extract_first()
palabras_claves = cita.xpath('.//*[@itemprop="keywords"]/@content').extract_first()
yield{ 'Texto' : texto,
'Autor' : autor,
'Palabras Claves' : palabras_claves }
url_a_continuar = response.xpath('//ul[@class="pager"]/li[@class="next"]/a/@href').extract()
url_siguiente = response.urljoin(url_a_continuar)
yield scrapy.Request(url_siguiente, callback = self.parse)
estoy aprendiendo un poco de esto y el problema que tengo es que no puedo hacer que mi arañita siga el enlace y lo unico que hace es repetirme los datos.
Como seria para que la araña siga el enlace y pueda continuar sacando la informacion?
estoy practicando con la siguiente pagina de internet
http://quotes.toscrape.com
Bueno e editado mi araña despues de investigar y ya consegui seguir los link pero el problema es que me repite la informacion. que estare haciendo mal?
Nueva Version
Código: (python) [Seleccionar]
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
class WitsiSpider(CrawlSpider):
name = 'witsi'
allowed_domains = ['quotes.toscrape.com']
start_urls = ['http://quotes.toscrape.com']
rules = (
Rule(LinkExtractor(allow=r'page/'),callback = 'parse', follow=True ),
)
def parse(self, response):
citas = response.xpath('//*[@class="quote"]')
for cita in citas:
texto = cita.xpath('.//*[@class="text"]/text()').extract_first()
autor = cita.xpath('.//*[@class="author"]/text()').extract_first()
palabras_claves = cita.xpath('.//*[@itemprop="keywords"]/@content').extract_first()
yield{ 'Texto' : texto,
'Autor' : autor,
'Palabras Claves' : palabras_claves }
yield
Esto es una parte de la salida de mi araña y como ven en este caso son citas de poemas me los repite y a si con otros
Código: (text) [Seleccionar]
{"Texto": "\u201cA woman is like a tea bag; you never know how strong it is until it's in hot water.\u201d", "Autor": "Eleanor Roosevelt", "Palabras Claves": "misattributed-eleanor-roosevelt"},
{"Texto": "\u201cA day without sunshine is like, you know, night.\u201d", "Autor": "Steve Martin", "Palabras Claves": "humor,obvious,simile"},
{"Texto": "\u201cLife is what happens to us while we are making other plans.\u201d", "Autor": "Allen Saunders", "Palabras Claves": "fate,life,misattributed-john-lennon,planning,plans"},
{"Texto": "\u201cLife is what happens to us while we are making other plans.\u201d", "Autor": "Allen Saunders", "Palabras Claves": "fate,life,misattributed-john-lennon,planning,plans"},
{"Texto": "\u201cLife is what happens to us while we are making other plans.\u201d", "Autor": "Allen Saunders", "Palabras Claves": "fate,life,misattributed-john-lennon,planning,plans"},
{"Texto": "\u201c... a mind needs books as a sword needs a whetstone, if it is to keep its edge.\u201d", "Autor": "George R.R. Martin", "Palabras Claves": "books,mind"},
{"Texto": "\u201cYou have to write the book that wants to be written. And if the book will be too difficult for grown-ups, then you write it for children.\u201d", "Autor": "Madeleine L'Engle", "Palabras Claves": "books,children,difficult,grown-ups,write,writers,writing"},
{"Texto": "\u201cYou have to write the book that wants to be written. And if the book will be too difficult for grown-ups, then you write it for children.\u201d", "Autor": "Madeleine L'Engle", "Palabras Claves": "books,children,difficult,grown-ups,write,writers,writing"},
{"Texto": "\u201cYou have to write the book that wants to be written. And if the book will be too difficult for grown-ups, then you write it for children.\u201d", "Autor": "Madeleine L'Engle", "Palabras Claves": "books,children,difficult,grown-ups,write,writers,writing"},
{"Texto": "\u201cYou have to write the book that wants to be written. And i