1
Dudas y pedidos generales / como es para seguir el enlace usando Scapy y Python
« en: Septiembre 03, 2020, 06:13:27 pm »Código: Python
- import scrapy
- class WitsiSpider(scrapy.Spider):
- name = 'witsi'
- allowed_domains = ['[url=http://www.quotes.toscrape.com]www.quotes.toscrape.com[/url]']
- start_urls = ['[url]http://quotes.toscrape.com[/url]']
- def parse(self, response):
- citas = response.xpath('//*[@class="quote"]')
- for cita in citas:
- texto = cita.xpath('.//*[@class="text"]/text()').extract_first()
- autor = cita.xpath('.//*[@class="author"]/text()').extract_first()
- palabras_claves = cita.xpath('.//*[@itemprop="keywords"]/@content').extract_first()
- yield{ 'Texto' : texto,
- 'Autor' : autor,
- 'Palabras Claves' : palabras_claves }
- url_a_continuar = response.xpath('//ul[@class="pager"]/li[@class="next"]/a/@href').extract()
- url_siguiente = response.urljoin(url_a_continuar)
- yield scrapy.Request(url_siguiente, callback = self.parse)
estoy aprendiendo un poco de esto y el problema que tengo es que no puedo hacer que mi arañita siga el enlace y lo unico que hace es repetirme los datos.
Como seria para que la araña siga el enlace y pueda continuar sacando la informacion?
estoy practicando con la siguiente pagina de internet
http://quotes.toscrape.com
Bueno e editado mi araña despues de investigar y ya consegui seguir los link pero el problema es que me repite la informacion. que estare haciendo mal?
Nueva Version
Código: Python
- import scrapy
- from scrapy.spiders import CrawlSpider, Rule
- from scrapy.linkextractors import LinkExtractor
- class WitsiSpider(CrawlSpider):
- name = 'witsi'
- allowed_domains = ['quotes.toscrape.com']
- start_urls = ['[url]http://quotes.toscrape.com[/url]']
- rules = (
- Rule(LinkExtractor(allow=r'page/'),callback = 'parse', follow=True ),
- )
- def parse(self, response):
- citas = response.xpath('//*[@class="quote"]')
- for cita in citas:
- texto = cita.xpath('.//*[@class="text"]/text()').extract_first()
- autor = cita.xpath('.//*[@class="author"]/text()').extract_first()
- palabras_claves = cita.xpath('.//*[@itemprop="keywords"]/@content').extract_first()
- yield{ 'Texto' : texto,
- 'Autor' : autor,
- 'Palabras Claves' : palabras_claves }
- yield
Esto es una parte de la salida de mi araña y como ven en este caso son citas de poemas me los repite y a si con otros
Código: Text
- {"Texto": "\u201cA woman is like a tea bag; you never know how strong it is until it's in hot water.\u201d", "Autor": "Eleanor Roosevelt", "Palabras Claves": "misattributed-eleanor-roosevelt"},
- {"Texto": "\u201cA day without sunshine is like, you know, night.\u201d", "Autor": "Steve Martin", "Palabras Claves": "humor,obvious,simile"},
- {"Texto": "\u201cLife is what happens to us while we are making other plans.\u201d", "Autor": "Allen Saunders", "Palabras Claves": "fate,life,misattributed-john-lennon,planning,plans"},
- {"Texto": "\u201cLife is what happens to us while we are making other plans.\u201d", "Autor": "Allen Saunders", "Palabras Claves": "fate,life,misattributed-john-lennon,planning,plans"},
- {"Texto": "\u201cLife is what happens to us while we are making other plans.\u201d", "Autor": "Allen Saunders", "Palabras Claves": "fate,life,misattributed-john-lennon,planning,plans"},
- {"Texto": "\u201c... a mind needs books as a sword needs a whetstone, if it is to keep its edge.\u201d", "Autor": "George R.R. Martin", "Palabras Claves": "books,mind"},
- {"Texto": "\u201cYou have to write the book that wants to be written. And if the book will be too difficult for grown-ups, then you write it for children.\u201d", "Autor": "Madeleine L'Engle", "Palabras Claves": "books,children,difficult,grown-ups,write,writers,writing"},
- {"Texto": "\u201cYou have to write the book that wants to be written. And if the book will be too difficult for grown-ups, then you write it for children.\u201d", "Autor": "Madeleine L'Engle", "Palabras Claves": "books,children,difficult,grown-ups,write,writers,writing"},
- {"Texto": "\u201cYou have to write the book that wants to be written. And if the book will be too difficult for grown-ups, then you write it for children.\u201d", "Autor": "Madeleine L'Engle", "Palabras Claves": "books,children,difficult,grown-ups,write,writers,writing"},
- {"Texto": "\u201cYou have to write the book that wants to be written. And i