Scrapy не может очистить следующую страницу

Я хотел очистить информацию для следующих страниц, однако код позволяет мне очистить информацию только с первой страницы.

Мой код выглядит следующим образом:

# -*- coding: utf-8 -*-
import scrapy
from ..items import PropertyItem

class Starprop(scrapy.Spider):
name = 'starprop'
allowed_domains = ['starproperty.com']
start_urls = ['https://www.starproperty.my/to-buy/search?max_price=1000000%2B&new_launch_checkbox=on&sub_sales_checkbox=on&auction_checkbox=on&listing=For%20Sale&sort=latest&page=1']


def parse(self, response):
    item = PropertyItem ()
    property_list = response.css('.mb-4 div')

    for property in property_list:
        property_name = property.css ('.property__name::text').extract()
        property_price = property.css('.property__price::text').extract()
        property_location = property.css ('.property__location::text').extract()
        property_agent = property.css('.property__agentdetails .property__agentdetails span:nth-child(1)::text').extract()
        property_phone = property.css ('.property__agentcontacts a span::text').extract()

        item['property_name']= property_name
        item['property_price']= property_price
        item['property_location'] = property_location
        item['property_agent'] = property_agent
        item['property_phone'] = property_phone

        yield item

        next_page = response.css('.page-item:nth-child(10) .page-link::attr(href)').get()

    if next_page is not None:
        yield response.follow(next_page, callback = self.parse)

KY Lee 13.12.2020 источник

Ответы (2)

arrow_upward
0
arrow_downward

Это все о вашем allowed_domains (но вам тоже нужно исправить отступ). Также я уверен, что вы хотите определить свой элемент внутри цикла:

class Starprop(scrapy.Spider):
    name = 'starprop'
    allowed_domains = ['starproperty.my']
    start_urls = ['https://www.starproperty.my/to-buy/search?max_price=1000000%2B&new_launch_checkbox=on&sub_sales_checkbox=on&auction_checkbox=on&listing=For%20Sale&sort=latest&page=1']


    def parse(self, response):

        property_list = response.css('.mb-4 div')

        for property in property_list:
            property_name = property.css ('.property__name::text').extract()
            property_price = property.css('.property__price::text').extract()
            property_location = property.css ('.property__location::text').extract()
            property_agent = property.css('.property__agentdetails .property__agentdetails span:nth-child(1)::text').extract()
            property_phone = property.css ('.property__agentcontacts a span::text').extract()
            item = PropertyItem ()
            item['property_name']= property_name
            item['property_price']= property_price
            item['property_location'] = property_location
            item['property_agent'] = property_agent
            item['property_phone'] = property_phone

            yield item

        next_page = response.css('.page-item:nth-child(10) .page-link::attr(href)').get()

        if next_page:
            yield response.follow(next_page, callback = self.parse)

gangabass 13.12.2020

arrow_upward
0
arrow_downward

может из-за отступа? попробуйте изменить:

    yield item

    next_page = response.css('.page-item:nth-child(10) .page-link::attr(href)').get()

if next_page is not None:
    yield response.follow(next_page, callback = self.parse)

    yield item

    next_page = response.css('.page-item:nth-child(10) .page-link::attr(href)').get()

    if next_page is not None:
        yield response.follow(next_page, callback = self.parse)

Yu Jiaao 13.12.2020

comment

Привет, я попробовал ваше предложение, но все еще не смог очистить вторую страницу. Неправильный ли селектор для моей второй страницы или я пропустил какие-либо другие коды? - KY Lee; 13.12.2020

comment

пожалуйста, распечатайте все и посмотрите, что это за next_page - Yu Jiaao; 13.12.2020

Scrapy не может очистить следующую страницу

Ответы (2)

Вопросы по теме