Python爬虫实战，requests模块，Python实现爬取自主品牌汽车

他说Python 2021年12月29日浏览：1806 评论：1 收藏：1

本次收集的数据，都是自主品牌汽车，外资合资通通不要。首先去中国政府网来看看今年上半年自主品牌汽车的销量，这里面不单单包含乘用车。所以你会发现，新能源汽车老大—比亚迪，客车老大—宇通，商用车老大—北汽福田，以及皮尺部—众泰并不在榜上。

Python腾讯大牛直播预约：

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图1

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图2

新能源汽车与燃油车无非就是动力方面有区别，车型没什么区别，所以就以爬取燃油车为例。这里插一句，其实车企研发一辆新款车型不容易的，一般需要耗资上亿，在研发的时候就会考虑燃油，混动，纯电动三种类型的。现在所谓的电动车，好多都是是车企的热销车型改变动力形式而已(因为电动车不好卖，不能专门去研发一波吧，亏本的生意~)。接下来就一个个来说啦！！！

1. 上汽

上汽的自主品牌有荣威，名爵，大通，五菱，宝骏。上汽算是很多汽车人的奋斗目标，优越的地理位置，相对不错的薪水。不过比起互联网，还是捉襟见肘。

比如来看看下面这些数据。2017年上汽的全年营收是 8579.78 亿，净利润 344.1 亿。2017年腾讯的全年营收是 2377.6 亿元，净利润 715 亿元。上汽利润率 4% ，腾讯利润率 30% ，只能说都是行业巨头，差距咋这么大呢？

上汽荣威

import os
import requests
from bs4 import BeautifulSoup

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = 'http://www.roewe.com.cn/htmlinclude/header.html'
response = requests.get(url=url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
Car_Type = ['Car','SUV']
for i in [1, 2]:
    folder_path = "F:/Car/SAIC Motor/roewe/" + Car_Type[i - 1] + "/"
    os.makedirs(folder_path)
    ul = (soup.find_all(class_='clearfix ul' + str(i)))[0]
    img = ul.find_all(name='img')
    for item in img:
        url = 'http://www.roewe.com.cn' + item['src']
        r = requests.get(url)
        picture_name = url.replace('http://www.roewe.com.cn/images/headernav/', '')
        with open('F:\\Car\\SAIC Motor\\roewe\\' + Car_Type[i-1] + "\\" + picture_name, 'wb') as f:
            f.write(r.content)
        f.close()
        print(url)
    print('\n\n')

{ 左右滑动切换图片 }

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图3

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图4

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图5

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图6

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图7

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图8

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图9

上汽名爵

import os
import requests
from bs4 import BeautifulSoup

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = 'http://www.saicmg.com/'
response = requests.get(url=url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
ul = soup.find_all(class_='se_tu')[0]
img = ul.find_all(class_='img100')[0:6]
folder_path = "F:/Car/SAIC Motor/mg/"
os.makedirs(folder_path)
for item in img:
    url = 'http://www.saicmg.com/' + item['src']
    r = requests.get(url)
    picture_name = url.replace('http://www.saicmg.com/images/', '')
    with open('F:\\Car\\SAIC Motor\\mg\\' + picture_name, 'wb') as f:
        f.write(r.content)
    f.close()
    print(url)

{ 左右滑动切换图片 }

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图10

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图11

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图12

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图13

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图14

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图15

上汽大通

import os
import requests
from bs4 import BeautifulSoup

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = 'https://www.saicmaxus.com/'
response = requests.get(url=url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
ul = soup.find_all(class_='item show clearfix')
Car_Type = ['MPV', 'SUV', 'PICK UP', 'MPV-1', 'MPV-2']
num = 0
for a in ul[:5]:
    num += 1
    folder_path = "F:/Car/SAIC Motor/maxus/" + Car_Type[num - 1] + "/"
    os.makedirs(folder_path)
    img = a.find_all(name='img')
    for item in img:
        url = 'https://www.saicmaxus.com/' + item['src']
        r = requests.get(url)
        picture_name = url.replace('https://www.saicmaxus.com//static/series/', '').replace('https://www.saicmaxus.com//uploads/month_1712/20171229075', '')
        with open('F:\\Car\\SAIC Motor\\maxus\\' + Car_Type[num-1] + "\\" + picture_name, 'wb') as f:
            f.write(r.content)
        f.close()
        print(url)

{ 左右滑动切换图片 }

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图16

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图17

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图18

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图19

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图20

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图21

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图22

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图23

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图24

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图25

上汽宝骏五菱

import os
import requests
from bs4 import BeautifulSoup

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = 'https://www.sgmw.com.cn/'
response = requests.get(url=url, headers=headers)
html = response.text
soup = BeautifulSoup(html, 'html.parser')
ul = soup.find_all(class_='det_box')
Car_Type = ['SUV', 'MPV', 'Car', 'Mini-Car']
num = 0
for i in range(len(ul)):
    num += 1
    folder_path = "F:/Car/SAIC Motor/sgmw/" + Car_Type[num - 1] + "/"
    os.makedirs(folder_path)
    p = ul[i]
    box = p.find_all(class_='itembox')
    for j in range(len(box)):
        g = (box[j].find_all(class_='item_img'))[0]
        item = (g.find_all(name='img'))[0]
        url = 'https://www.sgmw.com.cn/' + item['src']
        r = requests.get(url)
        picture_name = url.replace('https://www.sgmw.com.cn/images/childnav/', '').replace('https://www.sgmw.com.cn/images/', '').replace('https://www.sgmw.com.cn/hy310w/images/310w/', '').replace('510/', '').replace('s3/', '')
        with open('F:\\Car\\SAIC Motor\\sgmw\\' + Car_Type[num - 1] + "\\" + picture_name, 'wb') as f:
            f.write(r.content)
        f.close()
        print(url)

{ 左右滑动切换图片 }

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图26

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图27

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图28

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图29

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图30

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图31

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图32

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图33

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图34

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图35

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图36

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图37

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图38

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图39

2. 长安

长安作为自主品牌的老大哥，现如今的奇瑞已经濒临出售的局面，这个老大哥又该何去何从呢？

import os
import re
import requests

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = 'http://www.changan.com.cn/cache/car_json.js'
response = requests.get(url=url, headers=headers)
res = response.text
result = re.findall('"car_model_photo":"(.*?)","car_model_price_name"', res, re.S)
Car_Type = ['Car', 'SUV', 'MPV']
for i in range(3):
    folder_path = "F:/Car/CHANGAN/" + Car_Type[i] + "/"
    os.makedirs(folder_path)
for j in range(16):
    url = 'http:' + result[j].replace('\\', '')
    r = requests.get(url)
    picture_name = url.replace('http://www.changan.com.cn/uploads/car_model_photo/', '')
    if j < 9:
        with open('F:\\Car\\CHANGAN\\Car\\' + picture_name, 'wb') as f:
            f.write(r.content)
    elif j < 15:
        with open('F:\\Car\\CHANGAN\\SUV\\' + picture_name, 'wb') as f:
            f.write(r.content)
    else:
        with open('F:\\Car\\CHANGAN\\MPV\\' + picture_name, 'wb') as f:
            f.write(r.content)
    f.close()
    print(url)

{ 左右滑动切换图片 }

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图40

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图41

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图42

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图43

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图44

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图45

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图46

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图47

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图48

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图49

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图50

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图51

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图52

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图53

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图54

Python爬虫实战，requests模块，Python实现爬取自主品牌汽车的图55

3. 吉利汽车

吉利自从收购了沃尔沃后，便飞速发展，成为世界500强。一方面，是从沃尔沃学到了很多东西，设计、制造、采购、营销等。另一方面，也是离不开政策的支持，比如你看看它现在可是戴姆勒的第一大股东，吉利哪里来那么多钱，显而易见~

import os
import requests
from lxml import etree

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
url = 'http://www.geely.com/?mz_ca=2071413&mz_sp=7D3ws&mz_kw=8398784&mz_sb=1'
response = requests.get(url=url, headers=headers)
res = response.text
html = etree.HTML(response.text)
result = html.xpath('//div[@class="car"]/img/@src')
Car_Type = ['Car', 'SUV']
for i in range(2):
    folder_path = "F:/Car/GEELY_AUTO/" + Car_Type[i] + "/"
    os.makedirs(folder_path)
for j in range(17):
    url = result[j]
    r = requests.get(url)
    picture_name = url.replace('https://dm30webimages.geely.com/GeelyOfficial/Files/Car/CarType/', '')
    if 0 < j < 4 or 6 < j < 12 or j == 16:
        with open('F:\\Car\\GEELY_AUTO\\Car\\' + picture_name, 'wb') as f:
            f.write(r.content)
    elif 3 < j < 7 or 11 < j < 16:
        with open('F:\\Car\\GEELY_AUTO\\SUV\\' + picture_name, 'wb') as f:
            f.write(r.content)
    else:
        continue
    f.close()
    print(url)