【已解决】100元求助，获取网页地区及数据处理 - 有偿求助 - 批处理之家 BAT,CMD,批处理,PowerShell,VBS,DOS

返回列表发帖

buyiyang

少尉

Rank: 5 Rank: 5

帖子: 341
积分: 649
技术: 96
捐助: 0
注册时间: 2022-3-26

1楼 跳转到 »

发表于 2023-12-19 20:51 | 显示全部帖子

用python比较方便

import requests
import time
from bs4 import BeautifulSoup
from urllib.parse import urljoin
from pypinyin import pinyin, lazy_pinyin, Style

def get_lower_pinyin(string):
    pinyin_list = lazy_pinyin(string, style=Style.NORMAL)
    lower_pinyin = ''.join(pinyin_list)
    return lower_pinyin

def get_link_text(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
    }
    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        response.encoding = 'UTF8'  # Set the correct encoding
        html = response.text
        soup = BeautifulSoup(html, 'html.parser')
        td_elements = soup.find_all('td')        
        for td in td_elements:
            if td.find('a') and not td.attrs:
                links = td.find_all('a')
                for link in links:
                    link_text = link.get_text()
                    if not link_text.isdigit():
                        absolute_url = urljoin(url, link['href'])
                        print(link_text)
                        with open(file_path, 'a') as file:
                            file.write(link_text+","+get_lower_pinyin(link_text)+"\n")
                        time.sleep(0.5)
                        get_link_text(absolute_url)

url = "https://www.stats.gov.cn/sj/tjbz/tjyqhdmhcxhfdm/2023/"
file_path = r"r:\2.csv"
get_link_text(url)
复制代码

TOP

buyiyang

少尉

Rank: 5 Rank: 5

帖子: 341
积分: 649
技术: 96
捐助: 0
注册时间: 2022-3-26

2楼

发表于 2023-12-20 00:01 | 显示全部帖子

https://pan.baidu.com/s/1MPrNXYHGcxRn0Q1fF1p8ow?pwd=s9wi

TOP

buyiyang

少尉

Rank: 5 Rank: 5

帖子: 341
积分: 649
技术: 96
捐助: 0
注册时间: 2022-3-26

3楼

发表于 2023-12-20 17:11 | 显示全部帖子

已在6楼网盘链接中更新

TOP

返回列表

[新手上路]批处理新手入门导读	[视频教程]批处理基础视频教程	[视频教程]VBS基础视频教程	[批处理精品]批处理版照片整理器
[批处理精品]纯批处理备份&还原驱动	[批处理精品]CMD命令50条不能说的秘密	[在线下载]第三方命令行工具	[在线帮助]VBScript / JScript 在线参考

[收藏此主题] [关注此主题的新回复]

[通过 QQ、MSN 分享给朋友]