powershell 爬虫之天气抓取(待更新)

2018-01-12 11:16:46来源:oschina作者:超速_蜗牛人点击

分享

最近看了网友用python抓取天气信息, 代码量还蛮大的.自己工作当中会用到powershell,里面正好就自带了网页访问的命令,而且代码量很少,个人感觉这其实不算爬虫,只是简单的获取网页html然后分析里面的数据罢了. Invoke-WebRequest爬部分网址会乱码,具体分析在这里 https://www.cnblogs.com/piapia/p/5452448.html


码云地址https://gitee.com/chaoyuew/powershell/tree/feature/webauto/WebAuto/Get-Weather


[CmdletBinding()]Param(
[Parameter(Mandatory=$false)]
$city_code=101020100 #后期会有如何用fiddler抓取城市对应的code,现在默认抓取上海
)
$start_date=get-date
Invoke-WebRequest "http://www.weather.com.cn/weather/$city_code.shtml" -OutFile $env:temp/test.txt #saved result to temp file, will be deleted later
if((Get-Content $env:temp/test.txt -raw -encoding UTF8) -imatch "(?ms)

.*今天)

.*明天)"){
$shtwhc=$Matches[0] # shtwhc is short of "shanghai today weather html content"
$shanghai_today_weather=[ordered]@{
"城市"="上海"
"日期"=(get-date).ToString("yyyy/MM/dd")
"天气"=(($shtwhc -split("wea`">"))[1].split("<")[0])
"温度"=((($shtwhc -split("℃<"))[0] -split("i>"))[-1] + "℃")
"风力"=(($shtwhc -split("span title=`""))[1].split("`"")[0] +": "+ ($shtwhc.split("级")[0].split(">")[-1] +"级"))
}
$shanghai_today_weather|Out-GridView
}
del $env:temp/test.txt -Force -ea 0
((get-date) - $start_date).Seconds # to see how many seconds are used for the query

ps:现在代码语言里面还没有powershell的选项,毕竟小众语言,找了半天也没找到在哪里反馈给工作人员.ai

最新文章

123

最新摄影

闪念基因

微信扫一扫

第七城市微信公众平台