最近查异地就医发现了这个目标网站
http://www.ahga.gov.cn:8087/was2/was/hall/V1/jsp/cz/cyc.jsp?type=hz&flag=2
发现可以查新生儿重名。
但是有验证码
每次输入验证码多麻烦啊,是时候拿出tesseract-ocr了
先F12分析一波,发现果然是官方查询,验证方式让人无语!
前端生成的验证码值GET形式发送给服务器。
http://www.ahga.gov.cn:8087/was2//core/checkCode/write_img.jsp?rand=avus
带了一个rand参数就是前端生成的随机四位数验证码发送给服务器然后服务器返回验证码图片,
本来打算用tesseract-ocr来识别这个格式的验证码,后来发现前端生成前端验证,查询接口没有任何验证检查哈哈哈。
拿起Python瞎写吧?
#新生儿重名查询
#2019年7月16日07:22:12
#sencom1997@outlook.com
import requests
q_url = "http://www.ahga.gov.cn:8087/was2/ahsga/queryNewName.action"
SJLYSD = {'全省','合肥','淮北','亳州','宿州','蚌埠','阜阳','淮南','滁州','六安','马鞍山','芜湖','宣城','铜陵','池州','安庆','黄山'}
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'
}
if __name__ == "__main__":
name = input("请输入姓名>")
for city in SJLYSD:
data = {
"fn":"query",
"XM":name,
"SJLYSD":city,
"rand":"aaaa"
}
rq = requests.post(q_url, headers=headers, data=data)
print(city+":"+rq.text)
Python 3.7.1rc1 (v3.7.1rc1:2064bcf6ce, Sep 26 2018, 15:15:36) [MSC v.1914 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>>
========== RESTART: C:\Users\Michael Jiang\Desktop\新生儿重名查询\main.py ==========
请输入姓名>李杰英
淮北:{"result":"5"}
阜阳:{"result":"86"}
芜湖:{"result":"0"}
蚌埠:{"result":"4"}
宣城:{"result":"0"}
亳州:{"result":"52"}
合肥:{"result":"3"}
黄山:{"result":"3"}
马鞍山:{"result":"0"}
池州:{"result":"1"}
滁州:{"result":"1"}
宿州:{"result":"4"}
铜陵:{"result":"0"}
全省:{"result":"174"}
六安:{"result":"9"}
安庆:{"result":"2"}
淮南:{"result":"4"}
>>>
李杰英还是比较多的。
重新包装一下这个接口???
http://sencom.top/queryNewName/
本来准备直接用js进行POST请求的,无奈js我的技术不佳,写起来不顺手,所以还是老方法,直接php把form数据拿到后转给python处理吧。
index.php
<html lang="zh-CN">
<head>
<title>新生儿重名查询</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<!--移动设备适应屏幕-->
<meta name="viewport" content="width=device-width, initial-scale=1">
<!-- 最新版本的 Bootstrap 核心 CSS 文件 -->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@3.3.7/dist/css/bootstrap.min.css" integrity="sha384-BVYiiSIFeK1dGmJRAkycuHAHRg32OmUcww7on3RYdg4Va+PmSTsz/K68vbdEjh4u" crossorigin="anonymous">
<!-- 可选的 Bootstrap 主题文件(一般不用引入) -->
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@3.3.7/dist/css/bootstrap-theme.min.css" integrity="sha384-rHyoN1iRsVXV4nD0JutlnGaslCJuC7uwjduW9SVrLvRYooPp2bWYgmgJQIXwl/Sp" crossorigin="anonymous">
<!-- 最新的 Bootstrap 核心 JavaScript 文件 -->
<script src="https://cdn.jsdelivr.net/npm/bootstrap@3.3.7/dist/js/bootstrap.min.js" integrity="sha384-Tc5IQib027qvyjSMfHjOMaLkfuWVxZxUPnCJA7l2mCWNIpG9mGCD8wGNIcPD7Txa" crossorigin="anonymous"></script>
</head>
<h1>新生儿重名查询-安徽</h1>
<form class="form-horizontal" action="index.php" method="post">
<div class="col-xs-4">
<label for="inputEmail3" class="col-sm-2 control-label">姓名</label>
<div class="col-sm-10">
<input type="name" class="form-control" name="name" placeholder="姓名">
</div>
</div>
<div class="col-xs-4">
<div class="col-sm-offset-2 col-sm-10">
<input type="submit" class="btn btn-primary"></input>
</div>
</div>
</form>
<?php
$cityArr = array('全省','合肥','淮北','亳州','宿州','蚌埠','阜阳','淮南','滁州','六安','马鞍山','芜湖','宣城','铜陵','池州','安庆','黄山','NULL');
$locale='en_US.UTF-8'; // 或 $locale='zh_CN.UTF-8';解决exec传递$name时python报错
setlocale(LC_ALL,$locale);
putenv('LC_ALL='.$locale);
@$name = $_POST["name"];
echo "<br/><br/><br/><br/>";
$res = exec("python3 queryNewName.py ".$name);
$numArr = explode(",",$res);
for($i = 0; $i <17; $i++){
echo $cityArr[$i].":".$numArr[$i]."<br/>";
}
?>
</body>
</html>
queryNewName.py
# -*- coding: UTF-8 -*-
#新生儿重名查询
#2019年7月16日07:22:12
#sencom1997@outlook.com
import requests
import json
import sys
name = sys.argv[1]
q_url = "http://www.ahga.gov.cn:8087/was2/ahsga/queryNewName.action"
SJLYSD = ['全省','合肥','淮北','亳州','宿州','蚌埠','阜阳','淮南','滁州','六安','马鞍山','芜湖','宣城','铜陵','池州','安庆','黄山']
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 QIHU 360SE'
}
if __name__ == "__main__":
res = []
out = ""
for city in SJLYSD:
data = {
"fn":"query",
"XM":name,
"SJLYSD":city,
"rand":"aaaa"
}
rq = requests.post(q_url, headers=headers, data=data)
j = json.loads(rq.text)
res.append(j['result'])
#print(res)
for each in res:
out = out + each+','
print(out)
几个坑注意一下:
这台服务器php执行exec时使用的时apache账户,所以要将apache加到sudoers里面
php执行exec的时候传递参数是中文时有乱码情况通过查看/var/log/error_log错误日志可以发现
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-5: surrogates not allowed
Traceback (most recent call last):
File "/var/www/html/queryNewName/queryNewName.py", line 26, in <module>
rq = requests.post(q_url, headers=headers, data=data)
File "/usr/local/lib/python3.6/site-packages/requests/api.py", line 116, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 519, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 462, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 316, in prepare
self.prepare_body(data, files, json)
File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 507, in prepare_body
body = self._encode_params(data)
File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 104, in _encode_params
v.encode('utf-8') if isinstance(v, str) else v))
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-5: surrogates not allowed
此时就需要设置php
$locale='en_US.UTF-8'; // 或 $locale='zh_CN.UTF-8';解决exec传递$name时python报错
setlocale(LC_ALL,$locale);
putenv('LC_ALL='.$locale);
抽空搞一个全国的?