利用pytesseract库通过ocr绕过简单图片验证码实现图书馆座位预约

利用pytesseract库通过ocr绕过简单图片验证码实现图书馆座位预约

摘要:隔离无聊随便写点东西玩玩。

0x01写在前面

学校图书馆限流,进图书馆需要预约。于是想写一个脚本在每天八点(开始预约的时间)定时进行预约,省的自己开网址(主要被隔离了闲着没事做,写点玩具玩玩…)。定时功能挂在vps或者本地自己的PC上都可,网上很多资料,看这几篇文章即可:
Windows 任务调度程序定时执行Python脚本
Linux–CentOS定时运行Python脚本
这里不赘述。

0x02使用工具及所处环境

  • Pychram2019.1 x64
  • Python第三方库requests、pytesseract
  • Python3.6
  • Win10 64

0x03脚本及解析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
# -*- utf-8 -*-
# @Author : Hn13
# @Blog : https://www.hn13.top
import requests
import pytesseract
import time
from PIL import Image, ImageFile

ImageFile.LOAD_TRUNCATED_IMAGES = True
'''
需要修改的地方:
1、你的手机号;
2、你的学号和密码;
3、存储验证码图片的文件路径。
'''
def bypass():
url = "http://booking.zstu.edu.cn/api.php/check"
cookies = getstartcookies()
header = {
"Accept": "image/webp,image/apng,image/*,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "zh-CN,zh;q=0.9,zh-TW;q=0.8,en;q=0.7",
"Connection": "keep-alive",
"Cookie": cookies,
"Host": "booking.zstu.edu.cn",
"Referer": "http://booking.zstu.edu.cn/book/notice/act_id/256/type/4/lib/11",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36"
}
r = requests.get(url, headers=header)
f = open("D:\\test.png", "wb") # 文件路径需要修改
if f.write(r.content):
print("get the varifycode!")
f.close()
im = Image.open("D:\\test.png")
code = pytesseract.image_to_string(im)
code = code.replace(" ", "")
print("the verifycode is:" + code)
return str(code), cookies


def gettime():
now = time.localtime()
start = 12
return 256 + now.tm_mday - start


def getstartcookies():
id = gettime()
url = "http://booking.zstu.edu.cn/book/notice/act_id/" + str(id) + "/type/4/lib/11"
s = requests.session()
cook = s.get(url).cookies
cookies = ''
for x in cook:
cookies += x.name + '=' + x.value + ';'
cookies = cookies[:len(cookies)-1]
return cookies


def getlogincookies():
code, cookies = bypass()
print(cookies)
url = "http://booking.zstu.edu.cn/api.php/login"
# 你的学号和密码在这改
data = {
"username": "2018336711052",
"password": "000",
"verify": code
}
header = {
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Encoding": "gzip, deflate",
"Connection": "keep-alive",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Cookie": cookies,
"Host": "booking.zstu.edu.cn",
"Origin": "http://booking.zstu.edu.cn",
"Referer": "http://booking.zstu.edu.cn/book/notice/act_id/256/type/4/lib/11",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
s = requests.session()
cook = s.post(url, data=data, headers=header)
print(str(cook.content, encoding="utf-8"))
if "\\u9a8c\\u8bc1\\u7801\\u9519\\u8bef\\uff0c\\u8bf7\\u91cd\\u65b0\\u8f93\\u5165" in str(cook.content, encoding="utf-8"):
return getlogincookies()
else:
cookies_login = ''
for x in cook.cookies:
cookies_login += x.name + '=' + x.value + ';'
cookies_login = cookies_login[:len(cookies_login) - 1]
return cookies_login, cookies


def book():
# 手机号在这改
phonenum = "18757591052"
url = "http://booking.zstu.edu.cn/api.php/activities/256/application2?mobile={}&id={}".format(phonenum, gettime())
login_cookies, cookies = getlogincookies()
header = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "zh-CN,zh;q=0.9,zh-TW;q=0.8,en;q=0.7",
"Connection": "keep-alive",
"Cookie": cookies + "redirect_url=%2Fbook%2Fnotice%2Fact_id%2F256%2Ftype%2F4%2Flib%2F11; \
userid=2018336711052; " + login_cookies, # 学号在这改
"Host": "booking.zstu.edu.cn",
"Referer": "http://booking.zstu.edu.cn/book/notice/act_id/256/type/4/lib/11",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) \
Chrome/81.0.4044.138 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
r = requests.get(url, headers=header)
if "\\u6d3b\\u52a8\\u7533\\u8bf7\\u5931\\u8d25\\uff0c\\u5df2\\u7533\\u8bf7\\u7684\\u6d3b\\u52a8\\u6\
5f6\\u95f4\\u51b2\\u7a81\\uff01" in str(r.content, encoding="utf-8"):
print("You have booked the seat!")
else:
print("Success!")
'''cookies = ''
for x in cook:
cookies += x.name + '=' + x.value + ';'
cookies = cookies[:len(cookies) - 1]
print(cookies)'''


def main():
book()


if __name__ == '__main__':
main()

对申请的url请求简单分析如下:
进入url:http://booking.zstu.edu.cn/book/notice/act_id/256/type/4/lib/11 出现如下界面

点击我要预约,出现登录界面,需要验证码。

可以看到是比较简单的验证码,直接使用一个简单Python的ocr绕过即可。我们使用pytesseract,使用方式参考我的bypass()函数。注意,这边需要在获取验证码前得到一个cookie,因此要先访问上面那个url,所以这边写了一个getstartcookie()函数。
登录进去就简单了,登录的过程中拿到cookie,然后直接抓包换上参数预约即可。最后的实现效果如下(在pycharm下运行的):

这边有两个问题:
1、ocr不一定每次都准确,所以看到有一个多次发包的过程;
2、这边已经预约过了,一般预约成功过了是“success!”。

0x04写在后面

感觉被隔离真的很不爽…这个就随便写来玩玩的,代码写得也很凌乱,为了图省事直接把原始浏览器的header都拿来用了。大伙就看个乐呵吧。


评论