Tags: Python web scraping, reverse engineering, RSA encryption, Requests, BeautifulSoup
Environment: Windows 10 x64 + Python 3.11 + requests 2.32 + beautifulsoup4 4.12
Example site: All domain names and Cookie names mentioned in this article have been anonymized and replaced with placeholders such asexample.edu.cn/learning.example.edu.cn, for technical research purposes only.
1 Background: Why Write This Script?
Many universities and enterprises use CAS Single Sign-On (Central Authentication Service) for their online teaching or academic platforms. When we want to periodically scrape schedules, download materials, or automate check-ins, repeatedly entering a student ID and password manually is extremely inconvenient — hence the need for a "headless" script to handle login automatically.
2 Step 1: Define the Goal and Break Down the Tasks
| Sub-task | Description | Success Criterion |
|---|---|---|
① Get execution | Dynamic hidden field on the CAS login page | Printed value looks like e1s1 … |
| ② Fetch the public key | /cas/v2/getPubKey returns modulus and exponent | Can be parsed as large integers |
| ③ Replicate front-end encryption | security.js → encryptedString() | Matches browser output |
| ④ Submit the form | Contains username, encrypted password, and execution | Server returns 302, Location carries ticket= |
| ⑤ Follow the ticket | .../fromcas?ticket=ST-... | Server sends Set-Cookie: AUTHORIZATION= |
| ⑥ Print the Cookie | sess.cookies.get("AUTHORIZATION") | Terminal outputs a 32-byte hex string |
3 Step 2: Capture Traffic & Reconstruct the Login Flow
- Tools: Browser DevTools (Network), Fiddler, or Charles all work.
- Overview of steps
- Visit
https://auth.example.edu.cn/cas/login?service=https://learning.example.edu.cn/api/fromcas. - Fill in the username and password and submit; the server returns 302 with
ticket=ST-...in the Location header. The browser automatically GETs the ticket URL; the server redirects again to the home page and issues a business-domain Cookie:
Set-Cookie: AUTHORIZATION=3EE2248EABJH8CBB920881E67729341A; Domain=.example.edu.cn; Path=/
- Visit
- Key findings
executionchanges on every page refresh; an old value triggers a "session idle timeout" error.- The password is neither plaintext nor Base64 — it is RSA-encrypted and output as hex blocks separated by spaces.
4 Step 3: Identify the Dynamic Parameter execution
Searching for execution in the login page source reveals something like:
In the script, extract it with a regex or BeautifulSoup:
5 Step 4: Diving into security.js — Reverse-Engineering the RSA Encryption
The core front-end logic (simplified) is as follows:
- Reversal: The plaintext is reversed as a whole.
- Little-Endian:
encryptedString()combines two bytes into one digit, with the low byte first. - Output: Each ciphertext block is converted to lowercase hex, blocks are separated by spaces, with no Base64 or PKCS#1 padding.
6 Step 5: Replicating the Front-End Encryption in Python
7 Step 6: Assembling the First POST Request
Success indicator: resp.status_code == 302 and Location contains ticket=.
8 Step 7: Following the 302 to Obtain the Business Cookie
9 Step 8: Putting It All Together as a Complete Script
Full anonymized example script:
10 Step 9: Common Pitfalls & Debugging Tips
| Symptom | Cause | Solution |
|---|---|---|
| "Session idle timeout" error | execution has expired | Always GET the login page first, then submit immediately |
| Returns 200 instead of 302 | Password encryption mismatch | Compare against the browser's ciphertext; check reversal & Little-Endian |
AUTHORIZATION=None | Ticket URL not visited / ticket expired | Follow the redirect or log in again |
| CAPTCHA appears | Too many failures triggered risk control | Enter manually or integrate a CAPTCHA-solving service |
11 Closing: Possible Extensions
- Scheduled tasks: Use cron or Windows Task Scheduler to run the script on a schedule.
- Bulk material downloads: Scrape the course file list API and download in a loop.
- Email/message notifications: Send SMTP alerts when new announcements are detected.
- GUI tool: Wrap it in a PyQt or Tkinter interface to share with classmates.
Disclaimer
All techniques in this article are for personal learning and research only. The example domain names and Cookies are fictional and do not correspond to any real system. Do not use this script for unauthorized bulk access; you are solely responsible for any consequences.