Technical Disclaimer: The content of this article is intended solely for technical research and learning purposes. Any data requests should comply with the target website's robots.txt protocol and terms of service.
Part 1: Protocol Reverse Engineering Analysis
1. Background
While developing a Sky: Children of the Light daily task push plugin, I encountered a tricky problem: the original solution relied on users manually filling in Weibo Cookies (SUB and XSRF-TOKEN), which caused two pain points:
- Poor user experience: Regular users don't know how to extract Cookies from their browser
- High maintenance cost: Cookies have a short validity period and need to be updated frequently
Directly requesting the Weibo API (e.g., /api/container/getIndex) returns a 432 error, indicating that even in "visitor" mode, the server has a strict session management mechanism.
Technical Goal: Reverse engineer the Weibo H5 visitor authentication protocol to automatically obtain temporary visitor credentials, allowing the plugin to access Weibo data without requiring the user to provide Cookies.
2. Protocol Analysis: Three-Phase Authentication Chain
Through packet capture analysis with Edge DevTools, comparing before and after clearing Cookies, I confirmed that Weibo H5 visitor authentication is a three-phase closed loop involving cross-domain interaction.
2.1 Authentication Architecture Diagram
2.2 Phase 1: Identity Token Issuance
Endpoint: POST https://visitor.passport.weibo.cn/visitor/genvisitor2
Key Parameters:
Technical Challenges:
- JSONP response parsing: The response format is
visitor_gray_callback({...})rather than standard JSON, requiring regex extraction - Manual Cookie injection: The server does not issue SUB via Set-Cookie; it must be extracted from the response payload and manually written into the CookieJar
Core Code:
2.3 Phase 2: Session Initialization
Endpoint: GET https://m.weibo.cn/
Function: Access the main site with the SUB Cookie to activate the session and obtain the XSRF-TOKEN
Key Points:
- Must carry the SUB Cookie obtained in Phase 1
- The server issues XSRF-TOKEN via the Set-Cookie response header
- XSRF-TOKEN is a short-lived, session-level token
2.4 Phase 3: Protected Resource Access
Security Mechanism: Double Submit Cookie pattern for CSRF defense
Client Requirements:
- Cookie level: Carry SUB and XSRF-TOKEN
- Header level: Write the value of XSRF-TOKEN into the
x-xsrf-tokenrequest header
Server-Side Validation:
3. PoC Implementation
Complete proof-of-concept code:
Part 2: Engineering Practice - Integration into AstrBot Plugin
4. Real-World Challenges
When integrating the PoC into a production environment, the following challenges were encountered:
- API differences: The response format of the mobile API differs from the PC API
- Performance optimization: Avoid repeated authentication by reusing sessions
- Error handling: Fallback mechanisms for network errors and authentication failures
- Data adaptation: HTML tag cleanup and newline preservation
5. Architecture Design
5.1 Dual-Scheme Architecture
A smart fallback mechanism was designed:
Workflow:
- Prioritize using the user-configured Cookie (PC API)
- If the Cookie is not configured or has expired, automatically switch to the Cookie-free scheme (mobile API)
5.2 Spider Class Refactoring
6. Key Technical Points
6.1 API Response Difference Adaptation
PC API (/ajax/statuses/mymblog):
Mobile API (/api/container/getIndex):
Adaptation Solution:
6.2 Long Text Retrieval Optimization
Text returned by the mobile API is truncated; a long-text API call is needed:
6.3 Performance Optimization: Session Reuse
Problem: Each data source requires visitor authentication, resulting in too many requests
Before optimization (2 data sources):
- Data source 1: 1 auth + 1 list + 1 long-text auth + 1 long-text = 4 requests
- Data source 2: 1 auth + 1 list + 1 long-text auth + 1 long-text = 4 requests
- Total: 8 requests, 4 authentications
Optimization Solution:
After optimization (2 data sources):
- Data source 1: 1 auth + 1 list + 1 long-text (reused client) = 3 requests
- Data source 2: 1 auth + 1 list + 1 long-text (reused client) = 3 requests
- Total: 6 requests, 2 authentications
6.4 Regex Adaptation
Problem: After stripping HTML from the mobile API, the format of supertopic tags changes, causing regex matching to fail
Original HTML:
After cleanup:
Solution: Modify the regex from relying on a specific format to keyword matching
7. Configuration Design
Logic:
cookies.enabled = trueand Cookie is filled in → Use PC APIcookies.enabled = falseor Cookie is empty → Automatically use the Cookie-free scheme
8. Error Handling
9. Results
Before optimization:
- Users needed to manually fill in Cookies
- Plugin stopped working after Cookie expiration
- Only PC API was supported
After optimization:
- Zero-configuration, ready to use out of the box
- Automatically obtains temporary visitor credentials
- Dual-scheme automatic switching
- 25% performance improvement (2 fewer authentication requests)
Log Example:
Summary
Key Takeaways
- JSONP response handling: Use regex to extract JSON and manually inject Cookies
- Double Submit CSRF: Read the token from the Cookie and write it into the Header
- Session reuse: Avoid repeated authentication to reduce network requests
- HTML cleanup: Preserve newlines to improve text readability
- Error handling: Implement retry mechanisms and graceful degradation
Applicable Scenarios
This solution is suitable for:
- Automation tools that need to access public Weibo data
- Applications that don't want users to manually provide Cookies
- Crawler services that need to run stably over the long term
Notes
- Comply with protocols: Strictly follow robots.txt and terms of service
- Rate limiting: Avoid high-frequency requests that trigger risk control
- User-Agent: Use a real browser fingerprint
- Error handling: Implement proper exception catching and fallback strategies
References
- Weibo Mobile API Documentation
- OWASP CSRF Defense
- AstrBot Plugin Development Documentation
- nonebot2 Sky: Children of the Light Daily Task and Event Query Plugin
Project Repository: GitHub - Sky Daily Plugin
The complete code for this article has been open-sourced. Stars and PRs are welcome. If you have any questions, please open an Issue for discussion.