Background
In developing the AstrBot Douyin parsing plugin, I encountered a common but tricky issue: Server IP blocked by Douyin risk control, causing API requests to return empty data. This article records the complete process from a simple retry mechanism to the final solution using Cloudflare Workers reverse proxy.
Problem Analysis
Initial Issue
When the plugin parses Douyin links, dysk.py returns None, manifesting as:
- Videos can occasionally be parsed successfully
- Images and live photos fail to parse completely
- Logs show the API returned an empty response
Root Cause
Douyin's risk control mechanism detects the request source IP. When an abnormal request pattern is detected, it returns an empty response (HTTP 200 but the body is empty).
Solution Evolution
Phase 1: Implementing Retry Mechanism
Idea: When parsing fails, recreate the DouyinDownloader instance and retry.
Implementation Points:
- Max retries: 5
- Interval between retries: 5 seconds
- Force instance recreation (to get a new Cookie) during retry
Code Snippet:
Effect: The retry mechanism improves the success rate, but it treats the symptoms rather than the root cause. Once the IP is flagged by risk control, parsing still fails.
Phase 2: Introducing Cloudflare Workers Reverse Proxy
Idea:
- Proxy API requests through CF Workers, utilizing CF's IP pool to avoid risk control.
- Direct connection to CDN for video/image downloads to save CF traffic.
Expected Success Rate: 75-93%
Phase 3: Pitfalls During Implementation
Pitfall 1: CF Workers Response Body Empty
Phenomenon:
Cause: CF Workers' automatic gzip compression caused the response body transmission to fail.
Attempted Solutions:
- ❌ Return
fetch()result directly - Response body lost - ❌ Use
response.arrayBuffer()- Returned empty ArrayBuffer - ❌ Set
Content-Lengthheader - CF still forced compression - ✅ Base64 Encoding Transmission
- Successfully bypassed the compression issue
Pitfall 2: Cookie Not Passed Correctly
Phenomenon:
This is an empty gzip file, indicating the Douyin server returned an empty response.
Cause Analysis:
- ttwid API request successful (
Response text length: 205) - Douyin detail API request failed (
Response text length: 0) - CF Workers log shows:
Request Cookie: No Cookie
Root Cause: Although Python's requests.Session() manages Cookies, when proxying through CF Workers, Cookies were not automatically sent to CF Workers.
Solution: Manually build the Cookie request header.
Final Solution
Architecture Design
Complete Code
1. Cloudflare Workers Reverse Proxy Script
2. Python Side Key Code (dysk.py snippet)
3. Plugin Configuration File (confschema.json)
Technical Key Points Summary
1. CF Workers Auto-compression Issue
Problem: CF Workers automatically gzip compresses the response, even if the Content-Length header is set, it cannot be disabled.
Solution: Use Base64 encoding to transmit data.
Principle:
- CF's auto-compression targets text content.
- Base64 encoded data is wrapped in JSON.
- Although the JSON format response will also be compressed, it avoids the issue of lost response body.
2. Cookie Propagation Issue
Problem: requests.Session() Cookie management fails in proxy scenarios.
Solution:
- Manually extract
Set-Cookiefrom response headers. - Manually build
Cookierequest header.
Key Code:
Performance and Success Rate
Test Results
| Scenario | Without CF Proxy | With CF Proxy |
|---|---|---|
| Video Parsing | 60-70% | 95%+ |
| Image Parsing | 0-10% | 95%+ |
| Live Photo Parsing | 0-10% | 95%+ |
Advantages
- High Success Rate: Utilizes CF's IP pool to avoid single IP risk control.
- Save Traffic: Only proxies API requests; downloads connect directly to CDN.
- Free Quota: CF Workers free tier offers 100,000 requests per day.
- Global Acceleration: CF edge nodes provide low-latency access.
Considerations
CF Workers Limits:
- Free tier: 100,000 requests/day
- CPU time limit: 10-50ms/request
- Response body size: Unlimited (but recommended < 10MB)
Base64 Encoding Overhead:
- Data size increases by about 33%
- Encoding/decoding has CPU overhead
- Negligible impact for small data (< 100KB)
Cookie Management:
- Requires manual handling of Cookie extraction and sending
- Pay attention to Cookie domain and path settings
Deployment Guide
1. Deploy CF Workers
2. Configure Plugin
In AstrBot WebUI:
- Find the Douyin parsing plugin
- Click "Configure"
- Enable "Whether to enable Cloudflare proxy"
- Fill in "Cloudflare Workers Proxy Address"
- Save and reload the plugin
3. Test
Send a Douyin link to the bot and observe the log output:
- Success: You should see the parsing result
- Failure: Check CF Workers logs and Python logs
Troubleshooting
Issue 1: Empty Response Body
Symptom: Response text length: 0
Troubleshooting Steps:
- Check
Request Cookiein CF Workers logs. - Confirm if Cookie contains
ttwidandmsToken. - Check if the Python side correctly built the Cookie header.
Issue 2: Base64 Decode Failed
Symptom: JSON parse failed: Expecting value
Troubleshooting Steps:
- Check if response format is
{"data": "...", "encoding": "base64"}. - Confirm if CF Workers correctly encoded the response.
- Check if the response was truncated.
Issue 3: CF Workers Timeout
Symptom: Proxy request failed: timeout
Cause: Douyin API response is slow or CF Workers CPU time exceeded limit.
Solution:
- Increase timeout setting on Python side.
- Optimize CF Workers code (reduce unnecessary operations).
- Consider using CF Workers paid plan.
Summary
By using Cloudflare Workers reverse proxy, the Douyin API risk control issue was successfully solved. Key technical points:
- Base64 Encoding: Bypass CF auto-compression.
- Manual Cookie Management: Ensure authentication information is passed correctly.
- Retry Mechanism: Improve fault tolerance.
- Split Architecture: API via proxy, downloads via direct connection.
This solution is not only applicable to Douyin but can also be extended to other platforms with risk control mechanisms (such as Xiaohongshu, Bilibili, etc.).