A Xiaohongshu share link stopped parsing. Digging into it, I discovered the anti-scraping measures were far more complex than I expected. After trying several approaches, the final solution turned out to be surprisingly simple.
The Beginning
I have a Xiaohongshu link parser that extracts titles, authors, images, and other information from notes. One day someone reported that a share link failed to parse:
Using aiohttp to make the request and following redirects, it ended up at /404?errorCode=-510001. Opening the same link in a browser worked perfectly fine.
What was going on?
Short Link Redirects and xsec_token
Tracing the redirect chain with curl:
The short link first 302s to the note page, but the note page then 302s to a 404. The key parameter is xsec_token in the URL—this is Xiaohongshu's anti-hotlinking token, and the server validates its association with the current session.
I observed an interesting behavior in the browser: In incognito mode, the first visit also returns 404, but refreshing once makes it work normally. Comparing the two requests, the second one had an additional batch of cookies:
These cookies aren't issued by the server via Set-Cookie—they're generated by JavaScript on the page. A pure HTTP client can't obtain them.
Trying a Different UA
Since the desktop version requires JS-generated cookies to pass validation, I took a different approach and disguised as a mobile device:
It returned HTTP 200 directly with the complete page, no extra cookies needed.
This makes sense when you think about it. The WebView inside the app may not be able to run the full anti-scraping JS, so mobile share links (app_platform=ios) take a more lenient path that doesn't validate xsec-related cookies.
The parsing succeeded, but all images had watermarks.
The URLs looked like this:
!h5_1080jpg is an image processing directive—the CDN renders the watermark. Removing the suffix doesn't help either; signature validation returns a 403. The watermark is part of the image processing pipeline, not added by the client.
Brute-Forcing the API Protocol
Since the mobile page has watermarks, I tried calling the API directly to get the raw data.
I found an open-source project RedCrack that implements Xiaohongshu's Web API protocol: it fetches note JSON through the /api/sns/web/v1/feed endpoint on edith.xiaohongshu.com, and the returned image URLs have no watermarks.
Sounds great. But to make this work, you need to pass several hurdles:
Cookie Generation Chain
Request Signing
Each API request also requires 5 signature headers:
| Header | Generation Method |
|---|---|
x-s | MD5(url+body) → XOR → Base58 encoding, wrapped in custom Base64 |
x-s-common | ARC4 encrypt fingerprint subset → custom Base64 |
x-b3-traceid | 16-character random hex |
x-xray-traceid | Timestamp left-shift + sequence number + random number |
x-t | Current millisecond timestamp |
The core of x-s is _encrypt_x3, which corresponds to the output of window.mnsv2() in the browser. This function itself runs inside a JSVMP (JS Virtual Machine Protection)—it's not plain JS that you can read directly.
Hitting the Wall
I extracted RedCrack's encryption logic into a standalone module xhs_encrypt_helper.py (about 325 lines) and wrote xhs_api_client.py for complete session initialization and API calls.
The cookie generation chain actually worked—a1, webId, websectiga, gid, web_session all obtained. But when calling the feed endpoint:
461 is Xiaohongshu's "abnormal access" status code, accompanied by Verifytype: 216.
Testing further, the original RedCrack project itself also fails at the same step. The webprofile endpoint returns 471, and base64-decoding Verifymsg reveals:
Analyzing the live vendor-dynamic.js, ARTIFACT_VERSION changed from 4.83.1 to 6.3.0. After updating the version number, webprofile recovered, but the feed endpoint still returned 461.
The root cause: the JSVMP bytecode for mnsv2 had been updated server-side. Xiaohongshu periodically rotates the VM instruction set, and the _encrypt_x3 function from previous reverse engineering produces signatures that are no longer accepted.
The API route was a dead end.
CDN URL Rewriting
Looking back at the mobile HTML approach. Images have watermarks, but let's break down the URL carefully:
202604070308 is a timestamp, the hash after it is the anti-hotlinking signature, and !h5_1080jpg is the image processing directive (resize + watermark).
Then I discovered that Xiaohongshu has another CDN domain sns-img-qc.xhscdn.com that serves original images directly by image ID—no signature required, no watermark:
I tried several combinations:
| URL Pattern | Status | Watermark | Requires Signature |
|---|---|---|---|
sns-webpic-qc.xhscdn.com/DATE/SIGN/ID!h5_1080jpg | 200 | Yes | Yes |
sns-webpic-qc.xhscdn.com/DATE/SIGN/ID (suffix removed) | 403 | - | - |
sns-img-qc.xhscdn.com/ID | 200 | No | No |
ci.xiaohongshu.com/ID | 200 | No | No |
So the final solution is quite simple. Extract image URLs from the carousel in the mobile HTML, parse out the IMAGE_ID, and swap the domain:
All images obtained in watermark-free versions, and all directly accessible.
What Xiaohongshu's Anti-Scraping Looks Like
After all this tinkering, I can roughly map out Xiaohongshu's defense layers:
| Layer | Mechanism | Description |
|---|---|---|
| 1 | xsec_token | URL-level anti-hotlinking token, bound to session |
| 2 | JS Cookie generation | a1/webId/websectiga/gid etc. generated by JS runtime |
| 3 | Browser fingerprinting | 80+ field device fingerprint, DES-encrypted and uploaded |
| 4 | Request signing (x-s) | Executed in JSVMP virtual machine, bytecode rotated periodically |
| 5 | TLS fingerprinting | Detects non-browser TLS ClientHello characteristics |
| 6 | Version validation | Expired ARTIFACT_VERSION results in immediate 471 |
| 7 | Behavioral analysis | Comprehensive risk control based on frequency, trajectory, timing, etc. |
Layer 4 is the real tough nut to crack. mnsv2 compiles the signing algorithm into custom bytecode and runs it inside a JS virtual machine. To reverse-engineer it, you need to:
- Find the VM interpreter in
vendor-dynamic.xxx.js - Extract the bytecode array
- Simulate execution instruction by instruction to reconstruct the algorithm logic
- Rewrite it in Python
But as long as Xiaohongshu updates the bytecode array (while keeping the VM structure unchanged), all previous reverse engineering becomes useless. Attackers have to start from scratch every time, while defenders just need to change an array. Asymmetric cost.
Comparison of Bypass Approaches
| Approach | Difficulty | Stability | Watermark |
|---|---|---|---|
| Mobile UA + CDN rewriting | Low | High | None |
| Playwright/Puppeteer running a real browser | Medium | High | None |
| Reverse-engineering JSVMP signing algorithm | Extremely high | Low (can break anytime) | None |
| Using third-party parsing API | Low | Depends on third party | None |
I went with the first option: mobile UA to fetch HTML + CDN URL rewriting. No dependency on JS execution, no need to reverse-engineer signatures, and version updates don't affect it. Unless Xiaohongshu shuts down the sns-img-qc CDN or majorly overhauls the mobile page.
Technical Details
This section is notes for myself (or whoever comes after)—feel free to skip if you're not interested.
How Cookies Are Generated
a1: hex(timestamp_ms) + 30 random characters + platform code "50000" + CRC32 checksum, truncated to the first 52 characters.
websectiga: POST /api/sec/v1/scripting to get an obfuscated JS snippet containing base64-encoded lookup tables and index arrays, decrypted at specific offsets to produce a 64-character key string.
gid: Pack 80+ browser fingerprint fields into JSON → Base64 → DES-ECB encryption (key zbp30y86) → hex, POST to the webprofile endpoint.
x-s-common: Take a fingerprint subset → JSON → ARC4 encryption (key xhswebmplfbt) → URL encoding → custom Base64 (alphabet ZmserbBoHQtNP+wOcza/...).
Version Numbers
The getArtifactInfo function in vendor-dynamic.js has the version number hardcoded, which can be extracted from the live JS using the regex artifactVersion.*?(\d+.\d+.\d+).
Code Structure
Call flow of async_xhs.py:
Disclaimer
This article only documents personal exploration and thinking during the technical learning process, intended for security research and educational purposes. All technical analysis discussed pertains to publicly accessible pages and network requests, and does not involve any unauthorized access, bulk data scraping, or commercial use. Please comply with the relevant platform's terms of service and local laws and regulations, and do not use the content herein for scenarios that infringe upon others' legitimate rights and interests.
Afterword
The biggest lesson from this whole ordeal: don't dive straight into the most complex approach from the start.
My thought process was "desktop requires cookies → so generate cookies → cookies obtained, now sign requests → signing is wrong, so reverse-engineer JSVMP"—a straight path into a dead end. The final solution was in a completely different direction: switch the UA, switch the CDN domain.
sns-webpic-qc (with signature, watermarked) and sns-img-qc (bare ID, no watermark) are probably meant for different business purposes. The former serves user browsing, while the latter is likely for internal services or native App rendering. The latter doesn't require signatures, probably because it was never intended to be publicly exposed. But the image IDs in the mobile HTML connect these two systems.
This door may close in the future. But at least today, it's still open.