Please manually download relevant modules and ChromeDriver before running
selenium: This is a library used for automating web browser interactions.
pip install selenium
bs4 (BeautifulSoup4): This is a library used for parsing HTML and XML documents, often used for web scraping.
pip install beautifulsoup4
Steps to install ChromeDriver are as follows:
Download ChromeDriver
First, you need to know your Chrome browser version. You can check this by selecting "Help" -> "About Google Chrome" in the Chrome browser menu. Then, go to https://sites.google.com/a/chromium.org/chromedriver/downloads to select the ChromeDriver version that matches your Chrome browser version.
Unzip ChromeDriver
The downloaded file is a compressed file; you need to unzip it. You can right-click the downloaded file and select "Extract".
Add ChromeDriver path to system environment variables
This is a critical step for Selenium to be able to find ChromeDriver. First, remember the path where your ChromeDriver.exe file is located.
On Windows, you can add environment variables via the following steps:
Right-click "This PC" or "Computer", then select "Properties".
Click "Advanced system settings".
Click the "Environment Variables" button in the pop-up window.
Find and select "Path" in the "System variables" area, then click "Edit".
In the new window, click "New", then enter your ChromeDriver path.
Click "OK" to save your changes.
Verify Installation
Open a new command prompt window and enter chromedriver. If the installation is successful, you should see a message saying ChromeDriver is running.
Run
Before running the code, please ensure all modules have been successfully installed and Selenium can correctly find ChromeDriver.
The code is as follows
# Import
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
import time
import random
from bs4 import BeautifulSoup
from selenium.common.exceptions import NoSuchElementException
chrome_options = webdriver.ChromeOptions()
# Headless mode
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--disable-dev-shm-usage')
# Open login page
driver = webdriver.Chrome(options=chrome_options)
driver.get('http://passport2.chaoxing.com/login?fid=&newversion=true&refer=http://i.chaoxing.com')
# Random wait time 1-2 seconds
def sleep(a=1, b=2):
time.sleep(random.uniform(a, b))
sleep(10)
# Login below
a=input("Enter account:\n")
b=input("Enter password:\n")
sjh=driver.find_element(By.XPATH, "/html/body/div[1]/div/div[1]/div[2]/form/div[1]/input")
sjh.send_keys(a)
mima=driver.find_element(By.XPATH, "/html/body/div[1]/div/div[1]/div[2]/form/div[2]/input")
mima.send_keys(b)
login=driver.find_element(By.XPATH, "/html/body/div[1]/div/div[1]/div[2]/form/div[3]/button")
login.click()
# Enter exam link
shijuan=input("Enter the exam link you want to export:\n")
driver.get(shijuan)
# Get HTML
html = driver.page_source
# Parse HTML via BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
# Find all elements containing questions
questions = soup.find_all('div', class_='Sub_tit_box')
# Open a file named "QuestionBank.txt" on drive G for writing
with open("G:\\QuestionBank.txt", "w", encoding="utf-8") as file:
for question in questions:
# Write question
file.write(question.h3.text.strip() + "\n")
# Try to find and write choices
choices = question.find_next_sibling('ul', class_='mark_letter colorDeep')
if choices:
for li in choices.find_all('li'):
file.write(li.text.strip() + "\n")
else:
file.write("No Choices for this question.\n")
# Try to find and write answer
answer = question.find_next_sibling('div', class_='mark_answer')
if answer:
correct_answer = answer.find('span', class_='colorGreen marginRight40 fl')
fill_answer = answer.find('dl', class_='mark_fill colorGreen')
if correct_answer:
file.write(correct_answer.text.strip().replace("正确答案:", "").strip() + "\n\n")
elif fill_answer:
file.write(fill_answer.dd.text.strip() + "\n\n")
else:
file.write("No Answer for this question.\n\n")
else:
file.write("No Answer for this question.\n\n")
print("Export successful! Saved path is: G:\\QuestionBank.txt")