背景

在准备雅思阅读的过程中，及时记录不认识的生词是非常重要的。由于纸笔的效率太低，我更喜欢利用电脑将生词记录在文本文件中。但是在记录的过程中，发现了一个很大的痛点：复制粘贴单词需要频繁的切换软件界面。这样会影响专注程度,减慢学习效率。为了解决这个问题，我写了三个python脚本去除页面切换，并且对收集的单词做清洗和规范化处理，最后可进行批量翻译。

做个简单介绍：

复制粘贴脚本。运行之后会在后台监听粘贴板，当我们选中并复制单词之后，自动将复制的单词写入脚本中设置好的文档中。
整理脚本。通过将第一步复制的单词统一大小写，去重，去掉空行，实现规范化处理。
批量翻译脚本。对第二步的单词进行批量翻译，翻译结果写入单词的本行。

使用（运行之前仔细看注释！！！）

复制粘贴

运行脚本之前，修改脚本中的写入文本路径，如果路径中没有命名的文件，代码会自己创建该文件。

import pyperclip
import time

# 获取剪贴板内容
clipboard_content = pyperclip.paste()

while True:
    # 读取新的剪贴板内容
    new_clipboard_content = pyperclip.paste()

    # 检查新的剪贴板内容是否与上一次相同，如果不同则写入文件
    if new_clipboard_content != clipboard_content:
        clipboard_content = new_clipboard_content  # 更新剪贴板内容

        # 写入新的内容到文件中   这个路径用的时候改成自己的
        with open('D:\\Files\\python_projects\\copywords\\copy.txt', 'a') as file:
            file.write('\n' + clipboard_content)  # 写入剪贴板内容到下一行的行首

    # 1s后再次检查剪贴板内容
    time.sleep(0.5)

整理

注意，整理之后的内容会写入新的文档中，依然是跟上一步一样，设一个文件路径。

from collections import OrderedDict
import enchant

# 读取原始文件内容
input_file_path = r'D:\Files\python_projects\copywords\copyWords.txt'
output_file_path = r'D:\Files\python_projects\copywords\words.txt'

with open(input_file_path, 'r', encoding='utf-8') as file:
    lines = file.readlines()

# 去重、转换为小写，并进行拼写检查
unique_words = OrderedDict()
spell_checker = enchant.Dict("en_US")  # 使用英语字典进行拼写检查

for line in lines:
    word = line.strip().lower()
    if word and word not in unique_words and spell_checker.check(word):
        unique_words[word] = True

# 写入处理后的内容到新文件
with open(output_file_path, 'w', encoding='utf-8') as output_file:
    output_file.write('\n'.join(unique_words.keys()))

批量翻译

这一步需要使用者有百度翻译的api,没有的话需要注册一个，免费的。注册之后，可填写到脚本中。

import requests
import random
import json
from hashlib import md5

# 百度翻译API密钥
appid = '202106240008'  # 填写自己的appid
appkey = '1QLm6fOgk5bH'  # 填写自己的appkey

# 输入文件路径
file_path = r'D:\Files\python_projects\copywords\copyAnyWords\itlesReading\3_4chapter'

# 生成md5哈希值
def make_md5(s, encoding='utf-8'):
    return md5(s.encode(encoding)).hexdigest()

# 计算sign参数
def calculate_sign(appid, query, salt, appkey):
    return make_md5(appid + query + str(salt) + appkey)

# 使用百度翻译API进行翻译
def translate_word(word):
    endpoint = 'http://api.fanyi.baidu.com'
    path = '/api/trans/vip/translate'
    url = endpoint + path
    from_lang = 'en'
    to_lang = 'zh'
    salt = random.randint(32768, 65536)
    sign = calculate_sign(appid, word, salt, appkey)
    headers = {'Content-Type': 'application/x-www-form-urlencoded'}
    payload = {'appid': appid, 'q': word, 'from': from_lang, 'to': to_lang, 'salt': salt, 'sign': sign}
    response = requests.post(url, data=payload, headers=headers)
    translation = response.json()
    if 'trans_result' in translation:
        dst = translation['trans_result'][0]['dst']
        return dst
    else:
        return ''

# 读取并翻译txt文档中的每个单词
def translate_file(file_path):
    translated_content = []
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            word = line.strip()
            translation = translate_word(word)
            if translation:
                translated_content.append(f'{word}  {translation}\n')
            else:
                translated_content.append(f'{word}\n')
    return translated_content

# 将翻译结果写回原文档
def write_back_translation(file_path, translated_content):
    with open(file_path, 'w', encoding='utf-8') as file:
        file.writelines(translated_content)

# 主函数
def main():
    translated_content = translate_file(file_path)
    write_back_translation(file_path, translated_content)
    print("翻译完成并已写回原文档！")

if __name__ == '__main__':
    main()

注意事项

上面这三个脚本或许会出现三个地方的bug：

使用者或许没有相关的库，pip install之后重新运行便可解决。
要注意文件路径的写法，这个或许是最容易出错的。
使用第三个翻译脚本需要有自己的百度翻译密钥，要不然是翻译不了的。

新增短语提取脚本

有时候看见文章一些短语，想积累一下，也是可以的。

支持两种短语形式:

单词之间以“—”连接；
单词之间是空格连接。

在使用短语提取的时候，可以使用复制粘贴脚本将生词和短语都拷贝到同一文件中，然后使用下面的短语提取脚本，运行之后，短语会被提取到新的短语文件中，之前的文件中的短语会被删除，只剩下单词。这个短语提取可以和上面的三个脚本一起联合使用，提升效率。

import re
import os

# 读取文件中的内容
input_file_path = 'D:\Files\python_projects\copywords\copy.txt'
with open(input_file_path, 'r') as file:
    content = file.readlines()

# 使用正则表达式提取短语
phrases = set()  # 使用集合来存储短语，确保不重复
for line in content:
    line = line.strip()  # 移除行首行尾的空白字符
    if '-' in line:
        phrases.add(line)  # 将包含 "-" 符号的行添加到短语集合中
    elif ' ' in line:
        phrases.add(line)  # 将包含空格的行添加到短语集合中

# 写入提取到的短语到新文件中
output_file_path = 'D:\\Files\\python_projects\\copywords\\phrase.txt'
with open(output_file_path, 'a') as output_file:  # 使用追加模式打开文件
    for phrase in phrases:
        output_file.write(phrase + '\n')  # 将短语追加到文件末尾

# 从原始文件中删除已提取的短语
with open(input_file_path, 'w') as file:
    for line in content:
        line = line.strip()
        if line not in phrases:
            file.write(line + '\n')