我有两个utf-8编码的文本文件a.txt和b.txt,是收集的英文单词和短语,每行一个,想要实现查找这两个文本文件中相同的单词和词组(不区分单词大小写),并输出到新建的文本C.txt中。
在网上搜到这么一段批处理代码:- @echo off&cd/d "%~dp0"
- echo 请等候...
-
- for /f "usebackq delims=" %%i in ("a.txt")do (
- for %%j in (%%i)do set %%j=yes)
- for /f "usebackq delims=" %%i in ("b.txt")do (
- for %%j in (%%i)do if defined %%j echo %%j>>"c.txt")
- set/p=处理完成,正在退出... >nul
- ping /n 3 127.1>nul
复制代码 但在处理文本时出现了好几个问题,比如:
a.txt内容:
aerobic
aerobic activity
aerobic-arrest
aerobic bacteria
affinity
affinity absorbent
affinity adsorption
affinity attraction
affinity banding agent
b.txt内容:
aerobic
aerobic activity
aerobic-arrest
aerobic bacteria
affinity
affinity absorbent
affinity-adsorption
affinity attraction
affinity binding
affinity banding agent
affinity choline transport
affinity-coefficient
affinity coelectrophoresis
affinity column
affinity constant
得到的c.txt却是:
aerobic
aerobic
activity
aerobic-arrest
aerobic
bacteria
affinity
affinity
absorbent
affinity
attraction
affinity
affinity
banding
agent
affinity
affinity
affinity
affinity
出现的问题:
1. a文本中以空格隔开的词组,与b文本对比后,词组被拆开成单个单词存在c文本中,如affinity banding agent被拆分成三个单词;
2. a文本中单个单词如果在b文本中遇到该单词开头的带有空格的词组时,会提取词组中开头的这个单词并存在c文本中,如a中affinity去查询b文本中的词组affinity coelectrophoresis,affinity column,affinity constant等时,会将词组中的affinity全部提取出来;
3. a文本中单个单词如果在b文本中遇到该单词开头的带有-连字符词组时,命令行窗口呈现假死状态,程序停顿进行不下去。
恳请各位大神给个可靠的批处理代码解决上述的问题,多谢了! |