[新手上路]批处理新手入门导读[视频教程]批处理基础视频教程[视频教程]VBS基础视频教程[批处理精品]批处理版照片整理器
[批处理精品]纯批处理备份&还原驱动[批处理精品]CMD命令50条不能说的秘密[在线下载]第三方命令行工具[在线帮助]VBScript / JScript 在线参考
返回列表 发帖
回复 14# hfxiang

TOP

回复 11# 77七


    谢谢

TOP

回复 14# hfxiang


    谢谢

TOP

回复 13# qixiaobin0715
分割后文件里有不确定乱码 能解决吗

    Module4 unit1
Doctor: How can I help you?
Daming: I fell ill. I鈥檝e got a stomach ache and my head hurts.
Doctor: How long have you been like this?
Daming: Since Friday. I've been ill for about three days!
Doctor: I see. Have you caught a cold?
Daming: I don't think so.
Doctor: Let me take your temperature鈥mm, there's no fever. What kind of food do you eat?
Daming: Usually fast food.
Doctor: Do you have breakfast?
Daming: No, not usually.
Doctor: 鈥淭hat's the problem! Fast food and no breakfast.鈥?That's why you've got a stomach ache.
Daming: What about the headache?
Doctor: Do you do any exercise?
Daming: Not really. I haven't done much exercise since I got my computer last year.
Doctor: 鈥淵ou spend too much time in front of the computer.鈥?It can be very harmful to your health.
Daming: OK, so what should I do?
Doctor: Well, don't worry. It's not serious. First, stop eating fast food and have breakfast every day. Second, get some exercise, such as running. And I'll give you some medicine. Take it three times a day.
Daming: Thank you, doctor.

====

Module3 unit2
鈥淪cientists think that there has been life on the earth for hundreds of millions of years.鈥?However, we have not found life on any other planets yet.
The earth is a planet and it goes around the sun. Seven other planets also go around the sun. None of them has an environment like that of the earth, so scientists do not think they will find life on them. The sun and its planets are called the solar system, and our solar system is a small part of a much larger group of stars and planets, called the Galaxy or the Milky Way. There are billions of stars in the Galaxy, and our sun is only one of them.
Scientists have also discovered many other galaxies in the universe. They are very far away and their light has to travel for many years to reach us. So how large is the universe? It is impossible to imagine.
Scientists have sent spaceships to the planet Mars to take photos. They have even sent spaceships to travel outside the solar system. However, no spaceship has travelled far enough to reach other stars in our Galaxy.
Scientists have always asked the questions: with so many stars in the universe, are we alone, or is there life out there in space? Have there been visitors to the earth from other planets? Why has no one communicated with us? We do not know the answers... yet.



=====

Module3 unit1
Daming: Hi, Tony. What are you up to?
Tony: Hi Daming. I've just made a model spaceship for our school project.
Daming: I haven't started yet because I'm not sure how to make it. Can you help me?
Tony: Sure, no problem. Have you heard the latest news? Scientists have sent a spaceship to Mars. The journey has taken several months.
Daming: Has it arrived yet?
Tony: Yes, it has arrived already. That's why it's on the news.
Daming: So have they discovered life on the Mars?
Tony: No, they haven't yet.
Daming: Are there any astronauts in the spaceship?
Tony: No, there aren't.
Daming: 鈥淲hy not? Astronauts have already been to the moon.鈥?
Tony: Yes, but no one has been to Mars yet, because Mars is very far away, much farther than the moon. Lots of scientists are working hard in order to send astronauts to Mars one day.
Daming: That's interesting! How can I get information on space travel?
Tony: You can go online to search for information.
Daming: I will. Thank you, Tony!

TOP

回复 13# qixiaobin0715


    分割后的TXT文件,后面还有进一步处理,试了很久,发现 只有UTF-8才可以 没有乱码 谢谢 帮我看看

TOP

一。lz可先用记事本将原文件存为ANSI编码
二。以下批处理脚本代码亦存为ANSI编码
  1. @set @v=1 /*
  2. @echo off
  3. set "tF=" &set/p "tF=原文件:"
  4. if not defined tF exit/b
  5. (cscript.exe -e:jscript "%~f0" %tF%)
  6. exit/b
  7. */
  8. var v=WScript.arguments;
  9. var fso=new ActiveXObject('scripting.filesystemobject');
  10. var fr=fso.opentextfile(v(0));
  11. var alllines=fr.readall().split('\r\n'); fr.close();
  12. var n, nL=alllines.length, outF='';
  13. for (n=0; n<nL; ++n)
  14. if (alllines[n].indexOf('★') != -1) {
  15. if (outF != '') fw.close();
  16. outF=alllines[n].replace(/[★\?]/g, '');
  17. outF+='.txt';
  18. fw=fso.opentextfile(outF, 2, true);
  19. }
  20. else fw.write(alllines[n]+'\r\n');
  21. WSH.quit(0);
复制代码

TOP

回复 21# aloha20200628


    分割后的TXT文件,后面还有进一步处理,试了很久,
    只有UTF-8编辑才可以

谢谢

TOP

将源文件和批处理文件统一UTF-8编码:
  1. @echo off &@cls&chcp>nul 65001
  2. findstr /n /rb "Module[0-9]*.unit[0-9]" 1.txt>1.log
  3. for /f "delims=:" %%a in (1.log) do set _%%a=true
  4. del 1.log
  5. for /f "tokens=1* delims=:" %%i in ('findstr /n .* 1.txt') do (
  6.     if defined _%%i set "filename=%%j.txt"
  7.     set "str=%%j"
  8.     setlocal enabledelayedexpansion
  9.     echo,!str!>>!filename!
  10.     endlocal
  11. )
  12. pause
复制代码

TOP

与lz分享一下我的调试过程》
一。系统环境是win8.1简中版
二。复制lz的原文到记事本,用ANSI编码存盘为a.txt
三。本人的批处理脚本代码用记事本亦选ANSI编码存盘a.cmd
四。a.txt与a.cmd在同一目录
五。运行a.cmd,拖入或键入a.txt
六。结果是在a.txt目录中生成4个*.txt文件,完好复刻lz的需求效果(原文中的!...!段落不会丢失)。
      请问lz的调试方法与上述有何不同?

TOP

回复 24# aloha20200628
楼主的需求是,分割后的文件编码为UTF-8

TOP

回复 22# qd2024
后续还需要如何处理?

TOP

本帖最后由 hfxiang 于 2023-2-5 12:30 编辑

回复 10# qd2024

把Word文档以GB2312编码另存为“最新八年级外研版英语下册课文.txt”,经Windows10下反复测试,如下gawk( http://bcn.bathome.net/tool/4.1.3/gawk.exe )脚本能胜任(无乱码):
  1. gawk -vRS="Module[0-9]+ unit[0-9]+" "F_n{print F_n\"\n\"$0>F_n\".txt\"}{F_n=RT}" 最新八年级外研版英语下册课文.txt
复制代码

TOP

powershell 直接从word文档导出txt 这里档名为 a.docx
  1. <# : batch portion (begins PowerShell multi-line comment block)
  2. @echo off & setlocal
  3. powershell -noprofile -NoLogo "iex (${%~f0} | out-string)"
  4. pause
  5. exit
  6. #>
  7. $word = New-Object -ComObject Word.Application
  8. $file = (ls a.docx).FullName
  9. $doc = $word.Documents.Open($file)
  10. $text = $doc.Content.Text
  11. $pattern =[regex] '(?i)(Module\d+\s+unit\d+)[\r\n]*(.+?)(?=Module\d+\s+unit\d+|$)'
  12. $paragraphs = [regex]::matches($text,$pattern)
  13. $doc.Close()
  14. $word.Quit()
  15. $paragraphs.ForEach({[IO.File]::WriteAllText( $_.Groups[1].Value+ '.txt',$_.Groups[2].Value,[Text.Encoding]::Default)})
复制代码

TOP

回复 26# qixiaobin0715


     我用另外的代码 给单词加中文 另一个要求U8

现在可以了

TOP

回复 28# terse


    谢谢 我测试一下 感谢

TOP

返回列表