[新手上路]批处理新手入门导读[视频教程]批处理基础视频教程[视频教程]VBS基础视频教程[批处理精品]批处理版照片整理器
[批处理精品]纯批处理备份&还原驱动[批处理精品]CMD命令50条不能说的秘密[在线下载]第三方命令行工具[在线帮助]VBScript / JScript 在线参考
返回列表 发帖

[文本处理] 批处理怎样提取txt多个指定字符串的全部行?

文本内容如最后,以下为一组,重复30万组。
内容开头为字段名,需要提取所有文本组内CT CY CL WC C1五个字段下的全部内容,需要五个字段是对应的,因为零星一些内容可能为空。
最后效果如下:
  CT CY CL WC C1
1** ** ** ** **
2** ** ** ** **
3** ** ** ** **
.....

FN Clarivate Analytics Web of Science
VR 1.0
PT C
AU Si, D
   Cheng, SC
   Xing, RW
   Liu, C
   Wu, OY
AF Si, Dong
   Cheng, Sunny Chieh
   Xing, Ruiwen
   Liu, Chang
   Wu, Hoi Yan
GP IEEE
TI Scaling up Prediction of Psychosis by Natural Language Processing
SO 2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL
   INTELLIGENCE (ICTAI 2019)
SE Proceedings-International Conference on Tools With Artificial
   Intelligence
LA English
DT Proceedings Paper
CT 31st IEEE International Conference on Tools with Artificial Intelligence
   (ICTAI)
CY NOV 04-06, 2019
CL Portland, OR
SP IEEE, IEEE Comp Soc
DE Machine learning; Natural language processing; Text classification;
   Prediction of psychosis; Schizophrenia; Word embeddings; Convolutional
   neural networks
ID HIGH-RISK; SCHIZOPHRENIA; PREVALENCE
AB Mental health professionals currently diagnose and treat mental disorders, such as schizophrenia, mainly by analyzing the language and speech of their patients, a method that maybe improved with the usage of artificial intelligence. This study aims to use machine learning to distinguish between the speech of patients who suffer from mental disorders which cause psychosis from that of healthy individuals to improve early detection of schizophrenia. We analyzed forty interview transcripts from patients who have been diagnosed with first episode psychosis. Word embeddings and convolutional neural network were utilized for the classification of patients from healthy individuals. The preliminary test results achieved a prediction rate of 99%, which indicated that our speech classifier was able to discriminate speech in patients from healthy individuals' daily conversations. This suggested that machine learning models can learn and train upon features of natural languages to predict whether or not an individual is beginning to show the first signs of early psychosis based on their speech. This line of inquiry will contribute to the improved identification of individuals at risk for psychiatric symptoms and lead to the development of targeted therapies.
C1 [Si, Dong; Xing, Ruiwen; Liu, Chang; Wu, Hoi Yan] Univ Washington, Comp & Software Syst, Bothell, WA 98011 USA.
   [Cheng, Sunny Chieh] Univ Washington, Nursing & Healthcare Leadership, Tacoma, WA USA.
RP Si, D (corresponding author), Univ Washington, Comp & Software Syst, Bothell, WA 98011 USA.
EM dongsi@uw.edu; ccsunny@uw.edu; ruiwen@uw.edu; chang15@uw.edu;
   hoiyanwu@uw.edu
FU Graduate Research Award of Computing and Software Systems division;
   University of Washington BothellUniversity of Washington [74-0525];
   NVIDIA Corporation (Santa Clara, CA, USA)
FX This research was funded by the Graduate Research Award of Computing and
   Software Systems division and the startup fund 74-0525 of the University
   of Washington Bothell.; We gratefully acknowledge the support of NVIDIA
   Corporation (Santa Clara, CA, USA) with the donation of the GPU used for
   this research.
NR 31
TC 1
Z9 1
U1 0
U2 0
PU IEEE COMPUTER SOC
PI LOS ALAMITOS
PA 10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA
SN 1082-3409
BN 978-1-7281-3798-8
J9 PROC INT C TOOLS ART
PY 2019
BP 339
EP 347
DI 10.1109/ICTAI.2019.00055
PG 9
WC Computer Science, Artificial Intelligence; Computer Science, Theory &
   Methods
SC Computer Science
GA BP4NY
UT WOS:000553441500046
DA 2021-09-15
ER

WC和C1是不是写颠倒了?
字段与字段之间是用空格分隔吗?

TOP

回复 2# qixiaobin0715


    顺序可以随意更改;字段之间似乎没有空格,回车下一行了

TOP

我说的是同一行的CT CY CL WC C1之间。

TOP

源文件中CT CY CL C1 WC是固定顺序的吧?

TOP

回复 5# qixiaobin0715


    是的

TOP

回复 4# qixiaobin0715


    是的

TOP

本帖最后由 qixiaobin0715 于 2021-9-26 17:48 编辑

零星一些内容可能为空,是什么意思?
CT也可能为空吗?

TOP

链接:https://pan.baidu.com/s/1QX4H6uUy41_ezGPwQuVszw
提取码:1x4z

附件链接

TOP

回复 8# qixiaobin0715


链接:https://pan.baidu.com/s/1QX4H6uUy41_ezGPwQuVszw
提取码:1x4z

详情见附件 感恩大佬

TOP

本帖最后由 idwma 于 2021-9-26 22:22 编辑
  1. @echo off
  2. setlocal enabledelayedexpansion
  3. set "str=CT CY CL C1 WC"
  4. (for /f "delims=" %%a in (111.txt) do (
  5. set "strr=%%a"
  6. if defined f (
  7. set ccc=
  8. if not "!strr:~0,2!"=="  " (
  9. set f=
  10. set ccc=1
  11. )
  12. if not defined ccc (
  13. call set "!ff!=%%!ff!%%!strr:~3!"
  14. )
  15. )
  16. for %%c in (!str!) do (
  17. if "!strr:~0,2!"=="%%c" (
  18. set str=!str:%%c=!
  19. set "ff=%%c"
  20. set "!ff!=!strr:~3! "
  21. set f=1
  22. )
  23. )
  24. if defined CT if defined CY if defined CL if defined WC if defined C1 (
  25. if not defined f (
  26. set /a n+=1
  27. echo;!n!##!CT!##!CY!##!CL!##!C1!##!WC!
  28. for %%c in (CT CY CL C1 WC) do set %%c=
  29. set "str=CT CY CL C1 WC"
  30. )
  31. )
  32. ))>222.txt
  33. pause
复制代码

TOP

本帖最后由 qixiaobin0715 于 2021-9-27 13:58 编辑

回复 1# WILSONMAO
文件较大,请耐心等待:
  1. @echo off &@cls&chcp>nul 65001
  2. set var=CT CY CL WC C1
  3. setlocal enabledelayedexpansion
  4. (echo,CT,CY,CL,WC,C1
  5. for /f "tokens=1*" %%a in ('findstr /br "%var%" 2019') do (
  6.     if "%%a"=="CT" if defined _CT (
  7.         echo,"%%b","!_CY!","!_CL!","!_WC!","!_C1!"
  8.         for %%i in (%var%) do set "_%%i="
  9.     )
  10.     set "_%%a=%%b"
  11. )
  12. echo,"!_CT!","!_CY!","!_CL!","!_WC!","!_C1!"
  13. )>test.csv
  14. pause
复制代码

TOP

本帖最后由 qixiaobin0715 于 2021-9-27 12:18 编辑

回复 1# WILSONMAO
这样要准确些,并且效率提升不少:
  1. @echo off &@cls&chcp>nul 65001
  2. set var=CT CY CL WC C1
  3. findstr /br "%var%" 2019>a.txt
  4. setlocal enabledelayedexpansion
  5. (echo,CT,CY,CL,WC,C1
  6. for /f "tokens=1*" %%a in (a.txt) do (
  7.     if "%%a"=="WC" (
  8.         echo,"!_CT!","!_CY!","!_CL!","%%b","!_C1!"
  9.         for %%i in (%var%) do set "_%%i="
  10.     )
  11.     set "_%%a=%%b"
  12. ))>test.csv
  13. del a.txt
  14. pause
复制代码
csv文件可使用Excel打开。

TOP

回复 13# qixiaobin0715


    感恩大佬

TOP

返回列表