[新手上路]批处理新手入门导读[视频教程]批处理基础视频教程[视频教程]VBS基础视频教程[批处理精品]批处理版照片整理器
[批处理精品]纯批处理备份&还原驱动[批处理精品]CMD命令50条不能说的秘密[在线下载]第三方命令行工具[在线帮助]VBScript / JScript 在线参考
返回列表 发帖

[文本处理] 【已解决】批处理提取文本中两个指定行之间的内容

一个文件夹有许多个文本,想批量地提取一些文本内容。每个文本都比较大10M左右。
要求:(第一、第二主要说明任务,不表示顺序。)
第一,提取“UNITED STATES OF AMERICA (US)
PATENT (Number; Kind; Date): United States of America (US) ”[包含]和下一个“PATENT (Number; Kind; Date): ”[不包含]之间的。

第二,提取含有下面内容的行:
BASIC-PATENT:
PATENT (Number; Kind; Date): European Patent Office (EP)
PATENT (Number; Kind; Date): United States of America (US)
PATENT (Number; Kind; Date): World Intellectual Property Organisation (WO)
PATENT (Number; Kind; Date): Canada (CA)
PATENT (Number; Kind; Date): People's Republic of China (CN)
PATENT (Number; Kind; Date): Japan (JP)
PATENT (Number; Kind; Date): Republic of Korea (KR)
PATENT (Number; Kind; Date): United Kingdom (GB)
PATENT (Number; Kind; Date): Germany (DE)
PATENT (Number; Kind; Date): France (FR)
PATENT (Number; Kind; Date): Russian Federation (RU)

文本部分示例如下:(一个文本可能有一百个这样的段落(以“BASIC-PATENT:”分隔的。)

BASIC-PATENT:
European Patent Office (EP) 277,004; A1; August 03, 1988

PATENT FAMILY
Number of Patents: 276
TAIWAN (TW)

PATENT (Number; Kind; Date): Taiwan (TW) 464,511; B; November 21, 2001

TITLE: Pressure-sensitive adhesive composition suitable for use in a transdermal drug delivery system and preparation method therefor
INVENTOR: MIRANDA JESUS, United States of America (US); SABLOTSKY STEVEN, United States of America (US)
PRIORITY (Number; Kind; Date):
United States of America (US) 1994-178558; A; January 07, 1994
PATENT ASSIGNEE: NOVEN PHARMA, United States of America (US)
APPLICATION (Number; Kind; Date): Taiwan (TW) 19958410044; A; January 19, 1995
INT-CL: A61K9/00 (Section A, Class 61, Sub-class K, Group 9, Sub-group 00)

A61K31/74 (Section A, Class 61, Sub-class K, Group 31, Sub-group 74)


ABST:
A blend of at least three polymers, including a soluble polyvinylpyrrolidone, in combination with a drug provides a pressure-sensitive adhesive composition for a transdermal drug delivery system in which the drug is delivered from the pressure-sensitive adhesive composition and through dermis when the pressure-sensitive adhesive composition is in contact with human skin. Soluble polyvinylpyrrolidone increases the solubility of drug without negatively affecting the adhesivity of the composition or the rate of drug delivery from the pressure-sensitive adhesive composition.

UNITED STATES OF AMERICA (US)

PATENT (Number; Kind; Date): United States of America (US) 5,958,446; A; September 28, 1999

TITLE: SOLUBILITY PARAMETER BASED DRUG DELIVERY SYSTEM AND METHOD FOR ALTERING DRUG SATURATION CONCENTRATION
INVENTOR: MIRANDA JESUS, United States of America (US); SABLOTSKY STEVEN, United States of America (US)
PRIORITY (Number; Kind; Date):
United States of America (US) 1995-433754; A; May 04, 1995
United States of America (US) 1991-722342; A1; June 27, 1991
United States of America (US) 1989-295847; A2; January 11, 1989
United States of America (US) 1988-164482; A2; March 04, 1988
United States of America (US) 1991-671709; A2; April 02, 1991
World Intellectual Property Organisation (WO) 1990US9001750; W; March 28, 1990
PATENT ASSIGNEE: NOVEN PHARMA, United States of America (US)
APPLICATION (Number; Kind; Date): United States of America (US) 1995433754; A; May 04, 1995
INT-CL: A61F13/02 (Section A, Class 61, Sub-class F, Group 13, Sub-group 02)


NAT-CL: 424448; X426449
EURO-CL: A61F13/02M; A61K9/70E; A61L15/18; A61L15/58; A61L15/58M+C08L33/00; A61L15/58M+C08L31/04
DERWENT NUMBER: C1989-106432; C1990-225696; C1991-230072; C1991-310376; C1993-036110; C1994-109332; C1995-044946; C1997-558092
CHEMICAL ABSTRACT NUMBER: 111(10)084137W; 114(04)030158X; 116(10)091389M; 118(16)154566F; 120(26)331144F; 128(15)184708C
ABST:
The method of adjusting the saturation concentration of a drug in a transdermal composition for application to the dermis, which comprises mixing polymers having differing solubility parameters, so as to modulate the delivery of the drug. This results in the ability to achieve a predetermined permeation rate of the drug into and through the dermis. In one embodiment, a dermal composition of the present invention comprises a drug, an acrylate polymer, and a polysiloxane. The dermal compositions can be produced by a variety of methods known in the preparation of drug-containing adhesive preparations, including the mixing of the polymers, drug, and additional ingredients in solution, followed by removal of the processing solvents. The method and composition of this invention permit selectable loading of the drug into the dermal formulation and adjustment of the delivery rate of the drug from the composition through the dermis, while maintaining acceptable shear, tack, and peel adhesive properties.

PATENT (Number; Kind; Date): United States of America (US) 5,300,291; A; April 05, 1994


谢谢您!

[ 本帖最后由 lxh623 于 2009-4-13 21:39 编辑 ]
1

评分人数

    • Batcher: 感谢主动给标题标注[已解决]字样PB + 2

  因为论坛会存在转码现象,有可能对原始数据做出错误的处理,建议在正文中只发部分有代表性的数据,并在附件中上传部分原始数据,以保证数据格式不发生变化,同时能节约版面。

  因为文件较大,纯批处理估计很吃力,交给sed应该是上上之选,可惜本人还没仔细看过sed的帮助,想帮你也无能为力,呼唤懂sed的达人出现。
尺有所短寸有所长,学好批处理没商量;
考虑问题复杂化,解决问题简洁化。

心在天山,身老沧州。

TOP

原帖由 namejm 于 2009-4-12 09:35 发表
  因为论坛会存在转码现象,有可能对原始数据做出错误的处理,建议在正文中只发部分有代表性的数据,并在附件中上传部分原始数据,以保证数据格式不发生变化,同时能节约版面。

  因为文件较大,纯批处理估计 ...


谢谢您!
我已经修改帖子,并上传部分txt。


我已经通过其他途径解决,谢谢这里的朋友!

[ 本帖最后由 lxh623 于 2009-4-13 21:40 编辑 ]

TOP

回复 3楼 的帖子

你是采用的CN-DOS论坛得到的方法吧?能否把最终代码贴出来给大家讨论一下,也许有人能给出更好的代码呢^_^
我帮忙写的代码不需要付钱。如果一定要给,请在微信群或QQ群发给大家吧。
【微信公众号、微信群、QQ群】http://bbs.bathome.net/thread-3473-1-1.html
【支持批处理之家,加入VIP会员!】http://bbs.bathome.net/thread-67716-1-1.html

TOP

批处理的
@echo off&setlocal enabledelayedexpansion

set ho=UNITED STATES OF AMERICA (US)
set en=PATENT (Number; Kind; Date):
set bg=United States of America (US)
set li2=BASIC-PATENT:


set li10=European Patent Office (EP)
set li11=Russian Federation (RU)
set li12=World Intellectual Property Organisation (WO)
set li13=Canada (CA)
set li14=People's Republic of China (CN)
set li15=Japan (JP)
set li16=Republic of Korea (KR)
set li17=United Kingdom (GB)
set li18=Germany (DE)
set li19=France (FR)
set li20=United States of America (US)
::国家判断只对比了前面10个字节,应该可以了的。

for /f %%a in ('dir /b *.txt') do (
set "ver="
(for /f "delims=" %%d in (%%a) do (set "str=%%d"&call :sub))>%%~na_dest.txt

start %%~na_dest.txt

)
echo 处理完成
pause
goto :eof

:sub
if defined ver (echo.!str!
if not "!str:%en%=!"=="!str!" set ver=
goto :eof)
if not "!str:%li2%=!"=="!str!" echo !str!&goto :eof


if "!str:%en%=!"=="!str!" (
if "!str!"=="!ho!" (set vho=y&goto :eof) else (set vho=)
goto :eof
) else (
set "coc=!str:*%en% =!"
if defined vho (
if "!bg:~0,10!"=="!coc:~0,10!" (set vho=&set ver=y&echo !ho!&echo.!str!&goto :eof)
)
for /l %%a in (10,1,20) do (if "!li%%a:~0,10!"=="!coc:~0,10!" echo !str!&goto :eof)
)
goto :eof


javascript的

File_Path=WScript.arguments(0);
var sss,arr="",osss="";
var fso=new ActiveXObject("scripting.filesystemobject");
var fl=fso.opentextfile(File_Path,1);sss=fl.readall();
fl=fso.opentextfile(File_Path+"_转换后.txt",2,true);
var re=/(?:^|\r\n) ?BASIC-PATENT:|\r\nUNITED STATES OF AMERICA \(US\)\s*PATENT \(Number; Kind; Date\): United States of America \(US\)[\s\S]*?\r\nPATENT \(Number; Kind; Date\)\:.*|PATENT \(Number; Kind; Date\): European Patent Office \(EP\).*|PATENT \(Number; Kind; Date\): United States of America \(US\).*|PATENT \(Number; Kind; Date\): World Intellectual Property Organisation \(WO\).*|PATENT \(Number; Kind; Date\): Canada \(CA\).*|PATENT \(Number; Kind; Date\): People's Republic of China \(CN\).*|PATENT \(Number; Kind; Date\): Japan \(JP\).*|PATENT \(Number; Kind; Date\): Republic of Korea \(KR\).*|PATENT \(Number; Kind; Date\): United Kingdom \(GB\).*|PATENT \(Number; Kind; Date\): Germany \(DE\).*|PATENT \(Number; Kind; Date\): France \(FR\).*|PATENT \(Number; Kind; Date\): Russian Federation \(RU\).*\:.*/g
while ((arr=re.exec(sss))!=null)osss=osss+arr+"\r\n";
fl.write(osss);
WScript.echo("ok")

TOP

返回列表