[新手上路]批处理新手入门导读[视频教程]批处理基础视频教程[视频教程]VBS基础视频教程[批处理精品]批处理版照片整理器
[批处理精品]纯批处理备份&还原驱动[批处理精品]CMD命令50条不能说的秘密[在线下载]第三方命令行工具[在线帮助]VBScript / JScript 在线参考
返回列表 发帖

[转贴] find命令计算文本行数的原理

英语不过关的朋友可以试试Google翻译

On unix, you can use wc -l to count the number of lines in stdin. Windows doesn't come with wc, but there's a sneaky way to count the number of lines anyway:
  1. some-command-that-generates-output | find /c /v ""
复制代码
It is a special quirk of the find command that the null string is treated as never matching. The /v flag reverses the sense of the test, so now it matches everything. And the /c flag returns the count.

It's pretty convoluted, but it does work.

(Remember, I provide the occasional tip on batch file programming as a public service to those forced to endure it, not as an endorsement of batch file programming.)

Now come da history: Why does the find command say that a null string matches nothing? Mathematically, the null string is a substring of every string, so it should be that if you search for the null string, it matches everything. The reason dates back to the original MS-DOS version of find.exe, which according to the comments appears to have been written in 1982. And back then, pretty much all of MS-DOS was written in assembly language. (If you look at your old MS-DOS floppies, you'll find that find.exe is under 7KB in size.) Here is the relevant code, though I've done some editing to get rid of distractions like DBCS support.
  1.         mov     dx,st_length            ;length of the string arg.
  2.         dec     dx                      ;adjust for later use
  3.         mov     di, line_buffer
  4. lop:
  5.         inc     dx
  6.         mov     si,offset st_buffer     ;pointer to beg. of string argument
  7. comp_next_char:
  8.         lodsb
  9.         cmp     al,byte ptr [di]
  10.         jnz     no_match
  11.         dec     dx
  12.         jz      a_matchk                ; no chars left: a match!
  13.         call    next_char               ; updates di
  14.         jc      no_match                ; end of line reached
  15.         jmp     comp_next_char          ; loop if chars left in arg.
复制代码
If you're rusty on your 8086 assembly language, here's how it goes in pseudocode:
  1. int dx = st_length - 1;
  2. char *di = line_buffer;
  3. lop:
  4. dx++;
  5. char *si = st_buffer;
  6. comp_next_char:
  7. char al = *si++;
  8. if (al != *di) goto no_match;
  9. if (--dx == 0) goto a_matchk;
  10. if (!next_char(&di)) goto no_match;
  11. goto comp_next_char;
复制代码
In sort-of-C, the code looks like this:
  1. int l = st_length - 1;
  2. char *line = line_buffer;
  3. l++;
  4. char *string = st_buffer;
  5. while (*string++ == *line && --l && next_char(&line)) {}
复制代码
The weird - 1 followed by l++ is an artifact of code that I deleted, which needed the decremented value. If you prefer, you can look at the code this way:
  1. int l = st_length;
  2. char *line = line_buffer;
  3. char *string = st_buffer;
  4. while (*string++ == *line && --l && next_char(&line)) {}
复制代码
Notice that if the string length is zero, there is an integer underflow, and we end up reading off the end of the buffers. The comparison loop does stop, because we eventually hit bytes that don't match. (No virtual memory here, so there is no page fault when you run off the end of a buffer; you just keep going and reading from other parts of your data segment.)

In other words, due to an integer underflow bug, a string of length zero was treated as if it were a string of length 65536, which doesn't match anywhere in the file.

This bug couldn't be fixed, because by the time you got around to trying, there were already people who discovered this behavior and wrote batch files that relied on it. The bug became a feature.

The integer underflow was fixed, but the code is careful to treat null strings as never matching, in order to preserve existing behavior.

Exercise: Why is the loop label called lop instead of loop?

转自:http://blogs.msdn.com/b/oldnewthing/archive/2011/08/25/10200026.aspx
我帮忙写的代码不需要付钱。如果一定要给,请在微信群或QQ群发给大家吧。
【微信公众号、微信群、QQ群】http://bbs.bathome.net/thread-3473-1-1.html
【支持批处理之家,加入VIP会员!】http://bbs.bathome.net/thread-67716-1-1.html

天书,哦!

TOP

回复 1# Batcher

Batcher 是中国人吗?这么厉害。你英语几级啊?
我初中毕业怎么样才能达到你这种英语水平啊?

TOP

问问下哦~请问
Now come da history: Why does the find....

中的da是什么意思?(是the吗?)

TOP

msdn的高级货呀,英语不行,看不太懂呀

TOP

回复 5# explorer093


可以结合Google翻译来猜一猜
我帮忙写的代码不需要付钱。如果一定要给,请在微信群或QQ群发给大家吧。
【微信公众号、微信群、QQ群】http://bbs.bathome.net/thread-3473-1-1.html
【支持批处理之家,加入VIP会员!】http://bbs.bathome.net/thread-67716-1-1.html

TOP

返回列表