找回密码
 注册
搜索
[新手上路]批处理新手入门导读[视频教程]批处理基础视频教程[视频教程]VBS基础视频教程[批处理精品]批处理版照片整理器
[批处理精品]纯批处理备份&还原驱动[批处理精品]CMD命令50条不能说的秘密[在线下载]第三方命令行工具[在线帮助]VBScript / JScript 在线参考
查看: 21585|回复: 1

[转贴] docx2html.vbs

[复制链接]
发表于 2013-3-31 05:10:05 | 显示全部楼层 |阅读模式
代码出处: http://svn.greenstone.org/main/t ... s/bin/docx2html.vbs

用法:
  1. docx2html.vbs test.docx c:\test.docx c:\test.html
复制代码
  1. Option Explicit

  2. ' http://www.robvanderwoude.com/vbstech_automation_word.php
  3. ' http://www.nilpo.com/2008/06/windows-scripting/reading-word-documents-in-wsh/ - for grabbing just the text (cleaned of Word mark-up) from a doc(x)
  4. ' http://msdn.microsoft.com/en-us/library/3ca8tfek%28v=VS.85%29.aspx - VBScript Functions (CreateObject etc)
  5. ' http://msdn.microsoft.com/en-us/library/aa220734%28v=office.11%29.aspx - SaveAs Method. Expand "WdSaveFormat" section to see all the default filetypes Office 2003+ can save as

  6. ' Error Handling:
  7. ' http://blogs.msdn.com/b/ericlippert/archive/2004/08/19/error-handling-in-vbscript-part-one.aspx
  8. ' http://msdn.microsoft.com/en-us/library/53f3k80h%28v=VS.85%29.aspx


  9. ' To Do:
  10. ' +1. error output on bad input to this file. And commit.
  11. ' +1b. Active X error msg when trying to convert normal *.doc: only when windows scripting is on and Word not installed.
  12. ' +1c. Make docx accepted by default as well. Changed WordPlugin.
  13. ' 2. Try converting from other office types (xlsx, pptx) to html. They may use other constants for conversion filetypes
  14. ' 3. gsConvert.pl's any_to_txt can be implemented for docx by getting all the text contents. Use a separate subroutine for this. Or use wdFormatUnicodeText as outputformat.
  15. ' 4. Try out this script on Windows 7 to see whether WSH is active by default, as it is on XP and Vista.
  16. ' 5. What kind of error occurs if any when user tries to convert docx on a machine with an old version of Word (pre-docx/pre-Word 2007)?
  17. ' 6. Ask Dr Bainbridge whether this script can or shouldn't replace word2html, since this can launch all versions of word (not just 2007) I think.
  18. ' Unless some commands have changed? Including for other Office apps, in which case word2html would remain the correct program to use for those cases.


  19. ' gsConvert.pl expects error output to go to the console's STDERR
  20. ' for which we need to launch this vbs with "CScript //Nologo" '(cannot use WScript if using StdErr
  21. ' and //Nologo is needed to repress Microsoft logo text output which messes up error reporting)
  22. ' http://www.devguru.com/technologies/wsh/quickref/wscript_StdErr.html
  23. Dim objStdErr, args
  24. Set objStdErr = WScript.StdErr

  25. args = WScript.Arguments.Count
  26. If args < 2 then
  27.   'WScript.Echo Usage: args.vbs argument [input docx path] [output html path]
  28.   objStdErr.Write ("ERROR. Usage: CScript //Nologo " & WScript.ScriptName & " [input office doc path] [output html path]" & vbCrLf)
  29.   WScript.Quit
  30. end If

  31. ' Now run the conversion subroutine
  32. Doc2HTML WScript.Arguments.Item(0),WScript.Arguments.Item(1)
  33.         ' In terminal, run as: > docx2html.vbs C:\fullpath\to\input.docx C:\fullpath\to\output.html
  34.         ' In terminal, run as: > CScript //Nologo docx2html.vbs C:\fullpath\to\input.docx C:\fullpath\to\output.html
  35.         ' if you want echoed error output to go to console (instead of creating a popup) and to avoid 2 lines of MS logo.
  36.         ' Will be using WScript.StdErr object to make error output go to stderr of CScript console (can't launch with WScript).
  37.         ' http://www.devguru.com/technologies/wsh/quickref/wscript_StdErr.html


  38. Sub Doc2HTML( inFile, outHTML )
  39. ' This subroutine opens a Word document,
  40. ' then saves it as HTML, and closes Word.
  41. ' If the HTML file exists, it is overwritten.
  42. ' If Word was already active, the subroutine
  43. ' will leave the other document(s) alone and
  44. ' close only its "own" document.
  45. '
  46. ' Written by Rob van der Woude
  47. ' http://www.robvanderwoude.com
  48.     ' Standard housekeeping
  49.     Dim objDoc, objFile, objFSO, objWord, strFile

  50.     Const wdFormatDocument                    =  0
  51.     Const wdFormatDocument97                  =  0
  52.     Const wdFormatDocumentDefault             = 16
  53.     Const wdFormatDOSText                     =  4
  54.     Const wdFormatDOSTextLineBreaks           =  5
  55.     Const wdFormatEncodedText                 =  7
  56.     Const wdFormatFilteredHTML                = 10
  57.     Const wdFormatFlatXML                     = 19
  58.     Const wdFormatFlatXMLMacroEnabled         = 20
  59.     Const wdFormatFlatXMLTemplate             = 21
  60.     Const wdFormatFlatXMLTemplateMacroEnabled = 22
  61.     Const wdFormatHTML                        =  8
  62.     Const wdFormatPDF                         = 17
  63.     Const wdFormatRTF                         =  6
  64.     Const wdFormatTemplate                    =  1
  65.     Const wdFormatTemplate97                  =  1
  66.     Const wdFormatText                        =  2
  67.     Const wdFormatTextLineBreaks              =  3
  68.     Const wdFormatUnicodeText                 =  7
  69.     Const wdFormatWebArchive                  =  9
  70.     Const wdFormatXML                         = 11
  71.     Const wdFormatXMLDocument                 = 12
  72.     Const wdFormatXMLDocumentMacroEnabled     = 13
  73.     Const wdFormatXMLTemplate                 = 14
  74.     Const wdFormatXMLTemplateMacroEnabled     = 15
  75.     Const wdFormatXPS                         = 18
  76.        
  77.     ' Create a File System object
  78.     Set objFSO = CreateObject( "Scripting.FileSystemObject" )

  79.     ' Create a Word object. Exit with error msg if not possible (such as when Word is not installed)
  80.         On Error Resume Next
  81.     Set objWord = CreateObject( "Word.Application" )
  82.         If CStr(Err.Number) = 429 Then        ' 429 is the error code for "ActiveX component can't create object"
  83.                                                                         ' http://msdn.microsoft.com/en-us/library/xe43cc8d%28v=VS.85%29.aspx               
  84.                 'WScript.Echo "Microsoft Word cannot be found -- document conversion cannot take place. Error #" & CStr(Err.Number) & ": " & Err.Description & "." & vbCrLf
  85.                 objStdErr.Write ("ERROR: Windows-scripting failed. Document conversion cannot take place:" & vbCrLf)
  86.                 objStdErr.Write ("   Microsoft Word cannot be found or cannot be launched. (Error #" & CStr(Err.Number) & ": " & Err.Description & "). " & vbCrLf)               
  87.                 objStdErr.Write ("   For converting the latest Office documents, install OpenOffice and Greenstone's OpenOffice extension. (Turn it on and turn off windows-scripting.)" & vbCrLf)
  88.                 Exit Sub
  89.         End If

  90.     With objWord
  91.         ' True: make Word visible; False: invisible
  92.         .Visible = False

  93.         ' Check if the Word document exists
  94.         If objFSO.FileExists( inFile ) Then
  95.             Set objFile = objFSO.GetFile( inFile )
  96.             strFile = objFile.Path
  97.         Else
  98.             'WScript.Echo "FILE OPEN ERROR: The file does not exist" & vbCrLf
  99.             objStdErr.Write ("ERROR: Windows-scripting failed. Cannot open " & inFile & ". The file does not exist. ")
  100.             ' Close Word
  101.             .Quit
  102.             Exit Sub
  103.         End If

  104.         'outHTML = objFSO.BuildPath( objFile.ParentFolder, _
  105.         '          objFSO.GetBaseName( objFile ) & ".html" )

  106.         ' Open the Word document
  107.         .Documents.Open strFile

  108.         ' Make the opened file the active document
  109.         Set objDoc = .ActiveDocument

  110.         ' Save as HTML -- http://msdn.microsoft.com/en-us/library/aa220734%28v=office.11%29.aspx
  111.         objDoc.SaveAs outHTML, wdFormatFilteredHTML

  112.         ' Close the active document
  113.         objDoc.Close

  114.         ' Close Word
  115.         .Quit
  116.     End With
  117. End Sub
复制代码
发表于 2013-3-31 10:13:28 | 显示全部楼层
感谢tmplinshi 的搜集与分享
您需要登录后才可以回帖 登录 | 注册

本版积分规则

Archiver|手机版|小黑屋|批处理之家 ( 渝ICP备10000708号 )

GMT+8, 2026-3-17 07:31 , Processed in 0.013248 second(s), 8 queries , File On.

Powered by Discuz! X3.5

© 2001-2026 Discuz! Team.

快速回复 返回顶部 返回列表