Extract text from PDF, DOC, HTML, CHM, and RTF files提取文本从PDF格式,文件, HTML格式,包括进出口,和rtf文件

Posted on January 19, 2008 at 7:31 am发布于2008年1月19日在上午07时31分

Have a document in PDF format that you would like to convert to a text document ?有一份文件, PDF格式,你想转换为文本文件 Or maybe an HTML or CHM (Windows Help File) that you need to convert into simply plain text ?或者是HTML或信息中心机制( Windows帮助文件) ,你需要转换成简单的纯文本 Why might this be useful you ask?为什么会这样有益的,你问? Most PDF documents are not editable and selecting the text manually can be a tedious process.大多数PDF文件是不能被编辑和选择文本手动可能会是一个冗长的过程。

You can use Text-Mining-Tool to automatically extract text from a PDF file so that you can use it in any program freely.你可以用文本挖掘工具,自动地提取文本从PDF档案,让你可以用它在任何程序自由。 Or if you cannot open a PDF file because you do not have a PDF viewer installed, you can use this tool to extract the text and read the document.或者,如果您无法打开PDF文件,因为你没有一个PDF浏览器安装时,您可以使用这个工具来提取文本,并宣读了这份文件。

Text Mining Tool is completely free and does not even require an installation, simply unzip it and run the program to use it.文本挖掘工具是完全免费的,不,甚至需要安装,只要解压缩并运行程序来使用它。

文本挖掘工具

Click the Open button and choose your file that you want to convert to text.点击打开按钮,并选择你的文件说,你要转换为文本。 Click ok and the large window below the buttons will eventually fill with all of the text extracted from the document.单击确定和大窗下面按钮,将最终填补所有的文本提取出来的文件。

提取文本

Click Save to save the extracted text to your computer.点击保存保存提取文本到你的电脑。 You can also click Clipboard to copy the mined text to the Windows clipboard.您也可以点击剪贴板复制雷区文本到Windows剪贴板。

For convenience, the following hotkeys can be used to perform the operations:为方便起见,以下热键可以用来执行业务:

  • Open - F3 or O .开放-F 3或 °
  • Save - F2 or S .拯救-F 2代或 S
  • Clipboard - F5 or C .剪贴板-F 5的或 C
  • Exit - F10 or Escape .出境-F 10为或脱。

You can also use the minetext console tool to create a batch script for extracting text from multiple files.您也可以使用minetext控制台工具,以创造一个脚本批量提取文本,从多个档案。 This can be useful if you have a directory with a large number of files that need to have text extracted.这可以非常有用,如果你有一个目录,与大量的文件,需要有文字摘录。

The included console tool minetext has the following syntax:它所包含的控制台工具minetext有以下语法:

 minetext <input file>  minetext <input file> <output file>  where:    <input file>  - any file with one of the following extensions:                   pdf, doc, rtf, chm, htm, html   <output file> - file you want to write text mined from input file minetext <input file> minetext <input file> <output file>地点: <input file> -任何文件下列情形之一的分机号: P DF格式,文件,对R TF,信息中心机制,热媒, H TML格式< outputf ile>-档案,你想写文本雷区,从输入文件 

If you’re a web designer, this program can be very useful to grab the text from a Word document without getting all of the extra Microsoft Office styling code included with the text.如果你是一个网页设计师,这一计划可以是很有益的抓斗文从一个Word文件,没有得到全部的额外的Microsoft Office造型代码包含文本。

This is a very simple program that is very simple to use!这是一个很简单的程序,这是非常简单的使用! It has one basic purpose and it does a good job!它的一个基本目的,它是否做好了! Enjoy!享受!

Technorati Tags: Technorati标记: , , , , ,

If you enjoyed this post, make sure you 如果你喜欢这个职位,请务必 subscribe to my RSS feed 订阅我的RSS馈送 !

» Filed Under »提起下 Free Software Downloads免费软件下载

Related Posts相关职位

Please post your comments/suggestions!请您评论/建议!