Extract Text From Corrupt Office 2010 Documents

If you have Office 2010 docx, xlsx, and pptx documents that are corrupt, then you should extract the text before deleting them. Yes, you can still extract the content from the corrupt documents. This is possible because the Office formats are actually zipped collections of XML files.

Corrupt Office 2007 Extractor is a command line tool that can extract docx, xlsx, and pptx office documents. These formats are used in both Office 2007 and Office 2010, so the app will face no difficulty in extracting the text.

There are two switches only, -t and -x. The former switch allows you to extract the text from docx format and also allows conversion of xlsxl spreadsheet to csv format, while the later allows extraction of xml files from docx document.

Note: When using the -t switch to extract the text, the text will be displayed in the command line window as well.

To begin, you need to extract the tool and move the corrupt documents to the same folder. The output after extracting the text or original xml file will be saved in the same directory where the tool and corrupt documents are residing. The text is saved in RTF format.

If you think the XML file in the output is too large, you can split it using OOXP Splitter.

