Preventing and Fixing Corruption in Word Documents

Microsoft Word is the best word-processor in existence, for most people.

The only other product that comes close is Adobe FrameMaker. But FrameMaker is expensive, and requires serious effort and study to learn to use it. Even an experienced FrameMaker pilot will be nowhere near as quick as an experienced Word user on long and complex documents (although arguably, the FrameMaker pilot can create reliable longer and more complex documents, if they know what they are doing!).

That said, Microsoft Word is commercial end-user software built down to a price: it has limits! Some of these limits are published, most you will find out about the hard way. The 64-bit versions of Word 2016 and 2019 will exceed 5,000 pages in a single document: given excellent technique and a computer of adequate power to handle a file that big.


How Much is Enough?

To paraphrase Cecil B. DeMille, “Nothing succeeds like excess!” Word will happily go to 5,000 pages, but not on a laptop! The further you go beyond 200 pages; the more important good working technique becomes.

I currently run Word on an Apple iMac Pro: a 10-core Xeon with 64 GB of RAM and 4 TB of SSD storage. An equivalent PC would be the Dell Precision Workstation AIO 5720.

For normal office documents, you don’t need anywhere near that much: Word will “run” in 4GB of RAM, it will run very well in 8 GB, and 16 GB is a nice number if you’re handling more complex stuff.

But if you can afford more, have at it! A computer that is struggling to cope will be treacle-in-winter slow: it will make mistakes, and so will you while waiting for it. Eventually you will find yourself reading this article with a profound sense of regret!

Some pointers for the knowledgeable:

  • Word is not especially CPU-intensive, but a high CPU clock speed produces a much more responsive computer when you’re working in it.

  • More cores won’t help: Word is largely a single-threaded application, even today (Excel will benefit from extra cores, but Word not so much). I bought the 10-core machine because above that, Intel has to slow the chip down to prevent overheating. It’s the clock speed that works the magic in Word.

  • RAM is well ahead of cleanliness, right up there with Godliness as a virtue. I have rarely seen Word gobble more than 100 Mb of RAM; but everything else that’s running wants some. If your PC can keep everything resident in memory while you’re working, switching between programs will be instant. As soon as it starts having to swap applications out to disk, it will get maddeningly-slow, and the errors and corruptions will start.

  • Every web page will gobble 40 megs, your music program (of course you don’t run music while you work, but I do…) various social media apps (hey we’re human, right?), the favourite graphics app that likes a huge feed (Adobe PhotoShop will take over your world…). It all adds up. 

  • SSD storage is the best speed improvement you can make. Word is a chatty beastie to and from the disk. Better these days, but it still makes a huge number of disk transactions. If you have a spinning hard disk or a slow network, you’ll do a lot of waiting. It’s not the number of “pages” in a document that slows Word down, it’s the file size and complexity. Solid state storage makes a huge difference as Word’s file size climbs.

  • A hot graphics card is a waste of money (and electricity — unless you also play computer games). As far as I know, Word never touches the graphics card, for anything. Come to think of it, Word never displays anything: it calls the operating system to draw whatever appears on screen: the OS makes calls to the graphics subsystem on Word’s behalf, but the cheapest graphics card you can find is more than enough, for non-gaming use.


Avoiding Corruption

If you do the following as a matter of routine, breakages will be much less common:

Use only the modern .docx format, and save older .doc files to .docx. There’s no excuse for using .doc these days, and it’s almost guaranteed to break, especially if you convert between the formats.

Do not use the Master Documents feature. One of these days, I will get around to writing a feature on how to use Master Documents safely. But it would be a huge article, and you need to do every part of it precisely and correctly or the result is practically instant death. Since Word will now cope up to and beyond 5,000 pages in a single file, the need has pretty-much disappeared.

Save every time you stop to think! Make it a reflex action to hit Ctrl + S (Cmd + S on a Mac) every time you pause. No amount of rescuing will get back what you have not saved.

Turn Automatic Backup on:

Automatic-backup-option.png

Backup Copy: is a little “hidden”. Go to File > Options > Advanced, then scroll down to the Save segment. Backup Copy saves a complete copy of the document as it was immediately before Save. So, if you press Ctrl + S (Cmd + S on a Mac), your current version is renamed to be the backup file, and the latest version saved as the original. If you want the backup back, it’s in the same folder as the original (in a subfolder, for Mac Word). Just open it, re-save it as a .docx, and away you go again.

GOT-UR-BACK: is a free add-in from Great Circle Learning and provides a more robust backup copy solution versus Office’s built-in function mentioned above. It will keep up to five previously saved versions of your file and it works with Word, Excel and PowerPoint. It’s runs on a MAC and a PC.

AutoRecover (File > Options > Save > ”Save autorecover information every 10 minutes”) usually does nothing for you. It should be ON, for those rare occasions when it’s useful. But it can get the file back ONLY if Word knows that it has crashed, AND it can still read the original file. AutoRecover saves only the changes: if Word can’t read the original, or if you manually stopped Word, there’s nothing to apply the changes to, and the document is lost.

Always run the latest version of Microsoft Office — but it may be worth waiting a week or so on updates, since quick follow-ups to fix newly introduced bugs in large software packages are becoming all the more common. Do NOT put “Office Insider” or “Targeted” software on a machine you use in production (unless you have a spare!) it’s beta software, trying out new and sometimes untested features; there is no guarantee of performance or stability.

Never use Track Changes. Instead, rely on Compare Documents to mark the changes after the fact.

Compare.png

A “few” tracked changes will make little difference: but when you get lots of changes in a document, and changes upon changes within changes, the code in the document becomes unbelievably complex. It also means your numbering will be “wrong” until you resolve all the changes, because numbered paragraphs that are marked for deletion have not yet actually been deleted. To display the document, Word has to work out what’s in and what’s out, on the fly at very high speed. One mistake, and “boom” you lose it.

Don’t apply direct formatting (bold, italic, font changes, etc.). Instead, define named character and paragraph styles and rely entirely on them. Also makes your document MUCH easier to work with, especially for other people. A handy tool to help you manage styles is the free app AuthorTec Quick Styles.from Great Circle Learning. It runs on a MAC and a PC.

Never use drag-and-drop for editing, and instead rely on cut and paste. I have trouble avoiding drag-and-drop editing, since it can be extremely convenient, but it’s a killer; especially around tables or lists.

Avoid merged cells (rows or columns) in tables. Tables are a little fragile, especially in long documents: don’t push your luck!


OK, It’s Broken

Once a document begins behaving weirdly, NOW is the time to fix it. The corruption will only get worse, and soon (maybe next save…) the thing won’t open at all.

You do have a backup, right?

The only thing we can say for certain about a computer is “Eventually, it WILL fail!” And when it does, you will lose everything it contains.

You can buy a perfectly adequate backup drive that will last for your entire University course for less than 200 bucks. “I can’t afford it” begins to sound a bit lame if you have to repeat a year because you didn’t have time to re-type your assignment after some low-life jumped through the window and stole your laptop!

Resolve Tracked Changes

It’s pretty much a waste of time trying to repair a document that has tracked changes in it. Chances are, the changes, or some of them, are the source of the bother in the first place.

A mistake many users make is turning off the “display” of tracked changes. They go to the Review tab, in the Tracking chunk, and turn “All Markup” to “No Markup”.  Then they carry on working, thinking they have turned change tracking off. They haven’t: changes are still being tracked, you just can’t see that.

In a month or two of daily editing, the inside of the document looks a bit like this:

bandaides.png

It takes only ONE of those stickies to let go, and your whole document comes tumbling down.

One of the first signs of this condition is that all attempts to fix your numbering fail: whatever you do, the numbering won’t go “right”.  That’s because the deleted paragraphs are not removed when tracking changes, they are simply marked as “to be” deleted.

The fix is very simple: On the Review tab, drop down the disclosure triangle under the Accept button in the Changes chunk, and choose Accept All Changes and Stop Tracking:

Accept-tracked-changes.png

Now, save and close the document (Word does not actually remove deleted material from the file until the document is closed, in case you want to “Undo” something). Chances are quite high that whatever was wrong with the document has now been fixed.

No, you “can’t” leave tracked changes in place and fix a document; if you try, you will probably find the document doesn’t get fixed. What you “can” do is keep a copy of the bad document, and use Compare Documents to re-insert the changes to the fixed one. To do this, you need to REJECT all changes in the broken document (because you Accepted them all in the fixed version). The Comparison result will then show you all the differences. (It may also bring back the problem, so make a copy first…)

Doing a Maggie

The next rescue to try is a technique called “doing a Maggie” (named after Margaret Secara from the TECHWR-L mailing list, who first publicized the technique invented by Woody Leonard in his book "Word 97 annoyances"). Follow these steps (carefully!):

  1. Create a new, empty document in the .docx format.

  2. In your corrupted document, display the paragraph marks (¶); there’s a button you can click in the Paragraph chunk of the Home tab to do so.

  3. Click at the very beginning of the corrupted document to set the insertion point there,

  4. Scroll to the end of the document, just before the last paragraph mark in the document,

  5. Hold down the Shift key, and click again. Hundreds of document attributes are stored below that last paragraph mark, so it’s usually the place where corruption is stored.

  6. Copy the selected text,

  7. Switch to your new document,

  8. Paste the text, and

  9. Save the file with a new name.

If you do this carefully, chances are you will get a perfect recovery of your document, without the corrupted bit. Do a quick eye-ball to make sure no text is missing (Word simply discards any code it can’t understand: there may be the odd paragraph missing).

If you copied that last paragraph mark, you likely copied the corruption. If so, start again. Turn your paragraph marks on, so you can see what you are doing.

Save to a Simpler Format

If a Maggie doesn’t work, the next thing to try is saving out to a simpler file format. This works by forcing Word to re-interpret all the code in the document, to express it in a different format.

  • Microsoft suggests RTF as the format to use. I generally find that RTF is “too good” for this purpose: the RTF format accurately describes nearly all of what can be in a document, including the corruption; so it stays in there. Worth a try, especially for fixing numbering, but don’t hope for too much.

  • I recommend “Web Page” format. This simplifies the internals of the document, discarding things like comments and tracked changes, and in doing so, often removes the corruption.  Numbering may be converted to typed characters, depending. You keep most of your formatting.

  • Note: Don’t use “Web Page (Filtered)”: that cuts back the document to everything except basic HTML: You lose most formatting.

Having saved to a different format, close and then re-open the new version (to clear the old version from memory).

If the corruption is gone, re-save the new version as .docx with a new file name.  Don’t save over the original; during the conversion you may have lost some of the text, you may need to go back to the original.

Binary Search

If you are dealing with a long document, make a backup copy, and then try copying just the first half of the corrupted document out to a new document.

  1. Strip the section breaks before you do this: many corruptions occur inside section breaks.

  2. If that new document seems fine, copy half of what remains in the corrupted document, and keep copying halves until you isolate the problem. (If the problem still exists, start with the other half first.)

  3. Be very suspicious of text in or immediately adjacent to tables, lists, or graphics. Be aware that a document may have more than one corruption in it (if it has ever been a master document, it may have many…).

  4. Once you have narrowed down the corruption to just a paragraph or so, or to a single page, begin again with a fresh blank document.

  5. First copy the known good text to the new document, then either type the bad text in again, or copy it and paste it first into a Text editor (Notepad) then copy from Notepad and paste into the new document.

Remember: If you copy bad text, you will copy the corruption: start again! Don’t trust the first document you used in your binary search: corruption can be very small, like a virus: a single bad space copied over can corrupt the whole document.

Recover Text From Any File

This is your absolute last resort. It’s the only thing you can try if the file won’t even open.

Recover Text recovers just the text of the file: all tables, graphics, formatting, numbering etc is abandoned.  The file opens as a plain text file.  Bits may be missing if they cannot be read.

  1. Go to the File menu,

  2. Choose Open

  3. Choose Browse

  4. Change “All word documents” to “Recover text from any file (*.*)”.

  5. Select the corrupted file

  6. Click OK

Enjoy! You will get a plain text version of any recognisable text in the file, no matter what it is or what’s wrong with it. All formatting, tables, images, numbering, comments etc will be lost.

Save it as a new .docx file, then re-format it from scratch.


This blog post has been graciously provided by our Contributor at Large:

John McGhie.

Among other fine accomplishments, including his work as a Microsoft MVP (Word, Mac Word) John is a Consultant Technical Writer at McGhie Information Engineering Pty Ltd in Sydney, Australia.