Approximity blog home
1 of 1 article

Why one should only put pdfs and not word docs online .. Microsoft yet another gotcha   25 Sep 04
[print link all ]

(Source coredump.cx) This is not an exciting story: I happened to be browsing aimlessly through case studies and other publications released by Microsoft as a part of their "Get the facts" initiative. At one point, I stumbled upon a Word file I wanted to read - and as soon as I ran it through wvWare, I noticed there is a good deal of amusing change tracking information still recorded within the document. Naturally, publishing documents with "collaboration" data is not unheard of in the corporate world, but the fact Microsoft had became a victim of their own technology, and had failed to run their own tools against these publications makes it more entertaining.

A pointless idea came to my mind that instant: why not run a gentle web spider against all Microsoft sites in English, specifically looking for other instances of tracking data not removed from documents? I coded a bunch of scripts and let them run through the night, fetching approximately 10,000 unique documents; over 10% was identified as containing change tracking records. I decided to collect only those with deleted text still present, yielding a crop of over 5% of all documents. Quite impressive. Below, you will find a brief (and rest assured, incomplete) list of the most entertaining samples I’ve run into, along with some speculation (and only speculation) as to the reasons we see them. link The tool used

 

powered by RubLog
1 of 1 article Syndicate: full/short
A unique and safe way to buy gold and silver