| Why one should only put pdfs and not word docs online .. Microsoft yet another gotcha
|
|
25 Sep 04 |
(Source coredump.cx) This is not
an exciting story: I happened to be browsing aimlessly through case studies
and other publications released by Microsoft as a part of their "Get
the facts" initiative. At one point, I stumbled upon a Word file I
wanted to read - and as soon as I ran it through wvWare, I noticed there is
a good deal of amusing change tracking information still recorded within
the document. Naturally, publishing documents with
"collaboration" data is not unheard of in the corporate world,
but the fact Microsoft had became a victim of their own technology, and had
failed to run their own tools against these publications makes it more
entertaining.
A pointless idea came to my mind that instant: why not run a gentle web
spider against all Microsoft sites in English, specifically looking for
other instances of tracking data not removed from documents? I coded a
bunch of scripts and let them run through the night, fetching approximately
10,000 unique documents; over 10% was identified as containing change
tracking records. I decided to collect only those with deleted text still
present, yielding a crop of over 5% of all documents. Quite impressive.
Below, you will find a brief (and rest assured, incomplete) list of the
most entertaining samples I’ve run into, along with some speculation
(and only speculation) as to the reasons we see them. link The tool used
|