PHP and COM - Keeping It Simple (
Page 5 of 9 )
As I'm sure you've figured out by now, converting a Word document to
HTML with PHP isn't really as difficult as it sounds. I have no intention of
wasting my weekend writing complex search-replace algorithms to perform this
task. Instead, I'm going to make my life (and yours) a whole lot simpler by
having a Microsoft Word COM object (and its built-in methods) take care of it
for me.
<?php
// htmlviewer.php
// convert a Word doc to an HTML file
$DocumentPath = str_replace("\\\\", "\\", $DocumentPath);
// create an instance of the Word application
$word = new COM("word.application") or die("Unable to instantiate
application object");
// creating an instance of the Word Document object $wordDocument = new
COM("word.document") or die("Unable to instantiate document object");
// open up an empty document
$wordDocument = $word->Documents->Open($DocumentPath);
// create the filename for the HTML version
$HTMLPath = substr_replace($DocumentPath, 'html', -3, 3);
// save the document as HTML
$wordDocument->SaveAs($HTMLPath, 8);
// clean up
$wordDocument = null;
$word->Quit();
$word = null;
// redirect the browser to the newly-created document header("Location:"
. $HTMLPath);
?>
As you can see, this is fairly simple, and quite similar to
the script I wrote a few pages back for Microsoft Excel. Again, the first step
is to use the COM extension to create an instance of the Microsoft Word
application object, followed by an instance of the Word document object
<?php
// create an instance of the Word application
$word = new COM("word.application") or die("Unable to instantiate
application object");
?>
Once that's done, the next step is to open up the specified
document in Word and use the object's SaveAs() method to save it as HTML.
<?php
// open up an empty document
$wordDocument = $word->Documents->Open($DocumentPath);
// create the filename for the HTML version
$HTMLPath = substr_replace($DocumentPath, 'html', -3, 3);
// save the document as HTML
$wordDocument->SaveAs($HTMLPath, 8);
?>
Note the second argument passed to the SaveAs() method, the
integer 8 - this is a numeric code which tells Word to save the document as
HTML. Feel free to experiment with this number and create different file formats
- the Web page at
http://msdn.microsoft.com/library/en-us/modcore/html/deovrWorkingWithMicrosoftWordObjects.asp
has more information on the API for this object.
Once that's done, all
that's left is to clean up and redirect the Web browser to the specified HTML
file via a call to header().
<?php
// clean up
$wordDocument = null;
$word->Quit();
$word = null;
// redirect the browser to the newly-created document header("Location:"
. $HTMLPath);
?>
Note also the call to str_replace() at the top of the script;
this is needed in order to create a valid Windows file path, and remove the
extraneous escape characters (slashes) that PHP adds to the GET URL string.
<?php
$DocumentPath = str_replace("\\\\", "\\", $DocumentPath);
?>