You've already seen how PHP can be used to interface with Java components and JavaBeans. But here's something you didn't know - PHP can (shock shock! horror horror!) even be used to interface with Microsoft COM objects on the Windows platform. Will this be a happy marriage? Read on to find out.
As I'm sure you've figured out by now, converting a Word document to HTML with PHP isn't really as difficult as it sounds. I have no intention of wasting my weekend writing complex search-replace algorithms to perform this task. Instead, I'm going to make my life (and yours) a whole lot simpler by having a Microsoft Word COM object (and its built-in methods) take care of it for me.
<?php
// htmlviewer.php
// convert a Word doc to an HTML file
$DocumentPath = str_replace("\\\\", "\\", $DocumentPath);
// create an instance of the Word application
$word = new COM("word.application") or die("Unable to instantiate
application object");
// creating an instance of the Word Document object $wordDocument = new
COM("word.document") or die("Unable to instantiate document object");
// open up an empty document
$wordDocument = $word->Documents->Open($DocumentPath);
// create the filename for the HTML version
$HTMLPath = substr_replace($DocumentPath, 'html', -3, 3);
// save the document as HTML
$wordDocument->SaveAs($HTMLPath, 8);
// clean up
$wordDocument = null;
$word->Quit();
$word = null;
// redirect the browser to the newly-created document header("Location:"
. $HTMLPath);
?>
As you can see, this is fairly simple, and quite similar to
the script I wrote a few pages back for Microsoft Excel. Again, the first step is to use the COM extension to create an instance of the Microsoft Word application object, followed by an instance of the Word document object
<?php
// create an instance of the Word application
$word = new COM("word.application") or die("Unable to instantiate
application object");
?>
Once that's done, the next step is to open up the specified
document in Word and use the object's SaveAs() method to save it as HTML.
<?php
// open up an empty document
$wordDocument = $word->Documents->Open($DocumentPath);
// create the filename for the HTML version
$HTMLPath = substr_replace($DocumentPath, 'html', -3, 3);
// save the document as HTML
$wordDocument->SaveAs($HTMLPath, 8);
?>
Note the second argument passed to the SaveAs() method, the
integer 8 - this is a numeric code which tells Word to save the document as HTML. Feel free to experiment with this number and create different file formats - the Web page at http://msdn.microsoft.com/library/en-us/modcore/html/deovrWorkingWithMicrosoftWordObjects.asp has more information on the API for this object.
Once that's done, all that's left is to clean up and redirect the Web browser to the specified HTML file via a call to header().
<?php
// clean up
$wordDocument = null;
$word->Quit();
$word = null;
// redirect the browser to the newly-created document header("Location:"
. $HTMLPath);
?>
Note also the call to str_replace() at the top of the script;
this is needed in order to create a valid Windows file path, and remove the extraneous escape characters (slashes) that PHP adds to the GET URL string.