Neil Day
Hi All :)

Would be grateful to hear from anyone who has successfully managed to convert word/rtf files into HTML without using the 'save as webpage' option provided by MS Office.

Or maybe you have used that option but then successfully managed to use a piece of software to easily clean up the terrible HTML coding that it seems to produce?

Eagerly waiting ;)

Cheers

Neil D

George Chapin
Hi Neil,

If you don't have many to do I would try to open the word doc and select the entire doc and choose copy and paste into front page.

Front page will keep the same formatting but the html will be cleaner then if you converted to html using word. Although front page hardly makes clean html.

If you don't have front page and you would like me to convert some files for you let me know.

Take care,

Neil Day
Cheers George

DUH!!! Don't know why I didn't think of that. Thanks for the info. I will go and have a go at that now. It has got to produce cleaner coding than MS Office so shouldn't take long to re-edit if needed.

Later

Neil

George Chapin
I have to admit I didn't try it so let me know how it goes!

Louis
Originally posted by Neil Day
Hi All :)

Would be grateful to hear from anyone who has successfully managed to convert word/rtf files into HTML without using the 'save as webpage' option provided by MS Office.

Or maybe you have used that option but then successfully managed to use a piece of software to easily clean up the terrible HTML coding that it seems to produce?

Eagerly waiting ;)

Cheers

Neil D

Hi Neil

What I do - which is a bit of tricky process and does lose some formatting from Word (so it's best not to put too much in, in the first place) is:

From the "File" menu in Word I choose "Save As Web Page" to save the document in HTML format. (This is in Word 2000.)

However the style of HTML it saves the document out as isn't clean AT ALL.

I then use the Dreamweaver utility from the "Functions" menu: "Clean Up Word HTML". And that cleans up a lot of the excess HTML.

Please note however that I am using an older version of Dreamweaver - Version 4. So perhaps this function is even more powerful now.

From the outputted HTML I then sometimes go through the code and for any remaining messy and unnecessary code I find still in there I often remove with a global "Search & Replace" - replacing the text with nothing - therefore deleting it.

What I then have left is very bland HTML, but all the text is there and the original formatting of bold, underline, italic...etc. should remain.

I then simply apply my usual formatting to the document - choosing the font style, adding tables...etc.

It's a bit of a convoluted process but it works well. This is in fact what I'm going to do for my RREM March product - I've been writing it in Word applying little formatting, and then I'll go through this process to convert it to HTML so that the brandable PDF can be created.

I have also spent time looking at download.com for other utilities for cleaning "Word HTML" but haven't yet found anything close to the functionality Dreamweaver offers.

Sincerely,
Louis

Neil Day
Hi Louis

Thanks for your input.

Eventually I went with what George suggested by cutting and pasting the original RTF document into Frontpage.

It still gives a bit of excess HTML coding but nowhere near as much as Office does when you save the webpage. All I did then was to use the search and replace facility within notepad to get rid of the unwanted coding.

This appears to have worked OK but I wouldn't want to do it with a lot of pages :eek2:

Cheers

Neil

Louis
Originally posted by Neil Day
Hi Louis

Thanks for your input.

Eventually I went with what George suggested by cutting and pasting the original RTF document into Frontpage.

It still gives a bit of excess HTML coding but nowhere near as much as Office does when you save the webpage. All I did then was to use the search and replace facility within notepad to get rid of the unwanted coding.

This appears to have worked OK but I wouldn't want to do it with a lot of pages :eek2:

Cheers

Neil

Hi Neil

If you're not too worried about losing the formatting during the conversion from RTF to HTML, would outputting the document from Word as "Text Only" and then opening this in FrontPage or Dreamweaver perhaps work for you?

You'll then have all the text and paragraphs laid out, and will just need to add bold, italic, font style...etc. to your liking.

Sincerely,
Louis

George Chapin
If you are using the html in the pdf creator Louis's last suggestion is the way to go.

g

Copyright © InfoProfitsTalk.com. All Rights Reserved.