1. .doc file to .txt ?

Does anyone know how Eu can transform a .doc file (like 
http://www.nwo.usace.army.mil/html/od-tl/pn/200380625t.doc) into a plain 
ascii.txt file? Will different .doc versions require different decoders, and
will
the proper version be part of the .doc file? My win95b box won't read some 
.doc that winxp will read, but i prefer them to be in .txt anyhow, for various 
reasons.

Kat

new topic     » topic index » view message » categorize

2. Re: .doc file to .txt ?

Forgive me if this is a very stupid way to do it.  I know it's not clean and
pretty, but it will leave you with plain text, some of which will literally be
garbage, but at least you can open it in notepad and just delete the stuff
that's garbage, leaving the body of the document wrappable and readable.

	atom char
	integer fn
	integer of
	sequence buffer

	fn = open("c:\\my download files\\200380625t.doc","rb")
    char = getc(fn)
	of = open("c:\\luxor options\\doc.txt","w")
	buffer = {}

while char > -1 do   -- -1 is EOF

    if char > 31 and char < 127 then
    	buffer = append(buffer,char)
    elsif char = 10 or char = 13 then
    	buffer = append(buffer,char)
    	if char = 13 then
	        buffer = append(buffer,'\n')
	    end if
     end if
    char = getc(fn)

	
end while
	puts(of,buffer)
	close(fn)
	close(of)

I know you probably will find this really dumb, but I didn't want you to think
nobody was looking :)

Ward

new topic     » goto parent     » topic index » view message » categorize

3. Re: .doc file to .txt ?

Ward Turner wrote:
> 
> Forgive me if this is a very stupid way to do it.  I know it's not clean and
> pretty, but it will leave you with plain text, some of which will literally be
> garbage, but at least you can open it in notepad and just delete the stuff
> that's garbage, leaving the body of the document wrappable and readable.
> 
> 	atom char
> 	integer fn
> 	integer of
> 	sequence buffer
> 
> 	fn = open("c:\\my download files\\200380625t.doc","rb")
>     char = getc(fn)
> 	of = open("c:\\luxor options\\doc.txt","w")
> 	buffer = {}
> 
> while char > -1 do   -- -1 is EOF
> 
>     if char > 31 and char < 127 then
>     	buffer = append(buffer,char)
>     elsif char = 10 or char = 13 then
>     	buffer = append(buffer,char)
>     	if char = 13 then
> 	        buffer = append(buffer,'\n')
> 	    end if
>      end if
>     char = getc(fn)
> 
> 	
> end while
> 	puts(of,buffer)
> 	close(fn)
> 	close(of)
> 
> I know you probably will find this really dumb, but I didn't want you to think
> nobody was looking :)
> 
> Ward

Hi Ward,

I tried this and it worked pretty well.  I'm sure Kat can tweak your code to get
a good "returns" response, something like ignoring 10's and substituting '\n's
for 13's or something.  But basically it does the trick.  Nice of you to take the
time amid all the sound and fury of Linux vs. M$, etc.

Now, if you could just figure a way to extract the maps too..:^D

--Quark

new topic     » goto parent     » topic index » view message » categorize

4. Re: .doc file to .txt ?

DB James wrote:
> 
> I tried this and it worked pretty well.  I'm sure Kat can tweak your code to
> get a
> good "returns" response, something like ignoring 10's and substituting '\n's
> for 13's
> or something.  But basically it does the trick.  Nice of you to take the time
> amid
> all the sound and fury of Linux vs. M$, etc.
> 
> Now, if you could just figure a way to extract the maps too..:^D

Speaking of Linux vs M$, have you looked at antiword? 

http://www.winfield.demon.nl/

I'm not sure if you want to be dependant on an external program (nor even if
antiword will do what you want, re images) but it is available on many, many
platforms even 16-bit MS-DOS.

Gary

new topic     » goto parent     » topic index » view message » categorize

5. Re: .doc file to .txt ?

On 6 Aug 2005, at 6:24, ags wrote:

> 
> 
> posted by: ags <eu at 531pi.co.nz>
> 
> DB James wrote:
> > 
> > I tried this and it worked pretty well.  I'm sure Kat can tweak your code to
> > get a good "returns" response, something like ignoring 10's and substituting
> > '\n's for 13's or something.  But basically it does the trick.  Nice of you
> > to
> > take the time amid all the sound and fury of Linux vs. M$, etc.
> > 
> > Now, if you could just figure a way to extract the maps too..:^D
> 
> Speaking of Linux vs M$, have you looked at antiword? 
> 
> http://www.winfield.demon.nl/
> 
> I'm not sure if you want to be dependant on an external program (nor even if
> antiword will do what you want, re images) but it is available on many, many
> platforms even 16-bit MS-DOS.

Kool:

(2) save the text version of the Word document in Latin2, in a file antiword -m 
cp852.txt filename.doc > filename.txt 

(1) save the PostScript version of the Word document in Latin1, in a file
generate
PostScript for printing on European A4 size paper antiword -p a4 -m 8859-1.txt 
filename.doc > filename.ps 

(2) save the PostScipt version of the Word document in Latin2, in a file
generate
PostScript for printing on American letter size paper antiword -p letter -m
8859-
2.txt filename.doc > filename.ps 

Thanks, ags!

Kat

new topic     » goto parent     » topic index » view message » categorize

6. Re: .doc file to .txt ?

On 4 Aug 2005, at 17:13, Ward Turner wrote:

> 
> 
> posted by: Ward Turner <captaincorc at isp.com>
> 
> Forgive me if this is a very stupid way to do it.  I know it's not clean and
> pretty, but it will leave you with plain text, some of which will literally be
> garbage, but at least you can open it in notepad and just delete the stuff
> that's garbage, leaving the body of the document wrappable and readable.
> 
>  atom char
>  integer fn
>  integer of
>  sequence buffer
> 
>  fn = open("c:\\my download files\\200380625t.doc","rb")
>     char = getc(fn)
>  of = open("c:\\luxor options\\doc.txt","w")
>  buffer = {}
> 
> while char > -1 do   -- -1 is EOF
> 
>     if char > 31 and char < 127 then
>      buffer = append(buffer,char)
>     elsif char = 10 or char = 13 then
>      buffer = append(buffer,char)
>      if char = 13 then
>          buffer = append(buffer,'\n')
>      end if
>      end if
>     char = getc(fn)
> 
> 
> end while
>  puts(of,buffer)
>  close(fn)
>  close(of)
> 
> I know you probably will find this really dumb, but I didn't want you to think
> nobody was looking :)

Thanks, i wondered if there was some undocumented winapi call in 
someone's library to do this all pretty, with pics. Well,, hmm,, yes,, ok.

Kat

new topic     » goto parent     » topic index » view message » categorize

7. Re: .doc file to .txt ?

Kat wrote:
> > <a href="http://www.winfield.demon.nl/">http://www.winfield.demon.nl/</a>
> > 
> > I'm not sure if you want to be dependant on an external program (nor even if
> > antiword will do what you want, re images) but it is available on many, many
> > platforms even 16-bit MS-DOS.
> 
> Kool:
> 
> (2) save the text version of the Word document in Latin2, in a file antiword
> -m
> cp852.txt filename.doc > filename.txt 
> 
> (1) save the PostScript version of the Word document in Latin1, in a file
> generate
> 
> PostScript for printing on European A4 size paper antiword -p a4 -m 8859-1.txt
>
> filename.doc > filename.ps 
> 
> (2) save the PostScipt version of the Word document in Latin2, in a file
> generate
> PostScript for printing on American letter size paper antiword -p letter -m
> 8859-
> 2.txt filename.doc > filename.ps 
> 
> Thanks, ags!

De nada, nichts zu danken.  Openwebmail uses antiword to give fast previews
of word docs which is what impressed me about it.  I don't think it handles
tables too well though.

Gary

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu