1. Could this be done?

Hi all,

Some time ago Ward Turner replied to a query of Kat's with some code to extract
the text from an MS Word file.  I liked the idea, so wrote a simple program that
uses his idea.  It works, so I began thinking of other files to extract text
from, such as Outlook Express .dbx files, and others.  That led to thinking about
text extraction in general, and to the next step: file conversion.

And that led to the idea of "Vertex", from conVERT EXtract which is merely an
idea at the moment (and if I have to code it it will probably remain an idea).

Is it possible to write a Windows program that could have a conversion or
extraction feature added just by plopping an include file in its directory?  If
so, then anyone could add a specialized function to the "shell program", either
for personal use, or to share with everyone.  This would amount to a
collaboration without the need to communicate about it.  The author of the
"shell" would give a format for the include files that everyone could follow. 
Painless "add-ins" or "plug-ins", is the idea.

A way of thinking about this is to imagine a FROM and a TO menu item.  The TO
would show options according to what was chosen in the FROM, e.g. if a user chose
JPG in FROM, then BMP, PNG, etc. would show in the TO.  If TEXT were chosen in
FROM, then HTML would be an option in TO.

I like the idea of a program's capability growing constantly as people have new
ideas and needs.

--Quark

new topic     » topic index » view message » categorize

2. Re: Could this be done?

DB James wrote:
> 
> 
> Hi all,
> 
> Some time ago Ward Turner replied to a query of Kat's with some code to
> extract the
> text from an MS Word file.  I liked the idea, so wrote a simple program that
> uses his
> idea.  It works, so I began thinking of other files to extract text from, such
> as Outlook
> Express .dbx files, and others.  That led to thinking about text extraction in
> general,
> and to the next step: file conversion.
> 
> And that led to the idea of "Vertex", from conVERT EXtract which is merely an
> idea
> at the moment (and if I have to code it it will probably remain an idea).
> 
> Is it possible to write a Windows program that could have a conversion or
> extraction
> feature added just by plopping an include file in its directory?  If so, then
> anyone
> could add a specialized function to the "shell program", either for personal
> use, or
> to share with everyone.  This would amount to a collaboration without the need
> to communicate
> about it.  The author of the "shell" would give a format for the include files
> that
> everyone could follow.  Painless "add-ins" or "plug-ins", is the idea.
> 
> A way of thinking about this is to imagine a FROM and a TO menu item.  The TO
> would
> show options according to what was chosen in the FROM, e.g. if a user chose
> JPG in
> FROM, then BMP, PNG, etc. would show in the TO.  If TEXT were chosen in FROM,
> then
> HTML would be an option in TO.
> 
> I like the idea of a program's capability growing constantly as people have
> new ideas
> and needs.
> 
> --Quark
> 

Yes, it's possible. However, you'd need to have a documented internal
structure/format for the program so that everything can translate to a universal
format that the other translators can use to output their files. You would
probably also need to make an API for the converter includes to make it usable.
This would be a very useful program if you could get it to work and I've thought
about writting one many times but don't have the time.


The Euphoria Standard Library project :
    http://esl.sourceforge.net/
The Euphoria Standard Library mailing list :
    https://lists.sourceforge.net/lists/listinfo/esl-discussion

new topic     » goto parent     » topic index » view message » categorize

3. Re: Could this be done?

D. Newhall wrote:
> 
> DB James wrote:
> > 
> > 
> > Hi all,
> > 
> > Some time ago Ward Turner replied to a query of Kat's with some code to
> > extract the
> > text from an MS Word file.  I liked the idea, so wrote a simple program that
> > uses his
> > idea.  It works, so I began thinking of other files to extract text from,
> > such as Outlook
> > Express .dbx files, and others.  That led to thinking about text extraction
> > in general,
> > and to the next step: file conversion.
> > 
> > And that led to the idea of "Vertex", from conVERT EXtract which is merely
> > an idea
> > at the moment (and if I have to code it it will probably remain an idea).
> > 
> > Is it possible to write a Windows program that could have a conversion or
> > extraction
> > feature added just by plopping an include file in its directory?  If so,
> > then anyone
> > could add a specialized function to the "shell program", either for personal
> > use, or
> > to share with everyone.  This would amount to a collaboration without the
> > need to communicate
> > about it.  The author of the "shell" would give a format for the include
> > files that
> > everyone could follow.  Painless "add-ins" or "plug-ins", is the idea.
> > 
> > A way of thinking about this is to imagine a FROM and a TO menu item.  The
> > TO would
> > show options according to what was chosen in the FROM, e.g. if a user chose
> > JPG in
> > FROM, then BMP, PNG, etc. would show in the TO.  If TEXT were chosen in
> > FROM, then
> > HTML would be an option in TO.
> > 
> > I like the idea of a program's capability growing constantly as people have
> > new ideas
> > and needs.
> > 
> > --Quark
> > 

Hello, and thanks for the reply.  I was beginning to wonder if I hadn't
suggested something so loony that no one was going to respond.

> Yes, it's possible. However, you'd need to have a documented internal
> structure/format
> for the program so that everything can translate to a universal format that
> the other
> translators can use to output their files. 

I don't quite follow this.  I would understand and agree that it would be useful
with image files to be able to translate every other type to, say, BMP format,
and from there to other types.  This would reduce the number of translations. 
Does the same idea apply to other types of extractions or conversions?

>You would probably also need to make an API for the converter includes to >make
>it usable.

Perhaps you could elaborate on this.  I was imagining a limited conformity with
the needs of the shell program that would be explicit and be followed by the
authors of the includes.  For example, there might be rigidly formatted comments
in the include that would be readable by the shell to establish what capability
the include offers.  Perhaps the only call would have the same name as the name
of the include: text2html.e, so the call would be text2html(fullPath)or
text2html(lines) or whatever.

>This would be a very useful program  if you could get it to work and I've
>>thought about writting one many times but don'thave the time.

Yes, it might be useful in that it would grow with the efforts of each
individual who has a good idea and adds that functionality.  Perhaps the same
idea could be used for a tutorial program where an author lays out the shell and
sets up the general format for the presentation of the lessons.  Then any one,
whether an intermediate or advanced level programmer, could write a module for
it.  It does seem obvious that a lot of good ideas have not seen the light of day
because too much work was needed to do the whole job.  But just doing one part
might get done.

--Quark

new topic     » goto parent     » topic index » view message » categorize

4. Re: Could this be done?

DB James wrote:

> I don't quite follow this.  I would understand and agree that it would be
> useful with
> image files to be able to translate every other type to, say, BMP format, and
> from
> there to other types.  This would reduce the number of translations.  Does the
> same
> idea apply to other types of extractions or conversions?

What I meant was that when loaded by the program all files would be the same
internally. For example, it would read a .DOC file converting it to an
intermediate format inside the program. Then the ASCII text converter takes the
intermediate format and returns the .TXT file.

To get around the .BMP to .TXT problem you could have different classes of files
and you can only convert filetypes that are in the same class (for the most
part). For example, JPEGs, bitmaps, PNGs, etc. are all classified as type Image
and can be converted to and from one another; ASCII text files, .DOC files,
RichText files, etc. would be of type TextDocument or something. This is needed
because you can't convert a Bitmap to a text file (although vice-versa could be
done) so the converters need a way to check for limitations but without
sacrificing flexibility.


> Perhaps you could elaborate on this.  I was imagining a limited conformity
> with the
> needs of the shell program that would be explicit and be followed by the
> authors of
> the includes.  For example, there might be rigidly formatted comments in the
> include
> that would be readable by the shell to establish what capability the include
> offers.
>  Perhaps the only call would have the same name as the name of the include:
>  text2html.e,
> so the call would be text2html(fullPath)or text2html(lines) or whatever.

Here's an example of a possible API:

-- Include API functions and constants
include vertex.e


integer converter_rid, -- The routine ID number for the converter function
        extractor_rid  -- The routine ID number for the extractor function

-- This is the actual extractor function.
-- It takes the file number for the opened file to extract the info from.
function txt_extractor(integer file)
    object line
    line = gets(file)
    while sequence(line) do

        -- This goes through every line and converts the plain ASCII
        -- text to an intermediary format for text.

        line = gets(file)
    end while
end function

extractor_rid = routine_id("txt_extractor")


-- This is the actual converter function.
-- It takes the file number for the opened file to convert the info from 
-- the intermeiary format to the new one.
function txt_converter(integer file)
    object converted_data
    -- Loop through internal text data
    for i=1 to length(text_data) do

        -- Take the internal representation for text and convert it to the new
        -- format storing it in converted_data

        puts(file, converted_data)
    end for
end function

converter_rid = routine_id("txt_converter")


-- Holds the ID for the .TXT file type
integer text_file_id

-- This declares the type of files that it can work on
-- Arguments are: extension, description, file class
text_file_id = declare_filetype(".txt", "ASCII text file", CLASS_TEXTDOC)
declare_convert(text_file_id, converter_rid)
declare_extract(text_file_id, extractor_rid)




The Euphoria Standard Library project :
    http://esl.sourceforge.net/
The Euphoria Standard Library mailing list :
    https://lists.sourceforge.net/lists/listinfo/esl-discussion

new topic     » goto parent     » topic index » view message » categorize

5. Re: Could this be done?

D. Newhall wrote:
> 
> DB James wrote:
> 
> > I don't quite follow this.  I would understand and agree that it would be
> > useful with
> > image files to be able to translate every other type to, say, BMP format,
> > and from
> > there to other types.  This would reduce the number of translations.  Does
> > the same
> > idea apply to other types of extractions or conversions?
> 
> What I meant was that when loaded by the program all files would be the same
> internally.
> For example, it would read a .DOC file converting it to an intermediate format
> inside
> the program. Then the ASCII text converter takes the intermediate format and
> returns
> the .TXT file.
> 
> To get around the .BMP to .TXT problem you could have different classes of
> files and
> you can only convert filetypes that are in the same class (for the most part).
> For
> example, JPEGs, bitmaps, PNGs, etc. are all classified as type Image and can
> be converted
> to and from one another; ASCII text files, .DOC files, RichText files, etc.
> would be
> of type TextDocument or something. This is needed because you can't convert a
> Bitmap
> to a text file (although vice-versa could be done) so the converters need a
> way to
> check for limitations but without sacrificing flexibility.
> 
> 
> > Perhaps you could elaborate on this.  I was imagining a limited conformity
> > with the
> > needs of the shell program that would be explicit and be followed by the
> > authors of
> > the includes.  For example, there might be rigidly formatted comments in the
> > include
> > that would be readable by the shell to establish what capability the include
> > offers.
> >  Perhaps the only call would have the same name as the name of the include:
> >  text2html.e,
> > so the call would be text2html(fullPath)or text2html(lines) or whatever.
> 
> Here's an example of a possible API:
> 
> }}}
<eucode>
> -- Include API functions and constants
> include vertex.e
> 
> 
> integer converter_rid, -- The routine ID number for the converter function
>         extractor_rid  -- The routine ID number for the extractor function
> 
> -- This is the actual extractor function.
> -- It takes the file number for the opened file to extract the info from.
> function txt_extractor(integer file)
>     object line
>     line = gets(file)
>     while sequence(line) do
> 
>         -- This goes through every line and converts the plain ASCII
>         -- text to an intermediary format for text.
> 
>         line = gets(file)
>     end while
> end function
> 
> extractor_rid = routine_id("txt_extractor")
> 
> 
> -- This is the actual converter function.
> -- It takes the file number for the opened file to convert the info from 
> -- the intermeiary format to the new one.
> function txt_converter(integer file)
>     object converted_data
>     -- Loop through internal text data
>     for i=1 to length(text_data) do
> 
>         -- Take the internal representation for text and convert it to the new
>         -- format storing it in converted_data
> 
>         puts(file, converted_data)
>     end for
> end function
> 
> converter_rid = routine_id("txt_converter")
> 
> 
> -- Holds the ID for the .TXT file type
> integer text_file_id
> 
> -- This declares the type of files that it can work on
> -- Arguments are: extension, description, file class
> text_file_id = declare_filetype(".txt", "ASCII text file", CLASS_TEXTDOC)
> declare_convert(text_file_id, converter_rid)
> declare_extract(text_file_id, extractor_rid)
> 
> 
> <font color="#330033"></eucode>
{{{
</font>
> 
> 
> The Euphoria Standard Library project :
>     <a href="http://esl.sourceforge.net/">http://esl.sourceforge.net/</a>
> The Euphoria Standard Library mailing list :
>     <a
>     href="https://lists.sourceforge.net/lists/listinfo/esl-discussion">https://lists.sourceforge.net/lists/listinfo/esl-discussion</a>
> 

Hello,

With a beginning like that I'm guessing it would not take as much of your time
to write the program as you think.  If you do it, I hope I'd be able to add a
module or two.

A question though, whether it is useful or not, is this: is it really necessary
to assume the main program must create an intermediate version of whatever the
original file contains?  If that assumption were dropped, then the implementing
would be much easier, as far as the "shell" program is concerned.  It could be,
in effect, ignorant of the functionalities that the include files offered.  It
would not have to know how to handle the many different types of files that it
would be called upon to deal with.

If it were decided that some intermediate capability is necessary, such as that
which you describe, or with image-files-to-BMP or whatever, then some clever
programmer could write intermed1.e.  Another could write intermed2.e, etc. and
then the convert or extract functions would refer to those "standard" functions
before doing their thing.  At no time (ideally) would the shell program have a
clue about any of this.

And, to repeat a previous point, it seems to me that a project like David Gay's
new tutorial could be done this way.  Consider all the different topics that such
a tutorial series might contain: language-specific items, OS difference-items,
graphics, sound, GUI, database, HTML, and on and on.  Wouldn't it by much easier,
and faster, if many brains worked on areas they are interested in?  One big
advantage would be the lack of a need to coordinate the programming, beyond a
minimal level.  If two people tackled the same topic, what harm would that give?

--Quark


--Quark

new topic     » goto parent     » topic index » view message » categorize

6. Re: Could this be done?

Hi Quark

First of all, the person you mentioned is not a relation to me or in
my family. I asked around, which was why I did not answer your question
regarding TinyOS et al. Sorry for the delay. 

I agree with your suggestion about a collaborative effort on creating the
tutorial. Other projects like Win32lib, wxEuphoria, where you have a lot
of contributors working together have produced stellar results. The only
reason why I initially did not go for a collaboration was because I was not
sure if anyone wanted to do a tutorial with me. Tutorials are not as
exciting as games or IDEs. Even I admit writing a game would be more fun
than a tutorial.

That does not mean I will not accept help if offered. If anyone had any
ideas or assistance to offer, please let me know. I haven't even started on
the text of the tutorial yet, nor do I know what it will look like. However,
it would be great if I could get platform specific programming concepts and
newbie primers to all the great Euphoria wrappers out there as a part of
the tutorial.

However, I promise there will be no remote in this tutorial :P

Regards

David Gay

new topic     » goto parent     » topic index » view message » categorize

7. Re: Could this be done?

David Gay wrote:
> 
> Hi Quark
> 
> First of all, the person you mentioned is not a relation to me or in
> my family. I asked around, which was why I did not answer your question
> regarding TinyOS et al. Sorry for the delay. 
> 
> I agree with your suggestion about a collaborative effort on creating the
> tutorial. Other projects like Win32lib, wxEuphoria, where you have a lot
> of contributors working together have produced stellar results. The only
> reason why I initially did not go for a collaboration was because I was not
> sure if anyone wanted to do a tutorial with me. Tutorials are not as
> exciting as games or IDEs. Even I admit writing a game would be more fun
> than a tutorial.
> 
> That does not mean I will not accept help if offered. If anyone had any
> ideas or assistance to offer, please let me know. I haven't even started on
> the text of the tutorial yet, nor do I know what it will look like. However,
> it would be great if I could get platform specific programming concepts and
> newbie primers to all the great Euphoria wrappers out there as a part of
> the tutorial.
> 
> However, I promise there will be no remote in this tutorial :P
> 
> Regards
> 
> David Gay

Hi David,

As to the person I asked you about, I found a David Gay on the web who had been
at Berkeley from about the time you went off to do other things, and he had to do
with TinyOS and NexC, etc.  He is at Intel now.  But after I posted to you, I did
some more search and finally found his middle initial is E.  So scratch that, I
thought.

As to the collaborative effort, I was thinking what a massive job you had in the
past on the DOS-based tutorial, of your own DOS interest (as it has been mine),
the Windows learning-curve, and how the tutorial possibilities have grown, and I
thought you could use some help.

I suppose tutorials can be coherent and "book-like" (simple to complex) or they
can be modules that have a narrow focus and be complete in themselves.  If you
choose the former, you will have a big job ahead of you.  If the latter, then you
could create the main interface program, set the general pattern, develop a list
of needed modules, and generally oversee things, but rely on other knowledgeable
people to do many of the modules.  Ideally, the modules would plug in to the main
program easily, be listed coherently, and adding to the tutorial might be just a
matter of downloading a new module and placing it in the tutorial folder, or
another step might be needed, don't know.  Anyway it can be allowed to grow over
time, rather than needing to be complete to be worthwhile.

But, however you decide to do it, if I can help, I will, because I think this is
a very worthwhile project.

--Quark

new topic     » goto parent     » topic index » view message » categorize

Search



Quick Links

User menu

Not signed in.

Misc Menu