1. Could this be done?
- Posted by DB James <larch at adelphia.net> Sep 01, 2005
- 514 views
Hi all, Some time ago Ward Turner replied to a query of Kat's with some code to extract the text from an MS Word file. I liked the idea, so wrote a simple program that uses his idea. It works, so I began thinking of other files to extract text from, such as Outlook Express .dbx files, and others. That led to thinking about text extraction in general, and to the next step: file conversion. And that led to the idea of "Vertex", from conVERT EXtract which is merely an idea at the moment (and if I have to code it it will probably remain an idea). Is it possible to write a Windows program that could have a conversion or extraction feature added just by plopping an include file in its directory? If so, then anyone could add a specialized function to the "shell program", either for personal use, or to share with everyone. This would amount to a collaboration without the need to communicate about it. The author of the "shell" would give a format for the include files that everyone could follow. Painless "add-ins" or "plug-ins", is the idea. A way of thinking about this is to imagine a FROM and a TO menu item. The TO would show options according to what was chosen in the FROM, e.g. if a user chose JPG in FROM, then BMP, PNG, etc. would show in the TO. If TEXT were chosen in FROM, then HTML would be an option in TO. I like the idea of a program's capability growing constantly as people have new ideas and needs. --Quark
2. Re: Could this be done?
- Posted by D. Newhall <derek_newhall at yahoo.com> Sep 03, 2005
- 483 views
DB James wrote: > > > Hi all, > > Some time ago Ward Turner replied to a query of Kat's with some code to > extract the > text from an MS Word file. I liked the idea, so wrote a simple program that > uses his > idea. It works, so I began thinking of other files to extract text from, such > as Outlook > Express .dbx files, and others. That led to thinking about text extraction in > general, > and to the next step: file conversion. > > And that led to the idea of "Vertex", from conVERT EXtract which is merely an > idea > at the moment (and if I have to code it it will probably remain an idea). > > Is it possible to write a Windows program that could have a conversion or > extraction > feature added just by plopping an include file in its directory? If so, then > anyone > could add a specialized function to the "shell program", either for personal > use, or > to share with everyone. This would amount to a collaboration without the need > to communicate > about it. The author of the "shell" would give a format for the include files > that > everyone could follow. Painless "add-ins" or "plug-ins", is the idea. > > A way of thinking about this is to imagine a FROM and a TO menu item. The TO > would > show options according to what was chosen in the FROM, e.g. if a user chose > JPG in > FROM, then BMP, PNG, etc. would show in the TO. If TEXT were chosen in FROM, > then > HTML would be an option in TO. > > I like the idea of a program's capability growing constantly as people have > new ideas > and needs. > > --Quark > Yes, it's possible. However, you'd need to have a documented internal structure/format for the program so that everything can translate to a universal format that the other translators can use to output their files. You would probably also need to make an API for the converter includes to make it usable. This would be a very useful program if you could get it to work and I've thought about writting one many times but don't have the time. The Euphoria Standard Library project : http://esl.sourceforge.net/ The Euphoria Standard Library mailing list : https://lists.sourceforge.net/lists/listinfo/esl-discussion
3. Re: Could this be done?
- Posted by DB James <larch at adelphia.net> Sep 03, 2005
- 454 views
D. Newhall wrote: > > DB James wrote: > > > > > > Hi all, > > > > Some time ago Ward Turner replied to a query of Kat's with some code to > > extract the > > text from an MS Word file. I liked the idea, so wrote a simple program that > > uses his > > idea. It works, so I began thinking of other files to extract text from, > > such as Outlook > > Express .dbx files, and others. That led to thinking about text extraction > > in general, > > and to the next step: file conversion. > > > > And that led to the idea of "Vertex", from conVERT EXtract which is merely > > an idea > > at the moment (and if I have to code it it will probably remain an idea). > > > > Is it possible to write a Windows program that could have a conversion or > > extraction > > feature added just by plopping an include file in its directory? If so, > > then anyone > > could add a specialized function to the "shell program", either for personal > > use, or > > to share with everyone. This would amount to a collaboration without the > > need to communicate > > about it. The author of the "shell" would give a format for the include > > files that > > everyone could follow. Painless "add-ins" or "plug-ins", is the idea. > > > > A way of thinking about this is to imagine a FROM and a TO menu item. The > > TO would > > show options according to what was chosen in the FROM, e.g. if a user chose > > JPG in > > FROM, then BMP, PNG, etc. would show in the TO. If TEXT were chosen in > > FROM, then > > HTML would be an option in TO. > > > > I like the idea of a program's capability growing constantly as people have > > new ideas > > and needs. > > > > --Quark > > Hello, and thanks for the reply. I was beginning to wonder if I hadn't suggested something so loony that no one was going to respond. > Yes, it's possible. However, you'd need to have a documented internal > structure/format > for the program so that everything can translate to a universal format that > the other > translators can use to output their files. I don't quite follow this. I would understand and agree that it would be useful with image files to be able to translate every other type to, say, BMP format, and from there to other types. This would reduce the number of translations. Does the same idea apply to other types of extractions or conversions? >You would probably also need to make an API for the converter includes to >make >it usable. Perhaps you could elaborate on this. I was imagining a limited conformity with the needs of the shell program that would be explicit and be followed by the authors of the includes. For example, there might be rigidly formatted comments in the include that would be readable by the shell to establish what capability the include offers. Perhaps the only call would have the same name as the name of the include: text2html.e, so the call would be text2html(fullPath)or text2html(lines) or whatever. >This would be a very useful program if you could get it to work and I've >>thought about writting one many times but don'thave the time. Yes, it might be useful in that it would grow with the efforts of each individual who has a good idea and adds that functionality. Perhaps the same idea could be used for a tutorial program where an author lays out the shell and sets up the general format for the presentation of the lessons. Then any one, whether an intermediate or advanced level programmer, could write a module for it. It does seem obvious that a lot of good ideas have not seen the light of day because too much work was needed to do the whole job. But just doing one part might get done. --Quark
4. Re: Could this be done?
- Posted by D. Newhall <derek_newhall at yahoo.com> Sep 03, 2005
- 481 views
DB James wrote: > I don't quite follow this. I would understand and agree that it would be > useful with > image files to be able to translate every other type to, say, BMP format, and > from > there to other types. This would reduce the number of translations. Does the > same > idea apply to other types of extractions or conversions? What I meant was that when loaded by the program all files would be the same internally. For example, it would read a .DOC file converting it to an intermediate format inside the program. Then the ASCII text converter takes the intermediate format and returns the .TXT file. To get around the .BMP to .TXT problem you could have different classes of files and you can only convert filetypes that are in the same class (for the most part). For example, JPEGs, bitmaps, PNGs, etc. are all classified as type Image and can be converted to and from one another; ASCII text files, .DOC files, RichText files, etc. would be of type TextDocument or something. This is needed because you can't convert a Bitmap to a text file (although vice-versa could be done) so the converters need a way to check for limitations but without sacrificing flexibility. > Perhaps you could elaborate on this. I was imagining a limited conformity > with the > needs of the shell program that would be explicit and be followed by the > authors of > the includes. For example, there might be rigidly formatted comments in the > include > that would be readable by the shell to establish what capability the include > offers. > Perhaps the only call would have the same name as the name of the include: > text2html.e, > so the call would be text2html(fullPath)or text2html(lines) or whatever. Here's an example of a possible API:
-- Include API functions and constants include vertex.e integer converter_rid, -- The routine ID number for the converter function extractor_rid -- The routine ID number for the extractor function -- This is the actual extractor function. -- It takes the file number for the opened file to extract the info from. function txt_extractor(integer file) object line line = gets(file) while sequence(line) do -- This goes through every line and converts the plain ASCII -- text to an intermediary format for text. line = gets(file) end while end function extractor_rid = routine_id("txt_extractor") -- This is the actual converter function. -- It takes the file number for the opened file to convert the info from -- the intermeiary format to the new one. function txt_converter(integer file) object converted_data -- Loop through internal text data for i=1 to length(text_data) do -- Take the internal representation for text and convert it to the new -- format storing it in converted_data puts(file, converted_data) end for end function converter_rid = routine_id("txt_converter") -- Holds the ID for the .TXT file type integer text_file_id -- This declares the type of files that it can work on -- Arguments are: extension, description, file class text_file_id = declare_filetype(".txt", "ASCII text file", CLASS_TEXTDOC) declare_convert(text_file_id, converter_rid) declare_extract(text_file_id, extractor_rid)
The Euphoria Standard Library project : http://esl.sourceforge.net/ The Euphoria Standard Library mailing list : https://lists.sourceforge.net/lists/listinfo/esl-discussion
5. Re: Could this be done?
- Posted by DB James <larch at adelphia.net> Sep 03, 2005
- 464 views
- Last edited Sep 04, 2005
D. Newhall wrote: > > DB James wrote: > > > I don't quite follow this. I would understand and agree that it would be > > useful with > > image files to be able to translate every other type to, say, BMP format, > > and from > > there to other types. This would reduce the number of translations. Does > > the same > > idea apply to other types of extractions or conversions? > > What I meant was that when loaded by the program all files would be the same > internally. > For example, it would read a .DOC file converting it to an intermediate format > inside > the program. Then the ASCII text converter takes the intermediate format and > returns > the .TXT file. > > To get around the .BMP to .TXT problem you could have different classes of > files and > you can only convert filetypes that are in the same class (for the most part). > For > example, JPEGs, bitmaps, PNGs, etc. are all classified as type Image and can > be converted > to and from one another; ASCII text files, .DOC files, RichText files, etc. > would be > of type TextDocument or something. This is needed because you can't convert a > Bitmap > to a text file (although vice-versa could be done) so the converters need a > way to > check for limitations but without sacrificing flexibility. > > > > Perhaps you could elaborate on this. I was imagining a limited conformity > > with the > > needs of the shell program that would be explicit and be followed by the > > authors of > > the includes. For example, there might be rigidly formatted comments in the > > include > > that would be readable by the shell to establish what capability the include > > offers. > > Perhaps the only call would have the same name as the name of the include: > > text2html.e, > > so the call would be text2html(fullPath)or text2html(lines) or whatever. > > Here's an example of a possible API: > > }}} <eucode> > -- Include API functions and constants > include vertex.e > > > integer converter_rid, -- The routine ID number for the converter function > extractor_rid -- The routine ID number for the extractor function > > -- This is the actual extractor function. > -- It takes the file number for the opened file to extract the info from. > function txt_extractor(integer file) > object line > line = gets(file) > while sequence(line) do > > -- This goes through every line and converts the plain ASCII > -- text to an intermediary format for text. > > line = gets(file) > end while > end function > > extractor_rid = routine_id("txt_extractor") > > > -- This is the actual converter function. > -- It takes the file number for the opened file to convert the info from > -- the intermeiary format to the new one. > function txt_converter(integer file) > object converted_data > -- Loop through internal text data > for i=1 to length(text_data) do > > -- Take the internal representation for text and convert it to the new > -- format storing it in converted_data > > puts(file, converted_data) > end for > end function > > converter_rid = routine_id("txt_converter") > > > -- Holds the ID for the .TXT file type > integer text_file_id > > -- This declares the type of files that it can work on > -- Arguments are: extension, description, file class > text_file_id = declare_filetype(".txt", "ASCII text file", CLASS_TEXTDOC) > declare_convert(text_file_id, converter_rid) > declare_extract(text_file_id, extractor_rid) > > > <font color="#330033"></eucode> {{{ </font> > > > The Euphoria Standard Library project : > <a href="http://esl.sourceforge.net/">http://esl.sourceforge.net/</a> > The Euphoria Standard Library mailing list : > <a > href="https://lists.sourceforge.net/lists/listinfo/esl-discussion">https://lists.sourceforge.net/lists/listinfo/esl-discussion</a> > Hello, With a beginning like that I'm guessing it would not take as much of your time to write the program as you think. If you do it, I hope I'd be able to add a module or two. A question though, whether it is useful or not, is this: is it really necessary to assume the main program must create an intermediate version of whatever the original file contains? If that assumption were dropped, then the implementing would be much easier, as far as the "shell" program is concerned. It could be, in effect, ignorant of the functionalities that the include files offered. It would not have to know how to handle the many different types of files that it would be called upon to deal with. If it were decided that some intermediate capability is necessary, such as that which you describe, or with image-files-to-BMP or whatever, then some clever programmer could write intermed1.e. Another could write intermed2.e, etc. and then the convert or extract functions would refer to those "standard" functions before doing their thing. At no time (ideally) would the shell program have a clue about any of this. And, to repeat a previous point, it seems to me that a project like David Gay's new tutorial could be done this way. Consider all the different topics that such a tutorial series might contain: language-specific items, OS difference-items, graphics, sound, GUI, database, HTML, and on and on. Wouldn't it by much easier, and faster, if many brains worked on areas they are interested in? One big advantage would be the lack of a need to coordinate the programming, beyond a minimal level. If two people tackled the same topic, what harm would that give? --Quark --Quark
6. Re: Could this be done?
- Posted by David Gay <davidalangay at hotmail.com> Sep 03, 2005
- 475 views
- Last edited Sep 04, 2005
Hi Quark First of all, the person you mentioned is not a relation to me or in my family. I asked around, which was why I did not answer your question regarding TinyOS et al. Sorry for the delay. I agree with your suggestion about a collaborative effort on creating the tutorial. Other projects like Win32lib, wxEuphoria, where you have a lot of contributors working together have produced stellar results. The only reason why I initially did not go for a collaboration was because I was not sure if anyone wanted to do a tutorial with me. Tutorials are not as exciting as games or IDEs. Even I admit writing a game would be more fun than a tutorial. That does not mean I will not accept help if offered. If anyone had any ideas or assistance to offer, please let me know. I haven't even started on the text of the tutorial yet, nor do I know what it will look like. However, it would be great if I could get platform specific programming concepts and newbie primers to all the great Euphoria wrappers out there as a part of the tutorial. However, I promise there will be no remote in this tutorial :P Regards David Gay
7. Re: Could this be done?
- Posted by DB James <larch at adelphia.net> Sep 04, 2005
- 471 views
David Gay wrote: > > Hi Quark > > First of all, the person you mentioned is not a relation to me or in > my family. I asked around, which was why I did not answer your question > regarding TinyOS et al. Sorry for the delay. > > I agree with your suggestion about a collaborative effort on creating the > tutorial. Other projects like Win32lib, wxEuphoria, where you have a lot > of contributors working together have produced stellar results. The only > reason why I initially did not go for a collaboration was because I was not > sure if anyone wanted to do a tutorial with me. Tutorials are not as > exciting as games or IDEs. Even I admit writing a game would be more fun > than a tutorial. > > That does not mean I will not accept help if offered. If anyone had any > ideas or assistance to offer, please let me know. I haven't even started on > the text of the tutorial yet, nor do I know what it will look like. However, > it would be great if I could get platform specific programming concepts and > newbie primers to all the great Euphoria wrappers out there as a part of > the tutorial. > > However, I promise there will be no remote in this tutorial :P > > Regards > > David Gay Hi David, As to the person I asked you about, I found a David Gay on the web who had been at Berkeley from about the time you went off to do other things, and he had to do with TinyOS and NexC, etc. He is at Intel now. But after I posted to you, I did some more search and finally found his middle initial is E. So scratch that, I thought. As to the collaborative effort, I was thinking what a massive job you had in the past on the DOS-based tutorial, of your own DOS interest (as it has been mine), the Windows learning-curve, and how the tutorial possibilities have grown, and I thought you could use some help. I suppose tutorials can be coherent and "book-like" (simple to complex) or they can be modules that have a narrow focus and be complete in themselves. If you choose the former, you will have a big job ahead of you. If the latter, then you could create the main interface program, set the general pattern, develop a list of needed modules, and generally oversee things, but rely on other knowledgeable people to do many of the modules. Ideally, the modules would plug in to the main program easily, be listed coherently, and adding to the tutorial might be just a matter of downloading a new module and placing it in the tutorial folder, or another step might be needed, don't know. Anyway it can be allowed to grow over time, rather than needing to be complete to be worthwhile. But, however you decide to do it, if I can help, I will, because I think this is a very worthwhile project. --Quark