OpenEuphoria: Forum: PDF reader

1. PDF reader

Posted by gwalters Nov 23, 2013
2958 views

Do we have somewhere a PDF reader?. I would like to be able to read a PDF version of a simple text printout. It happens that I cannot save the printout as a simple txt file but only a pdf file. I don't seem to see anything in the archives that could do that. Suggestions would be appreciated.

new topic » topic index » view message » categorize

2. Re: PDF reader

Posted by Selgor Nov 24, 2013
2914 views

Hello .

What exactly do you want ?

Is it a pdf reader ?.

Is it a Text converter to pdf ?.

Is it a pdf to text converter ?.

I am assuming you need a pdf reader.

These are free readers.

www.foxitsoftware.com/Secure_PDF_Reader/‎

geniuspdf.com/‎

adobe.com/reader/‎

and there are heaps more !..

HTH. ?.

Cheers

Selgor

new topic » goto parent » topic index » view message » categorize

3. Re: PDF reader

Posted by gwalters Nov 24, 2013
2855 views

Sorry I was not clear. I want to read, using a euphoria program I write, a pdf file so I can strip out information needed. I can do that with a txt file but not a pdf file.

new topic » goto parent » topic index » view message » categorize

4. Re: PDF reader

Posted by EUWX Nov 24, 2013
2839 views

gwalters said...

Sorry I was not clear. I want to read, using a euphoria program I write, a pdf file so I can strip out information needed. I can do that with a txt file but not a pdf file.

My intention is not to discourage you but to make you aware of the difficulties.

Text to PDF conversion and vice versa is major undertaking. Even with a team effort using Euphoria, you will need 3-5 people collaborating for over a year to come up with a two way conversion, extraction of text, extraction text and graphics, insertion of the same, etc. And, of course you will need third party library(ies)

You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.

new topic » goto parent » topic index » view message » categorize

5. Re: PDF reader

Posted by jimcbrown (admin) Nov 24, 2013
2845 views

EUWX said...

You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.

Err - by using pdftohtml, it's a few minutes, not one or two months. Single-handedly.

new topic » goto parent » topic index » view message » categorize

6. Re: PDF reader

Posted by gwalters Nov 24, 2013
2826 views

jimcbrown said...

EUWX said...

You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.

Err - by using pdftohtml, it's a few minutes, not one or two months. Single-handedly.

Seems like this approach is workable(something I'm capable of doing). I'll give it a try. thanks

new topic » goto parent » topic index » view message » categorize

7. Re: PDF reader

Posted by EUWX Nov 24, 2013
2835 views

jimcbrown said...

EUWX said...

You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.

Err - by using pdftohtml, it's a few minutes, not one or two months. Single-handedly.

Anybody can use a preexisting software to convert "a few minutes" "Single-handedly". I do it all the time.
For example, I use "Able2Extract Professional", and only 2 months ago I helped my local religious organization, to convert and extract from PDF to word/excel a lot of names and addresses. This had to be done in stages. And allowing for switching between applications and correcting mistakes on PDF/Word conversion, etc etc , it took about 2 hours to convert a 30 page list into a usable Excel spread sheet

When I talked about "one or two months", I was talking about programmatically in Euphoria, getting a PDF converted, then save it as text, then still within the program load it in a Euphoria editor, to correct the mistakes, then programmatically further extract and rearrange, then programmatically display the result, then programmatically call another or same 3rd party software to convert it to PDF and then the programmatically save it. all this to be done within a single Euphoria program that would also be interactive enough to allow one to input the name of the file to be assaulted.

If you are genius enough to anticipate and spot the mistakes that occur in the best of PDF to word convertors (and automatically correct them), and use extractors without using cut and paste, then you should be exalted to the Guiness Book of work records.

new topic » goto parent » topic index » view message » categorize

8. Re: PDF reader

Posted by jimcbrown (admin) Nov 24, 2013
2837 views

EUWX said...

jimcbrown said...

EUWX said...

You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.

Err - by using pdftohtml, it's a few minutes, not one or two months. Single-handedly.

Anybody can use a preexisting software to convert "a few minutes" "Single-handedly".

Agreed.

EUWX said...

When I talked about "one or two months", I was talking about programmatically ... to correct the mistakes,

You did not mention this in your original quote, reproduced below. If this is what you meant, then you should say so.

EUWX said...

You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.

But I would agree - often times errors in the text of a PDF are hidden by the font being used, and can be a real pain to fix by hand after the text is extracted. If using OCR to pull text out of an embedded image, you more-or-less have to deal with the same issue. Dealing with this without human intervention is not an easy task. Probably not something gwalters needs to do either.

new topic » goto parent » topic index » view message » categorize

9. Re: PDF reader

Posted by EUWX Nov 24, 2013
2801 views

jimcbrown: Your approach to the problem posed by this questioner and mine are different. You like to make quick replies to problems without fully understanding the questioner's needs and the implication of your answer.
I come with an experience of converting ALL THE TIME between PDF and Word and Text and extracting and correcting and editing. That is part of my daily life as people submit to me documents written in all shades of text editors.
I also know what PDF to text convertors are in the market place and off and on I have to recommend to somebody a free one because he is a contractor and recommend next minute a much better paid application to a large company for their staff.

So, for God's sake, do not sermonise to me. For anybody to write a full convertor extractor in Euphoria language is a huge task, even using pre-existing C and C plus plus libraries. For this questioner and his need, a simpler method is to use 2-3 preexisting application software distributions which are in the public domain. They would often be somewhat deficient compared to the high quality paid software. He would therefore, be forced to recognise those shortcomings in free software and write Euphoria software to minimise that bad effect. That would involve presenting mistake for editing, and also interactively idetify changing pieces of text. He has to be aware that ALL application software that converts from PDF to text is prone to mistakes because you are doing a image to text conversion.

He needs a month or two to write an application program that will do a to and fro conversion and extraction and editing using Euphoria and the 3rd party open source application software and that is that. If you can do in in 2 minutes in Euphoria, you will be essentially claiming to be a unmatched genius

new topic » goto parent » topic index » view message » categorize

10. Re: PDF reader

Posted by jimcbrown (admin) Nov 24, 2013
2772 views

EUWX said...

jimcbrown: Your approach to the problem posed by this questioner and mine are different. You like to make quick replies to problems without fully understanding the questioner's needs and the implication of your answer.

I think that accusation makes more sense when leveled against you.

Look at the original request:

gwalters said...

Sorry I was not clear. I want to read, using a euphoria program I write, a pdf file so I can strip out information needed. I can do that with a txt file but not a pdf file.

EUWX said...

He needs a month or two to write an application program that will do a to and fro conversion and extraction and editing using Euphoria and the 3rd party open source application software and that is that.

Again, that's not the original request.

EUWX said...

If you can do in in 2 minutes in Euphoria, you will be essentially claiming to be a unmatched genius

I don't claim to be an unmatched genius, but ... with the caveat that I'm using system() to call non-Euphorian 3rd party applications to do all the hard work, I claim that I can write an application that will do a to and from conversion and extraction and even allowing a human to manually perform some editing in less than 2 minutes with Euphoria. It's simple really - just call htmltopdf to convert the PDF file into an html file, open the whole html file with LibreOffice and let a human do some editing, then manually invoke LibreOffice's print-to-file functionality to convert it back to a PDF file again...

new topic » goto parent » topic index » view message » categorize

11. Re: PDF reader

Posted by EUWX Nov 24, 2013
2774 views

The words you have to real think about in the questioner's request are "so I can strip out information needed". That is call extraction from a PDF file. Try it under Euphoria - you will not do it programatically in one month.
The next lot of words are also very clear and explicit - " I can do that with a txt file but not a pdf file"
That exactly is the case for at least one month of programming using ready made application for him to to do it conveniently as often as he needs to.
Many minutes and hours have passed and you have not come up with anything so far.

new topic » goto parent » topic index » view message » categorize

12. Re: PDF reader

Posted by EUWX Nov 24, 2013
2768 views

EUWX said...

If you can do in in 2 minutes in Euphoria, you will be essentially claiming to be a unmatched genius

jimcbrown said...

I don't claim to be an unmatched genius, but ... with the caveat that I'm using system() to call non-Euphorian 3rd party applications to do all the hard work, I claim that I can write an application that will do a to and from conversion and extraction and even allowing a human to manually perform some editing in less than 2 minutes with Euphoria. It's simple really - just call htmltopdf to convert the PDF file into an html file, open the whole html file with LibreOffice and let a human do some editing, then manually invoke LibreOffice's print-to-file functionality to convert it back to a PDF file again...

You still show ignorance. If you have to keep on switching between 3rd party applications, everybody knows they can do it. WRITE a full application in Euphoria, call your htmltopdf then call other editors and search and replace and give it to any user to do comfortably without keeping on explicitly calling 3rd party application. THAT is what the guy want and you have not don it yet given the hours you have spent talking half truths.

new topic » goto parent » topic index » view message » categorize

13. Re: PDF reader

Posted by useless_ Nov 24, 2013
2782 views

EUWX said...

If you can do in in 2 minutes in Euphoria, you will be essentially claiming to be a unmatched genius

jimcbrown said...

I don't claim to be an unmatched genius, but ... with the caveat that I'm using system() to call non-Euphorian 3rd party applications to do all the hard work, I claim that I can write an application that will do a to and from conversion and extraction and even allowing a human to manually perform some editing in less than 2 minutes with Euphoria. It's simple really - just call htmltopdf to convert the PDF file into an html file, open the whole html file with LibreOffice and let a human do some editing, then manually invoke LibreOffice's print-to-file functionality to convert it back to a PDF file again...

You still show ignorance. If you have to keep on switching between 3rd party applications, everybody knows they can do it. WRITE a full application in Euphoria, call your htmltopdf then call other editors and search and replace and give it to any user to do comfortably without keeping on explicitly calling 3rd party application. THAT is what the guy want and you have not don it yet given the hours you have spent talking half truths.

I agree with EUWX.

useless

new topic » goto parent » topic index » view message » categorize

14. Re: PDF reader

Posted by ne1uno Nov 24, 2013
2764 views

maybe you can print to a text printer? in windows I think it's called generic text. should work if there are no images to render.

you didn't mention how you are stuck generating pdf as the only option. there are online converters if that is a onetime thing, or you can call some of them by changing post requests to get requests in the URL.

new topic » goto parent » topic index » view message » categorize

15. Re: PDF reader

Posted by jimcbrown (admin) Nov 25, 2013
2740 views

EUWX said...

The words you have to real think about in the questioner's request are "so I can strip out information needed". That is call extraction from a PDF file. Try it under Euphoria - you will not do it programatically in one month.

This is why a 3rd party tool is used.

EUWX said...

The next lot of words are also very clear and explicit - " I can do that with a txt file but not a pdf file"

This is why a 3rd party tool is used to convert it to a text-based format first.

EUWX said...

That exactly is the case for at least one month of programming using ready made application for him to to do it conveniently as often as he needs to.
Many minutes and hours have passed and you have not come up with anything so far.

Actually, I wrote a utility with Euphoria several years ago that did the same thing - it converted a pdf file to an html file and then converted that to a plain text file and then grepped out a set of lines that I was interested in.

I think this argument makes more sense against you - you've posted among the most on this thread, but have failed to come up with any helpful ideas.

EUWX said...

If you can do in in 2 minutes in Euphoria, you will be essentially claiming to be a unmatched genius

jimcbrown said...

I don't claim to be an unmatched genius, but ... with the caveat that I'm using system() to call non-Euphorian 3rd party applications to do all the hard work, I claim that I can write an application that will do a to and from conversion and extraction and even allowing a human to manually perform some editing in less than 2 minutes with Euphoria. It's simple really - just call htmltopdf to convert the PDF file into an html file, open the whole html file with LibreOffice and let a human do some editing, then manually invoke LibreOffice's print-to-file functionality to convert it back to a PDF file again...

You still show ignorance. If you have to keep on switching between 3rd party applications, everybody knows they can do it. WRITE a full application in Euphoria, call your htmltopdf then call other editors and search and replace and give it to any user to do comfortably without keeping on explicitly calling 3rd party application. THAT is what the guy want and you have not don it yet given the hours you have spent talking half truths.

You have contradicting requirements here. I'm suppose to "call your htmltopdf then call other editors" but at the same time avoid "explicitly calling 3rd party application[s]" ?

Enough nonsense. I think it's time to get this thread back on track with helpful information.

new topic » goto parent » topic index » view message » categorize

16. Re: PDF reader

Posted by mattlewis (admin) Nov 25, 2013
2736 views

EUWX said...

The words you have to real think about in the questioner's request are "so I can strip out information needed". That is call extraction from a PDF file. Try it under Euphoria - you will not do it programatically in one month.\

It really depends on the PDF files you're dealing with. Some are extremely easy to deal with. Others can be extremely complex. I once wrote some code to convert some PDF files (it was years ago, and I don't know what happened to the code). They weren't too terrible, though I had to be flexible with the coordinates in order to get everything put together correctly. It was a bunch of maintenance information and schedules for submarines, so the documents had important fields all over the place.

Most of the work was figuring out where everything was in the documents. But if you have simpler documents to deal with, it might be a lot easier. The spec is open, so you should be able to be reading something from PDFs in a day or less. Again, without knowing the details of your source material, a month might not be a bad estimate, but it could also be a lot quicker than that.

Matt

new topic » goto parent » topic index » view message » categorize

17. Re: PDF reader

Posted by gwalters Nov 25, 2013
2648 views

mattlewis said...

Most of the work was figuring out where everything was in the documents. But if you have simpler documents to deal with, it might be a lot easier. The spec is open, so you should be able to be reading something from PDFs in a day or less. Again, without knowing the details of your source material, a month might not be a bad estimate, but it could also be a lot quicker than that.

Matt

Well there is a lot of discussion here and writing a converter now seems over my head. So what I did was purchase a command line pdf to text converter to execute from EU that allows me to name the output and put it where I want it.

thanks all for the comments and help.

new topic » goto parent » topic index » view message » categorize

18. Re: PDF reader

Posted by petersalvatore Sep 29, 2015
2165 views

Hi, Thanks for your nice sharing. I wonder have you ever worked it out? Do I need another 3rd party manual toolkit? When it comes to PDF conversion process, I have another question, I wonder have you ever tried to convert pdf to other image files before? As for myself, I am testing the related PDF to PNG converting, PDF to BMP converting , and PDF to JPG converting programs these days. Do you have experience about it? Any suggestion will be appreciated. Thanks in advance.

Best regards, Peter

Tags: PDF conversion; PDF to image conversion

new topic » goto parent » topic index » view message » categorize

19. Re: PDF reader

Posted by Spock Sep 29, 2015
2108 views

gwalters said...

Do we have somewhere a PDF reader?. I would like to be able to read a PDF version of a simple text printout. It happens that I cannot save the printout as a simple txt file but only a pdf file. I don't seem to see anything in the archives that could do that. Suggestions would be appreciated.

I use pdftotext for exactly this sort of task. In my office it gets a heavy workout - up to 1000 pdfs processed each day. When the txt is extracted I run a regular expressions library (my own, of course) over the data to pull out what I need. I used to try and work out coordinates of certain fields etc.. but in the end I found a context based approach much better.

EDIT: Whoa! Didn't see the date: 2013

Spock

new topic » goto parent » topic index » view message » categorize

OpenEuphoria

1. PDF reader

2. Re: PDF reader

3. Re: PDF reader

4. Re: PDF reader

5. Re: PDF reader

6. Re: PDF reader

7. Re: PDF reader

8. Re: PDF reader

9. Re: PDF reader

10. Re: PDF reader

11. Re: PDF reader

12. Re: PDF reader

13. Re: PDF reader

14. Re: PDF reader

15. Re: PDF reader

16. Re: PDF reader

17. Re: PDF reader

18. Re: PDF reader

19. Re: PDF reader

Search

Include:

Quick Links

User menu

Misc Menu