1. PDF reader
- Posted by gwalters Nov 23, 2013
- 2958 views
Do we have somewhere a PDF reader?. I would like to be able to read a PDF version of a simple text printout. It happens that I cannot save the printout as a simple txt file but only a pdf file. I don't seem to see anything in the archives that could do that. Suggestions would be appreciated.
2. Re: PDF reader
- Posted by Selgor Nov 24, 2013
- 2914 views
Hello .
What exactly do you want ?
Is it a pdf reader ?.
Is it a Text converter to pdf ?.
Is it a pdf to text converter ?.
I am assuming you need a pdf reader.
These are free readers.
www.foxitsoftware.com/Secure_PDF_Reader/
geniuspdf.com/
adobe.com/reader/
and there are heaps more !..
HTH. ?.
Cheers
Selgor
3. Re: PDF reader
- Posted by gwalters Nov 24, 2013
- 2855 views
Sorry I was not clear. I want to read, using a euphoria program I write, a pdf file so I can strip out information needed. I can do that with a txt file but not a pdf file.
4. Re: PDF reader
- Posted by EUWX Nov 24, 2013
- 2839 views
Sorry I was not clear. I want to read, using a euphoria program I write, a pdf file so I can strip out information needed. I can do that with a txt file but not a pdf file.
My intention is not to discourage you but to make you aware of the difficulties.
Text to PDF conversion and vice versa is major undertaking. Even with a team effort using Euphoria, you will need 3-5 people collaborating for over a year to come up with a two way conversion, extraction of text, extraction text and graphics, insertion of the same, etc. And, of course you will need third party library(ies)
You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.
5. Re: PDF reader
- Posted by jimcbrown (admin) Nov 24, 2013
- 2845 views
You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.
Err - by using pdftohtml, it's a few minutes, not one or two months. Single-handedly.
6. Re: PDF reader
- Posted by gwalters Nov 24, 2013
- 2826 views
You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.
Err - by using pdftohtml, it's a few minutes, not one or two months. Single-handedly.
Seems like this approach is workable(something I'm capable of doing). I'll give it a try. thanks
7. Re: PDF reader
- Posted by EUWX Nov 24, 2013
- 2835 views
You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.
Err - by using pdftohtml, it's a few minutes, not one or two months. Single-handedly.
Anybody can use a preexisting software to convert "a few minutes" "Single-handedly". I do it all the time.
For example, I use "Able2Extract Professional", and only 2 months ago I helped my local religious organization, to convert and extract from PDF to word/excel a lot of names and addresses. This had to be done in stages. And allowing for switching between applications and correcting mistakes on PDF/Word conversion, etc etc , it took about 2 hours to convert a 30 page list into a usable Excel spread sheet
When I talked about "one or two months", I was talking about programmatically in Euphoria, getting a PDF converted, then save it as text, then still within the program load it in a Euphoria editor, to correct the mistakes, then programmatically further extract and rearrange, then programmatically display the result, then programmatically call another or same 3rd party software to convert it to PDF and then the programmatically save it. all this to be done within a single Euphoria program that would also be interactive enough to allow one to input the name of the file to be assaulted.
If you are genius enough to anticipate and spot the mistakes that occur in the best of PDF to word convertors (and automatically correct them), and use extractors without using cut and paste, then you should be exalted to the Guiness Book of work records.
8. Re: PDF reader
- Posted by jimcbrown (admin) Nov 24, 2013
- 2837 views
You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.
Err - by using pdftohtml, it's a few minutes, not one or two months. Single-handedly.
Anybody can use a preexisting software to convert "a few minutes" "Single-handedly".
Agreed.
When I talked about "one or two months", I was talking about programmatically ... to correct the mistakes,
You did not mention this in your original quote, reproduced below. If this is what you meant, then you should say so.
You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.
But I would agree - often times errors in the text of a PDF are hidden by the font being used, and can be a real pain to fix by hand after the text is extracted. If using OCR to pull text out of an embedded image, you more-or-less have to deal with the same issue. Dealing with this without human intervention is not an easy task. Probably not something gwalters needs to do either.
9. Re: PDF reader
- Posted by EUWX Nov 24, 2013
- 2801 views
jimcbrown: Your approach to the problem posed by this questioner and mine are different. You like to make quick replies to problems without fully understanding the questioner's needs and the implication of your answer.
I come with an experience of converting ALL THE TIME between PDF and Word and Text and extracting and correcting and editing. That is part of my daily life as people submit to me documents written in all shades of text editors.
I also know what PDF to text convertors are in the market place and off and on I have to recommend to somebody a free one because he is a contractor and recommend next minute a much better paid application to a large company for their staff.
So, for God's sake, do not sermonise to me. For anybody to write a full convertor extractor in Euphoria language is a huge task, even using pre-existing C and C plus plus libraries. For this questioner and his need, a simpler method is to use 2-3 preexisting application software distributions which are in the public domain. They would often be somewhat deficient compared to the high quality paid software. He would therefore, be forced to recognise those shortcomings in free software and write Euphoria software to minimise that bad effect. That would involve presenting mistake for editing, and also interactively idetify changing pieces of text. He has to be aware that ALL application software that converts from PDF to text is prone to mistakes because you are doing a image to text conversion.
He needs a month or two to write an application program that will do a to and fro conversion and extraction and editing using Euphoria and the 3rd party open source application software and that is that. If you can do in in 2 minutes in Euphoria, you will be essentially claiming to be a unmatched genius
10. Re: PDF reader
- Posted by jimcbrown (admin) Nov 24, 2013
- 2772 views
jimcbrown: Your approach to the problem posed by this questioner and mine are different. You like to make quick replies to problems without fully understanding the questioner's needs and the implication of your answer.
I think that accusation makes more sense when leveled against you.
Look at the original request:
Sorry I was not clear. I want to read, using a euphoria program I write, a pdf file so I can strip out information needed. I can do that with a txt file but not a pdf file.
He needs a month or two to write an application program that will do a to and fro conversion and extraction and editing using Euphoria and the 3rd party open source application software and that is that.
Again, that's not the original request.
If you can do in in 2 minutes in Euphoria, you will be essentially claiming to be a unmatched genius
I don't claim to be an unmatched genius, but ... with the caveat that I'm using system() to call non-Euphorian 3rd party applications to do all the hard work, I claim that I can write an application that will do a to and from conversion and extraction and even allowing a human to manually perform some editing in less than 2 minutes with Euphoria. It's simple really - just call htmltopdf to convert the PDF file into an html file, open the whole html file with LibreOffice and let a human do some editing, then manually invoke LibreOffice's print-to-file functionality to convert it back to a PDF file again...
11. Re: PDF reader
- Posted by EUWX Nov 24, 2013
- 2774 views
The words you have to real think about in the questioner's request are "so I can strip out information needed". That is call extraction from a PDF file. Try it under Euphoria - you will not do it programatically in one month.
The next lot of words are also very clear and explicit - " I can do that with a txt file but not a pdf file"
That exactly is the case for at least one month of programming using ready made application for him to to do it conveniently as often as he needs to.
Many minutes and hours have passed and you have not come up with anything so far.
12. Re: PDF reader
- Posted by EUWX Nov 24, 2013
- 2768 views
If you can do in in 2 minutes in Euphoria, you will be essentially claiming to be a unmatched genius
I don't claim to be an unmatched genius, but ... with the caveat that I'm using system() to call non-Euphorian 3rd party applications to do all the hard work, I claim that I can write an application that will do a to and from conversion and extraction and even allowing a human to manually perform some editing in less than 2 minutes with Euphoria. It's simple really - just call htmltopdf to convert the PDF file into an html file, open the whole html file with LibreOffice and let a human do some editing, then manually invoke LibreOffice's print-to-file functionality to convert it back to a PDF file again...
You still show ignorance. If you have to keep on switching between 3rd party applications, everybody knows they can do it. WRITE a full application in Euphoria, call your htmltopdf then call other editors and search and replace and give it to any user to do comfortably without keeping on explicitly calling 3rd party application. THAT is what the guy want and you have not don it yet given the hours you have spent talking half truths.
13. Re: PDF reader
- Posted by useless_ Nov 24, 2013
- 2782 views
If you can do in in 2 minutes in Euphoria, you will be essentially claiming to be a unmatched genius
I don't claim to be an unmatched genius, but ... with the caveat that I'm using system() to call non-Euphorian 3rd party applications to do all the hard work, I claim that I can write an application that will do a to and from conversion and extraction and even allowing a human to manually perform some editing in less than 2 minutes with Euphoria. It's simple really - just call htmltopdf to convert the PDF file into an html file, open the whole html file with LibreOffice and let a human do some editing, then manually invoke LibreOffice's print-to-file functionality to convert it back to a PDF file again...
You still show ignorance. If you have to keep on switching between 3rd party applications, everybody knows they can do it. WRITE a full application in Euphoria, call your htmltopdf then call other editors and search and replace and give it to any user to do comfortably without keeping on explicitly calling 3rd party application. THAT is what the guy want and you have not don it yet given the hours you have spent talking half truths.
I agree with EUWX.
useless
14. Re: PDF reader
- Posted by ne1uno Nov 24, 2013
- 2764 views
maybe you can print to a text printer? in windows I think it's called generic text. should work if there are no images to render.
you didn't mention how you are stuck generating pdf as the only option. there are online converters if that is a onetime thing, or you can call some of them by changing post requests to get requests in the URL.
15. Re: PDF reader
- Posted by jimcbrown (admin) Nov 25, 2013
- 2740 views
The words you have to real think about in the questioner's request are "so I can strip out information needed". That is call extraction from a PDF file. Try it under Euphoria - you will not do it programatically in one month.
This is why a 3rd party tool is used.
The next lot of words are also very clear and explicit - " I can do that with a txt file but not a pdf file"
This is why a 3rd party tool is used to convert it to a text-based format first.
That exactly is the case for at least one month of programming using ready made application for him to to do it conveniently as often as he needs to.
Many minutes and hours have passed and you have not come up with anything so far.
Actually, I wrote a utility with Euphoria several years ago that did the same thing - it converted a pdf file to an html file and then converted that to a plain text file and then grepped out a set of lines that I was interested in.
I think this argument makes more sense against you - you've posted among the most on this thread, but have failed to come up with any helpful ideas.
If you can do in in 2 minutes in Euphoria, you will be essentially claiming to be a unmatched genius
I don't claim to be an unmatched genius, but ... with the caveat that I'm using system() to call non-Euphorian 3rd party applications to do all the hard work, I claim that I can write an application that will do a to and from conversion and extraction and even allowing a human to manually perform some editing in less than 2 minutes with Euphoria. It's simple really - just call htmltopdf to convert the PDF file into an html file, open the whole html file with LibreOffice and let a human do some editing, then manually invoke LibreOffice's print-to-file functionality to convert it back to a PDF file again...
You still show ignorance. If you have to keep on switching between 3rd party applications, everybody knows they can do it. WRITE a full application in Euphoria, call your htmltopdf then call other editors and search and replace and give it to any user to do comfortably without keeping on explicitly calling 3rd party application. THAT is what the guy want and you have not don it yet given the hours you have spent talking half truths.
You have contradicting requirements here. I'm suppose to "call your htmltopdf then call other editors" but at the same time avoid "explicitly calling 3rd party application[s]" ?
Enough nonsense. I think it's time to get this thread back on track with helpful information.
16. Re: PDF reader
- Posted by mattlewis (admin) Nov 25, 2013
- 2736 views
The words you have to real think about in the questioner's request are "so I can strip out information needed". That is call extraction from a PDF file. Try it under Euphoria - you will not do it programatically in one month.\
It really depends on the PDF files you're dealing with. Some are extremely easy to deal with. Others can be extremely complex. I once wrote some code to convert some PDF files (it was years ago, and I don't know what happened to the code). They weren't too terrible, though I had to be flexible with the coordinates in order to get everything put together correctly. It was a bunch of maintenance information and schedules for submarines, so the documents had important fields all over the place.
Most of the work was figuring out where everything was in the documents. But if you have simpler documents to deal with, it might be a lot easier. The spec is open, so you should be able to be reading something from PDFs in a day or less. Again, without knowing the details of your source material, a month might not be a bad estimate, but it could also be a lot quicker than that.
Matt
17. Re: PDF reader
- Posted by gwalters Nov 25, 2013
- 2648 views
Most of the work was figuring out where everything was in the documents. But if you have simpler documents to deal with, it might be a lot easier. The spec is open, so you should be able to be reading something from PDFs in a day or less. Again, without knowing the details of your source material, a month might not be a bad estimate, but it could also be a lot quicker than that.
Matt
Well there is a lot of discussion here and writing a converter now seems over my head. So what I did was purchase a command line pdf to text converter to execute from EU that allows me to name the output and put it where I want it.
thanks all for the comments and help.
18. Re: PDF reader
- Posted by petersalvatore Sep 29, 2015
- 2165 views
Hi, Thanks for your nice sharing. I wonder have you ever worked it out? Do I need another 3rd party manual toolkit? When it comes to PDF conversion process, I have another question, I wonder have you ever tried to convert pdf to other image files before? As for myself, I am testing the related PDF to PNG converting, PDF to BMP converting , and PDF to JPG converting programs these days. Do you have experience about it? Any suggestion will be appreciated. Thanks in advance.
Best regards, Peter
Tags: PDF conversion; PDF to image conversion
19. Re: PDF reader
- Posted by Spock Sep 29, 2015
- 2108 views
Do we have somewhere a PDF reader?. I would like to be able to read a PDF version of a simple text printout. It happens that I cannot save the printout as a simple txt file but only a pdf file. I don't seem to see anything in the archives that could do that. Suggestions would be appreciated.
I use pdftotext for exactly this sort of task. In my office it gets a heavy workout - up to 1000 pdfs processed each day. When the txt is extracted I run a regular expressions library (my own, of course) over the data to pull out what I need. I used to try and work out coordinates of certain fields etc.. but in the end I found a context based approach much better.
EDIT: Whoa! Didn't see the date: 2013
Spock