Re: PDF reader
- Posted by jimcbrown (admin) Nov 24, 2013
- 2836 views
You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.
Err - by using pdftohtml, it's a few minutes, not one or two months. Single-handedly.
Anybody can use a preexisting software to convert "a few minutes" "Single-handedly".
Agreed.
When I talked about "one or two months", I was talking about programmatically ... to correct the mistakes,
You did not mention this in your original quote, reproduced below. If this is what you meant, then you should say so.
You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.
But I would agree - often times errors in the text of a PDF are hidden by the font being used, and can be a real pain to fix by hand after the text is extracted. If using OCR to pull text out of an embedded image, you more-or-less have to deal with the same issue. Dealing with this without human intervention is not an easy task. Probably not something gwalters needs to do either.