Re: PDF reader

new topic     » goto parent     » topic index » view thread      » older message » newer message
EUWX said...
jimcbrown said...
EUWX said...

You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.

Err - by using pdftohtml, it's a few minutes, not one or two months. Single-handedly.

Anybody can use a preexisting software to convert "a few minutes" "Single-handedly".

Agreed.

EUWX said...

When I talked about "one or two months", I was talking about programmatically ... to correct the mistakes,

You did not mention this in your original quote, reproduced below. If this is what you meant, then you should say so.

EUWX said...

You can take a short cut by using the Euphoria system command to invoke a third party software, take the result into Euphoria, do a software extraction using Euphoria, and then reconvert using third party software. One to two months is what you will need working single-handed or with one collaborator.

But I would agree - often times errors in the text of a PDF are hidden by the font being used, and can be a real pain to fix by hand after the text is extracted. If using OCR to pull text out of an embedded image, you more-or-less have to deal with the same issue. Dealing with this without human intervention is not an easy task. Probably not something gwalters needs to do either.

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu