Re: RegEx with phix - howto

new topic     » goto parent     » topic index » view thread      » older message » newer message
_tom said...

First rule of regex: do not use regex.

You can take my regular expressions when you pry them from my cold dead hands.

begin said...

the text to regex is: 10.1016.12.31/nature.S0735%20-1097(98)2000/12/31/34:7-7

it always starts with 10. and end with space, CR or TAB. there is no length limit, it is just an DOI number, and the 10.1016.12.31/nature.S0735%20-1097(98)2000/12/31/34:7-7 is a good example of one. usually, the are embeded in text, the top or the bottom of a page. the %20 stands for some hexnumber starting with % and at most 2 chars after the %.

Are you trying to extract those hexadecimal values, or do you just want to validate that the string is a DOI number?

I found this page via a quick Google search: https://www.crossref.org/blog/dois-and-matching-regular-expressions/

The author implies that there may be several formats of DOI, each with their own unique contents.

I guess if you're look for a quick-and-dirty DOI validation expression, use something like this: ^10.\d{4,9}[^\s]+$

Here's how I'd do it in Euphoria. (Sorry, I don't really use Phix.)

include std/regex.e 
include std/text.e 
 
constant re_doi = regex:new( `^10.\d{4,9}[^\s]+$` ) 
 
public function valid_doi( sequence doi_number ) 
    return regex:is_match( re_doi, text:trim(doi_number) ) 
end function 
 
? valid_doi( "10.1016.12.31/nature.S0735%20-1097(98)2000/12/31/34:7-7" ) 
-- prints 1 
 
? valid_doi( "help me, obi-wan kenobi" ) 
-- prints 0 

Edit: I should note, that it may not be helpful to consider a string valid if it ends in whitespace. Might want to trim() it first.

-Greg

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu