Re: unicode length

new topic     » goto parent     » topic index » view thread      » older message » newer message

And here is un UTF-8 compliant head() function:

function uhead(sequence s, integer n) 
  sequence res 
  integer i, lg, ul 
  atom char 
 
  i = 1 
  ul = 0 
  lg = length(s) 
  if lg < 2 then return head(s, n) end if 
  while i <= lg do 
    if and_bits(s[i],#80) = #00 then 
      i += 1 
    elsif and_bits(s[i], #E0) = #C0 then 
      i += 2 
    elsif and_bits(s[i], #F0) = #E0 then 
      i += 3 
    elsif and_bits(s[i], #F8) = #F0 then 
      i += 4 
    else 
      i += 1 
    end if 
    ul += 1 
    if ul = n then return s[1..i-1] end if 
  end while 
  return s 
end function 

It works also with ASCII strings.

Jean-Marc

new topic     » goto parent     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu