unicode length

new topic     » topic index » view thread      » older message » newer message

Here is a little function to report an UTF-8 string length.

public function ulength(sequence s) 
  integer res 
  integer i, lg 
  atom char 
 
  i = 1 
  res = 0 
  lg = length(s) 
  if lg < 2 then return length(s) end if 
  while i <= lg do 
    if and_bits(s[i],#80) = #00 then 
      i += 1 
    elsif and_bits(s[i], #E0) = #C0 then 
      i += 2 
    elsif and_bits(s[i], #F0) = #E0 then 
      i += 3 
    elsif and_bits(s[i], #F8) = #F0 then 
      i += 4 
    else 
      i += 1 
    end if 
    res += 1 
  end while 
  return res 
end function 

It works also with ASCII strings so it could replace length().

Jean-Marc

new topic     » topic index » view thread      » older message » newer message

Search



Quick Links

User menu

Not signed in.

Misc Menu