UTF-8 Chinese interception function

In PHP, use the substr () function is the interception with a Chinese string, it may be garbled, because Western culture the number of bytes occupied by a single word is not the same, substr length parameter is counted according to the byte . In GB2312 encoding, a Chinese account for 2 bytes, English as a byte, and in the UTF-8 encoding of them, a Chinese may occupy two or three bytes, or half-width punctuation in English, 1 byte.

To address this problem, I found a lot of information, ultimately to find such a passage:

UTF-8 encoded characters may be from 1 to 3 bytes, the specific number can be judged by the first byte. (Theoretically may take longer, but this assumes no more than 3 bytes)
The first byte greater than 224, it and its after along with the formation of a 2-byte UTF-8 characters
The first byte greater than 192 less than 224, it and it's one byte after the formation of a UTF-8 characters
Otherwise, the first byte is itself a Chinese character (including numbers, and a small part of the punctuation marks).

OK, know about the principle, to solve the more relaxed, and then wrote the following function is used to UTF-8 characters in interception:

PHP Code

   1. / / String intercept, only UTF-8
   2. Function cut_str ($ str, $ len) (
   3. If (strlen ($ str) <= $ len) return $ str;
   4. $ N = 0;
   5. $ Tempstr ='';
   6. For ($ i = 0; $ i <$ len; $ i + +) (
   7. If (ord (substr ($ str, $ n, 1))> 224) (
   8. $ Tempstr .= substr ($ str, $ n, 3);
   9. $ N + = 3;
10. $ I + +; / / to a Chinese calculated according to the length of the two English
11.) Elseif (ord (substr ($ str, $ n, 1))> 192) (
12. $ Tempstr .= substr ($ str, $ n, 2);
13. $ N + = 2;
14. $ I + +; / / to a Chinese calculated according to the length of the two English
15.) Else (
16. $ Tempstr .= substr ($ str, $ n, 1);
17. $ N + +;
18.)
19.)
20. Return $ tempstr .'...';
21.)

PHP also has built-in function can be used for different encoded string interception, such as mb_substr (), but to open the corresponding php.ini extension.

August 10th, 2010 in Technology | tags: | No Comments

www.c2gg.com,中山公司搬家Technology,技术信息,新闻,News,中山家庭搬家，书写永久

UTF-8 Chinese interception function

Declined comment