UTF-8 Chinese interception function
In PHP, use the substr () function is the interception with a Chinese string, it may be garbled, because Western culture the number of bytes occupied by a single word is not the same, substr length parameter is counted according to the byte . In GB2312 encoding, a Chinese account for 2 bytes, English as a byte, and in the UTF-8 encoding of them, a Chinese may occupy two or three bytes, or half-width punctuation in English, 1 byte.
To address this problem, I found a lot of information, ultimately to find such a passage:
UTF-8 encoded characters may be from 1 to 3 bytes, the specific number can be judged by the first byte. (Theoretically may take longer, but this assumes no more than 3 bytes)
The first byte greater than 224, it and its after along with the formation of a 2-byte UTF-8 characters
The first byte greater than 192 less than 224, it and it's one byte after the formation of a UTF-8 characters
Otherwise, the first byte is itself a Chinese character (including numbers, and a small part of the punctuation marks).
OK, know about the principle, to solve the more relaxed, and then wrote the following function is used to UTF-8 characters in interception:
PHP Code
1. / / String intercept, only UTF-8
2. Function cut_str ($ str, $ len) (
3. If (strlen ($ str) <= $ len) return $ str;
4. $ N = 0;
5. $ Tempstr ='';
6. For ($ i = 0; $ i <$ len; $ i + +) (
7. If (ord (substr ($ str, $ n, 1))> 224) (
8. $ Tempstr .= substr ($ str, $ n, 3);
9. $ N + = 3;
10. $ I + +; / / to a Chinese calculated according to the length of the two English
11.) Elseif (ord (substr ($ str, $ n, 1))> 192) (
12. $ Tempstr .= substr ($ str, $ n, 2);
13. $ N + = 2;
14. $ I + +; / / to a Chinese calculated according to the length of the two English
15.) Else (
16. $ Tempstr .= substr ($ str, $ n, 1);
17. $ N + +;
18.)
19.)
20. Return $ tempstr .'...';
21.)
PHP also has built-in function can be used for different encoded string interception, such as mb_substr (), but to open the corresponding php.ini extension.