utf8.next()

Type Function

Library utf8.*

Return value Numbers or iterator

Revision Release 2024.3703

Keywords utf8, UTF-8, Unicode, string, next

Overview

Examines or iterates through a UTF-8 string, depending on usage:

If only charpos is specified, returns the next byte offset in the string.
If charpos and offset are specified, a new charpos will be calculated by adding or subtracting the UTF-8 character offset to/from the current charpos.
If the parentheses are omitted, it can used directly as an iterator:

for charpos, codepoint in utf8.next, "UTF8-string" do
    print( charpos, codepoint )
end

In all cases, this function returns a new character position (in bytes) and code point (number) at this position.

Syntax

utf8.next( s [, charpos [, offset]] )

s _^(required)

String. The string.

charpos _^(optional)

Number. The character position to start at.

offset _^(optional)

Number. The character offset.

Examples

Next Offset

local utf8 = require( "plugin.utf8" )

local testStr = "♡ 你好，世界 ♡"

print( utf8.next( testStr, 2 ) )  --> 3  161

Iterator

local utf8 = require( "plugin.utf8" )

local testStr = "♡ 你好，世界 ♡"

for charpos, codepoint in utf8.next, testStr do
    print( charpos, codepoint )
end

--> 1   9825
--> 4   32
--> 5   20320
--> 8   22909
--> 11  65292
--> 14  19990
--> 17  30028
--> 20  32
--> 21  9825

Type	Function
Library	utf8.*
Return value	Numbers or iterator
Revision	Release 2024.3703
Keywords	utf8, UTF-8, Unicode, string, next