utf8.*

Type Library
Revision Release 2024.3703
Keywords utf8, UTF-8, Unicode, string
Platforms Android, iOS, macOS, tvOS, Windows

Overview

The UTF-8 plugin, based on the luautf8 module, provides functions to manipulate UTF-8 strings in Corona.

UTF-8 strings are composed of a mixture of single-byte and multi-byte Unicode characters which means that simple byte counting can't be used to determine things like string length or substring positions. The Lua string object can store UTF-8 strings since it just considers them to be a sequence of bytes, but any operation that needs to understand the concept of a "character" won't work because Lua assumes all characters are one byte.

Essentially, if you need to handle actions like upper-casing or string substitutions with accented or non-ASCII characters, this plugin will be especially useful.

Concepts

The UTF-8 plugin uses some concepts which should be understood before attempting to use it.

Syntax

local utf8 = require( "plugin.utf8" )

Functions

utf8.width()


In addition, the following functions mimic the respective string library functions except, in the UTF-8 plugin, they handle multi-byte characters:

UTF-8 Plugin Equivalent
utf8.byte string.byte()
utf8.char string.char()
utf8.find string.find()
utf8.gmatch string.gmatch()
utf8.gsub string.gsub()
utf8.len string.len()
utf8.lower string.lower()
utf8.match string.match()
utf8.reverse string.reverse()
utf8.sub string.sub()
utf8.upper string.upper()

Project Settings

To use this plugin, add an entry into the plugins table of build.settings. When added, the build server will integrate the plugin during the build phase.

settings =
{
    plugins =
    {
        ["plugin.utf8"] =
        {
            publisherId = "com.coronalabs"
        },
    },      
}