Database Character Set Versus National Language Character Set

Every Oracle database has not one, but two character sets associated with it:

 

A database character set

Used for CHAR and VARCHAR2 columns; for table names, column names, PL/SQL variable names, and other such identifiers; and for SQL statements and most string literals.

 

A national language character set

Used for NCHAR and NVARCHAR2 columns, and for string literals prefixed with an N (see Section 8.3.3 later in this chapter).

Why two character sets? Historically, this came about largely for performance reasons. With two character sets, it became possible to use the database character set for single-byte ASCII characters, and to use the national character set for fixed-width multibyte characters. With this scenario, your database character set still supports ASCII, you can still support multibyte characters, and you don't take a performance hit from having to deal with multibyte characters all of the time.

If you're not certain which two character sets your database supports, you can query the NLS_DATABASE_PARAMETERS view in the Oracle data dictionary to find out:

SQL> SELECT * 2 FROM nls_database_parameters 3 WHERE parameter IN ('NLS_CHARACTERSET','NLS_NCHAR_CHARACTERSET'); PARAMETER VALUE------------------------------ ----------------------------------------NLS_CHARACTERSET WE8MSWIN1252NLS_NCHAR_CHARACTERSET AL16UTF16

The NLS_CHARACTERSET parameter returns Oracle's name for the database character set. The NLS_NCHAR_CHARACTERSET parameter returns Oracle's name for the national character set used for NCHAR and NVARCHAR2 types.

The character set "names" that Oracle uses are really convenient abbreviations recognized by the Oracle software. These abbreviations are not necessarily recognized by standards bodies, and they are not usually official character set names. For example,AL16UTF16 is Oracle's abbreviation for Unicode UTF-16. With some exceptions, you can often learn useful information about a character set just from the name Oracle uses for it. Figure 8-1 illustrates Oracle's character set naming convention.

Figure 8-1. Oracle's character set naming convention

The key item in the naming convention is the second element, indicating the number of bits per character. If it's 7 or 8, you can be confident that you are dealing with a single-byte, fixed-width character set. If it is 16, 32, or any other value higher than 8, you are dealing with a multibyte character set. Most multibyte character sets are variable-length, but the name alone won't tell you that.

Sometimes you will see an S or a C on the end of a character set name. Such a suffix indicates a character set that can be used only on the server (S) or only on the client (C).

 

 

The naming convention shown in Figure 8-1 isn't universally applied to all Oracle character sets. For example, UTF8, Oracle's name for the UTF-8 character set, does not conform to the naming convention just described.