Figure 8-2. Forward and reverse searches with INSTR

We have found INSTR to be a very handy function, especially when used to the fullest extent possible. Most programmers make use of (and may only be aware of) only the first two parameters. Use INSTR to search from the end of the string? Search for the nth appearance as opposed to just the first appearance? "Wow!" many programmers would say, "I didn't know it could do that." Take the time to get familiar with INSTR and use all of its power.

In Oracle7, if nth_appearance is not positive (i.e., if it is 0 or negative), then INSTR always returns 1. In Oracle8, a value of 0 or a negative number for nth_appearance causes INSTR to raise the VALUE_ERROR exception.

 

 

Let's look at some examples of INSTR. In these examples, you will see all four parameters used in all their permutations. As you write your own programs, keep in mind the different ways in which INSTR can be used to extract information from a string; it can greatly simplify the code you write to parse and analyze character data.

· Find the first occurrence of "archie" in "bug-or-tv-character?archie":

INSTR ('bug-or-tv-character?archie', 'archie') --> 21

The starting position and the nth appearance both defaulted to 1.

· Find the last occurrence of "ar" in "bug-or-tv-character?archie".

INSTR ('bug-or-tv-character?archie', 'ar', -1) --> 21

Were you thinking that the answer might be 6? Remember that the character position returned by INSTR is always calculated from the leftmost character of the string as position 1. The easiest way to find the last of anything in a string is to specify a negative number for the starting position. I did not have to specify the nth appearance (leaving me with a default value of 1), because the last occurrence is also the first when searching backwards.

· Find the second-to-last occurrence of "a" in "bug-or-tv-character?archie":

INSTR ('bug-or-tv-character?archie', 'a', -1, 2) --> 15

No surprises here. Counting from the back of the string, INSTR passes over the "a" in archie because that is the last occurrence, and searches for the next occurrence. Again, the character position is counted from the leftmost character, not the rightmost character, in the string.

· Find the position of the letter "t" closest to (but not past) the question mark in the string "bug-or-tv-character?archie tophat":

· search_string := 'bug-or-tv-character?archie tophat';· tee_loc :=· INSTR (search_string, 't', -1 * (LENGTH (search_string) - INSTR (search_string, '?') +1));

I dynamically calculate the location of the question mark (actually, the first question mark in the string; I assume that there is only one). Then I subtract that from the full length of the string and multiply by -1 because I need to count the number of characters from the end of the string. I then use that value to kick off the search for the closest prior "t".

This example is a good reminder that any of the parameters to INSTR can be complex expressions that call other functions or perform their own calculations. This fact is also highlighted in the next INSTR example.

· Use INSTR to confirm that a user entry is valid. In the following code, I check to see if the command selected by the user is found in the list of valid commands. If so, I execute that command:

· IF INSTR ('|ADD|DELETE|CHANGE|VIEW|CALC|', '|' || cmd || '|') > 0· THEN· execute_command (cmd);· ELSE· DBMS_OUTPUT.PUT_LINE· (' You entered an invalid command. Please try again...');END IF;

In this case, I use the concatenation operator to construct the string that I will search for in the command list. I have to append a vertical bar (|) to the selected command because it is used as a delimiter in the command list. I also use the call to INSTR in a Boolean expression. If INSTR finds a match in the string, it returns a nonzero value; the Boolean expression therefore evaluates to TRUE, and I can go on with my processing. Otherwise, I display an error message.

The following code example, generated using Unicode UTF-8 as the database character set, illustrates the difference in semantics between INSTR and INSTRB, the two variations of INSTR that you are most likely to use:

DECLARE --The underlying database datatype for this example is Unicode UTF-8 x CHAR(30 CHAR) := 'The character ã is two-bytes.';BEGIN --Find the location of "is" in terms of characters DBMS_OUTPUT.PUT_LINE(INSTR(x,'is')); --Find the location of "is" in terms of bytes DBMS_OUTPUT.PUT_LINE(INSTRB(x,'is'));END;

The output is:

1718

The difference in the location of the word "is" comes about because the character ã is represented in Unicode UTF-8 using two bytes. Thus, while "is" is 17 characters into the string, it is at the same time 18 bytes into the string.

The INSTRC function is capable of recognizing decomposed characters. As described in the section on COMPOSE, an alternate representation of ã is a\0303. The following example demonstrates that INSTRC recognizes this alternate representation, whereas INSTR does not:

DECLARE --The underlying database datatype for this example is Unicode UTF-8 x CHAR(40 CHAR) := UNISTR('The character a\0303 could be composed.');BEGIN --INSTR won't see that a\0303 is the same as ã DBMS_OUTPUT.PUT_LINE(INSTR(x,'ã')); --INSTRC, however, will recognize that a\0303 = ã DBMS_OUTPUT.PUT_LINE(INSTRC(x,'ã'));END;

The output is:

015

According to INSTR, the string x does not contain ã at all. INSTRC, on the other hand, recognizes that a\0303 is an alternate representation for the same character. The UNISTR function is used in the declaration of x to convert the ASCII string that we can read into a Unicode string for the example.

The INSTR2 and INSTR4 functions allow you to search for code units and code points respectively, which may not correspond to complete characters. For the following example, AL16UTF16 is used as the national language character set. The character ã is represented in the string x as two code points: one for "a", and the other (\0303) for the tilde that goes above the "a". INSTRC and INSTR4 are then used to search for the location of the \0303 code point:

DECLARE --The underlying database datatype for this example is Unicode UTF-16 x NCHAR(40) := UNISTR('The character a\0303 could be composed.');BEGIN --Find the location of "\0303" using INSTRC DBMS_OUTPUT.PUT_LINE(INSTRC(x,UNISTR('\0303'))); --Find the location of "\0303" using INSTR4 DBMS_OUTPUT.PUT_LINE(INSTR4(x,UNISTR('\0303')));END;

The output is:

016

The INSTRC function works with full characters, and is of no use when you need to search for a code point that does not represent a complete character. INSTR4, on the other hand, is able to locate the \0303 code point. The return value of 16 indicates that \0303 is the 16th code point in the string.

INSTR2 works like INSTR4, but allows you to search for UCS-4 code units. Look at the following example:

DECLARE x NCHAR(40) := UNISTR('This is a \D834\DD1E test');BEGIN DBMS_OUTPUT.PUT_LINE (INSTR2(x, UNISTR('\D834'))); DBMS_OUTPUT.PUT_LINE (INSTR4(x, UNISTR('\D834')));END; 110

\D834\DD1E (the musical G clef) is a surrogate pair; each value in a surrogate pair is a code unit. Together, the two code units represent a single code point. This example shows how INSTR2 matches on just the high surrogate, whereas INSTR4 does not. That's because INSTR2 matches in terms of code units, whereas INSTR4 matches in terms of code points. Matching on just one code unit of a surrogate is sort of equivalent to searching and matching on the first byte of a multibyte character. Normally, you don't want to do this.

LEAST  
   

LEAST takes one or more strings as input and returns the string that would come first (i.e., that is the least) if the inputs were to be sorted in ascending order. The ordering of the strings is based on character code in the database character set. LEAST has the following specification:

FUNCTION LEAST (string1 IN VARCHAR2, string2 IN VARCHAR2,...) RETURN VARCHAR2

Following is an example:

BEGIN DBMS_OUTPUT.PUT_LINE(LEAST('Gennick','Pribyl','Feuerstein'));END; Feuerstein

Also see the GREATEST function, which is the opposite of LEAST.

LENGTH, LENGTHB, LENGTHC, LENGTH2, and LENGTH4  
   

The LENGTH family of functions returns the length of a string. The length can be returned in any of the following units:

 

LENGTH

Characters

 

LENGTHB

Bytes

 

LENGTHC

Unicode characters, normalizing where possible

 

LENGTH2

Code units

 

LENGTH4

Code points

The same pattern is used for the specification of all LENGTH functions:

FUNCTION LENGTH (string1 VARCHAR2) RETURN NUMBER

If string1 is NULL, then LENGTH returns NULL—not zero! Remember that a NULL string is a "nonvalue." Therefore, it cannot have a length, even a zero length.

The LENGTH function, in fact, will never return zero; it will always return either NULL or a positive number.

An exception is when LENGTH is used against a CLOB. It is possible for a CLOB to hold zero bytes and yet not be NULL. In that one case, LENGTH will return zero.

 

 

Here are some examples of LENGTH:

LENGTH (NULL) --> NULLLENGTH ('') --> NULL -- Same as a NULL string.LENGTH ('abcd') --> 4LENGTH ('abcd ') --> 5

If string1 is a fixed-length CHAR datatype, then LENGTH counts the trailing blanks in its calculation. So the LENGTH of a fixed-length string is always the declared length of the string. If you want to compute the length of the nonblank characters in string1, you will need to use the RTRIM function to remove the trailing blanks (RTRIM is discussed later in this chapter). In the following example, length1 is set to 60 and length2 is set to 14.

DECLARE company_name CHAR(60) := 'ACME PLUMBING'; length1 NUMBER; length2 NUMBER;BEGIN length1 := LENGTH (company_name); length2 := LENGTH (RTRIM (company_name));END;

The following example uses an NVARCHAR2 variable and the AL16UTF16 character set to highlight the differences between LENGTH, LENGTHB, and LENGTHC. Remember that AL16UTF16 is Oracle's name for UTF-16, a character set in which each character is represented using two bytes.

DECLARE --NVARCHAR2 = UTF-16 in this example. x NVARCHAR2(50) := UNISTR('The character ã and its decomposed equivalent: a\0303');BEGIN DBMS_OUTPUT.PUT_LINE(LENGTH(x)); DBMS_OUTPUT.PUT_LINE(LENGTHB(x)); DBMS_OUTPUT.PUT_LINE(LENGTHC(x));END;

The output is:

499848

LENGTH counts characters, but sees a\0303 as two separate, two-byte characters. LENGTHB counts bytes, returning a value that is twice the number of two-byte characters reported by LENGTH.LENGTHC counts Unicode characters, and recognizes that a\0303 really represents the single character ã.

LOWER  
   

The LOWERfunction converts all letters in the specified string to lowercase. The specifications for the LOWER function are:

FUNCTION LOWER (string1 IN CHAR) RETURN CHARFUNCTION LOWER (string1 IN VARCHAR2) RETURN VARCHAR2

As noted earlier, LOWER and UPPER return a fixed-length string if the incoming string is fixed-length. LOWER will not change any characters in the string that are not letters, as case is irrelevant for numbers and special characters.

Here are some examples of the effect of LOWER:

LOWER ('BIG FAT LETTERS') --> 'big fat letters'LOWER ('123ABC') --> '123abc'

LOWER and UPPER are useful for guaranteeing a consistent case when comparing strings. PL/SQL is not a case-sensitive language with regard to its own syntax and names of identifiers, but it is sensitive to case in character strings, whether they are found in named constants, literals, or variables. The string "ABC" is not the same as "abc", and this can cause problems in your programs if you are not careful and consistent in your handling of such values.

LPAD  
   

By default, PL/SQL strips all trailing blanks from a character string unless it is declared with a fixed-length CHAR datatype. There are occasions, however, when you really want some spaces (or even some other characters) added to the front or back of your string. LPAD (and its right-leaning cousin, RPAD) give you this capability. The LPAD function returns a string padded to the left (hence the "L") to a specified length and with a specified pad string. The specification of the LPAD function is:

FUNCTION LPAD (string1 IN VARCHAR2, padded_length IN NUMBER [, pad_string IN VARCHAR2])RETURN VARCHAR2

LPAD returns string1 padded on the left to length padded_length with the optional character string pad_string. If you do not specify pad_string, then string1 is padded on the left with spaces. You must specify the padded_length. If string1 already has a length equal to padded_length, then LPAD returns string1 without any additional characters. If padded_length is smaller than the length of string1, LPAD effectively truncates string1—it returns only the first padded_length characters of the incoming string1.

As you can see, LPAD can do a lot more than just add spaces to the left of a string. Let's look at some examples of how useful LPAD can be.

· Display the number padded left with zeros to a length of 10:

LPAD ('55', 10, '0') --> '0000000055'

· Display the number padded left with zeros to a length of 5:

LPAD ('12345678', 5, '0') --> '12345'

LPAD interprets its padded_length as the maximum length of the string that it may return. As a result, it counts padded_length number of characters from the left (start of the string) and then simply returns that substring of the incoming value.

· Place the phrase "sell!" in front of the names of selected stocks, up to a string length of 43.

· LPAD ('HITOP TIES', 43, 'sell!') · --> 'sell!sell!sell!sell!sell!sell!selHITOP TIES'

Because the length of "HITOP TIES" is 10 and the length of "sell!" is 5, there is no room for seven full repetitions of the pad string. As a result, the seventh repetition (counting from the left) of "sell!" lost its last two characters. So you can see that LPAD does not pad by adding to the left of the original string until it runs out of room. Instead, it figures out how many characters it must pad by to reach the total, constructs that full padded fragment, and finally appends the original string to this fragment.

LTRIM  
   

The LTRIM function is the opposite of LPAD. Whereas LPAD adds characters to the left of a string, LTRIM removes, or trims, characters from the left of the string. And just like LPAD, LTRIM offers much more flexibility than simply removing leading blanks. The specification of the LTRIM function is:

FUNCTION LTRIM (string1 IN VARCHAR2 [, trim_string IN VARCHAR2])RETURN VARCHAR2

LTRIM returns string1 with all leading characters removed up to the first character not found in the trim_string. The second parameter is optional and defaults to a single space.

There is one important difference between LTRIM and LPAD. LPAD pads to the left with the specified string, and repeats that string (or pattern of characters) until there is no more room. LTRIM, on the other hand, removes all leading characters that appear in the trim string, not as a pattern, but as individual candidates for trimming.

Here are some examples:

· Trim all leading blanks from " Way Out in Right Field":

LTRIM (' Way Out in Right Field') --> 'Way Out in Right Field'

Because we did not specify a trim string, it defaults to a single space, so all leading spaces are removed.

· Remove all numbers from the front of the string:

· my_string := '70756234LotsaLuck';· LTRIM (my_string, '0987612345') --> 'LotsaLuck'

By specifying every possible digit in the trim string, we ensure that any and all numbers will be trimmed, regardless of the order in which they occur (and the order in which we place them in the trim string).

What if we wanted to remove only a specific pattern, say the letters "abc", from the front of the string? We couldn't use LTRIM because it trims off any matching individual characters. To remove a leading pattern from a string—or to replace one pattern with another—you will want to make use of the REPLACE function, which is discussed next.

The TRIM function implements ANSI-standard trim functionality.

 

 

REPLACE  
   

The REPLACE function returns a string in which all occurrences of a specifiedmatch string are replaced with a replacement string. REPLACE is useful for searching a pattern of characters, and then changing all instances of that pattern in a single function call. The specification of the REPLACE function is:

FUNCTION REPLACE (string1 IN VARCHAR2, match_string IN VARCHAR2 [, replace_string IN VARCHAR2])RETURN VARCHAR2

If you do not specify the replacement string, then REPLACE simply removes all occurrences of the match_string in string1. If you specify neither a match string nor a replacement string, REPLACE returns NULL.

Here are several examples using REPLACE:

· Remove all instances of the letter "C" in the string "CAT CALL":

REPLACE ('CAT CALL', 'C') --> 'AT ALL'

Because we did not specify a replacement string, REPLACE changed all occurrences of "C" to NULL.

· Replace all occurrences of "99" with "100" in the following string:

· REPLACE ('Zero defects in period 99 reached 99%!', '99', '100') · --> 'Zero defects in period 100 reached 100%!'

· Handle occurrences of a single quote within a query criteria string. The single quote is a string terminator symbol, indicating the start and/or end of the literal string. I once ran into this requirement when building query-by-example strings in Oracle Forms. If the user enters a string with a single quote in it, such as:

Customer didn't have change.

and then the program concatenates that string into a larger string, the resulting SQL statement (created dynamically by Oracle Forms in Query Mode) fails because there are unbalanced single quotes in the string. You can resolve this problem by converting that single quote into two single quotes in a row, thereby telling SQL that you really intend for one quote to be part of the string. Use the following REPLACE to do this:

criteria_string := REPLACE (criteria_string, '''', '''''');

The four quotes in the second parameter resolve to a string containing one single quote. The six quotes in the third parameter resolve to a string containing two single quotes. In each case, the outer pair of quotes delimits the string, and each pair of single quotes inside that outer pair are interpreted as just one single quote within the string.

· Remove all leading instances of "abc" from the string:

"abcabcccccI LOVE CHILIabc"

This is the behavior we were looking at in the previous section on LTRIM. We want to remove all instances of the pattern "abc" from the beginning of the string, but we do not want to remove that pattern throughout the rest of the string. In addition, we want to remove "abc" only as a pattern; if we encounter three contiguous c's ("ccc"), on the other hand, they should not be removed from the string. This task is less straightforward than it might at first seem. If we simply apply REPLACE to the string, it will remove all occurrences of "abc", instead of just the leading instances. For example:

REPLACE ('abcabccccI LOVE CHILIabc', 'abc') --> 'cccI LOVE CHILI'

That is not what we want in this case. If we use LTRIM, on the other hand, we will be left with none of the leading c's, as demonstrated in a previous example:

LTRIM ('abcabcccccI LOVE CHILIabc', 'abc') --> 'I LOVE CHILIabc'

And this is not quite right either. We want to be left with "cccI LOVE CHILIabc" (please do not ask why), and it turns out that the way to get it is to use a combination of LTRIM and REPLACE. Suppose that we create a local variable as follows:

my_string := 'abcabccccI LOVE CHILIabc';

Then the following statement will achieve the desired effect by nesting calls to REPLACE and LTRIM within one another:

REPLACE (LTRIM (REPLACE (my_string, 'abc', '@'), '@'), '@', 'abc') --> 'cccI LOVE CHILIabc'

Here is how we would describe in English what the above statement does:

First replace all occurrences of "abc" with the special character "@" (which we are assuming does not otherwise appear in the string). Then trim off all leading instances of "@". Finally, replace all remaining occurrences of "@" with "abc".

Voilà, as they say in many of the finer restaurants in Paris. Now let's pull apart this single, rather complex statement into separate PL/SQL steps corresponding to the "natural language" description:

Replace all occurrences of "abc" with the special character "@" (which we are assuming does not otherwise appear in the string). Notice that this only affects the "abc" pattern, and not any individual appearances of "a", "b", or "c".

REPLACE ('abcabccccI LOVE CHILIabc', 'abc', '@') --> '@@cccI LOVE CHILI@'

Trim off all leading instances of "@".

LTRIM ('@@cccI LOVE CHILI@', '@') --> 'cccI LOVE CHILI@'

Notice that LTRIM now leaves the c's in place, because we didn't ask it to remove "a" or "b" or "c"—just "@". In addition, it left the trailing "@" in the string because LTRIM deals only with characters on the leading end of the string.

Replace all remaining occurrences of "@" with "abc".

REPLACE ('cccI LOVE CHILI@', '@', 'abc') --> 'cccI LOVE CHILIabc'

And we are done. We used the first REPLACE to temporarily change the occurrences of "abc" so that LTRIM could distinguish between the pattern we wanted to get rid of and the extra characters that needed to be preserved. Then a final call to REPLACE restored the pattern in the string.

RPAD  
   

The RPAD function adds characters to the end of a character string. It returns a string padded to the right (hence the "R") to a specified length and with an optional pad string. The specification of the RPAD function is:

FUNCTION RPAD (string1 IN VARCHAR2, padded_length IN NUMBER [, pad_string IN VARCHAR2])RETURN VARCHAR2

RPAD returns string1 padded on the right to length padded_length with the optional character string pad_string. If you do not specify pad_string, then string1 is padded on the right with spaces. You must specify the padded_length. If string1 already has a length equal to padded_length, then RPAD returns string1 without any additional characters. If padded_length is smaller than the length of string1, RPAD effectively truncates string1, returning only the first padded_length characters of the incoming string1.

Let's look at some examples of RPAD:

· Display the number padded right with zeros to a length of 10:

RPAD ('55', 10, '0') --> '5500000000'

I could also use TO_CHAR to convert from a number to a character (I don't know off-hand why you would do this, but it's good to remember that there are usually at least two or three ways to solve any problem):

TO_CHAR (55 * 10000000) --> '5500000000'

· Display the number padded right with zeros to a length of 5:

RPAD ('12345678', 5) --> '12345'

RPAD interprets its padded_length as the maximum length of the string that it may return. As a result, it counts padded_length number of characters from the left (start of the string) and then simply returns that substring of the incoming value. This is the same behavior as that found with LPAD, described earlier in this chapter. Remember that RPAD does not return the rightmost five characters (in the above case "45678").

· Place the phrase "sell!" after the names of selected stocks, up to a string length of 43:

· RPAD ('HITOP TIES', 43, 'sell!') · --> 'HITOP TIESsell!sell!sell!sell!sell!sell!sel'

Because the length of "HITOP TIES" is 10 and the length of "sell!" is 5, there is no room for seven full repetitions of the pad string. As a result, the seventh repetition of "sell!" loses its last two characters.

You can use RPAD (and LPAD) to generate repetitive sequences of characters. For example, you can create a string of 60 dashes to use as a border in a report:

RPAD ('-', 60, '-') --> '------------------------------------------------------------'

I have used this technique in SQL*Reportwriter V1.1 where graphical objects like boxes are not really available. I can include the RPAD in a SELECT statement in the report, and then use the corresponding field in text elements to provide lines to break up the data in a report.

RTRIM  
   

The RTRIM function is the opposite of RPAD and the companion to LTRIM. While RPAD adds characters to the right of a string, RTRIM removes, or trims, characters from the end portion of the string. Just as with RPAD, RTRIM offers much more flexibility than simply removing trailing blanks. The specification of the RTRIM function is:

FUNCTION RTRIM (string1 IN VARCHAR2 [, trim_string IN VARCHAR2])RETURN VARCHAR2

RTRIM returns string1 with all trailing characters removed up to the first character not found in the trim_string. The second parameter is optional and defaults to a single space.

Here are some examples of RTRIM:

· Trim all trailing blanks from a string:

· RTRIM ('Way Out in Right Field ') --> 'Way Out in Right Field'

Because I did not specify a trim string, it defaults to a single space, so all trailing spaces are removed.

· Trim all the characters in "BAM! ARGH!" from the end of a string:

· my_string := 'Sound effects: BAM!ARGH!BAM!HAM';RTRIM (my_string, 'BAM! ARGH!') --> 'Sound effects:'

This use of RTRIM strips off all the letters at the end of the string that are found in "BAM!ARGH!". This includes "BAM" and "HAM", so those words too are removed from the string even though "HAM" is not listed explicitly as a "word" in the trim string. Also, the inclusion of two exclamation marks in the trim string is unnecessary, because RTRIM is not looking for the word "ARGH!", but for each of the letters in "ARGH!".

The TRIM function implements ANSI-standard trim functionality.

 

 

SOUNDEX  
   

The SOUNDEX function allows you to perform string comparisons based on phonetics (the way a word sounds) as opposed to semantics (the way a word is spelled).[1]

[1] Oracle Corporation uses the algorithm in Donald Knuth's The Art of Computer Programming, Volume 3, to generate the phonetic representation.

SOUNDEX returns a character string that is the "phonetic representation" of the argument. The specification of the SOUNDEX function is as follows:

FUNCTION SOUNDEX (string1 IN VARCHAR2) RETURN VARCHAR2

Here are some of the values SOUNDEX generated, and their variations according to the input string:

SOUNDEX ('smith') --> 'S530'SOUNDEX ('SMYTHE') --> ''S530'SOUNDEX ('smith smith') --> 'S532'SOUNDEX ('smith z') --> 'S532'SOUNDEX ('feuerstein') --> 'F623'SOUNDEX ('feuerst') --> 'F623'

Keep the following SOUNDEX rules in mind when using this function:

· The SOUNDEX value always begins with the first letter in the input string.

· SOUNDEX uses only the first five consonants in the string to generate the return value.

· Only consonants are used to compute the numeric portion of the SOUNDEX value. Except for leading vowels, all vowels are ignored.

· SOUNDEX is not case-sensitive. Upper- and lowercase letters return the same SOUNDEX value.

The SOUNDEX function is useful for ad hoc queries, and any other kinds of searches where the exact spelling of a database value is not known or easily determined.

The SOUNDEX algorithm is English-centric and may not work well (or at all) for other languages.

 

 

SUBSTR, SUBSTRB, SUBSTRC, SUBSTR2, and SUBSTR4  
   

The SUBSTR family of functions is one of the most common and useful set of character functions. The SUBSTR functions allow you to extract a subset of contiguous characters from a string. The substring is specified by starting position and length, and the functions differ in the units they use:

 

SUBSTR

Starting position and length are in terms of characters.

 

SUBSTRB

Starting position and length are in terms of bytes. When you use a single-byte character set, SUBSTRB and SUBSTR will return the same results.

 

SUBSTRC

Starting position and length are in terms of Unicode characters, after any decomposed characters have been composed.

 

SUBSTR2

Starting position and length are in terms of code units.

 

SUBSTR4

Starting position and length are in terms of code points.

All of the function specifications follow the same pattern:

FUNCTION SUBSTR (string_in IN VARCHAR2, start_position_in IN NUMBER [, substr_length_in IN NUMBER])RETURN VARCHAR2

where the arguments are as follows:

 

string_in

The source string

 

start_position_in

The starting position of the substring in string_in

 

substr_length_in

The length of the substring desired (the number of characters to be returned in the substring)

The last parameter, substr_length_in, is optional. If you do not specify a substring length, then SUBSTR returns all the characters to the end of string_in (from the starting position specified). The substr_length_in argument, if present, must be greater than zero.

The starting position cannot be zero. If it is less than zero, then the substring is retrieved from the back of the string. SUBSTR counts backwards substr_length_in number of characters from the end of string_in. In this case, however, the characters that are extracted are still to the right of the starting position. See Figure 8-3 for an illustration of how the different arguments are used by SUBSTR.