String manipulation

Strings are a packed UTF-8 representation of text, stored as null terminated buffers of u8 bytes. Strings should be indexed in bytes, for efficiency, but UTF-8 unsafe operations should be avoided. For some heavy-duty uses, try std::rope.

Implementation extensions for str

Extension methods for strings

Method trim

fn trim() -> str

Returns a string with leading and trailing whitespace removed

Method trim_left

fn trim_left() -> str

Returns a string with leading whitespace removed

Method trim_right

fn trim_right() -> str

Returns a string with trailing whitespace removed

Method +

pure fn +(rhs: str/& ) -> str

Concatenate two strings: operator version

Implementation extensions for str/&

Extension methods for strings

Method all

fn all(it: fn(char) -> bool) -> bool

Return true if a predicate matches all characters or if the string contains no characters

Method any

fn any(it: fn(char) -> bool) -> bool

Return true if a predicate matches any character (and false if it matches none or there are no characters)

Method contains

fn contains(needle: str/&a ) -> bool

Returns true if one string contains another

Method contains_char

fn contains_char(needle: char) -> bool

Returns true if a string contains a char

Method each

fn each(it: fn(u8) -> bool)

Iterate over the bytes in a string

Method eachi

fn eachi(it: fn(uint, u8) -> bool)

Iterate over the bytes in a string, with indices

Method each_char

fn each_char(it: fn(char) -> bool)

Iterate over the chars in a string

Method each_chari

fn each_chari(it: fn(uint, char) -> bool)

Iterate over the chars in a string, with indices

Method ends_with

fn ends_with(needle: str/& ) -> bool

Returns true if one string ends with another

Method is_empty

fn is_empty() -> bool

Returns true if the string has length 0

Method is_not_empty

fn is_not_empty() -> bool

Returns true if the string has length greater than 0

Method is_whitespace

fn is_whitespace() -> bool

Returns true if the string contains only whitespace

Whitespace characters are determined by char::is_whitespace

Method is_alphanumeric

fn is_alphanumeric() -> bool

Returns true if the string contains only alphanumerics

Alphanumeric characters are determined by char::is_alphanumeric

Method len

pure fn len() -> uint

Returns the size in bytes not counting the null terminator

Method slice

fn slice(begin: uint, end: uint) -> str

Returns a slice of the given string from the byte range [begin..end)

Fails when begin and end do not point to valid characters or beyond the last character of the string

Method split

fn split(sepfn: fn(char) -> bool) -> ~[str]

Splits a string into substrings using a character function

Method split_char

fn split_char(sep: char) -> ~[str]

Splits a string into substrings at each occurrence of a given character

Method split_str

fn split_str(sep: str/&a ) -> ~[str]

Splits a string into a vector of the substrings separated by a given string

Method starts_with

fn starts_with(needle: str/&a ) -> bool

Returns true if one string starts with another

Method substr

fn substr(begin: uint, n: uint) -> str

Take a substring of another.

Returns a string containing n characters starting at byte offset begin.

Method to_lower

fn to_lower() -> str

Convert a string to lowercase

Method to_upper

fn to_upper() -> str

Convert a string to uppercase

Method escape_default

fn escape_default() -> str

Escape each char in s with char::escape_default.

Method escape_unicode

fn escape_unicode() -> str

Escape each char in s with char::escape_unicode.

Function all

pure fn all(s: str/& , it: fn(char) -> bool) -> bool

Return true if a predicate matches all characters or if the string contains no characters

Function all_between

pure fn all_between(s: str/& , start: uint, end: uint, it: fn(char) -> bool)
        -> bool

Loop through a substring, char by char

Safety note

Arguments

Return value

true If execution proceeded correctly, false if it was interrupted, that is if it returned false at any point.

Function any

pure fn any(ss: str/& , pred: fn(char) -> bool) -> bool

Return true if a predicate matches any character (and false if it matches none or there are no characters)

Function any_between

pure fn any_between(s: str/& , start: uint, end: uint, it: fn(char) -> bool)
        -> bool

Loop through a substring, char by char

Safety note

Arguments

Return value

true if it returns true for any character

Function append

pure fn append(+lhs: str, rhs: str/& ) -> str

Concatenate two strings together

Function as_buf

pure fn as_buf<T>(s: str, f: fn(*u8) -> T) -> T

Work with the byte buffer of a string.

Allows for unsafe manipulation of strings, which is useful for foreign interop.

Function as_bytes

pure fn as_bytes<T>(s: str, f: fn(~[u8]) -> T) -> T

Work with the byte buffer of a string.

Allows for unsafe manipulation of strings, which is useful for foreign interop.

Example

let i = str::as_bytes("Hello World") { |bytes| vec::len(bytes) };

Function as_c_str

pure fn as_c_str<T>(s: str, f: fn(*libc::c_char) -> T) -> T

Work with the byte buffer of a string as a null-terminated C string.

Allows for unsafe manipulation of strings, which is useful for foreign interop, without copying the original string.

Example

let s = str::as_buf("PATH", { |path_buf| libc::getenv(path_buf) });

Function byte_slice

pure fn byte_slice<T>(s: str/& , f: fn(v: & [u8]) -> T) -> T

Work with the string as a byte slice, not including trailing null.

Function bytes

pure fn bytes(s: str) -> ~[u8]

Converts a string to a vector of bytes

The result vector is not null-terminated.

Function bytes_iter

pure fn bytes_iter(ss: str/& , it: fn(u8))

Iterate over the bytes in a string

Function capacity

pure fn capacity(&&s: str) -> uint

Returns the number of single-byte characters the string can hold without reallocating

Function char_at

pure fn char_at(s: str/& , i: uint) -> char

Pluck a character out of a string

Function char_len

pure fn char_len(s: str/& ) -> uint

Returns the number of characters that a string holds

Function char_range_at

pure fn char_range_at(s: str/& , i: uint) -> {ch: char, next: uint,}

Pluck a character out of a string and return the index of the next character.

This function can be used to iterate over the unicode characters of a string.

Example

let s = "中华Việt Nam";
let i = 0u;
while i < str::len(s) {
    let {ch, next} = str::char_range_at(s, i);
    std::io::println(#fmt("%u: %c",i,ch));
    i = next;
}

Example output

0: 中
3: 华
6: V
7: i
8: ệ
11: t
12:
13: N
14: a
15: m

Arguments

Return value

A record {ch: char, next: uint} containing the char value and the byte index of the next unicode character.

Failure

If i is greater than or equal to the length of the string. If i is not the index of the beginning of a valid UTF-8 character.

Function chars

pure fn chars(s: str/& ) -> ~[char]

Convert a string to a vector of characters

Function chars_iter

pure fn chars_iter(s: str/& , it: fn(char))

Iterate over the characters in a string

Function concat

pure fn concat(v: & [const str]) -> str

Concatenate a vector of strings

Function connect

pure fn connect(v: & [const str], sep: str) -> str

Concatenate a vector of strings, placing a given separator between each

Function contains

pure fn contains(haystack: str/&a , needle: str/&b ) -> bool

Returns true if one string contains another

Arguments

Function contains_char

pure fn contains_char(haystack: str/& , needle: char) -> bool

Returns true if a string contains a char.

Arguments

Function count_bytes

pure fn count_bytes(s: str/&b , start: uint, n: uint) -> uint

Counts the number of bytes taken by the n in s starting from start.

Function count_chars

pure fn count_chars(s: str/& , start: uint, end: uint) -> uint

As char_len but for a slice of a string

Arguments

Return value

The number of Unicode characters in s between the given indices.

Function each

pure fn each(s: str/& , it: fn(u8) -> bool)

Iterate over the bytes in a string

Function each_char

pure fn each_char(s: str/& , it: fn(char) -> bool)

Iterates over the chars in a string

Function each_chari

pure fn each_chari(s: str/& , it: fn(uint, char) -> bool)

Iterates over the chars in a string, with indices

Function eachi

pure fn eachi(s: str/& , it: fn(uint, u8) -> bool)

Iterate over the bytes in a string, with indices

Function ends_with

pure fn ends_with(haystack: str/&a , needle: str/&b ) -> bool

Returns true if one string ends with another

Arguments

Function eq

pure fn eq(&&a: str, &&b: str) -> bool

Bytewise string equality

Function escape_default

pure fn escape_default(s: str/& ) -> str

Escape each char in s with char::escape_default.

Function escape_unicode

pure fn escape_unicode(s: str/& ) -> str

Escape each char in s with char::escape_unicode.

Function find

pure fn find(s: str/& , f: fn(char) -> bool) -> option<uint>

Returns the byte index of the first character that satisfies the given predicate

Arguments

Return value

An option containing the byte index of the first matching character or none if there is no match

Function find_between

pure fn find_between(s: str/& , start: uint, end: uint, f: fn(char) -> bool)
        -> option<uint>

Returns the byte index of the first character that satisfies the given predicate, within a given range

Arguments

Return value

An option containing the byte index of the first matching character or none if there is no match

Failure

start must be less than or equal to end and end must be less than or equal to len(s). start must be the index of a character boundary, as defined by is_char_boundary.

Function find_char

pure fn find_char(s: str/& , c: char) -> option<uint>

Returns the byte index of the first matching character

Arguments

Return value

An option containing the byte index of the first matching character or none if there is no match

Function find_char_between

pure fn find_char_between(s: str/& , c: char, start: uint, end: uint) ->
        option<uint>

Returns the byte index of the first matching character within a given range

Arguments

Return value

An option containing the byte index of the first matching character or none if there is no match

Failure

start must be less than or equal to end and end must be less than or equal to len(s). start must be the index of a character boundary, as defined by is_char_boundary.

Function find_char_from

pure fn find_char_from(s: str/& , c: char, start: uint) -> option<uint>

Returns the byte index of the first matching character beginning from a given byte offset

Arguments

Return value

An option containing the byte index of the first matching character or none if there is no match

Failure

start must be less than or equal to len(s). start must be the index of a character boundary, as defined by is_char_boundary.

Function find_from

pure fn find_from(s: str/& , start: uint, f: fn(char) -> bool) -> option<uint>

Returns the byte index of the first character that satisfies the given predicate, beginning from a given byte offset

Arguments

Return value

An option containing the byte index of the first matching charactor or none if there is no match

Failure

start must be less than or equal to len(s). start must be the index of a character boundary, as defined by is_char_boundary.

Function find_str

pure fn find_str(haystack: str/&a , needle: str/&b ) -> option<uint>

Returns the byte index of the first matching substring

Arguments

Return value

An option containing the byte index of the first matching substring or none if there is no match

Function find_str_between

pure fn find_str_between(haystack: str/&a , needle: str/&b , start: uint,
                         end: uint) -> option<uint>

Returns the byte index of the first matching substring within a given range

Arguments

Return value

An option containing the byte index of the first matching character or none if there is no match

Failure

start must be less than or equal to end and end must be less than or equal to len(s).

Function find_str_from

pure fn find_str_from(haystack: str/&a , needle: str/&b , start: uint) ->
        option<uint>

Returns the byte index of the first matching substring beginning from a given byte offset

Arguments

Return value

An option containing the byte index of the last matching character or none if there is no match

Failure

start must be less than or equal to len(s)

Function from_byte

pure fn from_byte(b: u8) -> str

Convert a byte to a UTF-8 string

Failure

Fails if invalid UTF-8

Function from_bytes

pure fn from_bytes(+vv: ~[u8]) -> str

Convert a vector of bytes to a UTF-8 string

Failure

Fails if invalid UTF-8

Function from_char

pure fn from_char(ch: char) -> str

Convert a char to a string

Function from_chars

pure fn from_chars(chs: & [const char]) -> str

Convert a vector of chars to a string

Function from_utf16

pure fn from_utf16(v: & [const u16]) -> str

Function hash

pure fn hash(&&s: str) -> uint

String hash function

Function is_ascii

pure fn is_ascii(s: str/& ) -> bool

Determines if a string contains only ASCII characters

Function is_char_boundary

pure fn is_char_boundary(s: str/& , index: uint) -> bool

Returns false if the index points into the middle of a multi-byte character sequence.

Function is_empty

pure fn is_empty(s: str/& ) -> bool

Returns true if the string has length 0

Function is_not_empty

pure fn is_not_empty(s: str/& ) -> bool

Returns true if the string has length greater than 0

Function is_utf16

pure fn is_utf16(v: & [const u16]) -> bool

Determines if a vector of u16 contains valid UTF-16

Function is_utf8

pure fn is_utf8(v: & [const u8]) -> bool

Determines if a vector of bytes contains valid UTF-8

Function is_whitespace

pure fn is_whitespace(s: str/& ) -> bool

Returns true if the string contains only whitespace

Whitespace characters are determined by char::is_whitespace

Function le

pure fn le(&&a: str, &&b: str) -> bool

Bytewise less than or equal

Function len

pure fn len(s: str/& ) -> uint

Returns the string length/size in bytes not counting the null terminator

Function lines

pure fn lines(s: str/& ) -> ~[str]

Splits a string into a vector of the substrings separated by LF ('\n')

Function lines_any

pure fn lines_any(s: str/& ) -> ~[str]

Splits a string into a vector of the substrings separated by LF ('\n') and/or CR LF ("\r\n")

Function lines_iter

pure fn lines_iter(ss: str/& , ff: fn(&&str))

Apply a function to each line (by '\n')

Function map

pure fn map(ss: str/& , ff: fn(char) -> char) -> str

Apply a function to each character

Function pop_char

fn pop_char(&s: str) -> char

Remove the final character from a string and return it

Failure

If the string does not contain any characters

Function push_char

fn push_char(&s: str, ch: char)

Appends a character at the end of a string

Function push_str

fn push_str(&lhs: str, rhs: str/& )

Appends a string slice to the back of a string

Function replace

pure fn replace(s: str, from: str, to: str) -> str

Replace all occurrences of one string with another

Arguments

Return value

The original string with all occurances of from replaced with to

Function reserve

fn reserve(&s: str, n: uint)

Reserves capacity for exactly n bytes in the given string, not including the null terminator.

Assuming single-byte characters, the resulting string will be large enough to hold a string of length n. To account for the null terminator, the underlying buffer will have the size n + 1.

If the capacity for s is already equal to or greater than the requested capacity, then no action is taken.

Arguments

Function reserve_at_least

fn reserve_at_least(&s: str, n: uint)

Reserves capacity for at least n bytes in the given string, not including the null terminator.

Assuming single-byte characters, the resulting string will be large enough to hold a string of length n. To account for the null terminator, the underlying buffer will have the size n + 1.

This function will over-allocate in order to amortize the allocation costs in scenarios where the caller may need to repeatedly reserve additional space.

If the capacity for s is already equal to or greater than the requested capacity, then no action is taken.

Arguments

Function rfind

pure fn rfind(s: str/& , f: fn(char) -> bool) -> option<uint>

Returns the byte index of the last character that satisfies the given predicate

Arguments

Return value

An option containing the byte index of the last matching character or none if there is no match

Function rfind_between

pure fn rfind_between(s: str/& , start: uint, end: uint, f: fn(char) -> bool)
        -> option<uint>

Returns the byte index of the last character that satisfies the given predicate, within a given range

Arguments

Return value

An option containing the byte index of the last matching character or none if there is no match

Failure

end must be less than or equal to start and start must be less than or equal to len(s). start must be the index of a character boundary, as defined by is_char_boundary

Function rfind_char

pure fn rfind_char(s: str/& , c: char) -> option<uint>

Returns the byte index of the last matching character

Arguments

Return value

An option containing the byte index of the last matching character or none if there is no match

Function rfind_char_between

pure fn rfind_char_between(s: str/& , c: char, start: uint, end: uint) ->
        option<uint>

Returns the byte index of the last matching character within a given range

Arguments

Return value

An option containing the byte index of the last matching character or none if there is no match

Failure

end must be less than or equal to start and start must be less than or equal to len(s). start must be the index of a character boundary, as defined by is_char_boundary.

Function rfind_char_from

pure fn rfind_char_from(s: str/& , c: char, start: uint) -> option<uint>

Returns the byte index of the last matching character beginning from a given byte offset

Arguments

Return value

An option containing the byte index of the last matching character or none if there is no match

Failure

start must be less than or equal to len(s). start must be the index of a character boundary, as defined by is_char_boundary.

Function rfind_from

pure fn rfind_from(s: str/& , start: uint, f: fn(char) -> bool) ->
        option<uint>

Returns the byte index of the last character that satisfies the given predicate, beginning from a given byte offset

Arguments

Return value

An option containing the byte index of the last matching character or none if there is no match

Failure

start must be less than or equal to len(s)',startmust be the index of a character boundary, as defined byis_char_boundary`

Function shift_char

fn shift_char(&s: str) -> char

Remove the first character from a string and return it

Failure

If the string does not contain any characters

Function slice

pure fn slice(s: str/& , begin: uint, end: uint) -> str

Returns a slice of the given string from the byte range [begin..end)

Fails when begin and end do not point to valid characters or beyond the last character of the string

Function split

pure fn split(s: str/& , sepfn: fn(char) -> bool) -> ~[str]

Splits a string into substrings using a character function

Function split_char

pure fn split_char(s: str/& , sep: char) -> ~[str]

Splits a string into substrings at each occurrence of a given character

Function split_char_iter

pure fn split_char_iter(ss: str/& , cc: char, ff: fn(&&str))

Apply a function to each substring after splitting by character

Function split_char_nonempty

pure fn split_char_nonempty(s: str/& , sep: char) -> ~[str]

Like split_char, but omits empty strings from the returned vector

Function split_nonempty

pure fn split_nonempty(s: str/& , sepfn: fn(char) -> bool) -> ~[str]

Like split, but omits empty strings from the returned vector

Function split_str

pure fn split_str(s: str/&a , sep: str/&b ) -> ~[str]

Splits a string into a vector of the substrings separated by a given string

Example

assert ["", "XXX", "YYY", ""] == split_str(".XXX.YYY.", ".")

Function split_str_nonempty

pure fn split_str_nonempty(s: str/&a , sep: str/&b ) -> ~[str]

Function splitn

pure fn splitn(s: str/& , sepfn: fn(char) -> bool, count: uint) -> ~[str]

Splits a string into substrings using a character function, cutting at most count times.

Function splitn_char

pure fn splitn_char(s: str/& , sep: char, count: uint) -> ~[str]

Splits a string into substrings at each occurrence of a given character up to 'count' times

The byte must be a valid UTF-8/ASCII byte

Function splitn_char_iter

pure fn splitn_char_iter(ss: str/& , sep: char, count: uint, ff: fn(&&str))

Apply a function to each substring after splitting by character, up to count times

Function starts_with

pure fn starts_with(haystack: str/&a , needle: str/&b ) -> bool

Returns true if one string starts with another

Arguments

Function substr

pure fn substr(s: str/& , begin: uint, n: uint) -> str

Take a substring of another.

Returns a string containing n characters starting at byte offset begin.

Function to_lower

pure fn to_lower(s: str/& ) -> str

Convert a string to lowercase. ASCII only

Function to_upper

pure fn to_upper(s: str/& ) -> str

Convert a string to uppercase. ASCII only

Function to_utf16

pure fn to_utf16(s: str/& ) -> ~[u16]

Converts to a vector of u16 encoded as UTF-16

Function trim

pure fn trim(+s: str) -> str

Returns a string with leading and trailing whitespace removed

Function trim_left

pure fn trim_left(+s: str) -> str

Returns a string with leading whitespace removed

Function trim_right

pure fn trim_right(+s: str) -> str

Returns a string with trailing whitespace removed

Function unpack_slice

pure fn unpack_slice<T>(s: str/& , f: fn(*u8, uint) -> T) -> T

Work with the byte buffer and length of a slice.

The unpacked length is one byte longer than the 'official' indexable length of the string. This is to permit probing the byte past the indexable area for a null byte, as is the case in slices pointing to full strings, or suffixes of them.

Function unshift_char

fn unshift_char(&s: str, ch: char)

Prepend a char to a string

Function utf16_chars

pure fn utf16_chars(v: & [const u16], f: fn(char))

Function utf8_char_width

pure fn utf8_char_width(b: u8) -> uint

Given a first byte, determine how many bytes are in this UTF-8 character

Function words

pure fn words(s: str/& ) -> ~[str]

Splits a string into a vector of the substrings separated by whitespace

Function words_iter

pure fn words_iter(ss: str/& , ff: fn(&&str))

Apply a function to each word