[src]

Module std::str

Unicode string manipulation (str type)

Basic Usage

Rust's string type is one of the core primitive types of the language. While represented by the name str, the name str is not actually a valid type in Rust. Each string must also be decorated with its ownership. This means that there are two common kinds of strings in Rust:

As an example, here's a few different kinds of strings.

fn main() {
    let owned_string = ~"I am an owned string";
    let borrowed_string1 = "This string is borrowed with the 'static lifetime";
    let borrowed_string2: &str = owned_string;   // owned strings can be borrowed
}

From the example above, you can see that Rust has 2 different kinds of string literals. The owned literals correspond to the owned string types, but the "borrowed literal" is actually more akin to C's concept of a static string.

When a string is declared without a ~ sigil, then the string is allocated statically in the rodata of the executable/library. The string then has the type &'static str meaning that the string is valid for the 'static lifetime, otherwise known as the lifetime of the entire program. As can be inferred from the type, these static strings are not mutable.

Mutability

Many languages have immutable strings by default, and Rust has a particular flavor on this idea. As with the rest of Rust types, strings are immutable by default. If a string is declared as mut, however, it may be mutated. This works the same way as the rest of Rust's type system in the sense that if there's a mutable reference to a string, there may only be one mutable reference to that string. With these guarantees, strings can easily transition between being mutable/immutable with the same benefits of having mutable strings in other languages.

let mut buf = ~"testing";
buf.push_char(' ');
buf.push_str("123");
assert_eq!(buf, ~"testing 123");

Representation

Rust's string type, str, is a sequence of unicode codepoints encoded as a stream of UTF-8 bytes. All safely-created strings are guaranteed to be validly encoded UTF-8 sequences. Additionally, strings are not null-terminated and can contain null codepoints.

The actual representation of strings have direct mappings to vectors:

raw

Unsafe operations

CharOffsets

External iterator for a string's characters and their byte offsets. Use with the std::iter module.

CharRange

Struct that contains a char and the index of the first byte of the next char in a string. This can be used as a data structure for iterating over the UTF-8 bytes of a string.

CharSplits

An iterator over the substrings of a string, separated by sep.

CharSplitsN

An iterator over the substrings of a string, separated by sep, splitting at most count times.

Chars

External iterator for a string's characters. Use with the std::iter module.

MatchIndices

An iterator over the start and end indices of the matches of a substring within a larger string

Normalizations

External iterator for a string's normalization's characters. Use with the std::iter module.

StrSplits

An iterator over the substrings of a string separated by a given search string

UTF16Items

An iterator that decodes UTF-16 encoded codepoints from a vector of u16s.

MaybeOwned

A MaybeOwned is a string that can hold either a ~str or a &str. This can be useful as an optimization when an allocation is sometimes needed but not always.

UTF16Item

The possibilities for values decoded from a u16 stream.

CharEq

Something that can be used to compare against a character

IntoMaybeOwned

Trait for moving into a MaybeOwned

OwnedStr

Methods for owned strings

Str

Any string that can be represented as a slice

StrSlice

Methods for string slices

StrVector

Methods for vectors of strings

eq

Bytewise string equality

eq_slice

Bytewise slice equality

from_byte

Convert a byte to a UTF-8 string

from_char

Convert a char to a string

from_chars

Convert a vector of chars to a string

from_utf16

Decode a UTF-16 encoded vector v into a string, returning None if v contains any invalid data.

from_utf16_lossy

Decode a UTF-16 encoded vector v into a string, replacing invalid data with the replacement character (U+FFFD).

from_utf8

Converts a vector to a string slice without performing any allocations.

from_utf8_lossy

Converts a vector of bytes to a new utf-8 string. Any invalid utf-8 sequences are replaced with U+FFFD REPLACEMENT CHARACTER.

from_utf8_owned

Consumes a vector of bytes to create a new utf-8 string. Returns None if the vector contains invalid UTF-8.

is_utf16

Determines if a vector of u16 contains valid UTF-16

is_utf8

Determines if a vector of bytes contains valid UTF-8.

replace

Replace all occurrences of one string with another

truncate_utf16_at_nul

Return a slice of v ending at (and not including) the first NUL (0).

utf16_items

Create an iterator over the UTF-16 encoded codepoints in v, returning invalid surrogates as LoneSurrogates.

utf8_char_width

Given a first byte, determine how many bytes are in this UTF-8 character

with_capacity

Allocates a new string with the specified capacity. The string returned is the empty string, but has capacity for much more.

AnyLines

An iterator over the lines of a string, separated by either \n or (\r\n).

Bytes

External iterator for a string's bytes. Use with the std::iter module.

RevBytes

External iterator for a string's bytes in reverse order. Use with the std::iter module.

RevCharOffsets

External iterator for a string's characters and their byte offsets in reverse order. Use with the std::iter module.

RevCharSplits

An iterator over the substrings of a string, separated by sep, starting from the back of the string.

RevChars

External iterator for a string's characters in reverse order. Use with the std::iter module.

SendStr

SendStr is a specialization of MaybeOwned to be sendable

Words

An iterator over the words of a string, separated by a sequence of whitespace