Module std::str
Unicode string manipulation (str
type)
Basic Usage
Rust's string type is one of the core primitive types of the language. While
represented by the name str
, the name str
is not actually a valid type in
Rust. Each string must also be decorated with its ownership. This means that
there are two common kinds of strings in Rust:
~str
- This is an owned string. This type obeys all of the normal semantics of the~T
types, meaning that it has one, and only one, owner. This type cannot be implicitly copied, and is moved out of when passed to other functions.&str
- This is the borrowed string type. This type of string can only be created from the other kind of string. As the name "borrowed" implies, this type of string is owned elsewhere, and this string cannot be moved out of.
As an example, here's a few different kinds of strings.
fn main() { let owned_string = ~"I am an owned string"; let borrowed_string1 = "This string is borrowed with the 'static lifetime"; let borrowed_string2: &str = owned_string; // owned strings can be borrowed }
From the example above, you can see that Rust has 2 different kinds of string literals. The owned literals correspond to the owned string types, but the "borrowed literal" is actually more akin to C's concept of a static string.
When a string is declared without a ~
sigil, then the string is allocated
statically in the rodata of the executable/library. The string then has the
type &'static str
meaning that the string is valid for the 'static
lifetime, otherwise known as the lifetime of the entire program. As can be
inferred from the type, these static strings are not mutable.
Mutability
Many languages have immutable strings by default, and Rust has a particular
flavor on this idea. As with the rest of Rust types, strings are immutable by
default. If a string is declared as mut
, however, it may be mutated. This
works the same way as the rest of Rust's type system in the sense that if
there's a mutable reference to a string, there may only be one mutable reference
to that string. With these guarantees, strings can easily transition between
being mutable/immutable with the same benefits of having mutable strings in
other languages.
let mut buf = ~"testing"; buf.push_char(' '); buf.push_str("123"); assert_eq!(buf, ~"testing 123");
Representation
Rust's string type, str
, is a sequence of unicode codepoints encoded as a
stream of UTF-8 bytes. All safely-created strings are guaranteed to be validly
encoded UTF-8 sequences. Additionally, strings are not null-terminated
and can contain null codepoints.
The actual representation of strings have direct mappings to vectors:
~str
is the same as~[u8]
&str
is the same as&[u8]
Modules
raw | Unsafe operations |
Structs
CharOffsets | External iterator for a string's characters and their byte offsets.
Use with the |
CharRange | Struct that contains a |
CharSplits | An iterator over the substrings of a string, separated by |
CharSplitsN | An iterator over the substrings of a string, separated by |
Chars | External iterator for a string's characters.
Use with the |
MatchIndices | An iterator over the start and end indices of the matches of a substring within a larger string |
Normalizations | External iterator for a string's normalization's characters.
Use with the |
StrSplits | An iterator over the substrings of a string separated by a given search string |
UTF16Items | An iterator that decodes UTF-16 encoded codepoints from a vector
of |
Enums
MaybeOwned | A MaybeOwned is a string that can hold either a ~str or a &str. This can be useful as an optimization when an allocation is sometimes needed but not always. |
UTF16Item | The possibilities for values decoded from a |
Traits
CharEq | Something that can be used to compare against a character |
IntoMaybeOwned | Trait for moving into a |
OwnedStr | Methods for owned strings |
Str | Any string that can be represented as a slice |
StrSlice | Methods for string slices |
StrVector | Methods for vectors of strings |
Functions
eq | Bytewise string equality |
eq_slice | Bytewise slice equality |
from_byte | Convert a byte to a UTF-8 string |
from_char | Convert a char to a string |
from_chars | Convert a vector of chars to a string |
from_utf16 | Decode a UTF-16 encoded vector |
from_utf16_lossy | Decode a UTF-16 encoded vector |
from_utf8 | Converts a vector to a string slice without performing any allocations. |
from_utf8_lossy | Converts a vector of bytes to a new utf-8 string. Any invalid utf-8 sequences are replaced with U+FFFD REPLACEMENT CHARACTER. |
from_utf8_owned | Consumes a vector of bytes to create a new utf-8 string. Returns None if the vector contains invalid UTF-8. |
is_utf16 | Determines if a vector of |
is_utf8 | Determines if a vector of bytes contains valid UTF-8. |
replace | Replace all occurrences of one string with another |
truncate_utf16_at_nul | Return a slice of |
utf16_items | Create an iterator over the UTF-16 encoded codepoints in |
utf8_char_width | Given a first byte, determine how many bytes are in this UTF-8 character |
with_capacity | Allocates a new string with the specified capacity. The string returned is the empty string, but has capacity for much more. |
Type Definitions
AnyLines | An iterator over the lines of a string, separated by either |
Bytes | External iterator for a string's bytes.
Use with the |
RevBytes | External iterator for a string's bytes in reverse order.
Use with the |
RevCharOffsets | External iterator for a string's characters and their byte offsets in reverse order.
Use with the |
RevCharSplits | An iterator over the substrings of a string, separated by |
RevChars | External iterator for a string's characters in reverse order.
Use with the |
SendStr | SendStr is a specialization of |
Words | An iterator over the words of a string, separated by a sequence of whitespace |