Module std::str
Unicode string manipulation (str
type)
Basic Usage
Rust's string type is one of the core primitive types of the language. While
represented by the name str
, the name str
is not actually a valid type in
Rust. Each string must also be decorated with its ownership. This means that
there are three common kinds of strings in rust:
~str
- This is an owned string. This type obeys all of the normal semantics of the~T
types, meaning that it has one, and only one, owner. This type cannot be implicitly copied, and is moved out of when passed to other functions.@str
- This is a managed string. Similarly to@T
, this type can be implicitly copied, and each implicit copy will increment the reference count to the string. This means that there is no "true owner" of the string, and the string will be deallocated when the reference count reaches 0.&str
- Finally, this is the borrowed string type. This type of string can only be created from one of the other two kinds of strings. As the name "borrowed" implies, this type of string is owned elsewhere, and this string cannot be moved out of.
As an example, here's a few different kinds of strings.
#[feature(managed_boxes)];
fn main() {
let owned_string = ~"I am an owned string";
let managed_string = @"This string is garbage-collected";
let borrowed_string1 = "This string is borrowed with the 'static lifetime";
let borrowed_string2: &str = owned_string; // owned strings can be borrowed
let borrowed_string3: &str = managed_string; // managed strings can also be borrowed
}
From the example above, you can see that rust has 3 different kinds of string literals. The owned/managed literals correspond to the owned/managed string types, but the "borrowed literal" is actually more akin to C's concept of a static string.
When a string is declared without a ~
or @
sigil, then the string is
allocated statically in the rodata of the executable/library. The string then
has the type &'static str
meaning that the string is valid for the 'static
lifetime, otherwise known as the lifetime of the entire program. As can be
inferred from the type, these static strings are not mutable.
Mutability
Many languages have immutable strings by default, and rust has a particular
flavor on this idea. As with the rest of Rust types, strings are immutable by
default. If a string is declared as mut
, however, it may be mutated. This
works the same way as the rest of Rust's type system in the sense that if
there's a mutable reference to a string, there may only be one mutable reference
to that string. With these guarantees, strings can easily transition between
being mutable/immutable with the same benefits of having mutable strings in
other languages.
let mut buf = ~"testing";
buf.push_char(' ');
buf.push_str("123");
assert_eq!(buf, ~"testing 123");
Representation
Rust's string type, str
, is a sequence of unicode codepoints encoded as a
stream of UTF-8 bytes. All safely-created strings are guaranteed to be validly
encoded UTF-8 sequences. Additionally, strings are not null-terminated
and can contain null codepoints.
The actual representation of strings have direct mappings to vectors:
~str
is the same as~[u8]
&str
is the same as&[u8]
@str
is the same as@[u8]
Modules
not_utf8 | |
raw | Unsafe operations |
Structs
CharIterator | External iterator for a string's characters.
Use with the |
CharOffsetIterator | External iterator for a string's characters and their byte offsets.
Use with the |
CharRange | Struct that contains a |
CharSplitIterator | An iterator over the substrings of a string, separated by |
CharSplitNIterator | An iterator over the substrings of a string, separated by |
MatchesIndexIterator | An iterator over the start and end indices of the matches of a substring within a larger string |
StrSplitIterator | An iterator over the substrings of a string separated by a given search string |
Traits
CharEq | Something that can be used to compare against a character |
OwnedStr | Methods for owned strings |
Str | Any string that can be represented as a slice |
StrSlice | Methods for string slices |
StrVector | Methods for vectors of strings |
Functions
eq | Bytewise string equality |
eq_slice | Bytewise slice equality |
from_byte | Convert a byte to a UTF-8 string |
from_char | Convert a char to a string |
from_chars | Convert a vector of chars to a string |
from_utf16 | Allocates a new string from the utf-16 slice provided |
from_utf8 | Converts a vector to a string slice without performing any allocations. |
from_utf8_opt | Converts a vector to a string slice without performing any allocations. |
from_utf8_owned | Consumes a vector of bytes to create a new utf-8 string |
from_utf8_owned_opt | Consumes a vector of bytes to create a new utf-8 string. Returns None if the vector contains invalid UTF-8. |
is_utf16 | Determines if a vector of |
is_utf8 | Determines if a vector of bytes contains valid UTF-8 |
replace | Replace all occurrences of one string with another |
utf16_chars | Iterates over the utf-16 characters in the specified slice, yielding each decoded unicode character to the function provided. |
utf8_char_width | Given a first byte, determine how many bytes are in this UTF-8 character |
with_capacity | Allocates a new string with the specified capacity. The string returned is the empty string, but has capacity for much more. |
Type Definitions
AnyLineIterator | An iterator over the lines of a string, separated by either |
ByteIterator | External iterator for a string's bytes.
Use with the |
ByteRevIterator | External iterator for a string's bytes in reverse order.
Use with the |
CharOffsetRevIterator | External iterator for a string's characters and their byte offsets in reverse order.
Use with the |
CharRSplitIterator | An iterator over the substrings of a string, separated by |
CharRevIterator | External iterator for a string's characters in reverse order.
Use with the |
WordIterator | An iterator over the words of a string, separated by an sequence of whitespace |