If you want a simple and quick run-down of the string types in Windows and MSVC++, how they work, what they all mean, and how to program with them, I’ve put together the following snippet of code with explanations on the types of strings that MSVC++ exposes to developers and what their type-names mean, as well as a brief introduction to encoding sets and their effect on this mysterious but wonderful game of Windows string types:
#include "stdafx.h" #include "Windows.h" int _tmain(int argc, _TCHAR* argv[]) { /* Quick Tutorial on Strings in Microsoft Visual C++ The Unicode Character Set and Multibyte Character Set options in MSVC++ provide a project with two flavours of string encodings. They will use different encodings for characters in your project. Here are the two main character types in MSVC++ that you should be concerned about: 1. char <-- char characters use an 8-bit character encoding (8 bits = 1 byte) according to MSDN. 2. wchar_t <-- wchar_t uses a 16-bit character encoding (16 bits = 2 bytes) according to MSDN. From above, we can see that the size of each character in our strings will change depending on our chosen character set. WARNING: Do NOT assume that any given character you append to either a Mutlibyte or Unicode string will always take up a single-byte or double-byte space defined by char or wchar_t! That is up to the discretion of the encoding used. Sometimes, characters need to be combined to define a character that the user wants in their string. In other words, take this example: Multibyte character strings take up a byte per character inside of the string, but that does not mean that a given byte will always produce the character you desire at a particular location, because even multibyte characters may take up more than a single byte. MSDN says it may take up TWO character spaces to produce a single multibyte-encoded character: "A multibyte-character string may contain a mixture of single-byte and double-byte characters. A two-byte multibyte character has a lead byte and a trail byte." WARNING: Do NOT assume that Unicode contains every character for every language. For more information, please see http://stackoverflow.com/questions/5290182/how-many-bytes-takes-one-unicode-character. Note: The ASCII Character Set is a subset of both Multibyte and Unicode Character Sets (in other words, both of these flavours encompass ASCII characters). Note: You should always use Unicode for new development, according to MSDN. For more information, please see http://msdn.microsoft.com/en-us/library/ey142t48.aspx. */ // Strings that are Multibyte. LPSTR a; // Regular Multibyte string (synonymous with char *). LPCSTR b; // Constant Multibyte string (synonymous with const char *). // Strings that are Unicode. LPWSTR c; // Regular Unicode string (synonymous with wchar_t *). LPCWSTR d; // Constant Unicode string (synonymous with const wchar_t *). // Strings that take on either Multibyte or Unicode depending on project settings. LPTSTR e; // Multibyte or Unicode string (can be either char * or wchar_t *). LPCTSTR f; // Constant Multibyte or Unicode string (can be either const char * or const wchar_t *). /* From above, it is safe to assume that the pattern is as follows: LP: Specifies a long pointer type (this is synonymous with prefixing this type with a *). W: Specifies that the type is of the Unicode Character Set. C: Specifies that the type is constant. T: Specifies that the type has a variable encoding. STR: Specifies that the type is a string type. */ // String format specifiers: e = _T("Example."); // Formats a string as either Multibyte or Unicode depending on project settings. e = TEXT("Example."); // Formats a string as either Multibyte or Unicode depending on project settings (same as _T). c = L"Example."; // Formats a string as Unicode. a = "Example."; // Formats a string as Multibyte. return 0; }