2010-05-08 18:59:59 +00:00
|
|
|
/* This file is (c) 2008-2010 Konstantin Isakov <ikm@users.berlios.de>
|
2009-04-18 17:20:12 +00:00
|
|
|
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
|
|
|
|
|
|
|
#ifndef __WSTRING_HH_INCLUDED__
|
|
|
|
#define __WSTRING_HH_INCLUDED__
|
|
|
|
|
|
|
|
#include <string>
|
|
|
|
|
|
|
|
/// While most systems feature a 4-byte wchar_t and an UCS-4 Unicode
|
|
|
|
/// characters representation for it, Windows uses 2-byte wchar_t and an UTF-16
|
|
|
|
/// encoding. The use of UTF-16 on Windows is most probably a homeage to an
|
2009-04-19 23:34:49 +00:00
|
|
|
/// ancient history dating back to when there was nothing but a BMP, and
|
2009-04-18 17:20:12 +00:00
|
|
|
/// all Unicode chars were 2 bytes long. After the Unicode got expanded past
|
|
|
|
/// two-byte representation, the guys at Microsoft had probably decided that
|
|
|
|
/// the least painful way to go is to just switch to UTF-16. Or so's the theory.
|
|
|
|
///
|
|
|
|
/// Now, the UTF family is an encoding, made for transit purposes -- is not a
|
|
|
|
/// representation. While it's good for passthrough, it's not directly
|
|
|
|
/// applicable for manipulation on Unicode symbols. It must be decoded first to
|
|
|
|
/// a normal UCS. Think like this: UTF to UCS is something like Base64 to ASCII.
|
|
|
|
///
|
|
|
|
/// The distinction between Microsoft platform and all other ones is that while
|
|
|
|
/// the latters are stuck in an 8-bit era and use UTF-8 to pass unicode around
|
|
|
|
/// through its venerable interfaces, the former one is stuck in a 16-bit era,
|
|
|
|
/// and uses UTF-16 instead. Neither solution allows for direct processing of
|
|
|
|
/// the symbols in those strings without decoding them first. And the 16-bit
|
|
|
|
/// solution is even more ugly than the 8-bit one, because it doesn't have a
|
2009-04-19 23:34:49 +00:00
|
|
|
/// benefit of ASCII compatibility, having a much more useless UCS-2
|
2009-04-18 17:20:12 +00:00
|
|
|
/// compatibility instead. It's stuck in the middle of nowhere, really.
|
|
|
|
///
|
|
|
|
/// The question is, what are we going to do with all this? When we do Unicode
|
|
|
|
/// processing in GoldenDict, we want to use real Unicode charaters, not some
|
|
|
|
/// UTF-16 encoded ones. To that end, we have two options under Windows: first,
|
2009-04-19 23:34:49 +00:00
|
|
|
/// use QString, and second, use basic_string< unsigned int >.
|
2009-04-18 17:20:12 +00:00
|
|
|
/// While we use QStrings for the GUI and other non-critical code, there is a
|
|
|
|
/// serious doubt on the efficiency of QStrings for bulk text processing. And
|
|
|
|
/// since a lot of code uses wstring already, it would be much easier to convert
|
2009-04-19 23:34:49 +00:00
|
|
|
/// it to use basic_string< unsigned int > instead, since it shares the same
|
|
|
|
/// template, and therefore the interface too, with wstring. That's why we
|
|
|
|
/// introduce our own gd::wstring and gd::wchar types here. On all systems but
|
|
|
|
/// Windows, they are equivalent to std::wstring and wchar_t. On Windows, they
|
|
|
|
/// are basic_string< unsigned int > and unsigned int.
|
2009-04-18 17:20:12 +00:00
|
|
|
|
|
|
|
namespace gd
|
|
|
|
{
|
|
|
|
#ifdef __WIN32
|
|
|
|
|
|
|
|
typedef unsigned int wchar;
|
|
|
|
typedef std::basic_string< wchar > wstring;
|
|
|
|
|
|
|
|
// GD_NATIVE_TO_WS is used to convert L"" strings to a const pointer to
|
|
|
|
// wchar.
|
|
|
|
wstring __nativeToWs( wchar_t const * );
|
|
|
|
#define GD_NATIVE_TO_WS( str ) ( gd::__nativeToWs( ( str ) ).c_str() )
|
|
|
|
|
|
|
|
#else
|
|
|
|
|
|
|
|
typedef wchar_t wchar;
|
|
|
|
using std::wstring;
|
|
|
|
#define GD_NATIVE_TO_WS( str ) ( str )
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif
|