goldendict-ng/wstring.hh

/* This file is (c) 2008-2012 Konstantin Isakov <ikm@goldendict.org>
 * Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */

#ifndef __WSTRING_HH_INCLUDED__
#define __WSTRING_HH_INCLUDED__

#include <string>

/// While most systems feature a 4-byte wchar_t and an UCS-4 Unicode
/// characters representation for it, Windows uses 2-byte wchar_t and an UTF-16
/// encoding. The use of UTF-16 on Windows is most probably a homeage to an
/// ancient history dating back to when there was nothing but a BMP, and
/// all Unicode chars were 2 bytes long. After the Unicode got expanded past
/// two-byte representation, the guys at Microsoft had probably decided that
/// the least painful way to go is to just switch to UTF-16. Or so's the theory.
/// 
/// Now, the UTF family is an encoding, made for transit purposes -- is not a
/// representation. While it's good for passthrough, it's not directly
/// applicable for manipulation on Unicode symbols. It must be decoded first to
/// a normal UCS. Think like this: UTF to UCS is something like Base64 to ASCII.
/// 
/// The distinction between Microsoft platform and all other ones is that while
/// the latters are stuck in an 8-bit era and use UTF-8 to pass unicode around
/// through its venerable interfaces, the former one is stuck in a 16-bit era,
/// and uses UTF-16 instead. Neither solution allows for direct processing of
/// the symbols in those strings without decoding them first. And the 16-bit
/// solution is even more ugly than the 8-bit one, because it doesn't have a
/// benefit of ASCII compatibility, having a much more useless UCS-2
/// compatibility instead. It's stuck in the middle of nowhere, really.
/// 
/// The question is, what are we going to do with all this? When we do Unicode
/// processing in GoldenDict, we want to use real Unicode characters, not some
/// UTF-16 encoded ones. To that end, we have two options under Windows: first,
/// use QString, and second, use basic_string< unsigned int >.
/// While we use QStrings for the GUI and other non-critical code, there is a
/// serious doubt on the efficiency of QStrings for bulk text processing. And
/// since a lot of code uses wstring already, it would be much easier to convert
/// it to use basic_string< unsigned int > instead, since it shares the same
/// template, and therefore the interface too, with wstring. That's why we
/// introduce our own gd::wstring and gd::wchar types here. On all systems but
/// Windows, they are equivalent to std::wstring and wchar_t. On Windows, they
/// are basic_string< unsigned int > and unsigned int.

namespace gd
{
  #ifdef __WIN32

  typedef char32_t wchar;
  typedef std::u32string wstring;

  // GD_NATIVE_TO_WS is used to convert L"" strings to a const pointer to
  // wchar.
  wstring __nativeToWs( wchar_t const * );
  #define GD_NATIVE_TO_WS( str ) ( gd::__nativeToWs( ( str ) ).c_str() )

  #else

  typedef char32_t wchar;
  typedef std::u32string wstring;
  #define GD_NATIVE_TO_WS( str ) ( str )
  #endif
}

#endif
Update year in copyright notices. 2012-02-20 21:47:14 +00:00			`/* This file is (c) 2008-2012 Konstantin Isakov <ikm@goldendict.org>`
*! Introduce gd::wstring and gd:wchar and switch to them from std::wstring and wchar_t. This changes nothing on Linux and most other systems, but on Win32 it causes to use normal UCS-4 strings instead of Win32's usual UTF-16. 2009-04-18 17:20:12 +00:00			`* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */`

			`#ifndef __WSTRING_HH_INCLUDED__`
			`#define __WSTRING_HH_INCLUDED__`

			`#include <string>`

			`/// While most systems feature a 4-byte wchar_t and an UCS-4 Unicode`
			`/// characters representation for it, Windows uses 2-byte wchar_t and an UTF-16`
			`/// encoding. The use of UTF-16 on Windows is most probably a homeage to an`
* Comments edited a bit. 2009-04-19 23:34:49 +00:00			`/// ancient history dating back to when there was nothing but a BMP, and`
*! Introduce gd::wstring and gd:wchar and switch to them from std::wstring and wchar_t. This changes nothing on Linux and most other systems, but on Win32 it causes to use normal UCS-4 strings instead of Win32's usual UTF-16. 2009-04-18 17:20:12 +00:00			`/// all Unicode chars were 2 bytes long. After the Unicode got expanded past`
			`/// two-byte representation, the guys at Microsoft had probably decided that`
			`/// the least painful way to go is to just switch to UTF-16. Or so's the theory.`
			`///`
			`/// Now, the UTF family is an encoding, made for transit purposes -- is not a`
			`/// representation. While it's good for passthrough, it's not directly`
			`/// applicable for manipulation on Unicode symbols. It must be decoded first to`
			`/// a normal UCS. Think like this: UTF to UCS is something like Base64 to ASCII.`
			`///`
			`/// The distinction between Microsoft platform and all other ones is that while`
			`/// the latters are stuck in an 8-bit era and use UTF-8 to pass unicode around`
			`/// through its venerable interfaces, the former one is stuck in a 16-bit era,`
			`/// and uses UTF-16 instead. Neither solution allows for direct processing of`
			`/// the symbols in those strings without decoding them first. And the 16-bit`
			`/// solution is even more ugly than the 8-bit one, because it doesn't have a`
* Comments edited a bit. 2009-04-19 23:34:49 +00:00			`/// benefit of ASCII compatibility, having a much more useless UCS-2`
*! Introduce gd::wstring and gd:wchar and switch to them from std::wstring and wchar_t. This changes nothing on Linux and most other systems, but on Win32 it causes to use normal UCS-4 strings instead of Win32's usual UTF-16. 2009-04-18 17:20:12 +00:00			`/// compatibility instead. It's stuck in the middle of nowhere, really.`
			`///`
			`/// The question is, what are we going to do with all this? When we do Unicode`
Fix typos found by codespell 2018-07-07 09:33:15 +00:00			`/// processing in GoldenDict, we want to use real Unicode characters, not some`
*! Introduce gd::wstring and gd:wchar and switch to them from std::wstring and wchar_t. This changes nothing on Linux and most other systems, but on Win32 it causes to use normal UCS-4 strings instead of Win32's usual UTF-16. 2009-04-18 17:20:12 +00:00			`/// UTF-16 encoded ones. To that end, we have two options under Windows: first,`
* Comments edited a bit. 2009-04-19 23:34:49 +00:00			`/// use QString, and second, use basic_string< unsigned int >.`
*! Introduce gd::wstring and gd:wchar and switch to them from std::wstring and wchar_t. This changes nothing on Linux and most other systems, but on Win32 it causes to use normal UCS-4 strings instead of Win32's usual UTF-16. 2009-04-18 17:20:12 +00:00			`/// While we use QStrings for the GUI and other non-critical code, there is a`
			`/// serious doubt on the efficiency of QStrings for bulk text processing. And`
			`/// since a lot of code uses wstring already, it would be much easier to convert`
* Comments edited a bit. 2009-04-19 23:34:49 +00:00			`/// it to use basic_string< unsigned int > instead, since it shares the same`
			`/// template, and therefore the interface too, with wstring. That's why we`
			`/// introduce our own gd::wstring and gd::wchar types here. On all systems but`
			`/// Windows, they are equivalent to std::wstring and wchar_t. On Windows, they`
			`/// are basic_string< unsigned int > and unsigned int.`
*! Introduce gd::wstring and gd:wchar and switch to them from std::wstring and wchar_t. This changes nothing on Linux and most other systems, but on Win32 it causes to use normal UCS-4 strings instead of Win32's usual UTF-16. 2009-04-18 17:20:12 +00:00
			`namespace gd`
			`{`
			`#ifdef __WIN32`

fix dictionary parse error: 1,mdx dictionary load error in windows. 2,dsl dictionary load error in windows. 2021-10-18 16:19:25 +00:00			`typedef char32_t wchar;`
			`typedef std::u32string wstring;`
*! Introduce gd::wstring and gd:wchar and switch to them from std::wstring and wchar_t. This changes nothing on Linux and most other systems, but on Win32 it causes to use normal UCS-4 strings instead of Win32's usual UTF-16. 2009-04-18 17:20:12 +00:00
			`// GD_NATIVE_TO_WS is used to convert L"" strings to a const pointer to`
			`// wchar.`
			`wstring __nativeToWs( wchar_t const * );`
			`#define GD_NATIVE_TO_WS( str ) ( gd::__nativeToWs( ( str ) ).c_str() )`

			`#else`

fix dictionary parse error: 1,mdx dictionary load error in windows. 2,dsl dictionary load error in windows. 2021-10-18 16:19:25 +00:00			`typedef char32_t wchar;`
			`typedef std::u32string wstring;`
*! Introduce gd::wstring and gd:wchar and switch to them from std::wstring and wchar_t. This changes nothing on Linux and most other systems, but on Win32 it causes to use normal UCS-4 strings instead of Win32's usual UTF-16. 2009-04-18 17:20:12 +00:00			`#define GD_NATIVE_TO_WS( str ) ( str )`
			`#endif`
			`}`

			`#endif`