Created /trunk/src and moved everything there.
691
src/LICENSE.txt
Normal file
|
@ -0,0 +1,691 @@
|
|||
GoldenDict, a dictionary lookup program.
|
||||
Copyright (C) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
The text of the license follows.
|
||||
|
||||
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
Version 3, 29 June 2007
|
||||
|
||||
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
|
||||
Everyone is permitted to copy and distribute verbatim copies
|
||||
of this license document, but changing it is not allowed.
|
||||
|
||||
Preamble
|
||||
|
||||
The GNU General Public License is a free, copyleft license for
|
||||
software and other kinds of works.
|
||||
|
||||
The licenses for most software and other practical works are designed
|
||||
to take away your freedom to share and change the works. By contrast,
|
||||
the GNU General Public License is intended to guarantee your freedom to
|
||||
share and change all versions of a program--to make sure it remains free
|
||||
software for all its users. We, the Free Software Foundation, use the
|
||||
GNU General Public License for most of our software; it applies also to
|
||||
any other work released this way by its authors. You can apply it to
|
||||
your programs, too.
|
||||
|
||||
When we speak of free software, we are referring to freedom, not
|
||||
price. Our General Public Licenses are designed to make sure that you
|
||||
have the freedom to distribute copies of free software (and charge for
|
||||
them if you wish), that you receive source code or can get it if you
|
||||
want it, that you can change the software or use pieces of it in new
|
||||
free programs, and that you know you can do these things.
|
||||
|
||||
To protect your rights, we need to prevent others from denying you
|
||||
these rights or asking you to surrender the rights. Therefore, you have
|
||||
certain responsibilities if you distribute copies of the software, or if
|
||||
you modify it: responsibilities to respect the freedom of others.
|
||||
|
||||
For example, if you distribute copies of such a program, whether
|
||||
gratis or for a fee, you must pass on to the recipients the same
|
||||
freedoms that you received. You must make sure that they, too, receive
|
||||
or can get the source code. And you must show them these terms so they
|
||||
know their rights.
|
||||
|
||||
Developers that use the GNU GPL protect your rights with two steps:
|
||||
(1) assert copyright on the software, and (2) offer you this License
|
||||
giving you legal permission to copy, distribute and/or modify it.
|
||||
|
||||
For the developers' and authors' protection, the GPL clearly explains
|
||||
that there is no warranty for this free software. For both users' and
|
||||
authors' sake, the GPL requires that modified versions be marked as
|
||||
changed, so that their problems will not be attributed erroneously to
|
||||
authors of previous versions.
|
||||
|
||||
Some devices are designed to deny users access to install or run
|
||||
modified versions of the software inside them, although the manufacturer
|
||||
can do so. This is fundamentally incompatible with the aim of
|
||||
protecting users' freedom to change the software. The systematic
|
||||
pattern of such abuse occurs in the area of products for individuals to
|
||||
use, which is precisely where it is most unacceptable. Therefore, we
|
||||
have designed this version of the GPL to prohibit the practice for those
|
||||
products. If such problems arise substantially in other domains, we
|
||||
stand ready to extend this provision to those domains in future versions
|
||||
of the GPL, as needed to protect the freedom of users.
|
||||
|
||||
Finally, every program is threatened constantly by software patents.
|
||||
States should not allow patents to restrict development and use of
|
||||
software on general-purpose computers, but in those that do, we wish to
|
||||
avoid the special danger that patents applied to a free program could
|
||||
make it effectively proprietary. To prevent this, the GPL assures that
|
||||
patents cannot be used to render the program non-free.
|
||||
|
||||
The precise terms and conditions for copying, distribution and
|
||||
modification follow.
|
||||
|
||||
TERMS AND CONDITIONS
|
||||
|
||||
0. Definitions.
|
||||
|
||||
"This License" refers to version 3 of the GNU General Public License.
|
||||
|
||||
"Copyright" also means copyright-like laws that apply to other kinds of
|
||||
works, such as semiconductor masks.
|
||||
|
||||
"The Program" refers to any copyrightable work licensed under this
|
||||
License. Each licensee is addressed as "you". "Licensees" and
|
||||
"recipients" may be individuals or organizations.
|
||||
|
||||
To "modify" a work means to copy from or adapt all or part of the work
|
||||
in a fashion requiring copyright permission, other than the making of an
|
||||
exact copy. The resulting work is called a "modified version" of the
|
||||
earlier work or a work "based on" the earlier work.
|
||||
|
||||
A "covered work" means either the unmodified Program or a work based
|
||||
on the Program.
|
||||
|
||||
To "propagate" a work means to do anything with it that, without
|
||||
permission, would make you directly or secondarily liable for
|
||||
infringement under applicable copyright law, except executing it on a
|
||||
computer or modifying a private copy. Propagation includes copying,
|
||||
distribution (with or without modification), making available to the
|
||||
public, and in some countries other activities as well.
|
||||
|
||||
To "convey" a work means any kind of propagation that enables other
|
||||
parties to make or receive copies. Mere interaction with a user through
|
||||
a computer network, with no transfer of a copy, is not conveying.
|
||||
|
||||
An interactive user interface displays "Appropriate Legal Notices"
|
||||
to the extent that it includes a convenient and prominently visible
|
||||
feature that (1) displays an appropriate copyright notice, and (2)
|
||||
tells the user that there is no warranty for the work (except to the
|
||||
extent that warranties are provided), that licensees may convey the
|
||||
work under this License, and how to view a copy of this License. If
|
||||
the interface presents a list of user commands or options, such as a
|
||||
menu, a prominent item in the list meets this criterion.
|
||||
|
||||
1. Source Code.
|
||||
|
||||
The "source code" for a work means the preferred form of the work
|
||||
for making modifications to it. "Object code" means any non-source
|
||||
form of a work.
|
||||
|
||||
A "Standard Interface" means an interface that either is an official
|
||||
standard defined by a recognized standards body, or, in the case of
|
||||
interfaces specified for a particular programming language, one that
|
||||
is widely used among developers working in that language.
|
||||
|
||||
The "System Libraries" of an executable work include anything, other
|
||||
than the work as a whole, that (a) is included in the normal form of
|
||||
packaging a Major Component, but which is not part of that Major
|
||||
Component, and (b) serves only to enable use of the work with that
|
||||
Major Component, or to implement a Standard Interface for which an
|
||||
implementation is available to the public in source code form. A
|
||||
"Major Component", in this context, means a major essential component
|
||||
(kernel, window system, and so on) of the specific operating system
|
||||
(if any) on which the executable work runs, or a compiler used to
|
||||
produce the work, or an object code interpreter used to run it.
|
||||
|
||||
The "Corresponding Source" for a work in object code form means all
|
||||
the source code needed to generate, install, and (for an executable
|
||||
work) run the object code and to modify the work, including scripts to
|
||||
control those activities. However, it does not include the work's
|
||||
System Libraries, or general-purpose tools or generally available free
|
||||
programs which are used unmodified in performing those activities but
|
||||
which are not part of the work. For example, Corresponding Source
|
||||
includes interface definition files associated with source files for
|
||||
the work, and the source code for shared libraries and dynamically
|
||||
linked subprograms that the work is specifically designed to require,
|
||||
such as by intimate data communication or control flow between those
|
||||
subprograms and other parts of the work.
|
||||
|
||||
The Corresponding Source need not include anything that users
|
||||
can regenerate automatically from other parts of the Corresponding
|
||||
Source.
|
||||
|
||||
The Corresponding Source for a work in source code form is that
|
||||
same work.
|
||||
|
||||
2. Basic Permissions.
|
||||
|
||||
All rights granted under this License are granted for the term of
|
||||
copyright on the Program, and are irrevocable provided the stated
|
||||
conditions are met. This License explicitly affirms your unlimited
|
||||
permission to run the unmodified Program. The output from running a
|
||||
covered work is covered by this License only if the output, given its
|
||||
content, constitutes a covered work. This License acknowledges your
|
||||
rights of fair use or other equivalent, as provided by copyright law.
|
||||
|
||||
You may make, run and propagate covered works that you do not
|
||||
convey, without conditions so long as your license otherwise remains
|
||||
in force. You may convey covered works to others for the sole purpose
|
||||
of having them make modifications exclusively for you, or provide you
|
||||
with facilities for running those works, provided that you comply with
|
||||
the terms of this License in conveying all material for which you do
|
||||
not control copyright. Those thus making or running the covered works
|
||||
for you must do so exclusively on your behalf, under your direction
|
||||
and control, on terms that prohibit them from making any copies of
|
||||
your copyrighted material outside their relationship with you.
|
||||
|
||||
Conveying under any other circumstances is permitted solely under
|
||||
the conditions stated below. Sublicensing is not allowed; section 10
|
||||
makes it unnecessary.
|
||||
|
||||
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
|
||||
|
||||
No covered work shall be deemed part of an effective technological
|
||||
measure under any applicable law fulfilling obligations under article
|
||||
11 of the WIPO copyright treaty adopted on 20 December 1996, or
|
||||
similar laws prohibiting or restricting circumvention of such
|
||||
measures.
|
||||
|
||||
When you convey a covered work, you waive any legal power to forbid
|
||||
circumvention of technological measures to the extent such circumvention
|
||||
is effected by exercising rights under this License with respect to
|
||||
the covered work, and you disclaim any intention to limit operation or
|
||||
modification of the work as a means of enforcing, against the work's
|
||||
users, your or third parties' legal rights to forbid circumvention of
|
||||
technological measures.
|
||||
|
||||
4. Conveying Verbatim Copies.
|
||||
|
||||
You may convey verbatim copies of the Program's source code as you
|
||||
receive it, in any medium, provided that you conspicuously and
|
||||
appropriately publish on each copy an appropriate copyright notice;
|
||||
keep intact all notices stating that this License and any
|
||||
non-permissive terms added in accord with section 7 apply to the code;
|
||||
keep intact all notices of the absence of any warranty; and give all
|
||||
recipients a copy of this License along with the Program.
|
||||
|
||||
You may charge any price or no price for each copy that you convey,
|
||||
and you may offer support or warranty protection for a fee.
|
||||
|
||||
5. Conveying Modified Source Versions.
|
||||
|
||||
You may convey a work based on the Program, or the modifications to
|
||||
produce it from the Program, in the form of source code under the
|
||||
terms of section 4, provided that you also meet all of these conditions:
|
||||
|
||||
a) The work must carry prominent notices stating that you modified
|
||||
it, and giving a relevant date.
|
||||
|
||||
b) The work must carry prominent notices stating that it is
|
||||
released under this License and any conditions added under section
|
||||
7. This requirement modifies the requirement in section 4 to
|
||||
"keep intact all notices".
|
||||
|
||||
c) You must license the entire work, as a whole, under this
|
||||
License to anyone who comes into possession of a copy. This
|
||||
License will therefore apply, along with any applicable section 7
|
||||
additional terms, to the whole of the work, and all its parts,
|
||||
regardless of how they are packaged. This License gives no
|
||||
permission to license the work in any other way, but it does not
|
||||
invalidate such permission if you have separately received it.
|
||||
|
||||
d) If the work has interactive user interfaces, each must display
|
||||
Appropriate Legal Notices; however, if the Program has interactive
|
||||
interfaces that do not display Appropriate Legal Notices, your
|
||||
work need not make them do so.
|
||||
|
||||
A compilation of a covered work with other separate and independent
|
||||
works, which are not by their nature extensions of the covered work,
|
||||
and which are not combined with it such as to form a larger program,
|
||||
in or on a volume of a storage or distribution medium, is called an
|
||||
"aggregate" if the compilation and its resulting copyright are not
|
||||
used to limit the access or legal rights of the compilation's users
|
||||
beyond what the individual works permit. Inclusion of a covered work
|
||||
in an aggregate does not cause this License to apply to the other
|
||||
parts of the aggregate.
|
||||
|
||||
6. Conveying Non-Source Forms.
|
||||
|
||||
You may convey a covered work in object code form under the terms
|
||||
of sections 4 and 5, provided that you also convey the
|
||||
machine-readable Corresponding Source under the terms of this License,
|
||||
in one of these ways:
|
||||
|
||||
a) Convey the object code in, or embodied in, a physical product
|
||||
(including a physical distribution medium), accompanied by the
|
||||
Corresponding Source fixed on a durable physical medium
|
||||
customarily used for software interchange.
|
||||
|
||||
b) Convey the object code in, or embodied in, a physical product
|
||||
(including a physical distribution medium), accompanied by a
|
||||
written offer, valid for at least three years and valid for as
|
||||
long as you offer spare parts or customer support for that product
|
||||
model, to give anyone who possesses the object code either (1) a
|
||||
copy of the Corresponding Source for all the software in the
|
||||
product that is covered by this License, on a durable physical
|
||||
medium customarily used for software interchange, for a price no
|
||||
more than your reasonable cost of physically performing this
|
||||
conveying of source, or (2) access to copy the
|
||||
Corresponding Source from a network server at no charge.
|
||||
|
||||
c) Convey individual copies of the object code with a copy of the
|
||||
written offer to provide the Corresponding Source. This
|
||||
alternative is allowed only occasionally and noncommercially, and
|
||||
only if you received the object code with such an offer, in accord
|
||||
with subsection 6b.
|
||||
|
||||
d) Convey the object code by offering access from a designated
|
||||
place (gratis or for a charge), and offer equivalent access to the
|
||||
Corresponding Source in the same way through the same place at no
|
||||
further charge. You need not require recipients to copy the
|
||||
Corresponding Source along with the object code. If the place to
|
||||
copy the object code is a network server, the Corresponding Source
|
||||
may be on a different server (operated by you or a third party)
|
||||
that supports equivalent copying facilities, provided you maintain
|
||||
clear directions next to the object code saying where to find the
|
||||
Corresponding Source. Regardless of what server hosts the
|
||||
Corresponding Source, you remain obligated to ensure that it is
|
||||
available for as long as needed to satisfy these requirements.
|
||||
|
||||
e) Convey the object code using peer-to-peer transmission, provided
|
||||
you inform other peers where the object code and Corresponding
|
||||
Source of the work are being offered to the general public at no
|
||||
charge under subsection 6d.
|
||||
|
||||
A separable portion of the object code, whose source code is excluded
|
||||
from the Corresponding Source as a System Library, need not be
|
||||
included in conveying the object code work.
|
||||
|
||||
A "User Product" is either (1) a "consumer product", which means any
|
||||
tangible personal property which is normally used for personal, family,
|
||||
or household purposes, or (2) anything designed or sold for incorporation
|
||||
into a dwelling. In determining whether a product is a consumer product,
|
||||
doubtful cases shall be resolved in favor of coverage. For a particular
|
||||
product received by a particular user, "normally used" refers to a
|
||||
typical or common use of that class of product, regardless of the status
|
||||
of the particular user or of the way in which the particular user
|
||||
actually uses, or expects or is expected to use, the product. A product
|
||||
is a consumer product regardless of whether the product has substantial
|
||||
commercial, industrial or non-consumer uses, unless such uses represent
|
||||
the only significant mode of use of the product.
|
||||
|
||||
"Installation Information" for a User Product means any methods,
|
||||
procedures, authorization keys, or other information required to install
|
||||
and execute modified versions of a covered work in that User Product from
|
||||
a modified version of its Corresponding Source. The information must
|
||||
suffice to ensure that the continued functioning of the modified object
|
||||
code is in no case prevented or interfered with solely because
|
||||
modification has been made.
|
||||
|
||||
If you convey an object code work under this section in, or with, or
|
||||
specifically for use in, a User Product, and the conveying occurs as
|
||||
part of a transaction in which the right of possession and use of the
|
||||
User Product is transferred to the recipient in perpetuity or for a
|
||||
fixed term (regardless of how the transaction is characterized), the
|
||||
Corresponding Source conveyed under this section must be accompanied
|
||||
by the Installation Information. But this requirement does not apply
|
||||
if neither you nor any third party retains the ability to install
|
||||
modified object code on the User Product (for example, the work has
|
||||
been installed in ROM).
|
||||
|
||||
The requirement to provide Installation Information does not include a
|
||||
requirement to continue to provide support service, warranty, or updates
|
||||
for a work that has been modified or installed by the recipient, or for
|
||||
the User Product in which it has been modified or installed. Access to a
|
||||
network may be denied when the modification itself materially and
|
||||
adversely affects the operation of the network or violates the rules and
|
||||
protocols for communication across the network.
|
||||
|
||||
Corresponding Source conveyed, and Installation Information provided,
|
||||
in accord with this section must be in a format that is publicly
|
||||
documented (and with an implementation available to the public in
|
||||
source code form), and must require no special password or key for
|
||||
unpacking, reading or copying.
|
||||
|
||||
7. Additional Terms.
|
||||
|
||||
"Additional permissions" are terms that supplement the terms of this
|
||||
License by making exceptions from one or more of its conditions.
|
||||
Additional permissions that are applicable to the entire Program shall
|
||||
be treated as though they were included in this License, to the extent
|
||||
that they are valid under applicable law. If additional permissions
|
||||
apply only to part of the Program, that part may be used separately
|
||||
under those permissions, but the entire Program remains governed by
|
||||
this License without regard to the additional permissions.
|
||||
|
||||
When you convey a copy of a covered work, you may at your option
|
||||
remove any additional permissions from that copy, or from any part of
|
||||
it. (Additional permissions may be written to require their own
|
||||
removal in certain cases when you modify the work.) You may place
|
||||
additional permissions on material, added by you to a covered work,
|
||||
for which you have or can give appropriate copyright permission.
|
||||
|
||||
Notwithstanding any other provision of this License, for material you
|
||||
add to a covered work, you may (if authorized by the copyright holders of
|
||||
that material) supplement the terms of this License with terms:
|
||||
|
||||
a) Disclaiming warranty or limiting liability differently from the
|
||||
terms of sections 15 and 16 of this License; or
|
||||
|
||||
b) Requiring preservation of specified reasonable legal notices or
|
||||
author attributions in that material or in the Appropriate Legal
|
||||
Notices displayed by works containing it; or
|
||||
|
||||
c) Prohibiting misrepresentation of the origin of that material, or
|
||||
requiring that modified versions of such material be marked in
|
||||
reasonable ways as different from the original version; or
|
||||
|
||||
d) Limiting the use for publicity purposes of names of licensors or
|
||||
authors of the material; or
|
||||
|
||||
e) Declining to grant rights under trademark law for use of some
|
||||
trade names, trademarks, or service marks; or
|
||||
|
||||
f) Requiring indemnification of licensors and authors of that
|
||||
material by anyone who conveys the material (or modified versions of
|
||||
it) with contractual assumptions of liability to the recipient, for
|
||||
any liability that these contractual assumptions directly impose on
|
||||
those licensors and authors.
|
||||
|
||||
All other non-permissive additional terms are considered "further
|
||||
restrictions" within the meaning of section 10. If the Program as you
|
||||
received it, or any part of it, contains a notice stating that it is
|
||||
governed by this License along with a term that is a further
|
||||
restriction, you may remove that term. If a license document contains
|
||||
a further restriction but permits relicensing or conveying under this
|
||||
License, you may add to a covered work material governed by the terms
|
||||
of that license document, provided that the further restriction does
|
||||
not survive such relicensing or conveying.
|
||||
|
||||
If you add terms to a covered work in accord with this section, you
|
||||
must place, in the relevant source files, a statement of the
|
||||
additional terms that apply to those files, or a notice indicating
|
||||
where to find the applicable terms.
|
||||
|
||||
Additional terms, permissive or non-permissive, may be stated in the
|
||||
form of a separately written license, or stated as exceptions;
|
||||
the above requirements apply either way.
|
||||
|
||||
8. Termination.
|
||||
|
||||
You may not propagate or modify a covered work except as expressly
|
||||
provided under this License. Any attempt otherwise to propagate or
|
||||
modify it is void, and will automatically terminate your rights under
|
||||
this License (including any patent licenses granted under the third
|
||||
paragraph of section 11).
|
||||
|
||||
However, if you cease all violation of this License, then your
|
||||
license from a particular copyright holder is reinstated (a)
|
||||
provisionally, unless and until the copyright holder explicitly and
|
||||
finally terminates your license, and (b) permanently, if the copyright
|
||||
holder fails to notify you of the violation by some reasonable means
|
||||
prior to 60 days after the cessation.
|
||||
|
||||
Moreover, your license from a particular copyright holder is
|
||||
reinstated permanently if the copyright holder notifies you of the
|
||||
violation by some reasonable means, this is the first time you have
|
||||
received notice of violation of this License (for any work) from that
|
||||
copyright holder, and you cure the violation prior to 30 days after
|
||||
your receipt of the notice.
|
||||
|
||||
Termination of your rights under this section does not terminate the
|
||||
licenses of parties who have received copies or rights from you under
|
||||
this License. If your rights have been terminated and not permanently
|
||||
reinstated, you do not qualify to receive new licenses for the same
|
||||
material under section 10.
|
||||
|
||||
9. Acceptance Not Required for Having Copies.
|
||||
|
||||
You are not required to accept this License in order to receive or
|
||||
run a copy of the Program. Ancillary propagation of a covered work
|
||||
occurring solely as a consequence of using peer-to-peer transmission
|
||||
to receive a copy likewise does not require acceptance. However,
|
||||
nothing other than this License grants you permission to propagate or
|
||||
modify any covered work. These actions infringe copyright if you do
|
||||
not accept this License. Therefore, by modifying or propagating a
|
||||
covered work, you indicate your acceptance of this License to do so.
|
||||
|
||||
10. Automatic Licensing of Downstream Recipients.
|
||||
|
||||
Each time you convey a covered work, the recipient automatically
|
||||
receives a license from the original licensors, to run, modify and
|
||||
propagate that work, subject to this License. You are not responsible
|
||||
for enforcing compliance by third parties with this License.
|
||||
|
||||
An "entity transaction" is a transaction transferring control of an
|
||||
organization, or substantially all assets of one, or subdividing an
|
||||
organization, or merging organizations. If propagation of a covered
|
||||
work results from an entity transaction, each party to that
|
||||
transaction who receives a copy of the work also receives whatever
|
||||
licenses to the work the party's predecessor in interest had or could
|
||||
give under the previous paragraph, plus a right to possession of the
|
||||
Corresponding Source of the work from the predecessor in interest, if
|
||||
the predecessor has it or can get it with reasonable efforts.
|
||||
|
||||
You may not impose any further restrictions on the exercise of the
|
||||
rights granted or affirmed under this License. For example, you may
|
||||
not impose a license fee, royalty, or other charge for exercise of
|
||||
rights granted under this License, and you may not initiate litigation
|
||||
(including a cross-claim or counterclaim in a lawsuit) alleging that
|
||||
any patent claim is infringed by making, using, selling, offering for
|
||||
sale, or importing the Program or any portion of it.
|
||||
|
||||
11. Patents.
|
||||
|
||||
A "contributor" is a copyright holder who authorizes use under this
|
||||
License of the Program or a work on which the Program is based. The
|
||||
work thus licensed is called the contributor's "contributor version".
|
||||
|
||||
A contributor's "essential patent claims" are all patent claims
|
||||
owned or controlled by the contributor, whether already acquired or
|
||||
hereafter acquired, that would be infringed by some manner, permitted
|
||||
by this License, of making, using, or selling its contributor version,
|
||||
but do not include claims that would be infringed only as a
|
||||
consequence of further modification of the contributor version. For
|
||||
purposes of this definition, "control" includes the right to grant
|
||||
patent sublicenses in a manner consistent with the requirements of
|
||||
this License.
|
||||
|
||||
Each contributor grants you a non-exclusive, worldwide, royalty-free
|
||||
patent license under the contributor's essential patent claims, to
|
||||
make, use, sell, offer for sale, import and otherwise run, modify and
|
||||
propagate the contents of its contributor version.
|
||||
|
||||
In the following three paragraphs, a "patent license" is any express
|
||||
agreement or commitment, however denominated, not to enforce a patent
|
||||
(such as an express permission to practice a patent or covenant not to
|
||||
sue for patent infringement). To "grant" such a patent license to a
|
||||
party means to make such an agreement or commitment not to enforce a
|
||||
patent against the party.
|
||||
|
||||
If you convey a covered work, knowingly relying on a patent license,
|
||||
and the Corresponding Source of the work is not available for anyone
|
||||
to copy, free of charge and under the terms of this License, through a
|
||||
publicly available network server or other readily accessible means,
|
||||
then you must either (1) cause the Corresponding Source to be so
|
||||
available, or (2) arrange to deprive yourself of the benefit of the
|
||||
patent license for this particular work, or (3) arrange, in a manner
|
||||
consistent with the requirements of this License, to extend the patent
|
||||
license to downstream recipients. "Knowingly relying" means you have
|
||||
actual knowledge that, but for the patent license, your conveying the
|
||||
covered work in a country, or your recipient's use of the covered work
|
||||
in a country, would infringe one or more identifiable patents in that
|
||||
country that you have reason to believe are valid.
|
||||
|
||||
If, pursuant to or in connection with a single transaction or
|
||||
arrangement, you convey, or propagate by procuring conveyance of, a
|
||||
covered work, and grant a patent license to some of the parties
|
||||
receiving the covered work authorizing them to use, propagate, modify
|
||||
or convey a specific copy of the covered work, then the patent license
|
||||
you grant is automatically extended to all recipients of the covered
|
||||
work and works based on it.
|
||||
|
||||
A patent license is "discriminatory" if it does not include within
|
||||
the scope of its coverage, prohibits the exercise of, or is
|
||||
conditioned on the non-exercise of one or more of the rights that are
|
||||
specifically granted under this License. You may not convey a covered
|
||||
work if you are a party to an arrangement with a third party that is
|
||||
in the business of distributing software, under which you make payment
|
||||
to the third party based on the extent of your activity of conveying
|
||||
the work, and under which the third party grants, to any of the
|
||||
parties who would receive the covered work from you, a discriminatory
|
||||
patent license (a) in connection with copies of the covered work
|
||||
conveyed by you (or copies made from those copies), or (b) primarily
|
||||
for and in connection with specific products or compilations that
|
||||
contain the covered work, unless you entered into that arrangement,
|
||||
or that patent license was granted, prior to 28 March 2007.
|
||||
|
||||
Nothing in this License shall be construed as excluding or limiting
|
||||
any implied license or other defenses to infringement that may
|
||||
otherwise be available to you under applicable patent law.
|
||||
|
||||
12. No Surrender of Others' Freedom.
|
||||
|
||||
If conditions are imposed on you (whether by court order, agreement or
|
||||
otherwise) that contradict the conditions of this License, they do not
|
||||
excuse you from the conditions of this License. If you cannot convey a
|
||||
covered work so as to satisfy simultaneously your obligations under this
|
||||
License and any other pertinent obligations, then as a consequence you may
|
||||
not convey it at all. For example, if you agree to terms that obligate you
|
||||
to collect a royalty for further conveying from those to whom you convey
|
||||
the Program, the only way you could satisfy both those terms and this
|
||||
License would be to refrain entirely from conveying the Program.
|
||||
|
||||
13. Use with the GNU Affero General Public License.
|
||||
|
||||
Notwithstanding any other provision of this License, you have
|
||||
permission to link or combine any covered work with a work licensed
|
||||
under version 3 of the GNU Affero General Public License into a single
|
||||
combined work, and to convey the resulting work. The terms of this
|
||||
License will continue to apply to the part which is the covered work,
|
||||
but the special requirements of the GNU Affero General Public License,
|
||||
section 13, concerning interaction through a network will apply to the
|
||||
combination as such.
|
||||
|
||||
14. Revised Versions of this License.
|
||||
|
||||
The Free Software Foundation may publish revised and/or new versions of
|
||||
the GNU General Public License from time to time. Such new versions will
|
||||
be similar in spirit to the present version, but may differ in detail to
|
||||
address new problems or concerns.
|
||||
|
||||
Each version is given a distinguishing version number. If the
|
||||
Program specifies that a certain numbered version of the GNU General
|
||||
Public License "or any later version" applies to it, you have the
|
||||
option of following the terms and conditions either of that numbered
|
||||
version or of any later version published by the Free Software
|
||||
Foundation. If the Program does not specify a version number of the
|
||||
GNU General Public License, you may choose any version ever published
|
||||
by the Free Software Foundation.
|
||||
|
||||
If the Program specifies that a proxy can decide which future
|
||||
versions of the GNU General Public License can be used, that proxy's
|
||||
public statement of acceptance of a version permanently authorizes you
|
||||
to choose that version for the Program.
|
||||
|
||||
Later license versions may give you additional or different
|
||||
permissions. However, no additional obligations are imposed on any
|
||||
author or copyright holder as a result of your choosing to follow a
|
||||
later version.
|
||||
|
||||
15. Disclaimer of Warranty.
|
||||
|
||||
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
|
||||
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
|
||||
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
|
||||
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
|
||||
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
|
||||
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
|
||||
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
|
||||
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
|
||||
|
||||
16. Limitation of Liability.
|
||||
|
||||
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
|
||||
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
|
||||
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
|
||||
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
|
||||
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
|
||||
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
|
||||
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
|
||||
SUCH DAMAGES.
|
||||
|
||||
17. Interpretation of Sections 15 and 16.
|
||||
|
||||
If the disclaimer of warranty and limitation of liability provided
|
||||
above cannot be given local legal effect according to their terms,
|
||||
reviewing courts shall apply local law that most closely approximates
|
||||
an absolute waiver of all civil liability in connection with the
|
||||
Program, unless a warranty or assumption of liability accompanies a
|
||||
copy of the Program in return for a fee.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
How to Apply These Terms to Your New Programs
|
||||
|
||||
If you develop a new program, and you want it to be of the greatest
|
||||
possible use to the public, the best way to achieve this is to make it
|
||||
free software which everyone can redistribute and change under these terms.
|
||||
|
||||
To do so, attach the following notices to the program. It is safest
|
||||
to attach them to the start of each source file to most effectively
|
||||
state the exclusion of warranty; and each file should have at least
|
||||
the "copyright" line and a pointer to where the full notice is found.
|
||||
|
||||
<one line to give the program's name and a brief idea of what it does.>
|
||||
Copyright (C) <year> <name of author>
|
||||
|
||||
This program is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
Also add information on how to contact you by electronic and paper mail.
|
||||
|
||||
If the program does terminal interaction, make it output a short
|
||||
notice like this when it starts in an interactive mode:
|
||||
|
||||
<program> Copyright (C) <year> <name of author>
|
||||
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||
This is free software, and you are welcome to redistribute it
|
||||
under certain conditions; type `show c' for details.
|
||||
|
||||
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||
parts of the General Public License. Of course, your program's commands
|
||||
might be different; for a GUI interface, you would use an "about box".
|
||||
|
||||
You should also get your employer (if you work as a programmer) or school,
|
||||
if any, to sign a "copyright disclaimer" for the program, if necessary.
|
||||
For more information on this, and how to apply and follow the GNU GPL, see
|
||||
<http://www.gnu.org/licenses/>.
|
||||
|
||||
The GNU General Public License does not permit incorporating your program
|
||||
into proprietary programs. If your program is a subroutine library, you
|
||||
may consider it more useful to permit linking proprietary applications with
|
||||
the library. If this is what you want to do, use the GNU Lesser General
|
||||
Public License instead of this License. But first, please read
|
||||
<http://www.gnu.org/philosophy/why-not-lgpl.html>.
|
||||
|
116
src/article_maker.cc
Normal file
|
@ -0,0 +1,116 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include "article_maker.hh"
|
||||
#include "config.hh"
|
||||
#include "htmlescape.hh"
|
||||
#include "utf8.hh"
|
||||
#include <QFile>
|
||||
#include <set>
|
||||
|
||||
|
||||
using std::vector;
|
||||
using std::string;
|
||||
using std::wstring;
|
||||
using std::set;
|
||||
|
||||
ArticleMaker::ArticleMaker( vector< sptr< Dictionary::Class > > const & dictionaries_,
|
||||
vector< Instances::Group > const & groups_ ):
|
||||
dictionaries( dictionaries_ ),
|
||||
groups( groups_ )
|
||||
{
|
||||
}
|
||||
|
||||
string ArticleMaker::makeDefinitionFor( QString const & inWord,
|
||||
QString const & group ) const
|
||||
{
|
||||
printf( "group = %ls\n", group.toStdWString().c_str() );
|
||||
|
||||
wstring word = inWord.trimmed().toStdWString();
|
||||
|
||||
string result =
|
||||
"<html><head>"
|
||||
"<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\">";
|
||||
|
||||
QFile cssFile( Config::getUserCssFileName() );
|
||||
|
||||
if ( cssFile.open( QFile::ReadOnly ) )
|
||||
{
|
||||
result += "<style type=\"text/css\">\n";
|
||||
result += cssFile.readAll().data();
|
||||
result += "</style>\n";
|
||||
}
|
||||
|
||||
result += "<title>" + Html::escape( Utf8::encode( word ) ) + "</title>";
|
||||
|
||||
// Find the given group
|
||||
|
||||
Instances::Group const * activeGroup = 0;
|
||||
|
||||
for( unsigned x = 0; x < groups.size(); ++x )
|
||||
if ( groups[ x ].name == group )
|
||||
{
|
||||
activeGroup = &groups[ x ];
|
||||
break;
|
||||
}
|
||||
|
||||
// If we've found a group, use its dictionaries; otherwise, use the global
|
||||
// heap.
|
||||
std::vector< sptr< Dictionary::Class > > const & activeDicts =
|
||||
activeGroup ? activeGroup->dictionaries : dictionaries;
|
||||
|
||||
if ( activeGroup && activeGroup->icon.size() )
|
||||
{
|
||||
// This doesn't seem to be much of influence right now, but we'll keep
|
||||
// it anyway.
|
||||
result += "<link rel=\"icon\" type=\"image/png\" href=\"qrcx://localhost/flags/" + Html::escape( activeGroup->icon.toUtf8().data() ) + "\" />\n";
|
||||
}
|
||||
|
||||
result += "</head><body>";
|
||||
|
||||
// Accumulate main forms
|
||||
|
||||
vector< wstring > alts;
|
||||
|
||||
{
|
||||
set< wstring > altsSet;
|
||||
|
||||
for( unsigned x = 0; x < activeDicts.size(); ++x )
|
||||
{
|
||||
vector< wstring > found = activeDicts[ x ]->findHeadwordsForSynonym( word );
|
||||
|
||||
altsSet.insert( found.begin(), found.end() );
|
||||
}
|
||||
|
||||
alts.insert( alts.begin(), altsSet.begin(), altsSet.end() );
|
||||
}
|
||||
|
||||
for( unsigned x = 0; x < alts.size(); ++x )
|
||||
{
|
||||
printf( "Alt: %ls\n", alts[ x ].c_str() );
|
||||
}
|
||||
|
||||
for( unsigned x = 0; x < activeDicts.size(); ++x )
|
||||
{
|
||||
try
|
||||
{
|
||||
string body = activeDicts[ x ]->getArticle( word, alts );
|
||||
|
||||
printf( "From %s: %s\n", activeDicts[ x ]->getName().c_str(), body.c_str() );
|
||||
|
||||
result += "<div class=\"gddictname\">From " + Html::escape( activeDicts[ x ]->getName() ) + "</div>" + body;
|
||||
}
|
||||
catch( Dictionary::exNoSuchWord & )
|
||||
{
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
result += "</body></html>";
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
30
src/article_maker.hh
Normal file
|
@ -0,0 +1,30 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __ARTICLE_MAKER_HH_INCLUDED__
|
||||
#define __ARTICLE_MAKER_HH_INCLUDED__
|
||||
|
||||
#include "dictionary.hh"
|
||||
#include "instances.hh"
|
||||
|
||||
/// This class generates the article's body for the given lookup request
|
||||
class ArticleMaker
|
||||
{
|
||||
std::vector< sptr< Dictionary::Class > > const & dictionaries;
|
||||
std::vector< Instances::Group > const & groups;
|
||||
|
||||
public:
|
||||
|
||||
/// On construction, a reference to all dictionaries and a reference all
|
||||
/// groups' instances are to be passed. Those references are kept stored as
|
||||
/// references, and as such, any changes to them would reflect on the results
|
||||
/// of the inquiries, altthough those changes are perfectly legal.
|
||||
ArticleMaker( std::vector< sptr< Dictionary::Class > > const & dictionaries,
|
||||
std::vector< Instances::Group > const & groups );
|
||||
|
||||
/// Looks up the given word within the given group, and creates a full html
|
||||
/// page text containing its definition.
|
||||
std::string makeDefinitionFor( QString const & word, QString const & group ) const;
|
||||
};
|
||||
|
||||
#endif
|
146
src/article_netmgr.cc
Normal file
|
@ -0,0 +1,146 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include "article_netmgr.hh"
|
||||
|
||||
using std::string;
|
||||
|
||||
QNetworkReply * ArticleNetworkAccessManager::createRequest( Operation op,
|
||||
QNetworkRequest const & req,
|
||||
QIODevice * outgoingData )
|
||||
{
|
||||
if ( op == GetOperation )
|
||||
{
|
||||
if ( req.url().scheme() == "qrcx" )
|
||||
{
|
||||
// We have to override the local load policy for the qrc scheme, hence
|
||||
// we use qrcx and redirect it here back to qrc
|
||||
QUrl newUrl( req.url() );
|
||||
|
||||
newUrl.setScheme( "qrc" );
|
||||
newUrl.setHost( "" );
|
||||
|
||||
QNetworkRequest newReq( req );
|
||||
newReq.setUrl( newUrl );
|
||||
|
||||
return QNetworkAccessManager::createRequest( op, newReq, outgoingData );
|
||||
}
|
||||
|
||||
vector< char > data;
|
||||
QString contentType;
|
||||
|
||||
if ( getResource( req.url(), data, contentType ) )
|
||||
return new ArticleResourceReply( this, req, data, contentType );
|
||||
}
|
||||
|
||||
return QNetworkAccessManager::createRequest( op, req, outgoingData );
|
||||
}
|
||||
|
||||
bool ArticleNetworkAccessManager::getResource( QUrl const & url,
|
||||
vector< char > & data,
|
||||
QString & contentType )
|
||||
{
|
||||
//printf( "getResource: %ls\n", url.toString().toStdWString().c_str() );
|
||||
//printf( "scheme: %ls\n", url.scheme().toStdWString().c_str() );
|
||||
//printf( "host: %ls\n", url.host().toStdWString().c_str() );
|
||||
|
||||
if ( url.scheme() == "gdlookup" )
|
||||
{
|
||||
string result = articleMaker.makeDefinitionFor( url.queryItemValue( "word" ),
|
||||
url.queryItemValue( "group" ) );
|
||||
|
||||
data.resize( result.size() );
|
||||
|
||||
memcpy( &data.front(), result.data(), data.size() );
|
||||
|
||||
contentType = "text/html";
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
if ( ( url.scheme() == "bres" || url.scheme() == "gdau" ) &&
|
||||
url.path().size() )
|
||||
{
|
||||
//printf( "Get %s\n", req.url().host().toLocal8Bit().data() );
|
||||
//printf( "Get %s\n", req.url().path().toLocal8Bit().data() );
|
||||
|
||||
string id = url.host().toStdString();
|
||||
|
||||
bool search = ( id == "search" );
|
||||
|
||||
for( unsigned x = 0; x < dictionaries.size(); ++x )
|
||||
{
|
||||
if ( search || dictionaries[ x ]->getId() == id )
|
||||
{
|
||||
try
|
||||
{
|
||||
dictionaries[ x ]->getResource( url.path().mid( 1 ).toUtf8().data(),
|
||||
data );
|
||||
|
||||
return true;
|
||||
}
|
||||
catch( Dictionary::exNoSuchResource & )
|
||||
{
|
||||
if ( !search )
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
ArticleResourceReply::ArticleResourceReply( QObject * parent,
|
||||
QNetworkRequest const & req,
|
||||
vector< char > const & data_,
|
||||
QString const & contentType ):
|
||||
QNetworkReply( parent ), data( data_ ), left( data.size() )
|
||||
{
|
||||
setRequest( req );
|
||||
|
||||
setOpenMode( ReadOnly );
|
||||
|
||||
if ( contentType.size() )
|
||||
setHeader( QNetworkRequest::ContentTypeHeader, contentType );
|
||||
|
||||
connect( this, SIGNAL( readyReadSignal() ),
|
||||
this, SLOT( readyReadSlot() ), Qt::QueuedConnection );
|
||||
connect( this, SIGNAL( finishedSignal() ),
|
||||
this, SLOT( finishedSlot() ), Qt::QueuedConnection );
|
||||
|
||||
emit readyReadSignal();
|
||||
emit finishedSignal();
|
||||
}
|
||||
|
||||
qint64 ArticleResourceReply::bytesAvailable() const
|
||||
{
|
||||
return left + QNetworkReply::bytesAvailable();
|
||||
}
|
||||
|
||||
qint64 ArticleResourceReply::readData( char * out, qint64 maxSize )
|
||||
{
|
||||
printf( "====reading %d bytes\n", (int)maxSize );
|
||||
|
||||
size_t toRead = maxSize < left ? maxSize : left;
|
||||
|
||||
memcpy( out, &data[ data.size() - left ], toRead );
|
||||
|
||||
left -= toRead;
|
||||
|
||||
if ( !toRead )
|
||||
return -1;
|
||||
else
|
||||
return toRead;
|
||||
}
|
||||
|
||||
void ArticleResourceReply::readyReadSlot()
|
||||
{
|
||||
readyRead();
|
||||
}
|
||||
|
||||
void ArticleResourceReply::finishedSlot()
|
||||
{
|
||||
finished();
|
||||
}
|
||||
|
81
src/article_netmgr.hh
Normal file
|
@ -0,0 +1,81 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __ARTICLE_NETMGR_HH_INCLUDED__
|
||||
#define __ARTICLE_NETMGR_HH_INCLUDED__
|
||||
|
||||
#include <QtNetwork>
|
||||
#include "dictionary.hh"
|
||||
#include "article_maker.hh"
|
||||
|
||||
using std::vector;
|
||||
|
||||
/// A custom QNetworkAccessManager version which fetches images from the
|
||||
/// dictionaries when requested.
|
||||
|
||||
class ArticleNetworkAccessManager: public QNetworkAccessManager
|
||||
{
|
||||
vector< sptr< Dictionary::Class > > const & dictionaries;
|
||||
ArticleMaker const & articleMaker;
|
||||
|
||||
public:
|
||||
|
||||
ArticleNetworkAccessManager( QObject * parent,
|
||||
vector< sptr< Dictionary::Class > > const &
|
||||
dictionaries_,
|
||||
ArticleMaker const & articleMaker_ ):
|
||||
QNetworkAccessManager( parent ), dictionaries( dictionaries_ ),
|
||||
articleMaker( articleMaker_ )
|
||||
{}
|
||||
|
||||
/// Tries reading a resource referenced by a "bres://" url. If it succeeds,
|
||||
/// the vector is filled with data, and true is returned. If it doesn't
|
||||
/// succeed, it returns false. The function can optionally set the Content-Type
|
||||
/// header correspondingly.
|
||||
bool getResource( QUrl const & url, vector< char > & data,
|
||||
QString & contentType );
|
||||
|
||||
protected:
|
||||
|
||||
virtual QNetworkReply * createRequest( Operation op,
|
||||
QNetworkRequest const & req,
|
||||
QIODevice * outgoingData );
|
||||
};
|
||||
|
||||
class ArticleResourceReply: public QNetworkReply
|
||||
{
|
||||
Q_OBJECT
|
||||
|
||||
vector< char > data;
|
||||
|
||||
size_t left;
|
||||
|
||||
public:
|
||||
|
||||
ArticleResourceReply( QObject * parent,
|
||||
QNetworkRequest const &,
|
||||
vector< char > const & data,
|
||||
QString const & contentType );
|
||||
|
||||
protected:
|
||||
|
||||
virtual qint64 bytesAvailable() const;
|
||||
|
||||
virtual void abort()
|
||||
{}
|
||||
virtual qint64 readData( char * data, qint64 maxSize );
|
||||
|
||||
// We use the hackery below to work around the fact that we need to emit
|
||||
// ready/finish signals after we've been constructed.
|
||||
signals:
|
||||
|
||||
void readyReadSignal();
|
||||
void finishedSignal();
|
||||
|
||||
private slots:
|
||||
|
||||
void readyReadSlot();
|
||||
void finishedSlot();
|
||||
};
|
||||
|
||||
#endif
|
216
src/articleview.cc
Normal file
|
@ -0,0 +1,216 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include "articleview.hh"
|
||||
#include "externalviewer.hh"
|
||||
#include <QMessageBox>
|
||||
#include <QWebHitTestResult>
|
||||
#include <QMenu>
|
||||
|
||||
|
||||
ArticleView::ArticleView( QWidget * parent, ArticleNetworkAccessManager & nm,
|
||||
Instances::Groups const & groups_, bool popupView_ ):
|
||||
QFrame( parent ),
|
||||
articleNetMgr( nm ),
|
||||
groups( groups_ ),
|
||||
popupView( popupView_ )
|
||||
{
|
||||
ui.setupUi( this );
|
||||
|
||||
ui.definition->setContextMenuPolicy( Qt::CustomContextMenu );
|
||||
|
||||
ui.definition->page()->setLinkDelegationPolicy( QWebPage::DelegateAllLinks );
|
||||
|
||||
ui.definition->page()->setNetworkAccessManager( &articleNetMgr );
|
||||
|
||||
connect( ui.definition, SIGNAL( titleChanged( QString const & ) ),
|
||||
this, SLOT( handleTitleChanged( QString const & ) ) );
|
||||
|
||||
connect( ui.definition, SIGNAL( urlChanged( QUrl const & ) ),
|
||||
this, SLOT( handleUrlChanged( QUrl const & ) ) );
|
||||
|
||||
connect( ui.definition, SIGNAL( customContextMenuRequested( QPoint const & ) ),
|
||||
this, SLOT( contextMenuRequested( QPoint const & ) ) );
|
||||
|
||||
connect( ui.definition, SIGNAL( linkClicked( QUrl const & ) ),
|
||||
this, SLOT( linkClicked( QUrl const & ) ) );
|
||||
}
|
||||
|
||||
void ArticleView::showDefinition( QString const & word, QString const & group )
|
||||
{
|
||||
QUrl req;
|
||||
|
||||
req.setScheme( "gdlookup" );
|
||||
req.setHost( "localhost" );
|
||||
req.addQueryItem( "word", word );
|
||||
req.addQueryItem( "group", group );
|
||||
|
||||
ui.definition->load( req );
|
||||
}
|
||||
|
||||
void ArticleView::handleTitleChanged( QString const & title )
|
||||
{
|
||||
emit titleChanged( this, title );
|
||||
}
|
||||
|
||||
void ArticleView::handleUrlChanged( QUrl const & url )
|
||||
{
|
||||
QIcon icon;
|
||||
|
||||
QString group = getGroup( url );
|
||||
|
||||
if ( group.size() )
|
||||
{
|
||||
// Find the group's instance corresponding to the fragment value
|
||||
for( unsigned x = 0; x < groups.size(); ++x )
|
||||
if ( groups[ x ].name == group )
|
||||
{
|
||||
// Found it
|
||||
|
||||
if ( groups[ x ].icon.size() )
|
||||
icon = QIcon( ":/flags/" + groups[ x ].icon );
|
||||
else
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
emit iconChanged( this, icon );
|
||||
}
|
||||
|
||||
QString ArticleView::getGroup( QUrl const & url )
|
||||
{
|
||||
if ( url.scheme() == "gdlookup" && url.hasQueryItem( "group" ) )
|
||||
return url.queryItemValue( "group" );
|
||||
|
||||
return QString();
|
||||
}
|
||||
|
||||
|
||||
void ArticleView::linkClicked( QUrl const & url )
|
||||
{
|
||||
printf( "clicked %s\n", url.toString().toLocal8Bit().data() );
|
||||
|
||||
if ( url.scheme() == "bword" )
|
||||
showDefinition( url.host().startsWith( "xn--" ) ?
|
||||
QUrl::fromPunycode( url.host().toLatin1() ) :
|
||||
url.host(),
|
||||
getGroup( ui.definition->url() ) );
|
||||
else
|
||||
if ( url.scheme() == "gdlookup" ) // Plain html links inherit gdlookup scheme
|
||||
showDefinition( url.path().mid( 1 ),
|
||||
getGroup( ui.definition->url() ) );
|
||||
else
|
||||
if ( url.scheme() == "bres" || url.scheme() == "gdau" )
|
||||
{
|
||||
vector< char > data;
|
||||
|
||||
// Download it
|
||||
|
||||
QString contentType;
|
||||
|
||||
if ( !articleNetMgr.getResource( url, data, contentType ) )
|
||||
{
|
||||
QMessageBox::critical( this, tr( "GoldenDict" ), tr( "The referenced resource doesn't exist." ) );
|
||||
return;
|
||||
}
|
||||
|
||||
// Decide the viewer
|
||||
|
||||
QString program, extension;
|
||||
|
||||
if ( url.scheme() == "gdau" )
|
||||
{
|
||||
program = "mplayer";
|
||||
extension = "wav";
|
||||
}
|
||||
else
|
||||
if ( url.path().endsWith( ".pdf", Qt::CaseInsensitive ) )
|
||||
{
|
||||
program = "evince";
|
||||
extension = "pdf";
|
||||
}
|
||||
else
|
||||
if ( url.path().endsWith( ".rtf", Qt::CaseInsensitive ) )
|
||||
{
|
||||
program = "oowriter";
|
||||
extension = "rtf";
|
||||
}
|
||||
else
|
||||
{
|
||||
QMessageBox::critical( this, tr( "GoldenDict" ), tr( "Don't know how to handle the specified resource." ) );
|
||||
return;
|
||||
}
|
||||
|
||||
try
|
||||
{
|
||||
ExternalViewer * viewer = new ExternalViewer( this, data, extension, program );
|
||||
|
||||
try
|
||||
{
|
||||
viewer->start();
|
||||
|
||||
// Once started, it will erase itself
|
||||
}
|
||||
catch( ... )
|
||||
{
|
||||
delete viewer;
|
||||
throw;
|
||||
}
|
||||
}
|
||||
catch( ExternalViewer::Ex & e )
|
||||
{
|
||||
printf( "%s\n", e.what() );
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void ArticleView::contextMenuRequested( QPoint const & pos )
|
||||
{
|
||||
// Is that a link? Is there a selection?
|
||||
|
||||
QWebHitTestResult r = ui.definition->page()->currentFrame()->
|
||||
hitTestContent( pos );
|
||||
|
||||
QMenu menu( this );
|
||||
|
||||
|
||||
QAction * followLink = 0;
|
||||
QAction * lookupSelection = 0;
|
||||
|
||||
if ( !r.linkUrl().isEmpty() )
|
||||
{
|
||||
followLink = new QAction( tr( "Open the link" ), &menu );
|
||||
menu.addAction( followLink );
|
||||
}
|
||||
|
||||
QString selectedText = ui.definition->selectedText();
|
||||
if ( selectedText.size() )
|
||||
{
|
||||
lookupSelection = new QAction( tr( "Look up \"%1\"" ).arg( ui.definition->selectedText() ), &menu );
|
||||
menu.addAction( lookupSelection );
|
||||
}
|
||||
|
||||
if ( !menu.isEmpty() )
|
||||
{
|
||||
QAction * result = menu.exec( ui.definition->mapToGlobal( pos ) );
|
||||
|
||||
if ( result == followLink )
|
||||
linkClicked( r.linkUrl() );
|
||||
else
|
||||
if ( result == lookupSelection )
|
||||
showDefinition( selectedText, getGroup( ui.definition->url() ) );
|
||||
}
|
||||
#if 0
|
||||
printf( "%s\n", r.linkUrl().isEmpty() ? "null" : "not null" );
|
||||
|
||||
printf( "url = %s\n", r.linkUrl().toString().toLocal8Bit().data() );
|
||||
printf( "title = %s\n", r.title().toLocal8Bit().data() );
|
||||
#endif
|
||||
}
|
||||
|
||||
void ArticleView::showEvent( QShowEvent * ev )
|
||||
{
|
||||
QFrame::showEvent( ev );
|
||||
|
||||
ui.searchFrame->hide();
|
||||
}
|
70
src/articleview.hh
Normal file
|
@ -0,0 +1,70 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __ARTICLEVIEW_HH_INCLUDED__
|
||||
#define __ARTICLEVIEW_HH_INCLUDED__
|
||||
|
||||
#include <QWebView>
|
||||
#include <QUrl>
|
||||
#include "article_netmgr.hh"
|
||||
#include "instances.hh"
|
||||
#include "ui_articleview.h"
|
||||
|
||||
/// A widget with the web view tailored to view and handle articles -- it
|
||||
/// uses the appropriate netmgr, handles link clicks, rmb clicks etc
|
||||
class ArticleView: public QFrame
|
||||
{
|
||||
Q_OBJECT
|
||||
|
||||
ArticleNetworkAccessManager & articleNetMgr;
|
||||
Instances::Groups const & groups;
|
||||
bool popupView;
|
||||
|
||||
Ui::ArticleView ui;
|
||||
|
||||
public:
|
||||
/// The popupView flag influences contents of the context menus to be
|
||||
/// appropriate to the context of the view.
|
||||
/// The groups aren't copied -- rather than that, the reference is kept
|
||||
ArticleView( QWidget * parent,
|
||||
ArticleNetworkAccessManager &,
|
||||
Instances::Groups const &,
|
||||
bool popupView );
|
||||
|
||||
/// Shows the definition of the given word with the given group
|
||||
void showDefinition( QString const & word, QString const & group );
|
||||
|
||||
/// Goes back in history
|
||||
void back()
|
||||
{ ui.definition->back(); }
|
||||
|
||||
/// Goes forward in history
|
||||
void forward()
|
||||
{ ui.definition->forward(); }
|
||||
|
||||
signals:
|
||||
|
||||
void iconChanged( ArticleView *, QIcon const & icon );
|
||||
|
||||
void titleChanged( ArticleView *, QString const & title );
|
||||
|
||||
private slots:
|
||||
|
||||
void handleTitleChanged( QString const & title );
|
||||
void handleUrlChanged( QUrl const & url );
|
||||
void linkClicked( QUrl const & );
|
||||
void contextMenuRequested( QPoint const & );
|
||||
|
||||
private:
|
||||
|
||||
/// Deduces group from the url. If there doesn't seem to be any group,
|
||||
/// returns empty string.
|
||||
QString getGroup( QUrl const & );
|
||||
|
||||
protected:
|
||||
|
||||
// We need this to hide the search bar when we're showed
|
||||
void showEvent( QShowEvent * );
|
||||
};
|
||||
|
||||
#endif
|
163
src/articleview.ui
Normal file
|
@ -0,0 +1,163 @@
|
|||
<ui version="4.0" >
|
||||
<class>ArticleView</class>
|
||||
<widget class="QWidget" name="ArticleView" >
|
||||
<property name="geometry" >
|
||||
<rect>
|
||||
<x>0</x>
|
||||
<y>0</y>
|
||||
<width>833</width>
|
||||
<height>634</height>
|
||||
</rect>
|
||||
</property>
|
||||
<property name="windowTitle" >
|
||||
<string>Form</string>
|
||||
</property>
|
||||
<layout class="QVBoxLayout" name="verticalLayout_2" >
|
||||
<property name="margin" >
|
||||
<number>0</number>
|
||||
</property>
|
||||
<item>
|
||||
<widget class="QFrame" name="frame" >
|
||||
<property name="frameShape" >
|
||||
<enum>QFrame::StyledPanel</enum>
|
||||
</property>
|
||||
<property name="frameShadow" >
|
||||
<enum>QFrame::Raised</enum>
|
||||
</property>
|
||||
<layout class="QVBoxLayout" name="verticalLayout" >
|
||||
<property name="margin" >
|
||||
<number>0</number>
|
||||
</property>
|
||||
<item>
|
||||
<widget class="QWebView" name="definition" >
|
||||
<property name="palette" >
|
||||
<palette>
|
||||
<active>
|
||||
<colorrole role="Base" >
|
||||
<brush brushstyle="SolidPattern" >
|
||||
<color alpha="255" >
|
||||
<red>254</red>
|
||||
<green>253</green>
|
||||
<blue>235</blue>
|
||||
</color>
|
||||
</brush>
|
||||
</colorrole>
|
||||
<colorrole role="Window" >
|
||||
<brush brushstyle="SolidPattern" >
|
||||
<color alpha="255" >
|
||||
<red>255</red>
|
||||
<green>255</green>
|
||||
<blue>255</blue>
|
||||
</color>
|
||||
</brush>
|
||||
</colorrole>
|
||||
</active>
|
||||
<inactive>
|
||||
<colorrole role="Base" >
|
||||
<brush brushstyle="SolidPattern" >
|
||||
<color alpha="255" >
|
||||
<red>254</red>
|
||||
<green>253</green>
|
||||
<blue>235</blue>
|
||||
</color>
|
||||
</brush>
|
||||
</colorrole>
|
||||
<colorrole role="Window" >
|
||||
<brush brushstyle="SolidPattern" >
|
||||
<color alpha="255" >
|
||||
<red>255</red>
|
||||
<green>255</green>
|
||||
<blue>255</blue>
|
||||
</color>
|
||||
</brush>
|
||||
</colorrole>
|
||||
</inactive>
|
||||
<disabled>
|
||||
<colorrole role="Base" >
|
||||
<brush brushstyle="SolidPattern" >
|
||||
<color alpha="255" >
|
||||
<red>255</red>
|
||||
<green>255</green>
|
||||
<blue>255</blue>
|
||||
</color>
|
||||
</brush>
|
||||
</colorrole>
|
||||
<colorrole role="Window" >
|
||||
<brush brushstyle="SolidPattern" >
|
||||
<color alpha="255" >
|
||||
<red>255</red>
|
||||
<green>255</green>
|
||||
<blue>255</blue>
|
||||
</color>
|
||||
</brush>
|
||||
</colorrole>
|
||||
</disabled>
|
||||
</palette>
|
||||
</property>
|
||||
<property name="url" >
|
||||
<url>
|
||||
<string>about:blank</string>
|
||||
</url>
|
||||
</property>
|
||||
</widget>
|
||||
</item>
|
||||
</layout>
|
||||
</widget>
|
||||
</item>
|
||||
<item>
|
||||
<widget class="QFrame" name="searchFrame" >
|
||||
<property name="frameShape" >
|
||||
<enum>QFrame::NoFrame</enum>
|
||||
</property>
|
||||
<property name="frameShadow" >
|
||||
<enum>QFrame::Raised</enum>
|
||||
</property>
|
||||
<layout class="QHBoxLayout" name="horizontalLayout" >
|
||||
<property name="margin" >
|
||||
<number>0</number>
|
||||
</property>
|
||||
<item>
|
||||
<widget class="QToolButton" name="searchCloseButton" >
|
||||
<property name="text" >
|
||||
<string>x</string>
|
||||
</property>
|
||||
</widget>
|
||||
</item>
|
||||
<item>
|
||||
<widget class="QLabel" name="label" >
|
||||
<property name="text" >
|
||||
<string>Find:</string>
|
||||
</property>
|
||||
</widget>
|
||||
</item>
|
||||
<item>
|
||||
<widget class="QLineEdit" name="searchText" />
|
||||
</item>
|
||||
<item>
|
||||
<spacer name="horizontalSpacer" >
|
||||
<property name="orientation" >
|
||||
<enum>Qt::Horizontal</enum>
|
||||
</property>
|
||||
<property name="sizeHint" stdset="0" >
|
||||
<size>
|
||||
<width>40</width>
|
||||
<height>20</height>
|
||||
</size>
|
||||
</property>
|
||||
</spacer>
|
||||
</item>
|
||||
</layout>
|
||||
</widget>
|
||||
</item>
|
||||
</layout>
|
||||
</widget>
|
||||
<customwidgets>
|
||||
<customwidget>
|
||||
<class>QWebView</class>
|
||||
<extends>QWidget</extends>
|
||||
<header>QtWebKit/QWebView</header>
|
||||
</customwidget>
|
||||
</customwidgets>
|
||||
<resources/>
|
||||
<connections/>
|
||||
</ui>
|
758
src/bgl.cc
Normal file
|
@ -0,0 +1,758 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include "bgl.hh"
|
||||
#include "btreeidx.hh"
|
||||
#include "bgl_babylon.hh"
|
||||
#include "file.hh"
|
||||
#include "folding.hh"
|
||||
#include "utf8.hh"
|
||||
#include "chunkedstorage.hh"
|
||||
#include <map>
|
||||
#include <set>
|
||||
#include <list>
|
||||
#include <zlib.h>
|
||||
#include <ctype.h>
|
||||
|
||||
namespace Bgl {
|
||||
|
||||
using std::map;
|
||||
using std::multimap;
|
||||
using std::set;
|
||||
using std::wstring;
|
||||
using std::list;
|
||||
using std::pair;
|
||||
|
||||
using BtreeIndexing::WordArticleLink;
|
||||
using BtreeIndexing::IndexedWords;
|
||||
|
||||
namespace
|
||||
{
|
||||
enum
|
||||
{
|
||||
Signature = 0x584c4742, // BGLX on little-endian, XLGB on big-endian
|
||||
CurrentFormatVersion = 12 + BtreeIndexing::FormatVersion
|
||||
};
|
||||
|
||||
struct IdxHeader
|
||||
{
|
||||
uint32_t signature; // First comes the signature, BGLX
|
||||
uint32_t formatVersion; // File format version, currently 1.
|
||||
uint32_t parserVersion; // Version of the parser used to parse the BGL file.
|
||||
// If it's lower than the current one, the file is to
|
||||
// be re-parsed.
|
||||
uint32_t foldingVersion; // Version of the folding algorithm used when building
|
||||
// index. If it's different from the current one,
|
||||
// the file is to be rebuilt.
|
||||
uint32_t articleCount; // Total number of articles, for informative purposes only
|
||||
uint32_t wordCount; // Total number of words, for informative purposes only
|
||||
/// Add more fields here, like name, description, author and such.
|
||||
uint32_t chunksOffset; // The offset to chunks' storage
|
||||
uint32_t indexOffset; // The offset of the index in the file.
|
||||
uint32_t resourceListOffset; // The offset of the list of resources
|
||||
uint32_t resourcesCount; // Number of resources stored
|
||||
} __attribute__((packed));
|
||||
|
||||
bool indexIsOldOrBad( string const & indexFile )
|
||||
{
|
||||
File::Class idx( indexFile, "rb" );
|
||||
|
||||
IdxHeader header;
|
||||
|
||||
return idx.readRecords( &header, sizeof( header ), 1 ) != 1 ||
|
||||
header.signature != Signature ||
|
||||
header.formatVersion != CurrentFormatVersion ||
|
||||
header.parserVersion != Babylon::ParserVersion ||
|
||||
header.foldingVersion != Folding::Version;
|
||||
}
|
||||
|
||||
// Removes the $1$-like postfix
|
||||
string removePostfix( string const & in )
|
||||
{
|
||||
if ( in.size() && in[ in.size() - 1 ] == '$' )
|
||||
{
|
||||
// Find the end of it and cut it, barring any unexpectedness
|
||||
for( long x = in.size() - 2; x >= 0; x-- )
|
||||
{
|
||||
if ( in[ x ] == '$' )
|
||||
return in.substr( 0, x );
|
||||
else
|
||||
if ( !isdigit( in[ x ] ) )
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return in;
|
||||
}
|
||||
|
||||
// Removes any leading or trailing whitespace
|
||||
void trimWs( string & word )
|
||||
{
|
||||
if ( word.size() )
|
||||
{
|
||||
unsigned begin = 0;
|
||||
|
||||
while( begin < word.size() && isspace( word[ begin ] ) )
|
||||
++begin;
|
||||
|
||||
if ( begin == word.size() ) // Consists of ws entirely?
|
||||
word.clear();
|
||||
else
|
||||
{
|
||||
unsigned end = word.size();
|
||||
|
||||
// Doesn't consist of ws entirely, so must end with just isspace()
|
||||
// condition.
|
||||
while( isspace( word[ end - 1 ] ) )
|
||||
--end;
|
||||
|
||||
if ( end != word.size() || begin )
|
||||
word = string( word, begin, end - begin );
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void addEntryToIndex( string & word,
|
||||
uint32_t articleOffset,
|
||||
IndexedWords & indexedWords,
|
||||
vector< wchar_t > & wcharBuffer )
|
||||
{
|
||||
// Strip any leading or trailing whitespaces
|
||||
trimWs( word );
|
||||
|
||||
// Check the input word for a superscript postfix ($1$, $2$ etc), which
|
||||
// signifies different meaning in Bgl files. We emit different meaning
|
||||
// as different articles, but they appear in the index as the same word.
|
||||
|
||||
if ( word.size() && word[ word.size() - 1 ] == '$' )
|
||||
{
|
||||
word = removePostfix( word );
|
||||
trimWs( word );
|
||||
}
|
||||
|
||||
// Convert the word from utf8 to wide chars
|
||||
|
||||
if ( wcharBuffer.size() <= word.size() )
|
||||
wcharBuffer.resize( word.size() + 1 );
|
||||
|
||||
long result = Utf8::decode( word.c_str(), word.size(),
|
||||
&wcharBuffer.front() );
|
||||
|
||||
if ( result < 0 )
|
||||
{
|
||||
fprintf( stderr, "Failed to decode utf8 of headword %s, skipping it.\n",
|
||||
word.c_str() );
|
||||
return;
|
||||
}
|
||||
|
||||
wcharBuffer[ result ] = 0;
|
||||
|
||||
// Now make its folded version
|
||||
|
||||
wstring folded = Folding::apply( &wcharBuffer.front() );
|
||||
|
||||
/// Try to conserve the memory usage of the string
|
||||
folded.reserve( folded.size() );
|
||||
|
||||
// Insert new entry into an index
|
||||
|
||||
IndexedWords::iterator i = indexedWords.insert(
|
||||
IndexedWords::value_type( folded, vector< WordArticleLink >() ) ).first;
|
||||
|
||||
// Try to conserve memory somewhat -- slow insertions are ok
|
||||
i->second.reserve( i->second.size() + 1 );
|
||||
|
||||
i->second.push_back( WordArticleLink( word, articleOffset ) );
|
||||
}
|
||||
|
||||
|
||||
DEF_EX( exFailedToDecompressArticle, "Failed to decompress article's body", Dictionary::Ex )
|
||||
DEF_EX( exChunkIndexOutOfRange, "Chunk index is out of range", Dictionary::Ex )
|
||||
|
||||
class BglDictionary: public BtreeIndexing::BtreeDictionary
|
||||
{
|
||||
File::Class idx;
|
||||
IdxHeader idxHeader;
|
||||
string dictionaryName;
|
||||
ChunkedStorage::Reader chunks;
|
||||
|
||||
public:
|
||||
|
||||
BglDictionary( string const & id, string const & indexFile,
|
||||
string const & dictionaryFile );
|
||||
|
||||
virtual string getName() throw()
|
||||
{ return dictionaryName; }
|
||||
|
||||
virtual map< Dictionary::Property, string > getProperties() throw()
|
||||
{ return map< Dictionary::Property, string >(); }
|
||||
|
||||
virtual unsigned long getArticleCount() throw()
|
||||
{ return idxHeader.articleCount; }
|
||||
|
||||
virtual unsigned long getWordCount() throw()
|
||||
{ return idxHeader.wordCount; }
|
||||
|
||||
virtual vector< wstring > findHeadwordsForSynonym( wstring const & )
|
||||
throw( std::exception );
|
||||
|
||||
virtual string getArticle( wstring const &, vector< wstring > const & alts )
|
||||
throw( Dictionary::exNoSuchWord, std::exception );
|
||||
|
||||
virtual void getResource( string const & name,
|
||||
vector< char > & data ) throw( Dictionary::exNoSuchResource,
|
||||
std::exception );
|
||||
|
||||
private:
|
||||
|
||||
|
||||
/// Loads an article with the given offset, filling the given strings.
|
||||
void loadArticle( uint32_t offset, string & headword,
|
||||
string & displayedHeadword, string & articleText );
|
||||
|
||||
void replaceCharsetEntities( string & );
|
||||
};
|
||||
|
||||
BglDictionary::BglDictionary( string const & id, string const & indexFile,
|
||||
string const & dictionaryFile ):
|
||||
BtreeDictionary( id, vector< string >( 1, dictionaryFile ) ),
|
||||
idx( indexFile, "rb" ),
|
||||
idxHeader( idx.read< IdxHeader >() ),
|
||||
chunks( idx, idxHeader.chunksOffset )
|
||||
{
|
||||
idx.seek( sizeof( idxHeader ) );
|
||||
|
||||
// Read the dictionary's name
|
||||
|
||||
size_t len = idx.read< uint32_t >();
|
||||
|
||||
vector< char > nameBuf( len );
|
||||
|
||||
idx.read( &nameBuf.front(), len );
|
||||
|
||||
dictionaryName = string( &nameBuf.front(), len );
|
||||
|
||||
// Initialize the index
|
||||
|
||||
idx.seek( idxHeader.indexOffset );
|
||||
|
||||
openIndex( idx );
|
||||
}
|
||||
|
||||
|
||||
void BglDictionary::loadArticle( uint32_t offset, string & headword,
|
||||
string & displayedHeadword,
|
||||
string & articleText )
|
||||
{
|
||||
vector< char > chunk;
|
||||
|
||||
char * articleData = chunks.getBlock( offset, chunk );
|
||||
|
||||
headword = articleData;
|
||||
|
||||
displayedHeadword = articleData + headword.size() + 1;
|
||||
|
||||
articleText =
|
||||
string( articleData + headword.size() +
|
||||
displayedHeadword.size() + 2 );
|
||||
}
|
||||
|
||||
vector< wstring > BglDictionary::findHeadwordsForSynonym( wstring const & str )
|
||||
throw( std::exception )
|
||||
{
|
||||
vector< wstring > result;
|
||||
|
||||
vector< WordArticleLink > chain = findArticles( str );
|
||||
|
||||
wstring caseFolded = Folding::applySimpleCaseOnly( str );
|
||||
|
||||
for( unsigned x = 0; x < chain.size(); ++x )
|
||||
{
|
||||
string headword, displayedHeadword, articleText;
|
||||
|
||||
loadArticle( chain[ x ].articleOffset,
|
||||
headword, displayedHeadword, articleText );
|
||||
|
||||
wstring headwordDecoded = Utf8::decode( removePostfix( headword ) );
|
||||
|
||||
if ( caseFolded != Folding::applySimpleCaseOnly( headwordDecoded ) )
|
||||
{
|
||||
// The headword seems to differ from the input word, which makes the
|
||||
// input word its synonym.
|
||||
result.push_back( headwordDecoded );
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
// Converts a $1$-like postfix to a <sup>1</sup> one
|
||||
string postfixToSuperscript( string const & in )
|
||||
{
|
||||
if ( !in.size() || in[ in.size() - 1 ] != '$' )
|
||||
return in;
|
||||
|
||||
for( long x = in.size() - 2; x >= 0; x-- )
|
||||
{
|
||||
if ( in[ x ] == '$' )
|
||||
{
|
||||
if ( in.size() - x - 2 > 2 )
|
||||
{
|
||||
// Large postfixes seem like something we wouldn't want to show --
|
||||
// some dictionaries seem to have each word numbered using the
|
||||
// postfix.
|
||||
return in.substr( 0, x );
|
||||
}
|
||||
else
|
||||
return in.substr( 0, x ) + "<sup>" + in.substr( x + 1, in.size() - x - 2 ) + "</sup>";
|
||||
}
|
||||
else
|
||||
if ( !isdigit( in[ x ] ) )
|
||||
break;
|
||||
}
|
||||
|
||||
return in;
|
||||
}
|
||||
|
||||
string BglDictionary::getArticle( wstring const & word,
|
||||
vector< wstring > const & alts )
|
||||
throw( Dictionary::exNoSuchWord, std::exception )
|
||||
{
|
||||
vector< WordArticleLink > chain = findArticles( word );
|
||||
|
||||
for( unsigned x = 0; x < alts.size(); ++x )
|
||||
{
|
||||
/// Make an additional query for each alt
|
||||
|
||||
vector< WordArticleLink > altChain = findArticles( alts[ x ] );
|
||||
|
||||
chain.insert( chain.end(), altChain.begin(), altChain.end() );
|
||||
}
|
||||
|
||||
multimap< wstring, pair< string, string > > mainArticles, alternateArticles;
|
||||
|
||||
set< uint32_t > articlesIncluded; // Some synonims make it that the articles
|
||||
// appear several times. We combat this
|
||||
// by only allowing them to appear once.
|
||||
|
||||
wstring wordCaseFolded = Folding::applySimpleCaseOnly( word );
|
||||
|
||||
for( unsigned x = 0; x < chain.size(); ++x )
|
||||
{
|
||||
if ( articlesIncluded.find( chain[ x ].articleOffset ) != articlesIncluded.end() )
|
||||
continue; // We already have this article in the body.
|
||||
|
||||
// Now grab that article
|
||||
|
||||
string headword, displayedHeadword, articleText;
|
||||
|
||||
loadArticle( chain[ x ].articleOffset,
|
||||
headword, displayedHeadword, articleText );
|
||||
|
||||
// Ok. Now, does it go to main articles, or to alternate ones? We list
|
||||
// main ones first, and alternates after.
|
||||
|
||||
// We do the case-folded and postfix-less comparison here.
|
||||
|
||||
wstring headwordStripped =
|
||||
Folding::applySimpleCaseOnly( Utf8::decode( removePostfix( headword ) ) );
|
||||
|
||||
multimap< wstring, pair< string, string > > & mapToUse =
|
||||
( wordCaseFolded == headwordStripped ) ?
|
||||
mainArticles : alternateArticles;
|
||||
|
||||
mapToUse.insert( pair< wstring, pair< string, string > >(
|
||||
Folding::applySimpleCaseOnly( Utf8::decode( headword ) ),
|
||||
pair< string, string >(
|
||||
displayedHeadword.size() ? displayedHeadword : headword,
|
||||
articleText ) ) );
|
||||
|
||||
articlesIncluded.insert( chain[ x ].articleOffset );
|
||||
}
|
||||
|
||||
if ( mainArticles.empty() && alternateArticles.empty() )
|
||||
throw Dictionary::exNoSuchWord();
|
||||
|
||||
string result;
|
||||
|
||||
multimap< wstring, pair< string, string > >::const_iterator i;
|
||||
|
||||
string cleaner = "</font>""</font>""</font>""</font>""</font>""</font>"
|
||||
"</font>""</font>""</font>""</font>""</font>""</font>"
|
||||
"</b></b></b></b></b></b></b></b>"
|
||||
"</i></i></i></i></i></i></i></i>";
|
||||
|
||||
for( i = mainArticles.begin(); i != mainArticles.end(); ++i )
|
||||
{
|
||||
result += "<h3>";
|
||||
result += postfixToSuperscript( i->second.first );
|
||||
result += "</h3>";
|
||||
result += i->second.second;
|
||||
result += cleaner;
|
||||
}
|
||||
|
||||
for( i = alternateArticles.begin(); i != alternateArticles.end(); ++i )
|
||||
{
|
||||
result += "<h3>";
|
||||
result += postfixToSuperscript( i->second.first );
|
||||
result += "</h3>";
|
||||
result += i->second.second;
|
||||
result += cleaner;
|
||||
}
|
||||
// Do some cleanups in the text
|
||||
|
||||
replaceCharsetEntities( result );
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
void BglDictionary::getResource( string const & name,
|
||||
vector< char > & data )
|
||||
throw( Dictionary::exNoSuchResource, std::exception )
|
||||
{
|
||||
string nameLowercased = name;
|
||||
|
||||
for( string::iterator i = nameLowercased.begin(); i != nameLowercased.end();
|
||||
++i )
|
||||
*i = tolower( *i );
|
||||
|
||||
idx.seek( idxHeader.resourceListOffset );
|
||||
|
||||
for( size_t count = idxHeader.resourcesCount; count--; )
|
||||
{
|
||||
vector< char > nameData( idx.read< uint32_t >() );
|
||||
idx.read( &nameData.front(), nameData.size() );
|
||||
|
||||
for( size_t x = nameData.size(); x--; )
|
||||
nameData[ x ] = tolower( nameData[ x ] );
|
||||
|
||||
uint32_t offset = idx.read< uint32_t >();
|
||||
|
||||
if ( string( &nameData.front(), nameData.size() ) == nameLowercased )
|
||||
{
|
||||
// We have a match.
|
||||
|
||||
idx.seek( offset );
|
||||
|
||||
data.resize( idx.read< uint32_t >() );
|
||||
|
||||
vector< unsigned char > compressedData( idx.read< uint32_t >() );
|
||||
|
||||
idx.read( &compressedData.front(), compressedData.size() );
|
||||
|
||||
unsigned long decompressedLength = data.size();
|
||||
|
||||
if ( uncompress( (unsigned char *)&data.front(),
|
||||
&decompressedLength,
|
||||
&compressedData.front(),
|
||||
compressedData.size() ) != Z_OK ||
|
||||
decompressedLength != data.size() )
|
||||
{
|
||||
printf( "Failed to decompress resource %s, ignoring it.\n",
|
||||
name.c_str() );
|
||||
throw Dictionary::exNoSuchResource();
|
||||
}
|
||||
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
throw Dictionary::exNoSuchResource();
|
||||
}
|
||||
|
||||
/// Replaces <CHARSET c="t">1234;</CHARSET> occurences with ሴ
|
||||
void BglDictionary::replaceCharsetEntities( string & text )
|
||||
{
|
||||
string lowercased = text;
|
||||
|
||||
// Make a lowercased version of text, used for searching only. Only touch
|
||||
// symbols < 0x80 to avoid any weird results.
|
||||
for( unsigned x = lowercased.size(); x--; )
|
||||
if ( (unsigned char )lowercased[ x ] < 0x80 )
|
||||
lowercased[ x ] = tolower( lowercased[ x ] );
|
||||
|
||||
size_t prevPos = 0;
|
||||
|
||||
for( ;; )
|
||||
{
|
||||
size_t pos = lowercased.find( "<charset c=\"t\">", prevPos );
|
||||
|
||||
if ( pos == string::npos )
|
||||
break;
|
||||
|
||||
if ( lowercased.size() - pos < 30 )
|
||||
{
|
||||
// This is not right, the string is too short, leave it alone
|
||||
break;
|
||||
}
|
||||
|
||||
prevPos = pos + 1;
|
||||
|
||||
if ( lowercased.substr( pos + 15 + 4, 11 ) != ";</charset>" )
|
||||
{
|
||||
// The ending doesn't match
|
||||
printf( "!!!!!!ending mismatch\n" );
|
||||
continue;
|
||||
}
|
||||
|
||||
// Check if digits are all hex
|
||||
|
||||
if ( !isxdigit( lowercased[ pos + 15 ] ) ||
|
||||
!isxdigit( lowercased[ pos + 16 ] ) ||
|
||||
!isxdigit( lowercased[ pos + 17 ] ) ||
|
||||
!isxdigit( lowercased[ pos + 18 ] ) )
|
||||
{
|
||||
printf( "!!!!!!!!not hex digits\n" );
|
||||
continue;
|
||||
}
|
||||
|
||||
// Ok, replace now.
|
||||
|
||||
lowercased.replace( pos, 15, "&#x" );
|
||||
lowercased.erase( pos + 8, 10 );
|
||||
|
||||
text.replace( pos, 15, "&#x" );
|
||||
text.erase( pos + 8, 10 );
|
||||
}
|
||||
|
||||
prevPos = 0;
|
||||
|
||||
// Copy-pasted version for <charset c=t>. This should all be replaced
|
||||
// by regexps.
|
||||
for( ;; )
|
||||
{
|
||||
size_t pos = lowercased.find( "<charset c=t>", prevPos );
|
||||
|
||||
if ( pos == string::npos )
|
||||
break;
|
||||
|
||||
if ( lowercased.size() - pos < 28 )
|
||||
{
|
||||
// This is not right, the string is too short, leave it alone
|
||||
break;
|
||||
}
|
||||
|
||||
prevPos = pos + 1;
|
||||
|
||||
if ( lowercased.substr( pos + 13 + 4, 11 ) != ";</charset>" )
|
||||
{
|
||||
// The ending doesn't match
|
||||
printf( "!!!!!!ending mismatch\n" );
|
||||
continue;
|
||||
}
|
||||
|
||||
// Check if digits are all hex
|
||||
|
||||
if ( !isxdigit( lowercased[ pos + 13 ] ) ||
|
||||
!isxdigit( lowercased[ pos + 14 ] ) ||
|
||||
!isxdigit( lowercased[ pos + 15 ] ) ||
|
||||
!isxdigit( lowercased[ pos + 16 ] ) )
|
||||
{
|
||||
printf( "!!!!!!!!not hex digits\n" );
|
||||
continue;
|
||||
}
|
||||
|
||||
// Ok, replace now.
|
||||
|
||||
lowercased.replace( pos, 13, "&#x" );
|
||||
lowercased.erase( pos + 8, 10 );
|
||||
|
||||
text.replace( pos, 13, "&#x" );
|
||||
text.erase( pos + 8, 10 );
|
||||
}
|
||||
}
|
||||
|
||||
class ResourceHandler: public Babylon::ResourceHandler
|
||||
{
|
||||
File::Class & idxFile;
|
||||
list< pair< string, uint32_t > > resources;
|
||||
|
||||
public:
|
||||
|
||||
ResourceHandler( File::Class & idxFile_ ): idxFile( idxFile_ )
|
||||
{}
|
||||
|
||||
list< pair< string, uint32_t > > const & getResources() const
|
||||
{ return resources; }
|
||||
|
||||
protected:
|
||||
virtual void handleBabylonResource( string const & filename,
|
||||
char const * data, size_t size );
|
||||
};
|
||||
|
||||
void ResourceHandler::handleBabylonResource( string const & filename,
|
||||
char const * data, size_t size )
|
||||
{
|
||||
//printf( "Handling resource file %s (%u bytes)\n", filename.c_str(), size );
|
||||
|
||||
vector< unsigned char > compressedData( compressBound( size ) );
|
||||
|
||||
unsigned long compressedSize = compressedData.size();
|
||||
|
||||
if ( compress( &compressedData.front(), &compressedSize,
|
||||
(unsigned char const *) data, size ) != Z_OK )
|
||||
{
|
||||
fprintf( stderr, "Failed to compress the body of resource %s, dropping it.\n",
|
||||
filename.c_str() );
|
||||
return;
|
||||
}
|
||||
|
||||
resources.push_back( pair< string, uint32_t >( filename, idxFile.tell() ) );
|
||||
|
||||
idxFile.write< uint32_t >( size );
|
||||
idxFile.write< uint32_t >( compressedSize );
|
||||
idxFile.write( &compressedData.front(), compressedSize );
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
vector< sptr< Dictionary::Class > > Format::makeDictionaries(
|
||||
vector< string > const & fileNames,
|
||||
string const & indicesDir,
|
||||
Dictionary::Initializing & initializing )
|
||||
throw( std::exception )
|
||||
{
|
||||
vector< sptr< Dictionary::Class > > dictionaries;
|
||||
|
||||
for( vector< string >::const_iterator i = fileNames.begin(); i != fileNames.end();
|
||||
++i )
|
||||
{
|
||||
// Skip files with the extensions different to .bgl to speed up the
|
||||
// scanning
|
||||
if ( i->size() < 4 ||
|
||||
strcasecmp( i->c_str() + ( i->size() - 4 ), ".bgl" ) != 0 )
|
||||
continue;
|
||||
|
||||
Babylon b( *i );
|
||||
|
||||
if ( !b.open() )
|
||||
continue;
|
||||
|
||||
// Got the file -- check if we need to rebuid the index
|
||||
|
||||
vector< string > dictFiles( 1, *i );
|
||||
|
||||
string dictId = makeDictionaryId( dictFiles );
|
||||
|
||||
string indexFile = indicesDir + dictId;
|
||||
|
||||
if ( needToRebuildIndex( dictFiles, indexFile ) || indexIsOldOrBad( indexFile ) )
|
||||
{
|
||||
// Building the index
|
||||
|
||||
std::string sourceCharset, targetCharset;
|
||||
|
||||
if ( !b.read( sourceCharset, targetCharset ) )
|
||||
{
|
||||
fprintf( stderr, "Failed to start reading from %s, skipping it\n", i->c_str() );
|
||||
continue;
|
||||
}
|
||||
|
||||
initializing.indexingDictionary( b.title() );
|
||||
|
||||
File::Class idx( indexFile, "wb" );
|
||||
|
||||
IdxHeader idxHeader;
|
||||
|
||||
memset( &idxHeader, 0, sizeof( idxHeader ) );
|
||||
|
||||
// We write a dummy header first. At the end of the process the header
|
||||
// will be rewritten with the right values.
|
||||
|
||||
idx.write( idxHeader );
|
||||
|
||||
idx.write< uint32_t >( b.title().size() );
|
||||
idx.write( b.title().data(), b.title().size() );
|
||||
|
||||
// This is our index data that we accumulate during the loading process.
|
||||
// For each new word encountered, we emit the article's body to the file
|
||||
// immediately, inserting the word itself and its offset in this map.
|
||||
// This map maps folded words to the original words and the corresponding
|
||||
// articles' offsets.
|
||||
IndexedWords indexedWords;
|
||||
|
||||
// We use this buffer to decode utf8 into it.
|
||||
vector< wchar_t > wcharBuffer;
|
||||
|
||||
ChunkedStorage::Writer chunks( idx );
|
||||
|
||||
uint32_t articleCount = 0, wordCount = 0;
|
||||
|
||||
ResourceHandler resourceHandler( idx );
|
||||
|
||||
b.setResourcePrefix( string( "bres://" ) + dictId + "/" );
|
||||
|
||||
for( ; ; )
|
||||
{
|
||||
bgl_entry e = b.readEntry( &resourceHandler );
|
||||
|
||||
if ( e.headword.empty() )
|
||||
break;
|
||||
|
||||
// Save the article's body itself first
|
||||
|
||||
uint32_t articleAddress = chunks.startNewBlock();
|
||||
|
||||
chunks.addToBlock( e.headword.c_str(), e.headword.size() + 1 );
|
||||
chunks.addToBlock( e.displayedHeadword.c_str(), e.displayedHeadword.size() + 1 );
|
||||
chunks.addToBlock( e.definition.c_str(), e.definition.size() + 1 );
|
||||
|
||||
// Add entries to the index
|
||||
|
||||
addEntryToIndex( e.headword, articleAddress, indexedWords, wcharBuffer );
|
||||
|
||||
for( unsigned x = 0; x < e.alternates.size(); ++x )
|
||||
addEntryToIndex( e.alternates[ x ], articleAddress, indexedWords, wcharBuffer );
|
||||
|
||||
wordCount += 1 + e.alternates.size();
|
||||
++articleCount;
|
||||
}
|
||||
|
||||
// Finish with the chunks
|
||||
|
||||
idxHeader.chunksOffset = chunks.finish();
|
||||
|
||||
printf( "Writing index...\n" );
|
||||
|
||||
// Good. Now build the index
|
||||
|
||||
idxHeader.indexOffset = BtreeIndexing::buildIndex( indexedWords, idx );
|
||||
|
||||
// Save the resource's list.
|
||||
|
||||
idxHeader.resourceListOffset = idx.tell();
|
||||
idxHeader.resourcesCount = resourceHandler.getResources().size();
|
||||
|
||||
for( list< pair< string, uint32_t > >::const_iterator i =
|
||||
resourceHandler.getResources().begin();
|
||||
i != resourceHandler.getResources().end(); ++i )
|
||||
{
|
||||
idx.write< uint32_t >( i->first.size() );
|
||||
idx.write( i->first.data(), i->first.size() );
|
||||
idx.write< uint32_t >( i->second );
|
||||
}
|
||||
|
||||
// That concludes it. Update the header.
|
||||
|
||||
idxHeader.signature = Signature;
|
||||
idxHeader.formatVersion = CurrentFormatVersion;
|
||||
idxHeader.parserVersion = Babylon::ParserVersion;
|
||||
idxHeader.foldingVersion = Folding::Version;
|
||||
idxHeader.articleCount = articleCount;
|
||||
idxHeader.wordCount = wordCount;
|
||||
|
||||
idx.rewind();
|
||||
|
||||
idx.write( &idxHeader, sizeof( idxHeader ) );
|
||||
}
|
||||
|
||||
dictionaries.push_back( new BglDictionary( dictId,
|
||||
indexFile,
|
||||
*i ) );
|
||||
}
|
||||
|
||||
return dictionaries;
|
||||
}
|
||||
|
||||
}
|
28
src/bgl.hh
Normal file
|
@ -0,0 +1,28 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __BGL_HH_INCLUDED__
|
||||
#define __BGL_HH_INCLUDED__
|
||||
|
||||
#include "dictionary.hh"
|
||||
|
||||
/// Support for the Babylon's .bgl dictionaries.
|
||||
namespace Bgl {
|
||||
|
||||
using std::vector;
|
||||
using std::string;
|
||||
|
||||
class Format: public Dictionary::Format
|
||||
{
|
||||
public:
|
||||
|
||||
virtual vector< sptr< Dictionary::Class > > makeDictionaries(
|
||||
vector< string > const & fileNames,
|
||||
string const & indicesDir,
|
||||
Dictionary::Initializing & )
|
||||
throw( std::exception );
|
||||
};
|
||||
|
||||
}
|
||||
|
||||
#endif
|
547
src/bgl_babylon.cc
Normal file
|
@ -0,0 +1,547 @@
|
|||
/***************************************************************************
|
||||
* Copyright (C) 2007 by Raul Fernandes and Karl Grill *
|
||||
* rgbr@yahoo.com.br *
|
||||
* *
|
||||
* This program is free software; you can redistribute it and/or modify *
|
||||
* it under the terms of the GNU General Public License as published by *
|
||||
* the Free Software Foundation; either version 2 of the License, or *
|
||||
* (at your option) any later version. *
|
||||
* *
|
||||
* This program is distributed in the hope that it will be useful, *
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
|
||||
* GNU General Public License for more details. *
|
||||
* *
|
||||
* You should have received a copy of the GNU General Public License *
|
||||
* along with this program; if not, write to the *
|
||||
* Free Software Foundation, Inc., *
|
||||
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. *
|
||||
***************************************************************************/
|
||||
|
||||
/* Various improvements were made by Konstantin Isakov for the GoldenDict
|
||||
* program. */
|
||||
|
||||
#include "bgl_babylon.hh"
|
||||
|
||||
#include<stdlib.h>
|
||||
#include<stdio.h>
|
||||
#include <iconv.h>
|
||||
|
||||
#ifdef _WIN32
|
||||
#include <io.h>
|
||||
#define DUP _dup
|
||||
#else
|
||||
#define DUP dup
|
||||
#endif
|
||||
|
||||
Babylon::Babylon( std::string filename )
|
||||
{
|
||||
m_filename = filename;
|
||||
file = NULL;
|
||||
}
|
||||
|
||||
|
||||
Babylon::~Babylon()
|
||||
{
|
||||
close();
|
||||
}
|
||||
|
||||
|
||||
bool Babylon::open()
|
||||
{
|
||||
FILE *f;
|
||||
unsigned char buf[6];
|
||||
int i;
|
||||
|
||||
f = fopen( m_filename.c_str(), "rb" );
|
||||
if( f == NULL )
|
||||
return false;
|
||||
|
||||
i = fread( buf, 1, 6, f );
|
||||
|
||||
/* First four bytes: BGL signature 0x12340001 or 0x12340002 (big-endian) */
|
||||
if( i < 6 || memcmp( buf, "\x12\x34\x00", 3 ) || buf[3] == 0 || buf[3] > 2 )
|
||||
{
|
||||
fclose( f );
|
||||
return false;
|
||||
}
|
||||
|
||||
/* Calculate position of gz header */
|
||||
|
||||
i = buf[4] << 8 | buf[5];
|
||||
|
||||
if( i < 6 )
|
||||
{
|
||||
fclose( f );
|
||||
return false;
|
||||
}
|
||||
|
||||
if( fseek( f, i, SEEK_SET ) ) /* can't seek - emulate */
|
||||
for(int j=0;j < i - 6;j++) fgetc( f );
|
||||
|
||||
if( ferror( f ) || feof( f ) )
|
||||
{
|
||||
fclose( f );
|
||||
return false;
|
||||
}
|
||||
|
||||
/* we need to flush the file because otherwise some nfs mounts don't seem
|
||||
* to properly update the file position for the following reopen */
|
||||
|
||||
fflush( f );
|
||||
|
||||
file = gzdopen( DUP( fileno( f ) ), "r" );
|
||||
|
||||
fclose( f );
|
||||
|
||||
if( file == NULL )
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
void Babylon::close()
|
||||
{
|
||||
if ( file )
|
||||
{
|
||||
gzclose( file );
|
||||
file = 0;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
bool Babylon::readBlock( bgl_block &block )
|
||||
{
|
||||
if( gzeof( file ) || file == NULL )
|
||||
return false;
|
||||
|
||||
block.length = bgl_readnum( 1 );
|
||||
block.type = block.length & 0xf;
|
||||
if( block.type == 4 ) return false; // end of file marker
|
||||
block.length >>= 4;
|
||||
block.length = block.length < 4 ? bgl_readnum( block.length + 1 ) : block.length - 4 ;
|
||||
if( block.length )
|
||||
{
|
||||
block.data = (char *)malloc( block.length );
|
||||
gzread( file, block.data, block.length );
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
unsigned int Babylon::bgl_readnum( int bytes )
|
||||
{
|
||||
unsigned char buf[4];
|
||||
unsigned val = 0;
|
||||
|
||||
if ( bytes < 1 || bytes > 4 ) return (0);
|
||||
|
||||
gzread( file, buf, bytes );
|
||||
for(int i=0;i<bytes;i++) val= (val << 8) | buf[i];
|
||||
return val;
|
||||
}
|
||||
|
||||
|
||||
bool Babylon::read(std::string &source_charset, std::string &target_charset)
|
||||
{
|
||||
if( file == NULL ) return false;
|
||||
|
||||
bgl_block block;
|
||||
unsigned int pos;
|
||||
unsigned int type;
|
||||
std::string headword;
|
||||
std::string definition;
|
||||
|
||||
m_sourceCharset = source_charset;
|
||||
m_targetCharset = target_charset;
|
||||
m_numEntries = 0;
|
||||
while( readBlock( block ) )
|
||||
{
|
||||
headword.clear();
|
||||
definition.clear();
|
||||
switch( block.type )
|
||||
{
|
||||
case 0:
|
||||
switch( block.data[0] )
|
||||
{
|
||||
case 8:
|
||||
type = (unsigned int)block.data[2];
|
||||
if( type > 64 ) type -= 65;
|
||||
|
||||
if ( type >= 14 )
|
||||
type = 0;
|
||||
|
||||
m_defaultCharset = bgl_charset[type];
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
break;
|
||||
case 1:
|
||||
case 10:
|
||||
// Only count entries
|
||||
m_numEntries++;
|
||||
break;
|
||||
case 3:
|
||||
pos = 2;
|
||||
switch( block.data[1] )
|
||||
{
|
||||
case 1:
|
||||
headword.reserve( block.length - 2 );
|
||||
for(unsigned int a=0;a<block.length-2;a++) headword += block.data[pos++];
|
||||
m_title = headword;
|
||||
break;
|
||||
case 2:
|
||||
headword.reserve( block.length - 2 );
|
||||
for(unsigned int a=0;a<block.length-2;a++) headword += block.data[pos++];
|
||||
m_author = headword;
|
||||
break;
|
||||
case 3:
|
||||
headword.reserve( block.length - 2 );
|
||||
for(unsigned int a=0;a<block.length-2;a++) headword += block.data[pos++];
|
||||
m_email = headword;
|
||||
break;
|
||||
case 4:
|
||||
headword.reserve( block.length - 2 );
|
||||
for(unsigned int a=0;a<block.length-2;a++) headword += block.data[pos++];
|
||||
m_copyright = headword;
|
||||
break;
|
||||
case 7:
|
||||
headword = bgl_language[(unsigned char)(block.data[5])];
|
||||
m_sourceLang = headword;
|
||||
break;
|
||||
case 8:
|
||||
headword = bgl_language[(unsigned char)(block.data[5])];
|
||||
m_targetLang = headword;
|
||||
break;
|
||||
case 9:
|
||||
headword.reserve( block.length - 2 );
|
||||
for(unsigned int a=0;a<block.length-2;a++) {
|
||||
if (block.data[pos] == '\r') {
|
||||
} else if (block.data[pos] == '\n') {
|
||||
headword += "<br>";
|
||||
} else {
|
||||
headword += block.data[pos];
|
||||
}
|
||||
pos++;
|
||||
}
|
||||
m_description = headword;
|
||||
break;
|
||||
case 26:
|
||||
type = (unsigned int)block.data[2];
|
||||
if( type > 64 ) type -= 65;
|
||||
if ( type >= 14 )
|
||||
type = 0;
|
||||
if (m_sourceCharset.empty())
|
||||
m_sourceCharset = bgl_charset[type];
|
||||
break;
|
||||
case 27:
|
||||
type = (unsigned int)block.data[2];
|
||||
if( type > 64 ) type -= 65;
|
||||
if ( type >= 14 )
|
||||
type = 0;
|
||||
if (m_targetCharset.empty())
|
||||
m_targetCharset = bgl_charset[type];
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
break;
|
||||
default:
|
||||
;
|
||||
}
|
||||
if( block.length ) free( block.data );
|
||||
}
|
||||
gzseek( file, 0, SEEK_SET );
|
||||
|
||||
convertToUtf8( m_title, TARGET_CHARSET );
|
||||
convertToUtf8( m_author, TARGET_CHARSET );
|
||||
convertToUtf8( m_email, TARGET_CHARSET );
|
||||
convertToUtf8( m_copyright, TARGET_CHARSET );
|
||||
convertToUtf8( m_description, TARGET_CHARSET );
|
||||
printf("Default charset: %s\nSource Charset: %s\nTargetCharset: %s\n", m_defaultCharset.c_str(), m_sourceCharset.c_str(), m_targetCharset.c_str());
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
bgl_entry Babylon::readEntry( ResourceHandler * resourceHandler )
|
||||
{
|
||||
bgl_entry entry;
|
||||
|
||||
if( file == NULL )
|
||||
{
|
||||
entry.headword = "";
|
||||
return entry;
|
||||
}
|
||||
|
||||
bgl_block block;
|
||||
unsigned int len, pos;
|
||||
std::string headword, displayedHeadword;
|
||||
std::string definition;
|
||||
std::string temp;
|
||||
std::vector<std::string> alternates;
|
||||
std::string alternate;
|
||||
|
||||
while( readBlock( block ) )
|
||||
{
|
||||
switch( block.type )
|
||||
{
|
||||
case 2:
|
||||
{
|
||||
pos = 0;
|
||||
len = (unsigned char)block.data[pos++];
|
||||
std::string filename( block.data+pos, len );
|
||||
//if (filename != "8EAF66FD.bmp" && filename != "C2EEF3F6.html") {
|
||||
pos += len;
|
||||
if ( resourceHandler )
|
||||
resourceHandler->handleBabylonResource( filename,
|
||||
block.data + pos,
|
||||
block.length - pos );
|
||||
#if 0
|
||||
FILE *ifile = fopen(filename.c_str(), "w");
|
||||
fwrite(block.data + pos, 1, block.length -pos, ifile);
|
||||
fclose(ifile);
|
||||
#endif
|
||||
break;
|
||||
}
|
||||
case 1:
|
||||
case 10:
|
||||
alternate.clear();
|
||||
headword.clear();
|
||||
displayedHeadword.clear();
|
||||
definition.clear();
|
||||
temp.clear();
|
||||
pos = 0;
|
||||
|
||||
// Headword
|
||||
len = 0;
|
||||
len = (unsigned char)block.data[pos++];
|
||||
|
||||
headword.reserve( len );
|
||||
for(unsigned int a=0;a<len;a++)
|
||||
headword += block.data[pos++];
|
||||
|
||||
convertToUtf8( headword, SOURCE_CHARSET );
|
||||
|
||||
// Definition
|
||||
len = 0;
|
||||
len = (unsigned char)block.data[pos++] << 8;
|
||||
len |= (unsigned char)block.data[pos++];
|
||||
definition.reserve( len );
|
||||
|
||||
for(unsigned int a=0;a<len;a++)
|
||||
{
|
||||
if( (unsigned char)block.data[pos] == 0x0a )
|
||||
{
|
||||
definition += "<br>";
|
||||
pos++;
|
||||
}
|
||||
else if ( (unsigned char)block.data[pos] == 6 )
|
||||
{
|
||||
// Something
|
||||
pos += 2;
|
||||
++a;
|
||||
}
|
||||
else if ( (unsigned char)block.data[pos] >= 0x40 &&
|
||||
len - a >= 2 &&
|
||||
(unsigned char)block.data[pos + 1 ] == 0x18 )
|
||||
{
|
||||
// Hidden displayed headword (a displayed headword which
|
||||
// contains some garbage and shouldn't probably be visible).
|
||||
unsigned length = (unsigned char)block.data[ pos ] - 0x3F;
|
||||
|
||||
if ( length > len - a - 2 )
|
||||
{
|
||||
fprintf( stderr, "Hidden displayed headword is too large %s\n", headword.c_str() );
|
||||
pos += len - a;
|
||||
break;
|
||||
}
|
||||
|
||||
pos += length + 2;
|
||||
a += length + 1;
|
||||
}
|
||||
else if ( (unsigned char)block.data[pos] == 0x18 )
|
||||
{
|
||||
// Displayed headword
|
||||
unsigned length = (unsigned char)block.data[ pos + 1 ];
|
||||
|
||||
if ( length > len - a - 2 )
|
||||
{
|
||||
fprintf( stderr, "Displayed headword's length is too large for headword %s\n", headword.c_str() );
|
||||
pos += len - a;
|
||||
break;
|
||||
}
|
||||
|
||||
displayedHeadword = std::string( block.data + pos + 2, length );
|
||||
pos += length + 2;
|
||||
a += length + 1;
|
||||
}
|
||||
else if ( (unsigned char)block.data[pos] == 0x50 && len - a - 1 >= 2 &&
|
||||
(unsigned char)block.data[pos + 1 ] == 0x1B )
|
||||
{
|
||||
// 1-byte-sized transcription
|
||||
unsigned length = (unsigned char)block.data[pos + 2 ];
|
||||
|
||||
if ( length > len - a - 3 )
|
||||
{
|
||||
fprintf( stderr, "1-byte-sized transcription's length is too large for headword %s\n", headword.c_str() );
|
||||
pos += len - a;
|
||||
break;
|
||||
}
|
||||
|
||||
std::string transcription( block.data + pos + 3, length );
|
||||
|
||||
definition = std::string( "<span class=\"bgltrn\">" ) + transcription + "</span>" + definition;
|
||||
|
||||
pos += length + 3;
|
||||
a += length + 2;
|
||||
}
|
||||
else if ( (unsigned char)block.data[pos] == 0x60 && len - a - 1 >= 3 &&
|
||||
(unsigned char)block.data[pos + 1 ] == 0x1B )
|
||||
{
|
||||
// 2-byte-sized transcription
|
||||
unsigned length = (unsigned char)block.data[pos + 2 ];
|
||||
length <<= 8;
|
||||
length += (unsigned char)block.data[pos + 3 ];
|
||||
|
||||
if ( length > len - a - 4)
|
||||
{
|
||||
fprintf( stderr, "2-byte-sized transcription's length is too large for headword %s\n", headword.c_str() );
|
||||
pos += len - a;
|
||||
break;
|
||||
}
|
||||
|
||||
std::string transcription( block.data + pos + 4, length );
|
||||
|
||||
definition = std::string( "<span class=\"bgltrn\">" ) + transcription + "</span>" + definition;
|
||||
|
||||
pos += length + 4;
|
||||
a += length + 3;
|
||||
}
|
||||
else if ( (unsigned char)block.data[pos] == 0x1E )
|
||||
{
|
||||
// Resource reference begin marker
|
||||
definition += m_resourcePrefix;
|
||||
++pos;
|
||||
}
|
||||
else if ( (unsigned char)block.data[pos] == 0x1F )
|
||||
{
|
||||
// Resource reference end marker
|
||||
++pos;
|
||||
}
|
||||
else if( (unsigned char)block.data[pos] < 0x20 )
|
||||
{
|
||||
if( a <= len - 3 && block.data[pos] == 0x14 && block.data[pos+1] == 0x02 ) {
|
||||
int index = (unsigned char)block.data[pos+2] - 0x30;
|
||||
if (index >= 0 && index <= 10) {
|
||||
definition = "<span class=\"bglpos\">" + partOfSpeech[index] + "</span> " + definition;
|
||||
}
|
||||
pos += 3;
|
||||
a += 2;
|
||||
//pos += len - a;
|
||||
//break;
|
||||
} else if (block.data[pos] == 0x14) {
|
||||
pos++;
|
||||
} else {
|
||||
definition += block.data[pos++];
|
||||
}
|
||||
}else definition += block.data[pos++];
|
||||
}
|
||||
convertToUtf8( definition, TARGET_CHARSET );
|
||||
|
||||
if ( displayedHeadword.size() )
|
||||
convertToUtf8( displayedHeadword, TARGET_CHARSET );
|
||||
|
||||
// Alternate forms
|
||||
while( pos != block.length )
|
||||
{
|
||||
len = (unsigned char)block.data[pos++];
|
||||
alternate.reserve( len );
|
||||
for(unsigned int a=0;a<len;a++) alternate += block.data[pos++];
|
||||
convertToUtf8( alternate, SOURCE_CHARSET );
|
||||
alternates.push_back( alternate );
|
||||
alternate.clear();
|
||||
}
|
||||
|
||||
entry.headword = headword;
|
||||
entry.displayedHeadword = displayedHeadword;
|
||||
entry.definition = definition;
|
||||
entry.alternates = alternates;
|
||||
|
||||
if( block.length ) free( block.data );
|
||||
|
||||
return entry;
|
||||
|
||||
break;
|
||||
default:
|
||||
;
|
||||
}
|
||||
if( block.length ) free( block.data );
|
||||
}
|
||||
entry.headword = "";
|
||||
return entry;
|
||||
}
|
||||
|
||||
|
||||
|
||||
void Babylon::convertToUtf8( std::string &s, unsigned int type )
|
||||
{
|
||||
if( s.size() < 1 ) return;
|
||||
if( type > 2 ) return;
|
||||
|
||||
std::string charset;
|
||||
switch( type )
|
||||
{
|
||||
case DEFAULT_CHARSET:
|
||||
if(!m_defaultCharset.empty()) charset = m_defaultCharset;
|
||||
else charset = m_sourceCharset;
|
||||
break;
|
||||
case SOURCE_CHARSET:
|
||||
if(!m_sourceCharset.empty()) charset = m_sourceCharset;
|
||||
else charset = m_defaultCharset;
|
||||
break;
|
||||
case TARGET_CHARSET:
|
||||
if(!m_targetCharset.empty()) charset = m_targetCharset;
|
||||
else charset = m_defaultCharset;
|
||||
break;
|
||||
default:
|
||||
;
|
||||
}
|
||||
|
||||
iconv_t cd = iconv_open( "UTF-8", charset.c_str() );
|
||||
if( cd == (iconv_t)(-1) )
|
||||
{
|
||||
printf( "Error openning iconv library\n" );
|
||||
exit(1);
|
||||
}
|
||||
|
||||
char *outbuf, *defbuf;
|
||||
size_t inbufbytes, outbufbytes;
|
||||
|
||||
inbufbytes = s.size();
|
||||
outbufbytes = s.size() * 6;
|
||||
#ifdef _WIN32
|
||||
const char *inbuf;
|
||||
inbuf = s.data();
|
||||
#else
|
||||
char *inbuf;
|
||||
inbuf = (char *)s.data();
|
||||
#endif
|
||||
outbuf = (char*)malloc( outbufbytes + 1 );
|
||||
memset( outbuf, '\0', outbufbytes + 1 );
|
||||
defbuf = outbuf;
|
||||
while (inbufbytes) {
|
||||
if (iconv(cd, &inbuf, &inbufbytes, &outbuf, &outbufbytes) == (size_t)-1) {
|
||||
printf( "\n%s\n", inbuf );
|
||||
printf( "Error in iconv conversion\n" );
|
||||
inbuf++;
|
||||
inbufbytes--;
|
||||
}
|
||||
}
|
||||
s = std::string( defbuf );
|
||||
|
||||
free( defbuf );
|
||||
iconv_close( cd );
|
||||
}
|
220
src/bgl_babylon.hh
Normal file
|
@ -0,0 +1,220 @@
|
|||
/***************************************************************************
|
||||
* Copyright (C) 2007 by Raul Fernandes and Karl Grill *
|
||||
* rgbr@yahoo.com.br *
|
||||
* *
|
||||
* This program is free software; you can redistribute it and/or modify *
|
||||
* it under the terms of the GNU General Public License as published by *
|
||||
* the Free Software Foundation; either version 2 of the License, or *
|
||||
* (at your option) any later version. *
|
||||
* *
|
||||
* This program is distributed in the hope that it will be useful, *
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
|
||||
* GNU General Public License for more details. *
|
||||
* *
|
||||
* You should have received a copy of the GNU General Public License *
|
||||
* along with this program; if not, write to the *
|
||||
* Free Software Foundation, Inc., *
|
||||
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. *
|
||||
***************************************************************************/
|
||||
|
||||
#ifndef BABYLON_H
|
||||
#define BABYLON_H
|
||||
|
||||
#include <stdlib.h>
|
||||
#include <zlib.h>
|
||||
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
const std::string bgl_language[] = {
|
||||
"English",
|
||||
"French",
|
||||
"Italian",
|
||||
"Spanish",
|
||||
"Dutch",
|
||||
"Portuguese",
|
||||
"German",
|
||||
"Russian",
|
||||
"Japanese",
|
||||
"Traditional Chinese",
|
||||
"Simplified Chinese",
|
||||
"Greek",
|
||||
"Korean",
|
||||
"Turkish",
|
||||
"Hebrew",
|
||||
"Arabic",
|
||||
"Thai",
|
||||
"Other",
|
||||
"Other Simplified Chinese dialects",
|
||||
"Other Traditional Chinese dialects",
|
||||
"Other Eastern-European languages",
|
||||
"Other Western-European languages",
|
||||
"Other Russian languages",
|
||||
"Other Japanese languages",
|
||||
"Other Baltic languages",
|
||||
"Other Greek languages",
|
||||
"Other Korean dialects",
|
||||
"Other Turkish dialects",
|
||||
"Other Thai dialects",
|
||||
"Polish",
|
||||
"Hungarian",
|
||||
"Czech",
|
||||
"Lithuanian",
|
||||
"Latvian",
|
||||
"Catalan",
|
||||
"Croatian",
|
||||
"Serbian",
|
||||
"Slovak",
|
||||
"Albanian",
|
||||
"Urdu",
|
||||
"Slovenian",
|
||||
"Estonian",
|
||||
"Bulgarian",
|
||||
"Danish",
|
||||
"Finnish",
|
||||
"Icelandic",
|
||||
"Norwegian",
|
||||
"Romanian",
|
||||
"Swedish",
|
||||
"Ukrainian",
|
||||
"Belarusian",
|
||||
"Farsi",
|
||||
"Basque",
|
||||
"Macedonian",
|
||||
"Afrikaans",
|
||||
"Faeroese",
|
||||
"Latin",
|
||||
"Esperanto",
|
||||
"Tamazight",
|
||||
"Armenian"};
|
||||
|
||||
|
||||
const std::string bgl_charsetname[] = {
|
||||
"Default" ,
|
||||
"Latin",
|
||||
"Eastern European",
|
||||
"Cyrillic",
|
||||
"Japanese",
|
||||
"Traditional Chinese",
|
||||
"Simplified Chinese",
|
||||
"Baltic",
|
||||
"Greek",
|
||||
"Korean",
|
||||
"Turkish",
|
||||
"Hebrew",
|
||||
"Arabic",
|
||||
"Thai" };
|
||||
|
||||
const std::string bgl_charset[] = {
|
||||
"ISO-8859-1", /*Default*/
|
||||
"ISO-8859-1", /*Latin*/
|
||||
"ISO-8859-2", /*Eastern European*/
|
||||
"WINDOWS-1251", /*Cyriilic*/
|
||||
"SJIS-WIN", /*Japanese*/
|
||||
"BIG5", /*Traditional Chinese*/
|
||||
"GB18030", /*Simplified Chinese*/
|
||||
"CP1257", /*Baltic*/
|
||||
"CP1253", /*Greek*/
|
||||
"EUC-KR", /*Korean*/
|
||||
"ISO-8859-9", /*Turkish*/
|
||||
"WINDOWS-1255", /*Hebrew*/
|
||||
"CP1256", /*Arabic*/
|
||||
"CP874" /*Thai*/ };
|
||||
|
||||
const std::string partOfSpeech[] = {
|
||||
"n.",
|
||||
"adj.",
|
||||
"v.",
|
||||
"adv.",
|
||||
"interj.",
|
||||
"pron.",
|
||||
"prep.",
|
||||
"conj.",
|
||||
"suff.",
|
||||
"pref.",
|
||||
"art." };
|
||||
|
||||
typedef struct {
|
||||
unsigned type;
|
||||
unsigned length;
|
||||
char * data;
|
||||
} bgl_block;
|
||||
|
||||
typedef struct {
|
||||
std::string headword;
|
||||
std::string definition;
|
||||
std::string displayedHeadword;
|
||||
std::vector<std::string> alternates;
|
||||
} bgl_entry;
|
||||
|
||||
class Babylon
|
||||
{
|
||||
public:
|
||||
Babylon( std::string );
|
||||
~Babylon();
|
||||
|
||||
// Subclass this to store resources
|
||||
class ResourceHandler
|
||||
{
|
||||
public:
|
||||
|
||||
virtual void handleBabylonResource( std::string const & filename,
|
||||
char const * data, size_t size )=0;
|
||||
|
||||
virtual ~ResourceHandler()
|
||||
{}
|
||||
};
|
||||
|
||||
/// Sets a prefix string to append to each resource reference in hyperlinks.
|
||||
void setResourcePrefix( std::string const & prefix )
|
||||
{ m_resourcePrefix = prefix; }
|
||||
|
||||
bool open();
|
||||
void close();
|
||||
bool readBlock( bgl_block& );
|
||||
bool read(std::string &source_charset, std::string &target_charset);
|
||||
bgl_entry readEntry( ResourceHandler * = 0 );
|
||||
|
||||
inline std::string title() const { return m_title; };
|
||||
inline std::string author() const { return m_author; };
|
||||
inline std::string email() const { return m_email; };
|
||||
inline std::string description() const { return m_description; };
|
||||
inline std::string copyright() const { return m_copyright; };
|
||||
inline std::string sourceLang() const { return m_sourceLang; };
|
||||
inline std::string targetLang() const { return m_targetLang; };
|
||||
inline unsigned int numEntries() const { return m_numEntries; };
|
||||
inline std::string charset() const { return m_defaultCharset; };
|
||||
|
||||
inline std::string filename() const { return m_filename; };
|
||||
|
||||
enum
|
||||
{
|
||||
ParserVersion = 2
|
||||
};
|
||||
|
||||
private:
|
||||
unsigned int bgl_readnum( int );
|
||||
void convertToUtf8( std::string &, unsigned int = 0 );
|
||||
|
||||
std::string m_filename;
|
||||
gzFile file;
|
||||
|
||||
std::string m_title;
|
||||
std::string m_author;
|
||||
std::string m_email;
|
||||
std::string m_description;
|
||||
std::string m_copyright;
|
||||
std::string m_sourceLang;
|
||||
std::string m_targetLang;
|
||||
unsigned int m_numEntries;
|
||||
std::string m_defaultCharset;
|
||||
std::string m_sourceCharset;
|
||||
std::string m_targetCharset;
|
||||
|
||||
std::string m_resourcePrefix;
|
||||
|
||||
enum CHARSET { DEFAULT_CHARSET, SOURCE_CHARSET, TARGET_CHARSET };
|
||||
};
|
||||
|
||||
#endif // BABYLON_H
|
643
src/btreeidx.cc
Normal file
|
@ -0,0 +1,643 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include "btreeidx.hh"
|
||||
#include "folding.hh"
|
||||
#include "utf8.hh"
|
||||
#include <math.h>
|
||||
|
||||
//#define __BTREE_USE_LZO
|
||||
// LZO mode is experimental and unsupported. Tests didn't show any substantial
|
||||
// speed improvements.
|
||||
|
||||
#ifdef __BTREE_USE_LZO
|
||||
#include <lzo/lzo1x.h>
|
||||
|
||||
namespace {
|
||||
struct __LzoInit
|
||||
{
|
||||
__LzoInit()
|
||||
{
|
||||
lzo_init();
|
||||
}
|
||||
} __lzoInit;
|
||||
}
|
||||
|
||||
#else
|
||||
#include <zlib.h>
|
||||
#endif
|
||||
|
||||
namespace BtreeIndexing {
|
||||
|
||||
enum
|
||||
{
|
||||
BtreeMinElements = 64,
|
||||
BtreeMaxElements = 2048
|
||||
};
|
||||
|
||||
BtreeDictionary::BtreeDictionary( string const & id,
|
||||
vector< string > const & dictionaryFiles ):
|
||||
Dictionary::Class( id, dictionaryFiles ), idxFile( 0 )
|
||||
{
|
||||
}
|
||||
|
||||
void BtreeDictionary::openIndex( File::Class & file )
|
||||
{
|
||||
indexNodeSize = file.read< uint32_t >();
|
||||
rootOffset = file.read< uint32_t >();
|
||||
|
||||
idxFile = &file;
|
||||
}
|
||||
|
||||
vector< WordArticleLink > BtreeDictionary::findArticles( wstring const & str )
|
||||
{
|
||||
vector< WordArticleLink > result;
|
||||
|
||||
wstring folded = Folding::apply( str );
|
||||
|
||||
bool exactMatch;
|
||||
|
||||
vector< char > leaf;
|
||||
uint32_t nextLeaf;
|
||||
|
||||
char const * chainOffset = findChainOffsetExactOrPrefix( folded, exactMatch,
|
||||
leaf, nextLeaf );
|
||||
|
||||
if ( chainOffset && exactMatch )
|
||||
{
|
||||
result = readChain( chainOffset );
|
||||
|
||||
antialias( str, result );
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
|
||||
void BtreeDictionary::findExact( wstring const & str,
|
||||
vector< wstring > & exactMatches,
|
||||
vector< wstring > & prefixMatches,
|
||||
unsigned long maxPrefixResults )
|
||||
throw( std::exception )
|
||||
{
|
||||
exactMatches.clear();
|
||||
prefixMatches.clear();
|
||||
|
||||
wstring folded = Folding::apply( str );
|
||||
|
||||
bool exactMatch;
|
||||
|
||||
vector< char > leaf;
|
||||
uint32_t nextLeaf;
|
||||
|
||||
char const * chainOffset = findChainOffsetExactOrPrefix( folded, exactMatch,
|
||||
leaf, nextLeaf );
|
||||
|
||||
if ( !chainOffset )
|
||||
return;
|
||||
|
||||
for( ; ; )
|
||||
{
|
||||
//printf( "offset = %u, size = %u\n", chainOffset - &leaf.front(), leaf.size() );
|
||||
|
||||
vector< WordArticleLink > chain = readChain( chainOffset );
|
||||
vector< wstring > wstrings = convertChainToWstrings( chain );
|
||||
|
||||
wstring resultFolded = Folding::apply( wstrings[ 0 ] );
|
||||
|
||||
if ( resultFolded == folded )
|
||||
// Exact match
|
||||
exactMatches.insert( exactMatches.end(), wstrings.begin(), wstrings.end() );
|
||||
else
|
||||
if ( resultFolded.size() > folded.size() && !resultFolded.compare( 0, folded.size(), folded ) )
|
||||
{
|
||||
// Prefix match
|
||||
prefixMatches.insert( prefixMatches.end(), wstrings.begin(), wstrings.end() );
|
||||
|
||||
if ( prefixMatches.size() >= maxPrefixResults )
|
||||
{
|
||||
// For now we actually allow more than maxPrefixResults if the last
|
||||
// chain yield more than one result. That's ok and maybe even more
|
||||
// desirable.
|
||||
break;
|
||||
}
|
||||
}
|
||||
else
|
||||
// No match at all, end this
|
||||
break;
|
||||
|
||||
// Fetch new leaf if we're out of chains here
|
||||
|
||||
if ( chainOffset > &leaf.back() )
|
||||
{
|
||||
// We're past the current leaf, fetch the next one
|
||||
|
||||
//printf( "advancing\n" );
|
||||
|
||||
if ( nextLeaf )
|
||||
{
|
||||
readNode( nextLeaf, leaf );
|
||||
nextLeaf = idxFile->read< uint32_t >();
|
||||
chainOffset = &leaf.front() + sizeof( uint32_t );
|
||||
|
||||
uint32_t leafEntries = *(uint32_t *)&leaf.front();
|
||||
|
||||
if ( leafEntries == 0xffffFFFF )
|
||||
{
|
||||
//printf( "bah!\n" );
|
||||
exit( 1 );
|
||||
}
|
||||
}
|
||||
else
|
||||
break; // That was the last leaf
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void BtreeDictionary::readNode( uint32_t offset, vector< char > & out )
|
||||
{
|
||||
idxFile->seek( offset );
|
||||
|
||||
uint32_t uncompressedSize = idxFile->read< uint32_t >();
|
||||
uint32_t compressedSize = idxFile->read< uint32_t >();
|
||||
|
||||
//printf( "%x,%x\n", uncompressedSize, compressedSize );
|
||||
|
||||
out.resize( uncompressedSize );
|
||||
|
||||
vector< unsigned char > compressedData( compressedSize );
|
||||
|
||||
idxFile->read( &compressedData.front(), compressedData.size() );
|
||||
|
||||
#ifdef __BTREE_USE_LZO
|
||||
|
||||
lzo_uint decompressedLength = out.size();
|
||||
|
||||
if ( lzo1x_decompress( &compressedData.front(), compressedData.size(),
|
||||
(unsigned char *)&out.front(), &decompressedLength, 0 )
|
||||
!= LZO_E_OK || decompressedLength != out.size() )
|
||||
throw exFailedToDecompressNode();
|
||||
|
||||
#else
|
||||
|
||||
unsigned long decompressedLength = out.size();
|
||||
|
||||
if ( uncompress( (unsigned char *)&out.front(),
|
||||
&decompressedLength,
|
||||
&compressedData.front(),
|
||||
compressedData.size() ) != Z_OK ||
|
||||
decompressedLength != out.size() )
|
||||
throw exFailedToDecompressNode();
|
||||
#endif
|
||||
}
|
||||
|
||||
char const * BtreeDictionary::findChainOffsetExactOrPrefix( wstring const & target,
|
||||
bool & exactMatch,
|
||||
vector< char > & leaf,
|
||||
uint32_t & nextLeaf )
|
||||
{
|
||||
if ( !idxFile )
|
||||
throw exIndexWasNotOpened();
|
||||
|
||||
// Lookup the index by traversing the index btree
|
||||
|
||||
vector< char > charBuffer;
|
||||
vector< wchar_t > wcharBuffer;
|
||||
vector< char > wordsBuffer;
|
||||
|
||||
exactMatch = false;
|
||||
|
||||
// Read a node
|
||||
|
||||
uint32_t currentNodeOffset = rootOffset;
|
||||
|
||||
for( ; ; )
|
||||
{
|
||||
//printf( "reading node at %x\n", currentNodeOffset );
|
||||
readNode( currentNodeOffset, leaf );
|
||||
|
||||
// Is it a leaf or a node?
|
||||
|
||||
uint32_t leafEntries = *(uint32_t *)&leaf.front();
|
||||
|
||||
if ( leafEntries == 0xffffFFFF )
|
||||
{
|
||||
// A node
|
||||
|
||||
//printf( "=>a node\n" );
|
||||
|
||||
uint32_t const * offsets = (uint32_t *)&leaf.front() + 1;
|
||||
|
||||
char const * ptr = &leaf.front() + sizeof( uint32_t ) +
|
||||
( indexNodeSize + 1 ) * sizeof( uint32_t );
|
||||
|
||||
unsigned entry;
|
||||
|
||||
for( entry = 0; entry < indexNodeSize; ++entry )
|
||||
{
|
||||
//printf( "checking node agaist word %s\n", ptr );
|
||||
size_t wordSize = strlen( ptr );
|
||||
|
||||
if ( wcharBuffer.size() <= wordSize )
|
||||
wcharBuffer.resize( wordSize + 1 );
|
||||
|
||||
long result = Utf8::decode( ptr, wordSize, &wcharBuffer.front() );
|
||||
|
||||
if ( result < 0 )
|
||||
throw Utf8::exCantDecode( ptr );
|
||||
|
||||
wcharBuffer[ result ] = 0;
|
||||
|
||||
int compareResult = target.compare( &wcharBuffer.front() );
|
||||
|
||||
if ( !compareResult )
|
||||
{
|
||||
// The target string matches the current one.
|
||||
// Go to the right, since it's there where we store such results.
|
||||
currentNodeOffset = offsets[ entry + 1 ];
|
||||
break;
|
||||
}
|
||||
if ( compareResult < 0 )
|
||||
{
|
||||
// The target string is smaller than the current one.
|
||||
// Go to the left.
|
||||
currentNodeOffset = offsets[ entry ];
|
||||
break;
|
||||
}
|
||||
|
||||
ptr += wordSize + 1;
|
||||
}
|
||||
|
||||
if ( entry == indexNodeSize )
|
||||
{
|
||||
// We iterated through all entries, but our string is larger than
|
||||
// all of them. Go the the rightmost node.
|
||||
currentNodeOffset = offsets[ entry ];
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
//printf( "=>a leaf\n" );
|
||||
// A leaf
|
||||
nextLeaf = idxFile->read< uint32_t >();
|
||||
|
||||
// Iterate through chains until we find one that matches
|
||||
|
||||
char const * ptr = &leaf.front() + sizeof( uint32_t );
|
||||
|
||||
uint32_t chainSize;
|
||||
|
||||
while( leafEntries-- )
|
||||
{
|
||||
memcpy( &chainSize, ptr, sizeof( uint32_t ) );
|
||||
ptr += sizeof( uint32_t );
|
||||
|
||||
if( chainSize )
|
||||
{
|
||||
size_t wordSize = strlen( ptr );
|
||||
|
||||
if ( wcharBuffer.size() <= wordSize )
|
||||
wcharBuffer.resize( wordSize + 1 );
|
||||
|
||||
//printf( "checking agaist word %s, left = %u\n", ptr, leafEntries );
|
||||
|
||||
long result = Utf8::decode( ptr, wordSize, &wcharBuffer.front() );
|
||||
|
||||
if ( result < 0 )
|
||||
throw Utf8::exCantDecode( ptr );
|
||||
|
||||
wcharBuffer[ result ] = 0;
|
||||
|
||||
wstring foldedWord = Folding::apply( &wcharBuffer.front() );
|
||||
|
||||
int compareResult = target.compare( foldedWord );
|
||||
|
||||
if ( !compareResult )
|
||||
{
|
||||
// Exact match -- return and be done
|
||||
exactMatch = true;
|
||||
|
||||
return ptr - sizeof( uint32_t );
|
||||
}
|
||||
else
|
||||
if ( compareResult < 0 )
|
||||
{
|
||||
// The target string is smaller than the current one.
|
||||
// No point in travering further, return this result.
|
||||
|
||||
return ptr - sizeof( uint32_t );
|
||||
}
|
||||
ptr += chainSize;
|
||||
}
|
||||
}
|
||||
|
||||
// Well, our target is larger than all the chains here. This would mean
|
||||
// that the next leaf is the right one.
|
||||
|
||||
if ( nextLeaf )
|
||||
{
|
||||
readNode( nextLeaf, leaf );
|
||||
|
||||
nextLeaf = idxFile->read< uint32_t >();
|
||||
|
||||
return &leaf.front() + sizeof( uint32_t );
|
||||
}
|
||||
else
|
||||
return 0; // This was the last leaf
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
vector< WordArticleLink > BtreeDictionary::readChain( char const * & ptr )
|
||||
{
|
||||
uint32_t chainSize;
|
||||
|
||||
memcpy( &chainSize, ptr, sizeof( uint32_t ) );
|
||||
|
||||
ptr += sizeof( uint32_t );
|
||||
|
||||
vector< WordArticleLink > result;
|
||||
|
||||
vector< char > charBuffer;
|
||||
|
||||
while( chainSize )
|
||||
{
|
||||
string str = ptr;
|
||||
ptr += str.size() + 1;
|
||||
|
||||
uint32_t articleOffset;
|
||||
|
||||
memcpy( &articleOffset, ptr, sizeof( uint32_t ) );
|
||||
|
||||
ptr += sizeof( uint32_t );
|
||||
|
||||
result.push_back( WordArticleLink( str, articleOffset ) );
|
||||
|
||||
if ( chainSize < str.size() + 1 + sizeof( uint32_t ) )
|
||||
throw exCorruptedChainData();
|
||||
else
|
||||
chainSize -= str.size() + 1 + sizeof( uint32_t );
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
vector< wstring > BtreeDictionary::convertChainToWstrings(
|
||||
vector< WordArticleLink > const & chain )
|
||||
{
|
||||
vector< wchar_t > wcharBuffer;
|
||||
|
||||
vector< wstring > result;
|
||||
|
||||
for( unsigned x = 0; x < chain.size(); ++x )
|
||||
{
|
||||
unsigned wordSize = chain[ x ].word.size();
|
||||
|
||||
if ( wcharBuffer.size() <= wordSize )
|
||||
wcharBuffer.resize( wordSize + 1 );
|
||||
|
||||
long len = Utf8::decode( chain[ x ].word.data(), wordSize,
|
||||
&wcharBuffer.front() );
|
||||
|
||||
if ( len < 0 )
|
||||
{
|
||||
fprintf( stderr, "Failed to decode utf8 of a word %s, skipping it.\n",
|
||||
chain[ x ].word.c_str() );
|
||||
continue;
|
||||
}
|
||||
|
||||
wcharBuffer[ len ] = 0;
|
||||
|
||||
result.push_back( &wcharBuffer.front() );
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
void BtreeDictionary::antialias( wstring const & str,
|
||||
vector< WordArticleLink > & chain )
|
||||
{
|
||||
wstring caseFolded = Folding::applySimpleCaseOnly( str );
|
||||
|
||||
for( unsigned x = chain.size(); x--; )
|
||||
{
|
||||
// If after applying case folding to each word they wouldn't match, we
|
||||
// drop the entry.
|
||||
if ( Folding::applySimpleCaseOnly( Utf8::decode( chain[ x ].word ) ) !=
|
||||
caseFolded )
|
||||
chain.erase( chain.begin() + x );
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/// A function which recursively creates btree node.
|
||||
/// The nextIndex iterator is being iterated over and increased when building
|
||||
/// leaf nodes.
|
||||
static uint32_t buildBtreeNode( IndexedWords::const_iterator & nextIndex,
|
||||
size_t indexSize,
|
||||
File::Class & file, size_t maxElements,
|
||||
uint32_t & lastLeafLinkOffset )
|
||||
{
|
||||
// We compress all the node data. This buffer would hold it.
|
||||
vector< unsigned char > uncompressedData;
|
||||
|
||||
bool isLeaf = indexSize <= maxElements;
|
||||
|
||||
if ( isLeaf )
|
||||
{
|
||||
// A leaf.
|
||||
|
||||
uint32_t totalChainsLength = 0;
|
||||
|
||||
IndexedWords::const_iterator nextWord = nextIndex;
|
||||
|
||||
for( unsigned x = indexSize; x--; ++nextWord )
|
||||
{
|
||||
totalChainsLength += sizeof( uint32_t );
|
||||
|
||||
vector< WordArticleLink > const & chain = nextWord->second;
|
||||
|
||||
for( unsigned y = 0; y < chain.size(); ++y )
|
||||
totalChainsLength += chain[ y ].word.size() + 1 + sizeof( uint32_t );
|
||||
}
|
||||
|
||||
uncompressedData.resize( sizeof( uint32_t ) + totalChainsLength );
|
||||
|
||||
// First uint32_t indicates that this is a leaf.
|
||||
*(uint32_t *)&uncompressedData.front() = indexSize;
|
||||
|
||||
unsigned char * ptr = &uncompressedData.front() + sizeof( uint32_t );
|
||||
|
||||
for( unsigned x = indexSize; x--; ++nextIndex )
|
||||
{
|
||||
vector< WordArticleLink > const & chain = nextIndex->second;
|
||||
|
||||
unsigned char * saveSizeHere = ptr;
|
||||
|
||||
ptr += sizeof( uint32_t );
|
||||
|
||||
uint32_t size = 0;
|
||||
|
||||
for( unsigned y = 0; y < chain.size(); ++y )
|
||||
{
|
||||
memcpy( ptr, chain[ y ].word.c_str(), chain[ y ].word.size() + 1 );
|
||||
ptr += chain[ y ].word.size() + 1;
|
||||
|
||||
memcpy( ptr, &(chain[ y ].articleOffset), sizeof( uint32_t ) );
|
||||
ptr += sizeof( uint32_t );
|
||||
|
||||
size += chain[ y ].word.size() + 1 + sizeof( uint32_t );
|
||||
}
|
||||
|
||||
memcpy( saveSizeHere, &size, sizeof( uint32_t ) );
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
// A node which will have children.
|
||||
|
||||
uncompressedData.resize( sizeof( uint32_t ) + ( maxElements + 1 ) * sizeof( uint32_t ) );
|
||||
|
||||
// First uint32_t indicates that this is a node.
|
||||
*(uint32_t *)&uncompressedData.front() = 0xffffFFFF;
|
||||
|
||||
unsigned prevEntry = 0;
|
||||
|
||||
vector< char > charBuffer;
|
||||
|
||||
for( unsigned x = 0; x < maxElements; ++x )
|
||||
{
|
||||
unsigned curEntry = (uint64_t) indexSize * ( x + 1 ) / ( maxElements + 1 );
|
||||
|
||||
uint32_t offset = buildBtreeNode( nextIndex,
|
||||
curEntry - prevEntry,
|
||||
file, maxElements,
|
||||
lastLeafLinkOffset );
|
||||
|
||||
memcpy( &uncompressedData.front() + sizeof( uint32_t ) + x * sizeof( uint32_t ), &offset, sizeof( uint32_t ) );
|
||||
|
||||
if ( charBuffer.size() < nextIndex->first.size() * 4 )
|
||||
charBuffer.resize( nextIndex->first.size() * 4 );
|
||||
|
||||
size_t sz = Utf8::encode( nextIndex->first.data(), nextIndex->first.size(),
|
||||
&charBuffer.front() );
|
||||
|
||||
size_t prevSize = uncompressedData.size();
|
||||
uncompressedData.resize( prevSize + sz + 1 );
|
||||
|
||||
memcpy( &uncompressedData.front() + prevSize, &charBuffer.front(), sz );
|
||||
|
||||
uncompressedData.back() = 0;
|
||||
|
||||
prevEntry = curEntry;
|
||||
}
|
||||
|
||||
// Rightmost child
|
||||
uint32_t offset = buildBtreeNode( nextIndex,
|
||||
indexSize - prevEntry,
|
||||
file, maxElements,
|
||||
lastLeafLinkOffset );
|
||||
memcpy( &uncompressedData.front() + sizeof( uint32_t ) +
|
||||
maxElements * sizeof( uint32_t ), &offset, sizeof( offset ) );
|
||||
}
|
||||
|
||||
// Save the result.
|
||||
|
||||
#ifdef __BTREE_USE_LZO
|
||||
|
||||
vector< unsigned char > compressedData( uncompressedData.size() + uncompressedData.size() / 16 + 64 + 3 );
|
||||
|
||||
char workMem[ LZO1X_1_MEM_COMPRESS ];
|
||||
|
||||
lzo_uint compressedSize;
|
||||
|
||||
if ( lzo1x_1_compress( &uncompressedData.front(), uncompressedData.size(),
|
||||
&compressedData.front(), &compressedSize, workMem )
|
||||
!= LZO_E_OK )
|
||||
{
|
||||
fprintf( stderr, "Failed to compress btree node.\n" );
|
||||
abort();
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
vector< unsigned char > compressedData( compressBound( uncompressedData.size() ) );
|
||||
|
||||
unsigned long compressedSize = compressedData.size();
|
||||
|
||||
if ( compress( &compressedData.front(), &compressedSize,
|
||||
&uncompressedData.front(), uncompressedData.size() ) != Z_OK )
|
||||
{
|
||||
fprintf( stderr, "Failed to compress btree node.\n" );
|
||||
abort();
|
||||
}
|
||||
|
||||
#endif
|
||||
|
||||
uint32_t offset = file.tell();
|
||||
|
||||
file.write< uint32_t >( uncompressedData.size() );
|
||||
file.write< uint32_t >( compressedSize );
|
||||
file.write( &compressedData.front(), compressedSize );
|
||||
|
||||
if ( isLeaf )
|
||||
{
|
||||
// A link to the next leef, which is zero and which will be updated
|
||||
// should we happen to have another leaf.
|
||||
|
||||
file.write( ( uint32_t ) 0 );
|
||||
|
||||
uint32_t here = file.tell();
|
||||
|
||||
if ( lastLeafLinkOffset )
|
||||
{
|
||||
// Update the previous leaf to have the offset of this one.
|
||||
file.seek( lastLeafLinkOffset );
|
||||
file.write( offset );
|
||||
file.seek( here );
|
||||
}
|
||||
|
||||
// Make sure next leaf knows where to write its offset for us.
|
||||
lastLeafLinkOffset = here - sizeof( uint32_t );
|
||||
}
|
||||
|
||||
return offset;
|
||||
}
|
||||
|
||||
uint32_t buildIndex( IndexedWords const & indexedWords, File::Class & file )
|
||||
{
|
||||
// We try to stick to two-level tree for most dictionaries. Try finding
|
||||
// the right size for it.
|
||||
|
||||
size_t btreeMaxElements = ( (size_t) sqrt( indexedWords.size() ) ) + 1;
|
||||
|
||||
if ( btreeMaxElements < BtreeMinElements )
|
||||
btreeMaxElements = BtreeMinElements;
|
||||
else
|
||||
if ( btreeMaxElements > BtreeMaxElements )
|
||||
btreeMaxElements = BtreeMaxElements;
|
||||
|
||||
printf( "Building a tree of %u elements\n", btreeMaxElements );
|
||||
|
||||
IndexedWords::const_iterator nextIndex = indexedWords.begin();
|
||||
|
||||
uint32_t lastLeafOffset = 0;
|
||||
|
||||
uint32_t rootOffset = buildBtreeNode( nextIndex, indexedWords.size(),
|
||||
file, btreeMaxElements,
|
||||
lastLeafOffset );
|
||||
|
||||
// We need to save btreeMaxElements. For simplicity, we just save it here
|
||||
// along with root offset, and then return that record's offset as the
|
||||
// offset of the index itself.
|
||||
|
||||
uint32_t indexOffset = file.tell();
|
||||
|
||||
file.write( btreeMaxElements );
|
||||
file.write( rootOffset );
|
||||
|
||||
return indexOffset;
|
||||
}
|
||||
|
||||
|
||||
}
|
128
src/btreeidx.hh
Normal file
|
@ -0,0 +1,128 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __BTREEIDX_HH_INCLUDED__
|
||||
#define __BTREEIDX_HH_INCLUDED__
|
||||
|
||||
#include "dictionary.hh"
|
||||
#include "file.hh"
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include <map>
|
||||
|
||||
/// A base for the dictionary which creates a btree index to look up
|
||||
/// the words.
|
||||
namespace BtreeIndexing {
|
||||
|
||||
using std::string;
|
||||
using std::wstring;
|
||||
using std::vector;
|
||||
using std::map;
|
||||
|
||||
enum
|
||||
{
|
||||
/// This is to be bumped up each time the internal format changes.
|
||||
/// The value isn't used here by itself, it is supposed to be added
|
||||
/// to each dictionary's internal format version.
|
||||
FormatVersion = 1
|
||||
};
|
||||
|
||||
// These exceptions which might be thrown during the index traversal
|
||||
|
||||
DEF_EX( exIndexWasNotOpened, "The index wasn't opened", Dictionary::Ex )
|
||||
DEF_EX( exFailedToDecompressNode, "Failed to decompress a btree's node", Dictionary::Ex )
|
||||
DEF_EX( exCorruptedChainData, "Corrupted chain data in the leaf of a btree encountered", Dictionary::Ex )
|
||||
|
||||
/// This structure describes a word linked to its translation. The
|
||||
/// translation is represented as an abstract 32-bit offset.
|
||||
struct WordArticleLink
|
||||
{
|
||||
string word; // in utf8
|
||||
uint32_t articleOffset;
|
||||
|
||||
WordArticleLink()
|
||||
{}
|
||||
|
||||
WordArticleLink( string const & word_, uint32_t articleOffset_ ):
|
||||
word( word_ ), articleOffset( articleOffset_ )
|
||||
{}
|
||||
};
|
||||
|
||||
/// A base for the dictionary that utilizes a btree index build using
|
||||
/// buildIndex() function declared below.
|
||||
class BtreeDictionary: public Dictionary::Class
|
||||
{
|
||||
public:
|
||||
|
||||
BtreeDictionary( string const & id, vector< string > const & dictionaryFiles );
|
||||
|
||||
/// This function does the search using the btree index. Derivatives
|
||||
/// need not to implement this function.
|
||||
virtual void findExact( wstring const &,
|
||||
vector< wstring > &,
|
||||
vector< wstring > &,
|
||||
unsigned long ) throw( std::exception );
|
||||
|
||||
protected:
|
||||
|
||||
/// Opens the index. The file must be positioned at the offset previously
|
||||
/// returned by buildIndex(). The file reference is saved to be used for
|
||||
/// subsequent lookups.
|
||||
void openIndex( File::Class & );
|
||||
|
||||
/// Finds articles that match the given string. A case-insensitive search
|
||||
/// is performed.
|
||||
vector< WordArticleLink > findArticles( wstring const & );
|
||||
|
||||
private:
|
||||
|
||||
File::Class * idxFile;
|
||||
uint32_t indexNodeSize;
|
||||
uint32_t rootOffset;
|
||||
|
||||
/// Finds the offset in the btree leaf for the given word, either matching
|
||||
/// by an exact match, or by finding the smallest entry that might match
|
||||
/// by prefix. It can return zero if there isn't even a possible prefx
|
||||
/// match. The input string must already be folded. The exactMatch is set
|
||||
/// to true when an exact match is located, and to false otherwise.
|
||||
/// The located leaf is loaded to 'leaf', and the pointer to the next
|
||||
/// leaf is saved to 'nextLeaf'.
|
||||
char const * findChainOffsetExactOrPrefix( wstring const & target,
|
||||
bool & exactMatch,
|
||||
vector< char > & leaf,
|
||||
uint32_t & nextLeaf );
|
||||
|
||||
/// Reads a node or leaf at the given offset. Just uncompresses its data
|
||||
/// to the given vector and does nothing more.
|
||||
void readNode( uint32_t offset, vector< char > & out );
|
||||
|
||||
/// Reads the word-article links' chain at the given offset. The pointer
|
||||
/// is updated to point to the next chain, if there's any.
|
||||
vector< WordArticleLink > readChain( char const * & );
|
||||
|
||||
/// Converts words in a chain to a vector of wide strings. The article
|
||||
/// offsets don't get used.
|
||||
vector< wstring > convertChainToWstrings( vector< WordArticleLink > const & );
|
||||
|
||||
/// Drops any alises which arose due to folding. Only case-folded aliases
|
||||
/// are left.
|
||||
void antialias( wstring const &, vector< WordArticleLink > & );
|
||||
};
|
||||
|
||||
// Everything below is for building the index data.
|
||||
|
||||
/// This represents the index in its source form, as a map which binds folded
|
||||
/// words to sequences of their unfolded source forms and the corresponding
|
||||
/// article offsets.
|
||||
typedef map< wstring, vector< WordArticleLink > > IndexedWords;
|
||||
|
||||
|
||||
/// Builds the index, as a compressed btree. Returns offset to its root.
|
||||
/// All the data is stored to the given file, beginning from its current
|
||||
/// position.
|
||||
uint32_t buildIndex( IndexedWords const &, File::Class & file );
|
||||
|
||||
}
|
||||
|
||||
#endif
|
||||
|
136
src/chunkedstorage.cc
Normal file
|
@ -0,0 +1,136 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include "chunkedstorage.hh"
|
||||
#include <zlib.h>
|
||||
|
||||
namespace ChunkedStorage {
|
||||
|
||||
enum
|
||||
{
|
||||
ChunkMaxSize = 65536 // Can't be more since it would overflow the address
|
||||
};
|
||||
|
||||
Writer::Writer( File::Class & f ):
|
||||
file( f ), chunkStarted( false ), bufferUsed( 0 )
|
||||
{
|
||||
}
|
||||
|
||||
uint32_t Writer::startNewBlock()
|
||||
{
|
||||
if ( bufferUsed >= ChunkMaxSize )
|
||||
{
|
||||
// Need to flush first.
|
||||
saveCurrentChunk();
|
||||
}
|
||||
|
||||
chunkStarted = true;
|
||||
|
||||
// The address is comprised of the offset within the chunk (in lower
|
||||
// 16 bits, always fits there since ChunkMaxSize-1 does) and the
|
||||
// number of the chunk, which is therefore limited to be 65535 max.
|
||||
return bufferUsed | ( (uint32_t)offsets.size() << 16 );
|
||||
}
|
||||
|
||||
void Writer::addToBlock( void const * data, size_t size )
|
||||
{
|
||||
if ( !size )
|
||||
return;
|
||||
|
||||
if ( buffer.size() - bufferUsed < size )
|
||||
buffer.resize( bufferUsed + size );
|
||||
|
||||
memcpy( &buffer.front() + bufferUsed, data, size );
|
||||
|
||||
bufferUsed += size;
|
||||
|
||||
chunkStarted = false;
|
||||
}
|
||||
|
||||
void Writer::saveCurrentChunk()
|
||||
{
|
||||
size_t maxCompressedSize = compressBound( bufferUsed );
|
||||
|
||||
if ( bufferCompressed.size() < maxCompressedSize )
|
||||
bufferCompressed.resize( maxCompressedSize );
|
||||
|
||||
unsigned long compressedSize = bufferCompressed.size();
|
||||
|
||||
if ( compress( &bufferCompressed.front(), &compressedSize,
|
||||
&buffer.front(), bufferUsed ) != Z_OK )
|
||||
throw exFailedToCompressChunk();
|
||||
|
||||
offsets.push_back( file.tell() );
|
||||
|
||||
file.write( (uint32_t) bufferUsed );
|
||||
file.write( (uint32_t) compressedSize );
|
||||
file.write( &bufferCompressed.front(), compressedSize );
|
||||
|
||||
bufferUsed = 0;
|
||||
|
||||
chunkStarted = false;
|
||||
}
|
||||
|
||||
uint32_t Writer::finish()
|
||||
{
|
||||
if ( bufferUsed || chunkStarted )
|
||||
saveCurrentChunk();
|
||||
|
||||
uint32_t offset = file.tell();
|
||||
|
||||
file.write( (uint32_t) offsets.size() );
|
||||
file.write( &offsets.front(), offsets.size() * sizeof( uint32_t ) );
|
||||
|
||||
offsets.clear();
|
||||
chunkStarted = false;
|
||||
|
||||
return offset;
|
||||
}
|
||||
|
||||
Reader::Reader( File::Class & f, uint32_t offset ): file( f )
|
||||
{
|
||||
file.seek( offset );
|
||||
|
||||
offsets.resize( file.read< uint32_t >() );
|
||||
file.read( &offsets.front(), offsets.size() * sizeof( uint32_t ) );
|
||||
}
|
||||
|
||||
char * Reader::getBlock( uint32_t address, vector< char > & chunk )
|
||||
{
|
||||
size_t chunkIdx = address >> 16;
|
||||
|
||||
if ( chunkIdx >= offsets.size() )
|
||||
throw exAddressOutOfRange();
|
||||
|
||||
// Read and decompress the chunk
|
||||
{
|
||||
file.seek( offsets[ chunkIdx ] );
|
||||
|
||||
uint32_t uncompressedSize = file.read< uint32_t >();
|
||||
uint32_t compressedSize = file.read< uint32_t >();
|
||||
|
||||
chunk.resize( uncompressedSize );
|
||||
|
||||
vector< unsigned char > compressedData( compressedSize );
|
||||
|
||||
file.read( &compressedData.front(), compressedData.size() );
|
||||
|
||||
unsigned long decompressedLength = chunk.size();
|
||||
|
||||
if ( uncompress( (unsigned char *)&chunk.front(),
|
||||
&decompressedLength,
|
||||
&compressedData.front(),
|
||||
compressedData.size() ) != Z_OK ||
|
||||
decompressedLength != chunk.size() )
|
||||
throw exFailedToDecompressChunk();
|
||||
}
|
||||
|
||||
size_t offsetInChunk = address & 0xffFF;
|
||||
|
||||
if ( offsetInChunk > chunk.size() ) // It can be equal to for 0-sized blocks
|
||||
throw exAddressOutOfRange();
|
||||
|
||||
return &chunk.front() + offsetInChunk;
|
||||
}
|
||||
|
||||
}
|
87
src/chunkedstorage.hh
Normal file
|
@ -0,0 +1,87 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __CHUNKEDSTORAGE_HH_INCLUDED__
|
||||
#define __CHUNKEDSTORAGE_HH_INCLUDED__
|
||||
|
||||
#include "ex.hh"
|
||||
#include "file.hh"
|
||||
#include <vector>
|
||||
|
||||
/// A chunked compression storage. We use this for articles' bodies. The idea
|
||||
/// is to store data in a separately-compressed chunks, much like in dictzip,
|
||||
/// but without any fancy gzip-compatibility or whatever. Another difference
|
||||
/// is that any block of data saved is always contained without one chunk,
|
||||
/// even if its size does exceed its maximum allowed size. This is very
|
||||
/// handy since we're retrieving the data by the same blocks we used to save
|
||||
/// it as, that' the only kind of seek we support, really.
|
||||
namespace ChunkedStorage {
|
||||
|
||||
using std::vector;
|
||||
|
||||
DEF_EX( Ex, "Chunked storage exception", std::exception )
|
||||
DEF_EX( exFailedToCompressChunk, "Failed to compress a chunk", Ex )
|
||||
DEF_EX( exAddressOutOfRange, "The given chunked address is out of range", Ex )
|
||||
DEF_EX( exFailedToDecompressChunk, "Failed to decompress a chunk", Ex )
|
||||
|
||||
/// This class writes data blocks in chunks.
|
||||
class Writer
|
||||
{
|
||||
vector< uint32_t > offsets;
|
||||
File::Class & file;
|
||||
|
||||
public:
|
||||
|
||||
Writer( File::Class & );
|
||||
|
||||
/// Starts new block. Returns its address.
|
||||
uint32_t startNewBlock();
|
||||
|
||||
/// Add data to the previously started block.
|
||||
void addToBlock( void const * data, size_t size );
|
||||
|
||||
/// Finishes writing chunks and returns the offset to the chunk table which
|
||||
/// gets written at the moment of finishing.
|
||||
uint32_t finish();
|
||||
|
||||
private:
|
||||
|
||||
/// Indicates that an address was allocated, which would mean the writeout
|
||||
/// of the pending chunk is required even if its size is zero.
|
||||
bool chunkStarted;
|
||||
|
||||
// This buffer accumulates the chunk data until either enough data is
|
||||
// stored (>=ChunkMaxSize), or there's no more data left to store.
|
||||
vector< unsigned char > buffer;
|
||||
|
||||
// Here we compress the chunk before writing it out to file.
|
||||
vector< unsigned char > bufferCompressed;
|
||||
|
||||
// The amount of data stored in buffer so far. We keep it separate
|
||||
// from buffer.size() for performance reasons; the latter one only
|
||||
// grows, but never shrinks.
|
||||
size_t bufferUsed;
|
||||
|
||||
void saveCurrentChunk();
|
||||
};
|
||||
|
||||
/// This class reads data blocks previously written by Writer.
|
||||
class Reader
|
||||
{
|
||||
vector< uint32_t > offsets;
|
||||
File::Class & file;
|
||||
|
||||
public:
|
||||
/// Creates reader by giving it a file to read from and the offset returned
|
||||
/// by Writer::finish().
|
||||
Reader( File::Class &, uint32_t );
|
||||
|
||||
/// Reads the block previously written by Writer, identified by its address.
|
||||
/// Uses the user-provided storage to load the entire chunk, and then to
|
||||
/// return a pointer to the requested block inside it.
|
||||
char * getBlock( uint32_t address, vector< char > & );
|
||||
};
|
||||
|
||||
}
|
||||
|
||||
#endif
|
203
src/config.cc
Normal file
|
@ -0,0 +1,203 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include "config.hh"
|
||||
#include <QDir>
|
||||
#include <QFile>
|
||||
#include <QtXml>
|
||||
|
||||
namespace Config {
|
||||
|
||||
namespace
|
||||
{
|
||||
QDir getHomeDir()
|
||||
{
|
||||
QDir result = QDir::home();
|
||||
|
||||
char const * pathInHome =
|
||||
#ifdef Q_OS_WIN32
|
||||
"Application Data/GoldenDict"
|
||||
#else
|
||||
".goldendict"
|
||||
#endif
|
||||
;
|
||||
|
||||
result.mkpath( pathInHome );
|
||||
|
||||
if ( !result.cd( pathInHome ) )
|
||||
throw exCantUseHomeDir();
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
QString getConfigFileName()
|
||||
{
|
||||
return getHomeDir().absoluteFilePath( "config" );
|
||||
}
|
||||
}
|
||||
|
||||
Class load() throw( exError )
|
||||
{
|
||||
QString configName = getConfigFileName();
|
||||
|
||||
if ( !QFile::exists( configName ) )
|
||||
{
|
||||
// Make the default config, save it and return it
|
||||
Class c;
|
||||
|
||||
#ifdef Q_OS_LINUX
|
||||
|
||||
if ( QDir( "/usr/share/stardict/dic" ).exists() )
|
||||
c.paths.push_back( "/usr/share/stardict/dic" );
|
||||
|
||||
#endif
|
||||
|
||||
save( c );
|
||||
|
||||
return c;
|
||||
}
|
||||
|
||||
QFile configFile( configName );
|
||||
|
||||
if ( !configFile.open( QFile::ReadOnly ) )
|
||||
throw exCantReadConfigFile();
|
||||
|
||||
QDomDocument dd;
|
||||
|
||||
QString errorStr;
|
||||
int errorLine, errorColumn;
|
||||
|
||||
if ( !dd.setContent( &configFile, false, &errorStr, &errorLine, &errorColumn ) )
|
||||
{
|
||||
printf( "Error: %s at %d,%d\n", errorStr.toLocal8Bit().constData(), errorLine, errorColumn );
|
||||
throw exMalformedConfigFile();
|
||||
}
|
||||
|
||||
configFile.close();
|
||||
|
||||
QDomNode root = dd.namedItem( "config" );
|
||||
|
||||
Class c;
|
||||
|
||||
QDomNode paths = root.namedItem( "paths" );
|
||||
|
||||
if ( !paths.isNull() )
|
||||
{
|
||||
QDomNodeList nl = paths.toElement().elementsByTagName( "path" );
|
||||
|
||||
for( unsigned x = 0; x < nl.length(); ++x )
|
||||
c.paths.push_back( nl.item( x ).toElement().text() );
|
||||
}
|
||||
|
||||
QDomNode groups = root.namedItem( "groups" );
|
||||
|
||||
if ( !groups.isNull() )
|
||||
{
|
||||
QDomNodeList nl = groups.toElement().elementsByTagName( "group" );
|
||||
|
||||
for( unsigned x = 0; x < nl.length(); ++x )
|
||||
{
|
||||
QDomElement grp = nl.item( x ).toElement();
|
||||
|
||||
Group g;
|
||||
|
||||
g.name = grp.attribute( "name" );
|
||||
g.icon = grp.attribute( "icon" );
|
||||
|
||||
QDomNodeList dicts = grp.elementsByTagName( "dictionary" );
|
||||
|
||||
for( unsigned y = 0; y < dicts.length(); ++y )
|
||||
g.dictionaries.push_back( dicts.item( y ).toElement().text() );
|
||||
|
||||
c.groups.push_back( g );
|
||||
}
|
||||
}
|
||||
|
||||
return c;
|
||||
}
|
||||
|
||||
void save( Class const & c ) throw( exError )
|
||||
{
|
||||
QFile configFile( getConfigFileName() );
|
||||
|
||||
if ( !configFile.open( QFile::WriteOnly ) )
|
||||
throw exCantWriteConfigFile();
|
||||
|
||||
QDomDocument dd;
|
||||
|
||||
QDomElement root = dd.createElement( "config" );
|
||||
dd.appendChild( root );
|
||||
|
||||
{
|
||||
QDomElement paths = dd.createElement( "paths" );
|
||||
root.appendChild( paths );
|
||||
|
||||
for( Paths::const_iterator i = c.paths.begin(); i != c.paths.end(); ++i )
|
||||
{
|
||||
QDomElement path = dd.createElement( "path" );
|
||||
paths.appendChild( path );
|
||||
|
||||
QDomText value = dd.createTextNode( *i );
|
||||
|
||||
path.appendChild( value );
|
||||
}
|
||||
}
|
||||
|
||||
{
|
||||
QDomElement groups = dd.createElement( "groups" );
|
||||
root.appendChild( groups );
|
||||
|
||||
for( Groups::const_iterator i = c.groups.begin(); i != c.groups.end(); ++i )
|
||||
{
|
||||
QDomElement group = dd.createElement( "group" );
|
||||
groups.appendChild( group );
|
||||
|
||||
QDomAttr name = dd.createAttribute( "name" );
|
||||
|
||||
name.setValue( i->name );
|
||||
|
||||
group.setAttributeNode( name );
|
||||
|
||||
if ( i->icon.size() )
|
||||
{
|
||||
QDomAttr icon = dd.createAttribute( "icon" );
|
||||
|
||||
icon.setValue( i->icon );
|
||||
|
||||
group.setAttributeNode( icon );
|
||||
}
|
||||
|
||||
for( vector< QString >::const_iterator j = i->dictionaries.begin(); j != i->dictionaries.end(); ++j )
|
||||
{
|
||||
QDomElement dictionary = dd.createElement( "dictionary" );
|
||||
|
||||
group.appendChild( dictionary );
|
||||
|
||||
QDomText value = dd.createTextNode( *j );
|
||||
|
||||
dictionary.appendChild( value );
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
configFile.write( dd.toByteArray() );
|
||||
}
|
||||
|
||||
QString getIndexDir() throw( exError )
|
||||
{
|
||||
QDir result = getHomeDir();
|
||||
|
||||
result.mkpath( "index" );
|
||||
|
||||
if ( !result.cd( "index" ) )
|
||||
throw exCantUseIndexDir();
|
||||
|
||||
return result.path() + QDir::separator();
|
||||
}
|
||||
|
||||
QString getUserCssFileName() throw( exError )
|
||||
{
|
||||
return getHomeDir().filePath( "style.css" );
|
||||
}
|
||||
|
||||
}
|
57
src/config.hh
Normal file
|
@ -0,0 +1,57 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __CONFIG_HH_INCLUDED__
|
||||
#define __CONFIG_HH_INCLUDED__
|
||||
|
||||
#include <vector>
|
||||
#include <QString>
|
||||
#include "ex.hh"
|
||||
|
||||
/// GoldenDict's configuration
|
||||
namespace Config {
|
||||
|
||||
using std::vector;
|
||||
|
||||
/// A list of paths where to search for the dictionaries
|
||||
typedef vector< QString > Paths;
|
||||
|
||||
/// A dictionary group
|
||||
struct Group
|
||||
{
|
||||
QString name, icon;
|
||||
vector< QString > dictionaries; // consists of dictionary's ids
|
||||
};
|
||||
|
||||
/// All the groups
|
||||
typedef vector< Group > Groups;
|
||||
|
||||
struct Class
|
||||
{
|
||||
Paths paths;
|
||||
Groups groups;
|
||||
};
|
||||
|
||||
DEF_EX( exError, "Error with the program's configuration", std::exception )
|
||||
DEF_EX( exCantUseHomeDir, "Can't use home directory to store GoldenDict preferences", exError )
|
||||
DEF_EX( exCantUseIndexDir, "Can't use index directory to store GoldenDict index files", exError )
|
||||
DEF_EX( exCantReadConfigFile, "Can't read the configuration file", exError )
|
||||
DEF_EX( exCantWriteConfigFile, "Can't write the configuration file", exError )
|
||||
DEF_EX( exMalformedConfigFile, "The configuration file is malformed", exError )
|
||||
|
||||
/// Loads the configuration, or creates the default one if none is present
|
||||
Class load() throw( exError );
|
||||
|
||||
/// Saves the configuration
|
||||
void save( Class const & ) throw( exError );
|
||||
|
||||
/// Returns the index directory, where the indices are to be stored.
|
||||
QString getIndexDir() throw( exError );
|
||||
|
||||
/// Returns the user .css file name.
|
||||
QString getUserCssFileName() throw( exError );
|
||||
|
||||
}
|
||||
|
||||
#endif
|
||||
|
61
src/dictgroupwidget.ui
Normal file
|
@ -0,0 +1,61 @@
|
|||
<ui version="4.0" >
|
||||
<class>DictGroupWidget</class>
|
||||
<widget class="QWidget" name="DictGroupWidget" >
|
||||
<property name="geometry" >
|
||||
<rect>
|
||||
<x>0</x>
|
||||
<y>0</y>
|
||||
<width>403</width>
|
||||
<height>333</height>
|
||||
</rect>
|
||||
</property>
|
||||
<property name="windowTitle" >
|
||||
<string>Form</string>
|
||||
</property>
|
||||
<layout class="QVBoxLayout" name="verticalLayout" >
|
||||
<item>
|
||||
<widget class="DictListWidget" name="dictionaries" />
|
||||
</item>
|
||||
<item>
|
||||
<layout class="QHBoxLayout" name="horizontalLayout" >
|
||||
<item>
|
||||
<widget class="QLabel" name="label" >
|
||||
<property name="text" >
|
||||
<string>Group icon:</string>
|
||||
</property>
|
||||
</widget>
|
||||
</item>
|
||||
<item>
|
||||
<widget class="QComboBox" name="groupIcon" >
|
||||
<property name="sizeAdjustPolicy" >
|
||||
<enum>QComboBox::AdjustToContents</enum>
|
||||
</property>
|
||||
</widget>
|
||||
</item>
|
||||
<item>
|
||||
<spacer name="horizontalSpacer" >
|
||||
<property name="orientation" >
|
||||
<enum>Qt::Horizontal</enum>
|
||||
</property>
|
||||
<property name="sizeHint" stdset="0" >
|
||||
<size>
|
||||
<width>40</width>
|
||||
<height>20</height>
|
||||
</size>
|
||||
</property>
|
||||
</spacer>
|
||||
</item>
|
||||
</layout>
|
||||
</item>
|
||||
</layout>
|
||||
</widget>
|
||||
<customwidgets>
|
||||
<customwidget>
|
||||
<class>DictListWidget</class>
|
||||
<extends>QListWidget</extends>
|
||||
<header>groups_widgets.hh</header>
|
||||
</customwidget>
|
||||
</customwidgets>
|
||||
<resources/>
|
||||
<connections/>
|
||||
</ui>
|
77
src/dictionary.cc
Normal file
|
@ -0,0 +1,77 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include <vector>
|
||||
#include <algorithm>
|
||||
#include <cstdio>
|
||||
#include "dictionary.hh"
|
||||
#include "md5.h"
|
||||
|
||||
// For needToRebuildIndex(), read below
|
||||
#include <QFileInfo>
|
||||
#include <QDateTime>
|
||||
|
||||
namespace Dictionary {
|
||||
|
||||
Class::Class( string const & id_, vector< string > const & dictionaryFiles_ ):
|
||||
id( id_ ), dictionaryFiles( dictionaryFiles_ )
|
||||
{
|
||||
}
|
||||
|
||||
string Format::makeDictionaryId( vector< string > const & dictionaryFiles ) throw()
|
||||
{
|
||||
std::vector< string > sortedList( dictionaryFiles );
|
||||
|
||||
std::sort( sortedList.begin(), sortedList.end() );
|
||||
|
||||
md5_state_t context;
|
||||
|
||||
md5_init( &context );
|
||||
for( std::vector< string >::const_iterator i = sortedList.begin();
|
||||
i != sortedList.end(); ++i )
|
||||
md5_append( &context, (unsigned char const *)i->c_str(), i->size() + 1 );
|
||||
|
||||
unsigned char digest[ 16 ];
|
||||
|
||||
md5_finish( &context, digest );
|
||||
|
||||
char result[ sizeof( digest ) * 2 + 1 ];
|
||||
|
||||
for( unsigned x = 0; x < sizeof( digest ); ++x )
|
||||
sprintf( result + x * 2, "%02x", digest[ x ] );
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
// While this file is not supposed to have any Qt stuff since it's used by
|
||||
// the dictionary backends, there's no platform-independent way to get hold
|
||||
// of a timestamp of the file, so we use here Qt anyway. It is supposed to
|
||||
// be fixed in the future when it's needed.
|
||||
bool Format::needToRebuildIndex( vector< string > const & dictionaryFiles,
|
||||
string const & indexFile ) throw()
|
||||
{
|
||||
unsigned long lastModified = 0;
|
||||
|
||||
for( std::vector< string >::const_iterator i = dictionaryFiles.begin();
|
||||
i != dictionaryFiles.end(); ++i )
|
||||
{
|
||||
QFileInfo fileInfo( QString::fromStdString( *i ) );
|
||||
|
||||
if ( !fileInfo.exists() )
|
||||
return true;
|
||||
|
||||
unsigned long ts = fileInfo.lastModified().toTime_t();
|
||||
|
||||
if ( ts > lastModified )
|
||||
lastModified = ts;
|
||||
}
|
||||
|
||||
QFileInfo fileInfo( QString::fromStdString( indexFile ) );
|
||||
|
||||
if ( !fileInfo.exists() )
|
||||
return true;
|
||||
|
||||
return fileInfo.lastModified().toTime_t() < lastModified;
|
||||
}
|
||||
|
||||
}
|
163
src/dictionary.hh
Normal file
|
@ -0,0 +1,163 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __DICTIONARY_HH_INCLUDED__
|
||||
#define __DICTIONARY_HH_INCLUDED__
|
||||
|
||||
#include <vector>
|
||||
#include <string>
|
||||
#include <map>
|
||||
#include "sptr.hh"
|
||||
#include "ex.hh"
|
||||
|
||||
/// Abstract dictionary-related stuff
|
||||
namespace Dictionary {
|
||||
|
||||
using std::vector;
|
||||
using std::string;
|
||||
using std::wstring;
|
||||
using std::map;
|
||||
|
||||
enum Property
|
||||
{
|
||||
Author,
|
||||
Copyright,
|
||||
Description,
|
||||
Email
|
||||
};
|
||||
|
||||
DEF_EX( Ex, "Dictionary error", std::exception )
|
||||
DEF_EX( exNoSuchWord, "The given word does not exist", Ex )
|
||||
DEF_EX( exNoSuchResource, "The given resource does not exist", Ex )
|
||||
|
||||
/// A dictionary. Can be used to query words.
|
||||
class Class
|
||||
{
|
||||
string id;
|
||||
vector< string > dictionaryFiles;
|
||||
|
||||
public:
|
||||
|
||||
/// Creates a dictionary. The id should be made using
|
||||
/// Format::makeDictionaryId(), the dictionaryFiles is the file names the
|
||||
/// dictionary consists of.
|
||||
Class( string const & id, vector< string > const & dictionaryFiles );
|
||||
|
||||
/// Returns the dictionary's id.
|
||||
string getId() throw()
|
||||
{ return id; }
|
||||
|
||||
/// Returns the list of file names the dictionary consists of.
|
||||
vector< string > const & getDictionaryFilenames() throw()
|
||||
{ return dictionaryFiles; }
|
||||
|
||||
|
||||
/// Returns the dictionary's full name, utf8.
|
||||
virtual string getName() throw()=0;
|
||||
|
||||
/// Returns all the available properties, like the author's name, copyright,
|
||||
/// description etc. All strings are in utf8.
|
||||
virtual map< Property, string > getProperties() throw()=0;
|
||||
|
||||
/// Returns the number of articles in the dictionary.
|
||||
virtual unsigned long getArticleCount() throw()=0;
|
||||
|
||||
/// Returns the number of words in the dictionary. This can be equal to
|
||||
/// the number of articles, or can be larger if some synonyms are present.
|
||||
virtual unsigned long getWordCount() throw()=0;
|
||||
|
||||
/// Looks up a given word in the dictionary, aiming for exact matches. The
|
||||
/// result is a list of such matches. If it is possible to also look up words
|
||||
/// that begin with the given substring without much expense, they should be
|
||||
/// put into the prefix results (if not, it should be left empty). Not more
|
||||
/// than maxPrefixResults prefix results should be stored. The whole
|
||||
/// operation is supposed to be fast and is executed in a GUI thread.
|
||||
virtual void findExact( wstring const &,
|
||||
vector< wstring > & exactMatches,
|
||||
vector< wstring > & prefixMatches,
|
||||
unsigned long maxPrefixResults ) throw( std::exception )=0;
|
||||
|
||||
/// Finds known headwords for the given word, that is, the words for which
|
||||
/// the given word is a synonym. If a dictionary can't perform this operation,
|
||||
/// it should leave the default implementation which always returns an empty
|
||||
/// vector.
|
||||
virtual vector< wstring > findHeadwordsForSynonym( wstring const & )
|
||||
throw( std::exception )
|
||||
{ return vector< wstring >(); }
|
||||
|
||||
/// Returns a definition for the given word. The definition should
|
||||
/// be an html fragment (without html/head/body tags) in an utf8 encoding.
|
||||
/// The 'alts' vector could contain a list of words the definitions of which
|
||||
/// should be included in the output as well, being treated as additional
|
||||
/// synonyms for the main word.
|
||||
virtual string getArticle( wstring const &, vector< wstring > const & alts )
|
||||
throw( exNoSuchWord, std::exception )=0;
|
||||
|
||||
/// Loads contents of a resource named 'name' into the 'data' vector. This is
|
||||
/// usually a picture file referenced in the article or something like that.
|
||||
virtual void getResource( string const & name,
|
||||
vector< char > & data ) throw( exNoSuchResource,
|
||||
std::exception )
|
||||
{ throw exNoSuchResource(); }
|
||||
|
||||
virtual ~Class()
|
||||
{}
|
||||
};
|
||||
|
||||
/// Callbacks to be used when the dictionaries are being initialized.
|
||||
class Initializing
|
||||
{
|
||||
public:
|
||||
|
||||
/// Called by the Format instance to notify the caller that the given
|
||||
/// dictionary is being indexed. Since indexing can take some time, this
|
||||
/// is useful to show in some kind of a splash screen.
|
||||
/// The dictionaryName is in utf8.
|
||||
virtual void indexingDictionary( string const & dictionaryName ) throw()=0;
|
||||
|
||||
virtual ~Initializing()
|
||||
{}
|
||||
};
|
||||
|
||||
/// A dictionary format. This is a factory to create dictionaries' instances.
|
||||
/// It is fed filenames to check if they are dictionaries, and it creates
|
||||
/// instances when they are.
|
||||
class Format
|
||||
{
|
||||
public:
|
||||
|
||||
/// Should go through the given list of file names, trying each one as a
|
||||
/// possible dictionary of the supported format. Upon finding one, creates a
|
||||
/// corresponding dictionary instance. As a result, a list of dictionaries
|
||||
/// is created.
|
||||
/// indicesDir indicates a directory where index files can be created, should
|
||||
/// there be need for them. The index file name must be the same as the
|
||||
/// dictionary's id, made by makeDictionaryId() from the list of file names.
|
||||
/// Any exception thrown would terminate the program with an error.
|
||||
virtual vector< sptr< Class > > makeDictionaries( vector< string > const & fileNames,
|
||||
string const & indicesDir,
|
||||
Initializing & )
|
||||
throw( std::exception )=0;
|
||||
|
||||
virtual ~Format()
|
||||
{}
|
||||
|
||||
public://protected:
|
||||
|
||||
/// Generates an id based on the set of file names which the dictionary
|
||||
/// consists of. The resulting id is an alphanumeric hex value made by
|
||||
/// hashing the file names. This id should be used to identify dictionary
|
||||
/// and for the index file name, if one is needed.
|
||||
static string makeDictionaryId( vector< string > const & dictionaryFiles ) throw();
|
||||
/// Checks if it is needed to regenerate index file based on its timestamp
|
||||
/// and the timestamps of the dictionary files. If some files are newer than
|
||||
/// the index file, or the index file doesn't exist, returns true. If some
|
||||
/// dictionary files don't exist, returns true, too.
|
||||
static bool needToRebuildIndex( vector< string > const & dictionaryFiles,
|
||||
string const & indexFile ) throw();
|
||||
};
|
||||
|
||||
}
|
||||
|
||||
#endif
|
||||
|
679
src/dictzip.c
Normal file
|
@ -0,0 +1,679 @@
|
|||
/* Made up from data.c and other supplementary files of dictd-1.0.11 for the
|
||||
* GoldenDict program.
|
||||
*/
|
||||
|
||||
/* data.c --
|
||||
* Created: Tue Jul 16 12:45:41 1996 by faith@dict.org
|
||||
* Revised: Sat Mar 30 10:46:06 2002 by faith@dict.org
|
||||
* Copyright 1996, 1997, 1998, 2000, 2002 Rickard E. Faith (faith@dict.org)
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it
|
||||
* under the terms of the GNU General Public License as published by the
|
||||
* Free Software Foundation; either version 1, or (at your option) any
|
||||
* later version.
|
||||
*
|
||||
* This program is distributed in the hope that it will be useful, but
|
||||
* WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
* General Public License for more details.
|
||||
*
|
||||
* You should have received a copy of the GNU General Public License along
|
||||
* with this program; if not, write to the Free Software Foundation, Inc.,
|
||||
* 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||
*/
|
||||
|
||||
#include "dictzip.h"
|
||||
#include <limits.h>
|
||||
#include <stdarg.h>
|
||||
#include <errno.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
#define BUFFERSIZE 10240
|
||||
|
||||
#define OUT_BUFFER_SIZE 0xffffL
|
||||
|
||||
#define IN_BUFFER_SIZE ((unsigned long)((double)(OUT_BUFFER_SIZE - 12) * 0.89))
|
||||
|
||||
/* For gzip-compatible header, as defined in RFC 1952 */
|
||||
|
||||
/* Magic for GZIP (rfc1952) */
|
||||
#define GZ_MAGIC1 0x1f /* First magic byte */
|
||||
#define GZ_MAGIC2 0x8b /* Second magic byte */
|
||||
|
||||
/* FLaGs (bitmapped), from rfc1952 */
|
||||
#define GZ_FTEXT 0x01 /* Set for ASCII text */
|
||||
#define GZ_FHCRC 0x02 /* Header CRC16 */
|
||||
#define GZ_FEXTRA 0x04 /* Optional field (random access index) */
|
||||
#define GZ_FNAME 0x08 /* Original name */
|
||||
#define GZ_COMMENT 0x10 /* Zero-terminated, human-readable comment */
|
||||
#define GZ_MAX 2 /* Maximum compression */
|
||||
#define GZ_FAST 4 /* Fasted compression */
|
||||
|
||||
/* These are from rfc1952 */
|
||||
#define GZ_OS_FAT 0 /* FAT filesystem (MS-DOS, OS/2, NT/Win32) */
|
||||
#define GZ_OS_AMIGA 1 /* Amiga */
|
||||
#define GZ_OS_VMS 2 /* VMS (or OpenVMS) */
|
||||
#define GZ_OS_UNIX 3 /* Unix */
|
||||
#define GZ_OS_VMCMS 4 /* VM/CMS */
|
||||
#define GZ_OS_ATARI 5 /* Atari TOS */
|
||||
#define GZ_OS_HPFS 6 /* HPFS filesystem (OS/2, NT) */
|
||||
#define GZ_OS_MAC 7 /* Macintosh */
|
||||
#define GZ_OS_Z 8 /* Z-System */
|
||||
#define GZ_OS_CPM 9 /* CP/M */
|
||||
#define GZ_OS_TOPS20 10 /* TOPS-20 */
|
||||
#define GZ_OS_NTFS 11 /* NTFS filesystem (NT) */
|
||||
#define GZ_OS_QDOS 12 /* QDOS */
|
||||
#define GZ_OS_ACORN 13 /* Acorn RISCOS */
|
||||
#define GZ_OS_UNKNOWN 255 /* unknown */
|
||||
|
||||
#define GZ_RND_S1 'R' /* First magic for random access format */
|
||||
#define GZ_RND_S2 'A' /* Second magic for random access format */
|
||||
|
||||
#define GZ_ID1 0 /* GZ_MAGIC1 */
|
||||
#define GZ_ID2 1 /* GZ_MAGIC2 */
|
||||
#define GZ_CM 2 /* Compression Method (Z_DEFALTED) */
|
||||
#define GZ_FLG 3 /* FLaGs (see above) */
|
||||
#define GZ_MTIME 4 /* Modification TIME */
|
||||
#define GZ_XFL 8 /* eXtra FLags (GZ_MAX or GZ_FAST) */
|
||||
#define GZ_OS 9 /* Operating System */
|
||||
#define GZ_XLEN 10 /* eXtra LENgth (16bit) */
|
||||
#define GZ_FEXTRA_START 12 /* Start of extra fields */
|
||||
#define GZ_SI1 12 /* Subfield ID1 */
|
||||
#define GZ_SI2 13 /* Subfield ID2 */
|
||||
#define GZ_SUBLEN 14 /* Subfield length (16bit) */
|
||||
#define GZ_VERSION 16 /* Version for subfield format */
|
||||
#define GZ_CHUNKLEN 18 /* Chunk length (16bit) */
|
||||
#define GZ_CHUNKCNT 20 /* Number of chunks (16bit) */
|
||||
#define GZ_RNDDATA 22 /* Random access data (16bit) */
|
||||
|
||||
|
||||
#define DBG_VERBOSE (0<<30|1<< 0) /* Verbose */
|
||||
#define DBG_ZIP (0<<30|1<< 1) /* Zip */
|
||||
#define DBG_UNZIP (0<<30|1<< 2) /* Unzip */
|
||||
#define DBG_SEARCH (0<<30|1<< 3) /* Search */
|
||||
#define DBG_SCAN (0<<30|1<< 4) /* Config file scan */
|
||||
#define DBG_PARSE (0<<30|1<< 5) /* Config file parse */
|
||||
#define DBG_INIT (0<<30|1<< 6) /* Database initialization */
|
||||
#define DBG_PORT (0<<30|1<< 7) /* Log port number for connections */
|
||||
#define DBG_LEV (0<<30|1<< 8) /* Levenshtein matching */
|
||||
#define DBG_AUTH (0<<30|1<< 9) /* Debug authentication */
|
||||
#define DBG_NODETACH (0<<30|1<<10) /* Don't detach as a background proc. */
|
||||
#define DBG_NOFORK (0<<30|1<<11) /* Don't fork (single threaded) */
|
||||
#define DBG_ALT (0<<30|1<<12) /* altcompare() */
|
||||
|
||||
#define LOG_SERVER (0<<30|1<< 0) /* Log server diagnostics */
|
||||
#define LOG_CONNECT (0<<30|1<< 1) /* Log connection information */
|
||||
#define LOG_STATS (0<<30|1<< 2) /* Log termination information */
|
||||
#define LOG_COMMAND (0<<30|1<< 3) /* Log commands */
|
||||
#define LOG_FOUND (0<<30|1<< 4) /* Log words found */
|
||||
#define LOG_NOTFOUND (0<<30|1<< 5) /* Log words not found */
|
||||
#define LOG_CLIENT (0<<30|1<< 6) /* Log client */
|
||||
#define LOG_HOST (0<<30|1<< 7) /* Log remote host name */
|
||||
#define LOG_TIMESTAMP (0<<30|1<< 8) /* Log with timestamps */
|
||||
#define LOG_MIN (0<<30|1<< 9) /* Log a few minimal things */
|
||||
#define LOG_AUTH (0<<30|1<<10) /* Log authentication denials */
|
||||
|
||||
#define DICT_LOG_TERM 0
|
||||
#define DICT_LOG_DEFINE 1
|
||||
#define DICT_LOG_MATCH 2
|
||||
#define DICT_LOG_NOMATCH 3
|
||||
#define DICT_LOG_CLIENT 4
|
||||
#define DICT_LOG_TRACE 5
|
||||
#define DICT_LOG_COMMAND 6
|
||||
#define DICT_LOG_AUTH 7
|
||||
#define DICT_LOG_CONNECT 8
|
||||
|
||||
#define DICT_UNKNOWN 0
|
||||
#define DICT_TEXT 1
|
||||
#define DICT_GZIP 2
|
||||
#define DICT_DZIP 3
|
||||
|
||||
/* For now, just always enable the mmap mode */
|
||||
#define HAVE_MMAP
|
||||
|
||||
#include <sys/stat.h>
|
||||
#ifdef HAVE_MMAP
|
||||
#include <sys/mman.h>
|
||||
#endif
|
||||
#include <ctype.h>
|
||||
#include <fcntl.h>
|
||||
#include <assert.h>
|
||||
#ifdef HAVE_MMAP
|
||||
#include <sys/mman.h>
|
||||
#endif
|
||||
|
||||
#include <sys/stat.h>
|
||||
|
||||
#define USE_CACHE 1
|
||||
|
||||
#ifdef HAVE_MMAP
|
||||
int mmap_mode = 1; /* dictd uses mmap() function (the default) */
|
||||
#else
|
||||
int mmap_mode = 0;
|
||||
#endif
|
||||
|
||||
#define dict_data_filter( ... )
|
||||
#define PRINTF( ... )
|
||||
|
||||
#define xmalloc malloc
|
||||
#define xfree free
|
||||
|
||||
static const char * _err_programName = "GoldenDict";
|
||||
|
||||
#define log_error( ... )
|
||||
#define log_error_va( ... )
|
||||
|
||||
static void err_fatal( const char *routine, const char *format, ... )
|
||||
{
|
||||
va_list ap;
|
||||
|
||||
fflush( stdout );
|
||||
if (_err_programName) {
|
||||
if (routine)
|
||||
fprintf( stderr, "%s (%s): ", _err_programName, routine );
|
||||
else
|
||||
fprintf( stderr, "%s: ", _err_programName );
|
||||
} else {
|
||||
if (routine) fprintf( stderr, "%s: ", routine );
|
||||
}
|
||||
|
||||
va_start( ap, format );
|
||||
vfprintf( stderr, format, ap );
|
||||
log_error_va( routine, format, ap );
|
||||
va_end( ap );
|
||||
|
||||
fflush( stderr );
|
||||
fflush( stdout );
|
||||
exit ( 1 );
|
||||
}
|
||||
|
||||
/* \doc |err_fatal_errno| flushes "stdout", prints a fatal error report on
|
||||
"stderr", prints the system error corresponding to |errno|, flushes
|
||||
"stderr" and "stdout", and calls |exit|. |routine| is the name of the
|
||||
routine in which the error took place. */
|
||||
|
||||
static void err_fatal_errno( const char *routine, const char *format, ... )
|
||||
{
|
||||
va_list ap;
|
||||
int errorno = errno;
|
||||
|
||||
fflush( stdout );
|
||||
if (_err_programName) {
|
||||
if (routine)
|
||||
fprintf( stderr, "%s (%s): ", _err_programName, routine );
|
||||
else
|
||||
fprintf( stderr, "%s: ", _err_programName );
|
||||
} else {
|
||||
if (routine) fprintf( stderr, "%s: ", routine );
|
||||
}
|
||||
|
||||
va_start( ap, format );
|
||||
vfprintf( stderr, format, ap );
|
||||
log_error_va( routine, format, ap );
|
||||
va_end( ap );
|
||||
|
||||
#if HAVE_STRERROR
|
||||
fprintf( stderr, "%s: %s\n", routine, strerror( errorno ) );
|
||||
log_error( routine, "%s: %s\n", routine, strerror( errorno ) );
|
||||
#else
|
||||
errno = errorno;
|
||||
perror( routine );
|
||||
log_error( routine, "%s: errno = %d\n", routine, errorno );
|
||||
#endif
|
||||
|
||||
fflush( stderr );
|
||||
fflush( stdout );
|
||||
exit( 1 );
|
||||
}
|
||||
|
||||
/* \doc |err_internal| flushes "stdout", prints the fatal error message,
|
||||
flushes "stderr" and "stdout", and calls |abort| so that a core dump is
|
||||
generated. */
|
||||
|
||||
static void err_internal( const char *routine, const char *format, ... )
|
||||
{
|
||||
va_list ap;
|
||||
|
||||
fflush( stdout );
|
||||
if (_err_programName) {
|
||||
if (routine)
|
||||
fprintf( stderr, "%s (%s): Internal error\n ",
|
||||
_err_programName, routine );
|
||||
else
|
||||
fprintf( stderr, "%s: Internal error\n ", _err_programName );
|
||||
} else {
|
||||
if (routine) fprintf( stderr, "%s: Internal error\n ", routine );
|
||||
else fprintf( stderr, "Internal error\n " );
|
||||
}
|
||||
|
||||
va_start( ap, format );
|
||||
vfprintf( stderr, format, ap );
|
||||
log_error( routine, format, ap );
|
||||
va_end( ap );
|
||||
|
||||
if (_err_programName)
|
||||
fprintf( stderr, "Aborting %s...\n", _err_programName );
|
||||
else
|
||||
fprintf( stderr, "Aborting...\n" );
|
||||
fflush( stderr );
|
||||
fflush( stdout );
|
||||
abort();
|
||||
}
|
||||
|
||||
static int dict_read_header( const char *filename,
|
||||
dictData *header, int computeCRC )
|
||||
{
|
||||
FILE *str;
|
||||
int id1, id2, si1, si2;
|
||||
char buffer[BUFFERSIZE];
|
||||
int extraLength, subLength;
|
||||
int i;
|
||||
char *pt;
|
||||
int c;
|
||||
struct stat sb;
|
||||
unsigned long crc = crc32( 0L, Z_NULL, 0 );
|
||||
int count;
|
||||
unsigned long offset;
|
||||
|
||||
if (!(str = fopen( filename, "r" )))
|
||||
err_fatal_errno( __func__,
|
||||
"Cannot open data file \"%s\" for read\n", filename );
|
||||
|
||||
header->filename = NULL;//str_find( filename );
|
||||
header->headerLength = GZ_XLEN - 1;
|
||||
header->type = DICT_UNKNOWN;
|
||||
|
||||
id1 = getc( str );
|
||||
id2 = getc( str );
|
||||
|
||||
if (id1 != GZ_MAGIC1 || id2 != GZ_MAGIC2) {
|
||||
header->type = DICT_TEXT;
|
||||
fstat( fileno( str ), &sb );
|
||||
header->compressedLength = header->length = sb.st_size;
|
||||
header->origFilename = NULL;//str_find( filename );
|
||||
header->mtime = sb.st_mtime;
|
||||
if (computeCRC) {
|
||||
rewind( str );
|
||||
while (!feof( str )) {
|
||||
if ((count = fread( buffer, 1, BUFFERSIZE, str ))) {
|
||||
crc = crc32( crc, buffer, count );
|
||||
}
|
||||
}
|
||||
}
|
||||
header->crc = crc;
|
||||
fclose( str );
|
||||
return 0;
|
||||
}
|
||||
header->type = DICT_GZIP;
|
||||
|
||||
header->method = getc( str );
|
||||
header->flags = getc( str );
|
||||
header->mtime = getc( str ) << 0;
|
||||
header->mtime |= getc( str ) << 8;
|
||||
header->mtime |= getc( str ) << 16;
|
||||
header->mtime |= getc( str ) << 24;
|
||||
header->extraFlags = getc( str );
|
||||
header->os = getc( str );
|
||||
|
||||
if (header->flags & GZ_FEXTRA) {
|
||||
extraLength = getc( str ) << 0;
|
||||
extraLength |= getc( str ) << 8;
|
||||
header->headerLength += extraLength + 2;
|
||||
si1 = getc( str );
|
||||
si2 = getc( str );
|
||||
|
||||
if (si1 == GZ_RND_S1 && si2 == GZ_RND_S2) {
|
||||
subLength = getc( str ) << 0;
|
||||
subLength |= getc( str ) << 8;
|
||||
header->version = getc( str ) << 0;
|
||||
header->version |= getc( str ) << 8;
|
||||
|
||||
if (header->version != 1)
|
||||
err_internal( __func__,
|
||||
"dzip header version %d not supported\n",
|
||||
header->version );
|
||||
|
||||
header->chunkLength = getc( str ) << 0;
|
||||
header->chunkLength |= getc( str ) << 8;
|
||||
header->chunkCount = getc( str ) << 0;
|
||||
header->chunkCount |= getc( str ) << 8;
|
||||
|
||||
if (header->chunkCount <= 0) {
|
||||
fclose( str );
|
||||
return 5;
|
||||
}
|
||||
header->chunks = xmalloc( sizeof( header->chunks[0] )
|
||||
* header->chunkCount );
|
||||
for (i = 0; i < header->chunkCount; i++) {
|
||||
header->chunks[i] = getc( str ) << 0;
|
||||
header->chunks[i] |= getc( str ) << 8;
|
||||
}
|
||||
header->type = DICT_DZIP;
|
||||
} else {
|
||||
fseek( str, header->headerLength, SEEK_SET );
|
||||
}
|
||||
}
|
||||
|
||||
if (header->flags & GZ_FNAME) { /* FIXME! Add checking against header len */
|
||||
pt = buffer;
|
||||
while ((c = getc( str )) && c != EOF){
|
||||
*pt++ = c;
|
||||
|
||||
if (pt == buffer + sizeof (buffer)){
|
||||
err_fatal (
|
||||
__func__,
|
||||
"too long FNAME field in dzip file \"%s\"\n", filename);
|
||||
}
|
||||
}
|
||||
|
||||
*pt = '\0';
|
||||
header->origFilename = NULL;//str_find( buffer );
|
||||
header->headerLength += strlen( buffer ) + 1;
|
||||
} else {
|
||||
header->origFilename = NULL;
|
||||
}
|
||||
|
||||
if (header->flags & GZ_COMMENT) { /* FIXME! Add checking for header len */
|
||||
pt = buffer;
|
||||
while ((c = getc( str )) && c != EOF){
|
||||
*pt++ = c;
|
||||
|
||||
if (pt == buffer + sizeof (buffer)){
|
||||
err_fatal (
|
||||
__func__,
|
||||
"too long COMMENT field in dzip file \"%s\"\n", filename);
|
||||
}
|
||||
}
|
||||
|
||||
*pt = '\0';
|
||||
header->comment = NULL;//str_find( buffer );
|
||||
header->headerLength += strlen( header->comment ) + 1;
|
||||
} else {
|
||||
header->comment = NULL;
|
||||
}
|
||||
|
||||
if (header->flags & GZ_FHCRC) {
|
||||
getc( str );
|
||||
getc( str );
|
||||
header->headerLength += 2;
|
||||
}
|
||||
|
||||
if (ftell( str ) != header->headerLength + 1)
|
||||
err_internal( __func__,
|
||||
"File position (%lu) != header length + 1 (%d)\n",
|
||||
ftell( str ), header->headerLength + 1 );
|
||||
|
||||
fseek( str, -8, SEEK_END );
|
||||
header->crc = getc( str ) << 0;
|
||||
header->crc |= getc( str ) << 8;
|
||||
header->crc |= getc( str ) << 16;
|
||||
header->crc |= getc( str ) << 24;
|
||||
header->length = getc( str ) << 0;
|
||||
header->length |= getc( str ) << 8;
|
||||
header->length |= getc( str ) << 16;
|
||||
header->length |= getc( str ) << 24;
|
||||
header->compressedLength = ftell( str );
|
||||
|
||||
/* Compute offsets */
|
||||
header->offsets = xmalloc( sizeof( header->offsets[0] )
|
||||
* header->chunkCount );
|
||||
for (offset = header->headerLength + 1, i = 0;
|
||||
i < header->chunkCount;
|
||||
i++)
|
||||
{
|
||||
header->offsets[i] = offset;
|
||||
offset += header->chunks[i];
|
||||
}
|
||||
|
||||
fclose( str );
|
||||
return 0;
|
||||
}
|
||||
|
||||
dictData *dict_data_open( const char *filename, int computeCRC )
|
||||
{
|
||||
dictData *h = NULL;
|
||||
struct stat sb;
|
||||
int j;
|
||||
|
||||
if (!filename)
|
||||
return NULL;
|
||||
|
||||
h = xmalloc( sizeof( struct dictData ) );
|
||||
|
||||
memset( h, 0, sizeof( struct dictData ) );
|
||||
h->initialized = 0;
|
||||
|
||||
if (dict_read_header( filename, h, computeCRC )) {
|
||||
err_fatal( __func__,
|
||||
"\"%s\" not in text or dzip format\n", filename );
|
||||
}
|
||||
|
||||
if ((h->fd = open( filename, O_RDONLY )) < 0)
|
||||
err_fatal_errno( __func__,
|
||||
"Cannot open data file \"%s\"\n", filename );
|
||||
if (fstat( h->fd, &sb ))
|
||||
err_fatal_errno( __func__,
|
||||
"Cannot stat data file \"%s\"\n", filename );
|
||||
h->size = sb.st_size;
|
||||
|
||||
if (mmap_mode){
|
||||
#ifdef HAVE_MMAP
|
||||
h->start = mmap( NULL, h->size, PROT_READ, MAP_SHARED, h->fd, 0 );
|
||||
if ((void *)h->start == (void *)(-1))
|
||||
err_fatal_errno(
|
||||
__func__,
|
||||
"Cannot mmap data file \"%s\"\n", filename );
|
||||
#else
|
||||
err_fatal (__func__, "This should not happen");
|
||||
#endif
|
||||
}else{
|
||||
h->start = xmalloc (h->size);
|
||||
if (-1 == read (h->fd, (char *) h->start, h->size))
|
||||
err_fatal_errno (
|
||||
__func__,
|
||||
"Cannot read data file \"%s\"\n", filename );
|
||||
|
||||
close (h -> fd);
|
||||
h -> fd = 0;
|
||||
}
|
||||
|
||||
h->end = h->start + h->size;
|
||||
|
||||
for (j = 0; j < DICT_CACHE_SIZE; j++) {
|
||||
h->cache[j].chunk = -1;
|
||||
h->cache[j].stamp = -1;
|
||||
h->cache[j].inBuffer = NULL;
|
||||
h->cache[j].count = 0;
|
||||
}
|
||||
|
||||
return h;
|
||||
}
|
||||
|
||||
void dict_data_close( dictData *header )
|
||||
{
|
||||
int i;
|
||||
|
||||
if (!header)
|
||||
return;
|
||||
|
||||
if (header->fd >= 0) {
|
||||
if (mmap_mode){
|
||||
#ifdef HAVE_MMAP
|
||||
munmap( (void *)header->start, header->size );
|
||||
close( header->fd );
|
||||
header->fd = 0;
|
||||
header->start = header->end = NULL;
|
||||
#else
|
||||
err_fatal (__func__, "This should not happen");
|
||||
#endif
|
||||
}else{
|
||||
if (header -> start)
|
||||
xfree ((char *) header -> start);
|
||||
}
|
||||
}
|
||||
|
||||
if (header->chunks) xfree( header->chunks );
|
||||
if (header->offsets) xfree( header->offsets );
|
||||
|
||||
if (header->initialized) {
|
||||
if (inflateEnd( &header->zStream ))
|
||||
err_internal( __func__,
|
||||
"Cannot shut down inflation engine: %s\n",
|
||||
header->zStream.msg );
|
||||
}
|
||||
|
||||
for (i = 0; i < DICT_CACHE_SIZE; ++i){
|
||||
if (header -> cache [i].inBuffer)
|
||||
xfree (header -> cache [i].inBuffer);
|
||||
}
|
||||
|
||||
memset( header, 0, sizeof( struct dictData ) );
|
||||
xfree( header );
|
||||
}
|
||||
|
||||
char *dict_data_read_ (
|
||||
dictData *h, unsigned long start, unsigned long size,
|
||||
const char *preFilter, const char *postFilter )
|
||||
{
|
||||
char *buffer, *pt;
|
||||
unsigned long end;
|
||||
int count;
|
||||
char *inBuffer;
|
||||
char outBuffer[OUT_BUFFER_SIZE];
|
||||
int firstChunk, lastChunk;
|
||||
int firstOffset, lastOffset;
|
||||
int i, j;
|
||||
int found, target, lastStamp;
|
||||
static int stamp = 0;
|
||||
|
||||
end = start + size;
|
||||
|
||||
buffer = xmalloc( size + 1 );
|
||||
|
||||
PRINTF(DBG_UNZIP,
|
||||
("dict_data_read( %p, %lu, %lu, %s, %s )\n",
|
||||
h, start, size, preFilter, postFilter ));
|
||||
|
||||
assert( h != NULL);
|
||||
switch (h->type) {
|
||||
case DICT_GZIP:
|
||||
err_fatal( __func__,
|
||||
"Cannot seek on pure gzip format files.\n"
|
||||
"Use plain text (for performance)"
|
||||
" or dzip format (for space savings).\n" );
|
||||
break;
|
||||
case DICT_TEXT:
|
||||
memcpy( buffer, h->start + start, size );
|
||||
buffer[size] = '\0';
|
||||
break;
|
||||
case DICT_DZIP:
|
||||
if (!h->initialized) {
|
||||
++h->initialized;
|
||||
h->zStream.zalloc = NULL;
|
||||
h->zStream.zfree = NULL;
|
||||
h->zStream.opaque = NULL;
|
||||
h->zStream.next_in = 0;
|
||||
h->zStream.avail_in = 0;
|
||||
h->zStream.next_out = NULL;
|
||||
h->zStream.avail_out = 0;
|
||||
if (inflateInit2( &h->zStream, -15 ) != Z_OK)
|
||||
err_internal( __func__,
|
||||
"Cannot initialize inflation engine: %s\n",
|
||||
h->zStream.msg );
|
||||
}
|
||||
firstChunk = start / h->chunkLength;
|
||||
firstOffset = start - firstChunk * h->chunkLength;
|
||||
lastChunk = end / h->chunkLength;
|
||||
lastOffset = end - lastChunk * h->chunkLength;
|
||||
PRINTF(DBG_UNZIP,
|
||||
(" start = %lu, end = %lu\n"
|
||||
"firstChunk = %d, firstOffset = %d,"
|
||||
" lastChunk = %d, lastOffset = %d\n",
|
||||
start, end, firstChunk, firstOffset, lastChunk, lastOffset ));
|
||||
for (pt = buffer, i = firstChunk; i <= lastChunk; i++) {
|
||||
|
||||
/* Access cache */
|
||||
found = 0;
|
||||
target = 0;
|
||||
lastStamp = INT_MAX;
|
||||
for (j = 0; j < DICT_CACHE_SIZE; j++) {
|
||||
#if USE_CACHE
|
||||
if (h->cache[j].chunk == i) {
|
||||
found = 1;
|
||||
target = j;
|
||||
break;
|
||||
}
|
||||
#endif
|
||||
if (h->cache[j].stamp < lastStamp) {
|
||||
lastStamp = h->cache[j].stamp;
|
||||
target = j;
|
||||
}
|
||||
}
|
||||
|
||||
h->cache[target].stamp = ++stamp;
|
||||
if (found) {
|
||||
count = h->cache[target].count;
|
||||
inBuffer = h->cache[target].inBuffer;
|
||||
} else {
|
||||
h->cache[target].chunk = i;
|
||||
if (!h->cache[target].inBuffer)
|
||||
h->cache[target].inBuffer = xmalloc( IN_BUFFER_SIZE );
|
||||
inBuffer = h->cache[target].inBuffer;
|
||||
|
||||
if (h->chunks[i] >= OUT_BUFFER_SIZE ) {
|
||||
err_internal( __func__,
|
||||
"h->chunks[%d] = %d >= %ld (OUT_BUFFER_SIZE)\n",
|
||||
i, h->chunks[i], OUT_BUFFER_SIZE );
|
||||
}
|
||||
memcpy( outBuffer, h->start + h->offsets[i], h->chunks[i] );
|
||||
dict_data_filter( outBuffer, &count, OUT_BUFFER_SIZE, preFilter );
|
||||
|
||||
h->zStream.next_in = outBuffer;
|
||||
h->zStream.avail_in = h->chunks[i];
|
||||
h->zStream.next_out = inBuffer;
|
||||
h->zStream.avail_out = IN_BUFFER_SIZE;
|
||||
if (inflate( &h->zStream, Z_PARTIAL_FLUSH ) != Z_OK)
|
||||
err_fatal( __func__, "inflate: %s\n", h->zStream.msg );
|
||||
if (h->zStream.avail_in)
|
||||
err_internal( __func__,
|
||||
"inflate did not flush (%d pending, %d avail)\n",
|
||||
h->zStream.avail_in, h->zStream.avail_out );
|
||||
|
||||
count = IN_BUFFER_SIZE - h->zStream.avail_out;
|
||||
dict_data_filter( inBuffer, &count, IN_BUFFER_SIZE, postFilter );
|
||||
|
||||
h->cache[target].count = count;
|
||||
}
|
||||
|
||||
if (i == firstChunk) {
|
||||
if (i == lastChunk) {
|
||||
memcpy( pt, inBuffer + firstOffset, lastOffset-firstOffset);
|
||||
pt += lastOffset - firstOffset;
|
||||
} else {
|
||||
if (count != h->chunkLength )
|
||||
err_internal( __func__,
|
||||
"Length = %d instead of %d\n",
|
||||
count, h->chunkLength );
|
||||
memcpy( pt, inBuffer + firstOffset,
|
||||
h->chunkLength - firstOffset );
|
||||
pt += h->chunkLength - firstOffset;
|
||||
}
|
||||
} else if (i == lastChunk) {
|
||||
memcpy( pt, inBuffer, lastOffset );
|
||||
pt += lastOffset;
|
||||
} else {
|
||||
assert( count == h->chunkLength );
|
||||
memcpy( pt, inBuffer, h->chunkLength );
|
||||
pt += h->chunkLength;
|
||||
}
|
||||
}
|
||||
*pt = '\0';
|
||||
break;
|
||||
case DICT_UNKNOWN:
|
||||
err_fatal( __func__, "Cannot read unknown file type\n" );
|
||||
break;
|
||||
}
|
||||
|
||||
return buffer;
|
||||
}
|
97
src/dictzip.h
Normal file
|
@ -0,0 +1,97 @@
|
|||
/* Made up from data.h and other supplementary files of dictd-1.0.11 for the
|
||||
* GoldenDict program.
|
||||
*/
|
||||
|
||||
/* data.h --
|
||||
* Created: Sat Mar 15 18:04:25 2003 by Aleksey Cheusov <vle@gmx.net>
|
||||
* Copyright 1994-2003 Rickard E. Faith (faith@dict.org)
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify it
|
||||
* under the terms of the GNU General Public License as published by the
|
||||
* Free Software Foundation; either version 1, or (at your option) any
|
||||
* later version.
|
||||
*
|
||||
* This program is distributed in the hope that it will be useful, but
|
||||
* WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
||||
* General Public License for more details.
|
||||
*
|
||||
* You should have received a copy of the GNU General Public License along
|
||||
* with this program; if not, write to the Free Software Foundation, Inc.,
|
||||
* 675 Mass Ave, Cambridge, MA 02139, USA.
|
||||
*/
|
||||
|
||||
#ifndef _DICTZIP_H_
|
||||
#define _DICTZIP_H_
|
||||
|
||||
#include <stdio.h>
|
||||
#include <zlib.h>
|
||||
|
||||
#ifdef __cplusplus
|
||||
extern "C"
|
||||
{
|
||||
#endif
|
||||
|
||||
|
||||
/* Excerpts from defs.h */
|
||||
|
||||
#define DICT_CACHE_SIZE 5
|
||||
|
||||
typedef struct dictCache {
|
||||
int chunk;
|
||||
char *inBuffer;
|
||||
int stamp;
|
||||
int count;
|
||||
} dictCache;
|
||||
|
||||
typedef struct dictData {
|
||||
int fd; /* file descriptor */
|
||||
const char *start; /* start of mmap'd area */
|
||||
const char *end; /* end of mmap'd area */
|
||||
unsigned long size; /* size of mmap */
|
||||
|
||||
int type;
|
||||
const char *filename;
|
||||
z_stream zStream;
|
||||
int initialized;
|
||||
|
||||
int headerLength;
|
||||
int method;
|
||||
int flags;
|
||||
time_t mtime;
|
||||
int extraFlags;
|
||||
int os;
|
||||
int version;
|
||||
int chunkLength;
|
||||
int chunkCount;
|
||||
int *chunks;
|
||||
unsigned long *offsets; /* Sum-scan of chunks. */
|
||||
const char *origFilename;
|
||||
const char *comment;
|
||||
unsigned long crc;
|
||||
unsigned long length;
|
||||
unsigned long compressedLength;
|
||||
dictCache cache[DICT_CACHE_SIZE];
|
||||
} dictData;
|
||||
|
||||
|
||||
/* initialize .data file */
|
||||
extern dictData *dict_data_open (
|
||||
const char *filename, int computeCRC);
|
||||
/* */
|
||||
extern void dict_data_close (
|
||||
dictData *data);
|
||||
|
||||
extern char *dict_data_read_ (
|
||||
dictData *data,
|
||||
unsigned long start, unsigned long end,
|
||||
const char *preFilter,
|
||||
const char *postFilter );
|
||||
|
||||
extern int mmap_mode;
|
||||
|
||||
#ifdef __cplusplus
|
||||
} /* end extern "C" */
|
||||
#endif
|
||||
|
||||
#endif /* _DICTZIP_H_ */
|
986
src/dsl.cc
Normal file
|
@ -0,0 +1,986 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include "dsl.hh"
|
||||
#include "dsl_details.hh"
|
||||
#include "btreeidx.hh"
|
||||
#include "folding.hh"
|
||||
#include "utf8.hh"
|
||||
#include "chunkedstorage.hh"
|
||||
#include "dictzip.h"
|
||||
#include "htmlescape.hh"
|
||||
#include "iconv.hh"
|
||||
#include "filetype.hh"
|
||||
#include "fsencoding.hh"
|
||||
#include <zlib.h>
|
||||
#include <map>
|
||||
#include <set>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include <list>
|
||||
#include <wctype.h>
|
||||
|
||||
// For TIFF conversion
|
||||
#include <QImage>
|
||||
#include <QByteArray>
|
||||
#include <QBuffer>
|
||||
|
||||
namespace Dsl {
|
||||
|
||||
using namespace Details;
|
||||
|
||||
using std::map;
|
||||
using std::multimap;
|
||||
using std::pair;
|
||||
using std::set;
|
||||
using std::string;
|
||||
using std::wstring;
|
||||
using std::vector;
|
||||
using std::list;
|
||||
|
||||
using BtreeIndexing::WordArticleLink;
|
||||
using BtreeIndexing::IndexedWords;
|
||||
|
||||
namespace {
|
||||
|
||||
DEF_EX_STR( exCantReadFile, "Can't read file", Dictionary::Ex )
|
||||
|
||||
enum
|
||||
{
|
||||
Signature = 0x584c5344, // DSLX on little-endian, XLSD on big-endian
|
||||
CurrentFormatVersion = 6 + BtreeIndexing::FormatVersion + Folding::Version
|
||||
};
|
||||
|
||||
struct IdxHeader
|
||||
{
|
||||
uint32_t signature; // First comes the signature, DSLX
|
||||
uint32_t formatVersion; // File format version (CurrentFormatVersion)
|
||||
int dslEncoding; // Which encoding is used for the file indexed
|
||||
uint32_t chunksOffset; // The offset to chunks' storage
|
||||
uint32_t hasAbrv; // Non-zero means file has abrvs at abrvAddress
|
||||
uint32_t abrvAddress; // Address of abrv map in the chunked storage
|
||||
uint32_t indexOffset; // The offset of the index in the file
|
||||
} __attribute__((packed));
|
||||
|
||||
bool indexIsOldOrBad( string const & indexFile )
|
||||
{
|
||||
File::Class idx( indexFile, "rb" );
|
||||
|
||||
IdxHeader header;
|
||||
|
||||
return idx.readRecords( &header, sizeof( header ), 1 ) != 1 ||
|
||||
header.signature != Signature ||
|
||||
header.formatVersion != CurrentFormatVersion;
|
||||
}
|
||||
|
||||
class DslDictionary: public BtreeIndexing::BtreeDictionary
|
||||
{
|
||||
File::Class idx;
|
||||
IdxHeader idxHeader;
|
||||
ChunkedStorage::Reader chunks;
|
||||
string dictionaryName;
|
||||
map< string, string > abrv;
|
||||
dictData * dz;
|
||||
|
||||
public:
|
||||
|
||||
DslDictionary( string const & id, string const & indexFile,
|
||||
vector< string > const & dictionaryFiles );
|
||||
|
||||
~DslDictionary();
|
||||
|
||||
virtual string getName() throw()
|
||||
{ return dictionaryName; }
|
||||
|
||||
virtual map< Dictionary::Property, string > getProperties() throw()
|
||||
{ return map< Dictionary::Property, string >(); }
|
||||
|
||||
virtual unsigned long getArticleCount() throw()
|
||||
{ return 0; }
|
||||
|
||||
virtual unsigned long getWordCount() throw()
|
||||
{ return 0; }
|
||||
|
||||
virtual vector< wstring > findHeadwordsForSynonym( wstring const & )
|
||||
throw( std::exception )
|
||||
{
|
||||
return vector< wstring >();
|
||||
}
|
||||
|
||||
virtual string getArticle( wstring const &, vector< wstring > const & alts )
|
||||
throw( Dictionary::exNoSuchWord, std::exception );
|
||||
|
||||
virtual void getResource( string const & name,
|
||||
vector< char > & data )
|
||||
throw( Dictionary::exNoSuchResource, std::exception );
|
||||
|
||||
private:
|
||||
|
||||
/// Loads the article. Does not process the DSL language.
|
||||
void loadArticle( uint32_t address,
|
||||
string & headword,
|
||||
list< wstring > & displayedHeadwords,
|
||||
wstring & articleText );
|
||||
|
||||
/// Converts DSL language to an Html.
|
||||
string dslToHtml( wstring const & );
|
||||
|
||||
// Parts of dslToHtml()
|
||||
string nodeToHtml( ArticleDom::Node const & );
|
||||
string processNodeChildren( ArticleDom::Node const & node );
|
||||
};
|
||||
|
||||
DslDictionary::DslDictionary( string const & id,
|
||||
string const & indexFile,
|
||||
vector< string > const & dictionaryFiles ):
|
||||
BtreeDictionary( id, dictionaryFiles ),
|
||||
idx( indexFile, "rb" ),
|
||||
idxHeader( idx.read< IdxHeader >() ),
|
||||
chunks( idx, idxHeader.chunksOffset )
|
||||
{
|
||||
// Open the .dict file
|
||||
|
||||
dz = dict_data_open( dictionaryFiles[ 0 ].c_str(), 0 );
|
||||
|
||||
if ( !dz )
|
||||
throw exCantReadFile( dictionaryFiles[ 0 ] );
|
||||
|
||||
// Read the dictionary name
|
||||
|
||||
idx.seek( sizeof( idxHeader ) );
|
||||
|
||||
vector< char > dName( idx.read< uint32_t >() );
|
||||
idx.read( &dName.front(), dName.size() );
|
||||
dictionaryName = string( &dName.front(), dName.size() );
|
||||
|
||||
// Read the abrv, if any
|
||||
|
||||
if ( idxHeader.hasAbrv )
|
||||
{
|
||||
vector< char > chunk;
|
||||
|
||||
char * abrvBlock = chunks.getBlock( idxHeader.abrvAddress, chunk );
|
||||
|
||||
uint32_t total;
|
||||
memcpy( &total, abrvBlock, sizeof( uint32_t ) );
|
||||
abrvBlock += sizeof( uint32_t );
|
||||
|
||||
printf( "Loading %u abbrv\n", total );
|
||||
|
||||
while( total-- )
|
||||
{
|
||||
uint32_t keySz;
|
||||
memcpy( &keySz, abrvBlock, sizeof( uint32_t ) );
|
||||
abrvBlock += sizeof( uint32_t );
|
||||
|
||||
char * key = abrvBlock;
|
||||
|
||||
abrvBlock += keySz;
|
||||
|
||||
uint32_t valueSz;
|
||||
memcpy( &valueSz, abrvBlock, sizeof( uint32_t ) );
|
||||
abrvBlock += sizeof( uint32_t );
|
||||
|
||||
abrv[ string( key, keySz ) ] = string( abrvBlock, valueSz );
|
||||
|
||||
abrvBlock += valueSz;
|
||||
}
|
||||
}
|
||||
|
||||
// Initialize the index
|
||||
|
||||
idx.seek( idxHeader.indexOffset );
|
||||
|
||||
openIndex( idx );
|
||||
}
|
||||
|
||||
DslDictionary::~DslDictionary()
|
||||
{
|
||||
if ( dz )
|
||||
dict_data_close( dz );
|
||||
}
|
||||
|
||||
void DslDictionary::loadArticle( uint32_t address,
|
||||
string & headword,
|
||||
list< wstring > & displayedHeadwords,
|
||||
wstring & articleText )
|
||||
{
|
||||
wstring articleData;
|
||||
|
||||
{
|
||||
vector< char > chunk;
|
||||
|
||||
char * articleProps = chunks.getBlock( address, chunk );
|
||||
|
||||
uint32_t articleOffset, articleSize;
|
||||
|
||||
memcpy( &articleOffset, articleProps, sizeof( articleOffset ) );
|
||||
memcpy( &articleSize, articleProps + sizeof( articleOffset ),
|
||||
sizeof( articleSize ) );
|
||||
|
||||
printf( "offset = %x\n", articleOffset );
|
||||
|
||||
char * articleBody = dict_data_read_( dz, articleOffset, articleSize, 0, 0 );
|
||||
|
||||
if ( !articleBody )
|
||||
throw exCantReadFile( getDictionaryFilenames()[ 0 ] );
|
||||
|
||||
try
|
||||
{
|
||||
articleData =
|
||||
DslIconv::toWstring(
|
||||
DslIconv::getEncodingNameFor( DslEncoding( idxHeader.dslEncoding ) ),
|
||||
articleBody, articleSize );
|
||||
}
|
||||
catch( ... )
|
||||
{
|
||||
free( articleBody );
|
||||
throw;
|
||||
}
|
||||
}
|
||||
|
||||
size_t pos = articleData.find_first_of( L"\n\r" );
|
||||
|
||||
if ( pos == wstring::npos )
|
||||
pos = articleData.size();
|
||||
|
||||
wstring firstHeadword( articleData, 0, pos );
|
||||
|
||||
printf( "first headword = %ls\n", firstHeadword.c_str() );
|
||||
|
||||
// Make a headword
|
||||
{
|
||||
wstring str( firstHeadword );
|
||||
list< wstring > lst;
|
||||
|
||||
processUnsortedParts( str, true );
|
||||
expandOptionalParts( str, lst );
|
||||
|
||||
headword = Utf8::encode( lst.front() );
|
||||
}
|
||||
|
||||
// Generate displayed headwords
|
||||
|
||||
displayedHeadwords.clear();
|
||||
|
||||
processUnsortedParts( firstHeadword, false );
|
||||
expandOptionalParts( firstHeadword, displayedHeadwords );
|
||||
|
||||
// Now skip alts until we reach the body itself
|
||||
while ( pos != articleData.size() )
|
||||
{
|
||||
if ( articleData[ pos ] == '\r' )
|
||||
++pos;
|
||||
|
||||
if ( pos != articleData.size() )
|
||||
{
|
||||
if ( articleData[ pos ] == '\n' )
|
||||
++pos;
|
||||
}
|
||||
|
||||
if ( pos != articleData.size() && !iswblank( articleData[ pos ] ) )
|
||||
{
|
||||
// Skip any alt headwords
|
||||
pos = articleData.find_first_of( L"\n\r", pos );
|
||||
|
||||
if ( pos == wstring::npos )
|
||||
pos = articleData.size();
|
||||
}
|
||||
else
|
||||
break;
|
||||
}
|
||||
|
||||
if ( pos != articleData.size() )
|
||||
articleText = wstring( articleData, pos );
|
||||
else
|
||||
articleText = L"";
|
||||
}
|
||||
|
||||
string DslDictionary::dslToHtml( wstring const & str )
|
||||
{
|
||||
ArticleDom dom( str );
|
||||
|
||||
string html = processNodeChildren( dom.root );
|
||||
|
||||
// Lines seem to indicate paragraphs in Dsls, so we enclose each line within
|
||||
// a <p></p>.
|
||||
|
||||
for( size_t x = html.size(); x--; )
|
||||
if ( html[ x ] == '\n' )
|
||||
html.insert( x + 1, "</p><p>" );
|
||||
|
||||
return "<!-- DSL Source:\n" + Utf8::encode( str ) + "\n-->"
|
||||
"<p>" + html + "</p>";
|
||||
}
|
||||
|
||||
string DslDictionary::processNodeChildren( ArticleDom::Node const & node )
|
||||
{
|
||||
string result;
|
||||
|
||||
for( ArticleDom::Node::const_iterator i = node.begin(); i != node.end();
|
||||
++i )
|
||||
result += nodeToHtml( *i );
|
||||
|
||||
return result;
|
||||
}
|
||||
string DslDictionary::nodeToHtml( ArticleDom::Node const & node )
|
||||
{
|
||||
if ( !node.isTag )
|
||||
return Html::escape( Utf8::encode( node.text ) );
|
||||
|
||||
string result;
|
||||
|
||||
if ( node.tagName == L"b" )
|
||||
result += "<b class=\"dsl_b\">" + processNodeChildren( node ) + "</b>";
|
||||
else
|
||||
if ( node.tagName == L"i" )
|
||||
result += "<i class=\"dsl_i\">" + processNodeChildren( node ) + "</i>";
|
||||
else
|
||||
if ( node.tagName == L"u" )
|
||||
result += "<span class=\"dsl_u\">" + processNodeChildren( node ) + "</span>";
|
||||
else
|
||||
if ( node.tagName == L"c" )
|
||||
{
|
||||
result += "<font color=\"" + ( node.tagAttrs.size() ?
|
||||
Html::escape( Utf8::encode( node.tagAttrs ) ) : string( "c_default_color" ) )
|
||||
+ "\">" + processNodeChildren( node ) + "</font>";
|
||||
}
|
||||
else
|
||||
if ( node.tagName == L"*" )
|
||||
result += "<span class=\"dsl_opt\">" + processNodeChildren( node ) + "</span>";
|
||||
else
|
||||
if ( node.tagName.size() == 2 && node.tagName[ 0 ] == L'm' &&
|
||||
iswdigit( node.tagName[ 1 ] ) )
|
||||
result += "<div class=\"dsl_" + Utf8::encode( node.tagName ) + "\">" + processNodeChildren( node ) + "</div>";
|
||||
else
|
||||
if ( node.tagName == L"trn" )
|
||||
result += "<span class=\"dsl_trn\">" + processNodeChildren( node ) + "</span>";
|
||||
else
|
||||
if ( node.tagName == L"ex" )
|
||||
result += "<span class=\"dsl_ex\">" + processNodeChildren( node ) + "</span>";
|
||||
else
|
||||
if ( node.tagName == L"com" )
|
||||
result += "<span class=\"dsl_com\">" + processNodeChildren( node ) + "</span>";
|
||||
else
|
||||
if ( node.tagName == L"s" )
|
||||
{
|
||||
string filename = Utf8::encode( node.renderAsText() );
|
||||
|
||||
if ( Filetype::isNameOfSound( filename ) )
|
||||
{
|
||||
// If we have the file here, do the exact reference to this dictionary.
|
||||
// Otherwise, make a global 'search' one.
|
||||
|
||||
string n =
|
||||
FsEncoding::dirname( getDictionaryFilenames()[ 0 ] ) +
|
||||
FsEncoding::separator() +
|
||||
FsEncoding::encode( filename );
|
||||
|
||||
bool search = true;
|
||||
|
||||
try
|
||||
{
|
||||
File::Class f( n, "r" );
|
||||
|
||||
search = false;
|
||||
}
|
||||
catch( File::Ex & )
|
||||
{
|
||||
}
|
||||
|
||||
string ref = "\"gdau://" + ( search ? string( "search" ) : getId() ) +
|
||||
"/" + Html::escape( filename ) +"\"";
|
||||
|
||||
result += "<span class=\"dsl_s_wav\"><a href=" + ref
|
||||
+ "><img src=\"qrcx://localhost/icons/playsound.png\" border=\"0\" align=\"absmiddle\" alt=\"Play\"/></a></span>";
|
||||
}
|
||||
else
|
||||
if ( Filetype::isNameOfPicture( filename ) )
|
||||
{
|
||||
result += "<img src=\"bres://" + getId() + "/" + Html::escape( filename )
|
||||
+ "\" alt=\"" + Html::escape( filename ) + "\"/>";
|
||||
}
|
||||
else
|
||||
{
|
||||
// Unknown file type, downgrade to a hyperlink
|
||||
result += "<a class=\"dsl_s\" href=\"bres://" + getId() + "/" + Html::escape( filename )
|
||||
+ "\">" + processNodeChildren( node ) + "</a>";
|
||||
}
|
||||
}
|
||||
else
|
||||
if ( node.tagName == L"url" )
|
||||
result += "<a class=\"dsl_url\" href=\"" + Html::escape( Utf8::encode( node.renderAsText() ) ) +"\">" + processNodeChildren( node ) + "</a>";
|
||||
else
|
||||
if ( node.tagName == L"!trs" )
|
||||
result += "<span class=\"dsl_trs\">" + processNodeChildren( node ) + "</span>";
|
||||
else
|
||||
if ( node.tagName == L"p" )
|
||||
{
|
||||
result += "<span class=\"dsl_p\"";
|
||||
|
||||
string val = Utf8::encode( node.renderAsText() );
|
||||
|
||||
// If we have such a key, display a title
|
||||
|
||||
map< string, string >::const_iterator i = abrv.find( val );
|
||||
|
||||
if ( i != abrv.end() )
|
||||
result += " title=\"" + Html::escape( i->second ) + "\"";
|
||||
|
||||
result += ">" + processNodeChildren( node ) + "</span>";
|
||||
}
|
||||
else
|
||||
if ( node.tagName == L"'" )
|
||||
{
|
||||
result += "<span class=\"dsl_stress\">" + processNodeChildren( node ) + "<span class=\"dsl_stacc\">" + Utf8::encode( wstring( 1, 0x301 ) ) + "</span></span>";
|
||||
}
|
||||
else
|
||||
if ( node.tagName == L"lang" )
|
||||
{
|
||||
result += "<span class=\"dsl_lang\">" + processNodeChildren( node ) + "</span>";
|
||||
}
|
||||
else
|
||||
if ( node.tagName == L"ref" )
|
||||
{
|
||||
result += "<a class=\"dsl_ref\" href=\"bword://" + Html::escape( Utf8::encode( node.renderAsText() ) ) +"\">" + processNodeChildren( node ) + "</a>";
|
||||
}
|
||||
else
|
||||
if ( node.tagName == L"sub" )
|
||||
{
|
||||
result += "<sub>" + processNodeChildren( node ) + "</sub>";
|
||||
}
|
||||
else
|
||||
if ( node.tagName == L"sup" )
|
||||
{
|
||||
result += "<sup>" + processNodeChildren( node ) + "</sup>";
|
||||
}
|
||||
else
|
||||
if ( node.tagName == L"t" )
|
||||
{
|
||||
result += "<span class=\"dsl_t\">" + processNodeChildren( node ) + "</span>";
|
||||
}
|
||||
else
|
||||
result += "<span class=\"dsl_unknown\">" + processNodeChildren( node ) + "</span>";
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
#if 0
|
||||
vector< wstring > StardictDictionary::findHeadwordsForSynonym( wstring const & str )
|
||||
throw( std::exception )
|
||||
{
|
||||
vector< wstring > result;
|
||||
|
||||
vector< WordArticleLink > chain = findArticles( str );
|
||||
|
||||
wstring caseFolded = Folding::applySimpleCaseOnly( str );
|
||||
|
||||
for( unsigned x = 0; x < chain.size(); ++x )
|
||||
{
|
||||
string headword, articleText;
|
||||
|
||||
loadArticle( chain[ x ].articleOffset,
|
||||
headword, articleText );
|
||||
|
||||
wstring headwordDecoded = Utf8::decode( headword );
|
||||
|
||||
if ( caseFolded != Folding::applySimpleCaseOnly( headwordDecoded ) )
|
||||
{
|
||||
// The headword seems to differ from the input word, which makes the
|
||||
// input word its synonym.
|
||||
result.push_back( headwordDecoded );
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
string DslDictionary::getArticle( wstring const & word,
|
||||
vector< wstring > const & alts )
|
||||
throw( Dictionary::exNoSuchWord, std::exception )
|
||||
{
|
||||
vector< WordArticleLink > chain = findArticles( word );
|
||||
|
||||
for( unsigned x = 0; x < alts.size(); ++x )
|
||||
{
|
||||
/// Make an additional query for each alt
|
||||
|
||||
vector< WordArticleLink > altChain = findArticles( alts[ x ] );
|
||||
|
||||
chain.insert( chain.end(), altChain.begin(), altChain.end() );
|
||||
}
|
||||
|
||||
multimap< wstring, string > mainArticles, alternateArticles;
|
||||
|
||||
set< uint32_t > articlesIncluded; // Some synonims make it that the articles
|
||||
// appear several times. We combat this
|
||||
// by only allowing them to appear once.
|
||||
|
||||
wstring wordCaseFolded = Folding::applySimpleCaseOnly( word );
|
||||
|
||||
for( unsigned x = 0; x < chain.size(); ++x )
|
||||
{
|
||||
if ( articlesIncluded.find( chain[ x ].articleOffset ) != articlesIncluded.end() )
|
||||
continue; // We already have this article in the body.
|
||||
|
||||
// Now grab that article
|
||||
|
||||
string headword;
|
||||
|
||||
list< wstring > displayedHeadwords;
|
||||
wstring articleBody;
|
||||
|
||||
loadArticle( chain[ x ].articleOffset, headword, displayedHeadwords,
|
||||
articleBody );
|
||||
|
||||
string articleText;
|
||||
|
||||
articleText += "<span class=\"dsl_article\">";
|
||||
articleText += "<div class=\"dsl_headwords\">";
|
||||
|
||||
for( list< wstring >::const_iterator i = displayedHeadwords.begin();
|
||||
i != displayedHeadwords.end(); ++i )
|
||||
articleText += dslToHtml( *i );
|
||||
|
||||
articleText += "</div>";
|
||||
|
||||
if ( displayedHeadwords.size() )
|
||||
expandTildes( articleBody, displayedHeadwords.front() );
|
||||
|
||||
articleText += "<div class=\"dsl_definition\">";
|
||||
articleText += dslToHtml( articleBody );
|
||||
articleText += "</div>";
|
||||
articleText += "</span>";
|
||||
|
||||
// Ok. Now, does it go to main articles, or to alternate ones? We list
|
||||
// main ones first, and alternates after.
|
||||
|
||||
// We do the case-folded comparison here.
|
||||
|
||||
wstring headwordStripped =
|
||||
Folding::applySimpleCaseOnly( Utf8::decode( headword ) );
|
||||
|
||||
multimap< wstring, string > & mapToUse =
|
||||
( wordCaseFolded == headwordStripped ) ?
|
||||
mainArticles : alternateArticles;
|
||||
|
||||
mapToUse.insert( pair< wstring, string >(
|
||||
Folding::applySimpleCaseOnly( Utf8::decode( headword ) ),
|
||||
articleText ) );
|
||||
|
||||
articlesIncluded.insert( chain[ x ].articleOffset );
|
||||
}
|
||||
|
||||
if ( mainArticles.empty() && alternateArticles.empty() )
|
||||
throw Dictionary::exNoSuchWord();
|
||||
|
||||
string result;
|
||||
|
||||
multimap< wstring, string >::const_iterator i;
|
||||
|
||||
for( i = mainArticles.begin(); i != mainArticles.end(); ++i )
|
||||
result += i->second;
|
||||
|
||||
for( i = alternateArticles.begin(); i != alternateArticles.end(); ++i )
|
||||
result += i->second;
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
void DslDictionary::getResource( string const & name,
|
||||
vector< char > & data )
|
||||
throw( Dictionary::exNoSuchResource, std::exception )
|
||||
{
|
||||
string n =
|
||||
FsEncoding::dirname( getDictionaryFilenames()[ 0 ] ) +
|
||||
FsEncoding::separator() +
|
||||
FsEncoding::encode( name );
|
||||
|
||||
printf( "n is %s\n", n.c_str() );
|
||||
|
||||
try
|
||||
{
|
||||
File::Class f( n, "r" );
|
||||
|
||||
f.seekEnd();
|
||||
|
||||
data.resize( f.tell() );
|
||||
|
||||
f.rewind();
|
||||
|
||||
f.read( &data.front(), data.size() );
|
||||
|
||||
if ( Filetype::isNameOfTiff( name ) )
|
||||
{
|
||||
// Convert it
|
||||
|
||||
QImage img = QImage::fromData( (unsigned char *) &data.front(),
|
||||
data.size() );
|
||||
|
||||
if ( img.isNull() )
|
||||
{
|
||||
// Failed to load, return data as is
|
||||
return;
|
||||
}
|
||||
|
||||
QByteArray ba;
|
||||
QBuffer buffer( &ba );
|
||||
buffer.open( QIODevice::WriteOnly );
|
||||
img.save( &buffer, "BMP" );
|
||||
|
||||
data.resize( buffer.size() );
|
||||
|
||||
memcpy( &data.front(), buffer.data(), data.size() );
|
||||
}
|
||||
}
|
||||
catch( File::Ex & )
|
||||
{
|
||||
throw Dictionary::exNoSuchResource();
|
||||
}
|
||||
}
|
||||
|
||||
} // anonymous namespace
|
||||
|
||||
static bool tryPossibleName( string const & name, string & copyTo )
|
||||
{
|
||||
try
|
||||
{
|
||||
File::Class f( name, "rb" );
|
||||
|
||||
copyTo = name;
|
||||
|
||||
return true;
|
||||
}
|
||||
catch( ... )
|
||||
{
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
#if 0
|
||||
static void findCorrespondingFiles( string const & ifo,
|
||||
string & idx, string & dict, string & syn,
|
||||
bool needSyn )
|
||||
{
|
||||
string base( ifo, 0, ifo.size() - 3 );
|
||||
|
||||
if ( !(
|
||||
tryPossibleName( base + "idx", idx ) ||
|
||||
tryPossibleName( base + "idx.gz", idx ) ||
|
||||
tryPossibleName( base + "idx.dz", idx ) ||
|
||||
tryPossibleName( base + "IDX", idx ) ||
|
||||
tryPossibleName( base + "IDX.GZ", idx ) ||
|
||||
tryPossibleName( base + "IDX.DZ", idx )
|
||||
) )
|
||||
throw exNoIdxFile( ifo );
|
||||
|
||||
if ( !(
|
||||
tryPossibleName( base + "dict", dict ) ||
|
||||
tryPossibleName( base + "dict.dz", dict ) ||
|
||||
tryPossibleName( base + "DICT", dict ) ||
|
||||
tryPossibleName( base + "dict.DZ", dict )
|
||||
) )
|
||||
throw exNoDictFile( ifo );
|
||||
|
||||
if ( needSyn && !(
|
||||
tryPossibleName( base + "syn", syn ) ||
|
||||
tryPossibleName( base + "syn.gz", syn ) ||
|
||||
tryPossibleName( base + "syn.dz", syn ) ||
|
||||
tryPossibleName( base + "SYN", syn ) ||
|
||||
tryPossibleName( base + "SYN.GZ", syn ) ||
|
||||
tryPossibleName( base + "SYN.DZ", syn )
|
||||
) )
|
||||
throw exNoSynFile( ifo );
|
||||
}
|
||||
#endif
|
||||
|
||||
vector< sptr< Dictionary::Class > > Format::makeDictionaries(
|
||||
vector< string > const & fileNames,
|
||||
string const & indicesDir,
|
||||
Dictionary::Initializing & initializing )
|
||||
throw( std::exception )
|
||||
{
|
||||
vector< sptr< Dictionary::Class > > dictionaries;
|
||||
|
||||
for( vector< string >::const_iterator i = fileNames.begin(); i != fileNames.end();
|
||||
++i )
|
||||
{
|
||||
// Try .dsl and .dsl.dz suffixes
|
||||
|
||||
if ( ( i->size() < 4 ||
|
||||
strcasecmp( i->c_str() + ( i->size() - 4 ), ".dsl" ) != 0 ) &&
|
||||
( i->size() < 7 ||
|
||||
strcasecmp( i->c_str() + ( i->size() - 7 ), ".dsl.dz" ) != 0 ) )
|
||||
continue;
|
||||
|
||||
try
|
||||
{
|
||||
vector< string > dictFiles( 1, *i );
|
||||
|
||||
// Check if there is an 'abrv' file present
|
||||
string baseName = ( (*i)[ i->size() - 4 ] == '.' ) ?
|
||||
string( *i, 0, i->size() - 4 ) : string( *i, 0, i->size() - 7 );
|
||||
|
||||
string abrvFileName;
|
||||
|
||||
if ( tryPossibleName( baseName + "_abrv.dsl", abrvFileName ) ||
|
||||
tryPossibleName( baseName + "_abrv.dsl.dz", abrvFileName ) ||
|
||||
tryPossibleName( baseName + "_ABRV.DSL", abrvFileName ) ||
|
||||
tryPossibleName( baseName + "_ABRV.DSL.DZ", abrvFileName ) ||
|
||||
tryPossibleName( baseName + "_ABRV.DSL.dz", abrvFileName ) )
|
||||
dictFiles.push_back( abrvFileName );
|
||||
|
||||
string dictId = makeDictionaryId( dictFiles );
|
||||
|
||||
string indexFile = indicesDir + dictId;
|
||||
|
||||
if ( needToRebuildIndex( dictFiles, indexFile ) ||
|
||||
indexIsOldOrBad( indexFile ) )
|
||||
{
|
||||
DslScanner scanner( *i );
|
||||
|
||||
if ( scanner.getDictionaryName() == L"Abbrev" )
|
||||
continue; // For now just skip abbreviations
|
||||
|
||||
// Building the index
|
||||
initializing.indexingDictionary( Utf8::encode( scanner.getDictionaryName() ) );
|
||||
|
||||
printf( "Dictionary name: %ls\n", scanner.getDictionaryName().c_str() );
|
||||
|
||||
File::Class idx( indexFile, "wb" );
|
||||
|
||||
IdxHeader idxHeader;
|
||||
|
||||
memset( &idxHeader, 0, sizeof( idxHeader ) );
|
||||
|
||||
// We write a dummy header first. At the end of the process the header
|
||||
// will be rewritten with the right values.
|
||||
|
||||
idx.write( idxHeader );
|
||||
|
||||
string dictionaryName = Utf8::encode( scanner.getDictionaryName() );
|
||||
|
||||
idx.write( (uint32_t) dictionaryName.size() );
|
||||
idx.write( dictionaryName.data(), dictionaryName.size() );
|
||||
|
||||
idxHeader.dslEncoding = scanner.getEncoding();
|
||||
|
||||
IndexedWords indexedWords;
|
||||
|
||||
ChunkedStorage::Writer chunks( idx );
|
||||
|
||||
// Read the abbreviations
|
||||
|
||||
if ( abrvFileName.size() )
|
||||
{
|
||||
try
|
||||
{
|
||||
DslScanner abrvScanner( abrvFileName );
|
||||
|
||||
map< string, string > abrv;
|
||||
|
||||
wstring curString;
|
||||
size_t curOffset;
|
||||
|
||||
for( ; ; )
|
||||
{
|
||||
// Skip any whitespace
|
||||
if ( !abrvScanner.readNextLine( curString, curOffset ) )
|
||||
break;
|
||||
if ( curString.empty() || iswblank( curString[ 0 ] ) )
|
||||
continue;
|
||||
|
||||
string key = Utf8::encode( curString );
|
||||
|
||||
if ( !abrvScanner.readNextLine( curString, curOffset ) )
|
||||
{
|
||||
fprintf( stderr, "Warning: premature end of file %s\n", abrvFileName.c_str() );
|
||||
break;
|
||||
}
|
||||
|
||||
if ( curString.empty() || !iswblank( curString[ 0 ] ) )
|
||||
{
|
||||
fprintf( stderr, "Warning: malformed file %s\n", abrvFileName.c_str() );
|
||||
break;
|
||||
}
|
||||
|
||||
curString.erase( 0, curString.find_first_not_of( L" \t" ) );
|
||||
|
||||
abrv[ key ] = Utf8::encode( curString );
|
||||
}
|
||||
|
||||
idxHeader.hasAbrv = 1;
|
||||
idxHeader.abrvAddress = chunks.startNewBlock();
|
||||
|
||||
uint32_t sz = abrv.size();
|
||||
|
||||
chunks.addToBlock( &sz, sizeof( uint32_t ) );
|
||||
|
||||
for( map< string, string >::const_iterator i = abrv.begin();
|
||||
i != abrv.end(); ++i )
|
||||
{
|
||||
printf( "%s:%s\n", i->first.c_str(), i->second.c_str() );
|
||||
|
||||
sz = i->first.size();
|
||||
chunks.addToBlock( &sz, sizeof( uint32_t ) );
|
||||
chunks.addToBlock( i->first.data(), sz );
|
||||
sz = i->second.size();
|
||||
chunks.addToBlock( &sz, sizeof( uint32_t ) );
|
||||
chunks.addToBlock( i->second.data(), sz );
|
||||
}
|
||||
}
|
||||
catch( std::exception & e )
|
||||
{
|
||||
fprintf( stderr, "Error reading abrv file %s: %s. Skipping it.\n",
|
||||
abrvFileName.c_str(), e.what() );
|
||||
}
|
||||
}
|
||||
|
||||
bool hasString = false;
|
||||
wstring curString;
|
||||
size_t curOffset;
|
||||
|
||||
for( ; ; )
|
||||
{
|
||||
// Find the main headword
|
||||
|
||||
if ( !hasString && !scanner.readNextLine( curString, curOffset ) )
|
||||
break; // Clean end of file
|
||||
|
||||
hasString = false;
|
||||
|
||||
// The line read should either consist of pure whitespace, or be a
|
||||
// headword
|
||||
|
||||
if ( curString.empty() )
|
||||
continue;
|
||||
|
||||
if ( iswblank( curString[ 0 ] ) )
|
||||
{
|
||||
// The first character is blank. Let's make sure that all other
|
||||
// characters are blank, too.
|
||||
for( size_t x = 1; x < curString.size(); ++x )
|
||||
{
|
||||
if ( !iswblank( curString[ x ] ) )
|
||||
{
|
||||
fprintf( stderr, "Warning: garbage string in %s at offset 0x%X\n", i->c_str(), curOffset );
|
||||
break;
|
||||
}
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// Ok, got the headword
|
||||
|
||||
list< wstring > allEntryWords;
|
||||
|
||||
processUnsortedParts( curString, true );
|
||||
expandOptionalParts( curString, allEntryWords );
|
||||
|
||||
uint32_t articleOffset = curOffset;
|
||||
|
||||
//printf( "Headword: %ls\n", curString.c_str() );
|
||||
|
||||
// More headwords may follow
|
||||
|
||||
for( ; ; )
|
||||
{
|
||||
if ( ! ( hasString = scanner.readNextLine( curString, curOffset ) ) )
|
||||
{
|
||||
fprintf( stderr, "Warning: premature end of file %s\n", i->c_str() );
|
||||
exit( 0 );
|
||||
break;
|
||||
}
|
||||
|
||||
if ( curString.empty() || iswblank( curString[ 0 ] ) )
|
||||
break; // No more headwords
|
||||
|
||||
printf( "Alt headword: %ls\n", curString.c_str() );
|
||||
|
||||
processUnsortedParts( curString, true );
|
||||
expandTildes( curString, allEntryWords.front() );
|
||||
expandOptionalParts( curString, allEntryWords );
|
||||
}
|
||||
|
||||
if ( !hasString )
|
||||
break;
|
||||
|
||||
// Insert new entry
|
||||
|
||||
uint32_t descOffset = chunks.startNewBlock();
|
||||
|
||||
chunks.addToBlock( &articleOffset, sizeof( articleOffset ) );
|
||||
|
||||
for( list< wstring >::iterator j = allEntryWords.begin();
|
||||
j != allEntryWords.end(); ++j )
|
||||
{
|
||||
unescapeDsl( *j );
|
||||
wstring folded = Folding::apply( *j );
|
||||
|
||||
IndexedWords::iterator e = indexedWords.insert(
|
||||
IndexedWords::value_type( folded, vector< WordArticleLink >() ) ).first;
|
||||
|
||||
// Try to conserve memory somewhat -- slow insertions are ok
|
||||
e->second.reserve( e->second.size() + 1 );
|
||||
|
||||
e->second.push_back( WordArticleLink( Utf8::encode( *j ), descOffset ) );
|
||||
}
|
||||
|
||||
// Skip the article's body
|
||||
for( ; ; )
|
||||
{
|
||||
if ( ! ( hasString = scanner.readNextLine( curString, curOffset ) ) )
|
||||
break;
|
||||
|
||||
if ( curString.size() && !iswblank( curString[ 0 ] ) )
|
||||
break;
|
||||
}
|
||||
|
||||
// Now that we're having read the first string after the article
|
||||
// itself, we can use its offset to calculate the article's size.
|
||||
// An end of file works here, too.
|
||||
|
||||
uint32_t articleSize = ( curOffset - articleOffset );
|
||||
|
||||
chunks.addToBlock( &articleSize, sizeof( articleSize ) );
|
||||
|
||||
if ( !hasString )
|
||||
break;
|
||||
}
|
||||
|
||||
// Finish with the chunks
|
||||
|
||||
idxHeader.chunksOffset = chunks.finish();
|
||||
|
||||
// Build index
|
||||
|
||||
idxHeader.indexOffset = BtreeIndexing::buildIndex( indexedWords, idx );
|
||||
|
||||
// That concludes it. Update the header.
|
||||
|
||||
idxHeader.signature = Signature;
|
||||
idxHeader.formatVersion = CurrentFormatVersion;
|
||||
|
||||
idx.rewind();
|
||||
|
||||
idx.write( &idxHeader, sizeof( idxHeader ) );
|
||||
}
|
||||
|
||||
dictionaries.push_back( new DslDictionary( dictId,
|
||||
indexFile,
|
||||
dictFiles ) );
|
||||
}
|
||||
catch( std::exception & e )
|
||||
{
|
||||
fprintf( stderr, "DSL dictionary reading failed: %s, error: %s\n",
|
||||
i->c_str(), e.what() );
|
||||
}
|
||||
}
|
||||
|
||||
return dictionaries;
|
||||
}
|
||||
|
||||
|
||||
}
|
28
src/dsl.hh
Normal file
|
@ -0,0 +1,28 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __DSL_HH_INCLUDED__
|
||||
#define __DSL_HH_INCLUDED__
|
||||
|
||||
#include "dictionary.hh"
|
||||
|
||||
/// Support for the ABBYY Lingo .DSL files.
|
||||
namespace Dsl {
|
||||
|
||||
using std::vector;
|
||||
using std::string;
|
||||
|
||||
class Format: public Dictionary::Format
|
||||
{
|
||||
public:
|
||||
|
||||
virtual vector< sptr< Dictionary::Class > > makeDictionaries(
|
||||
vector< string > const & fileNames,
|
||||
string const & indicesDir,
|
||||
Dictionary::Initializing & )
|
||||
throw( std::exception );
|
||||
};
|
||||
|
||||
}
|
||||
|
||||
#endif
|
750
src/dsl_details.cc
Normal file
|
@ -0,0 +1,750 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include "dsl_details.hh"
|
||||
#include <wctype.h>
|
||||
|
||||
namespace Dsl {
|
||||
namespace Details {
|
||||
|
||||
using std::wstring;
|
||||
using std::list;
|
||||
|
||||
/////////////// ArticleDom
|
||||
|
||||
wstring ArticleDom::Node::renderAsText() const
|
||||
{
|
||||
if ( !isTag )
|
||||
return text;
|
||||
|
||||
wstring result;
|
||||
|
||||
for( list< Node >::const_iterator i = begin(); i != end(); ++i )
|
||||
result += i->renderAsText();
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
// Returns true if src == 'm' and dest is 'mX', where X is a digit
|
||||
static inline bool checkM( wstring const & dest, wstring const & src )
|
||||
{
|
||||
return ( src == L"m" && dest.size() == 2 &&
|
||||
dest[ 0 ] == L'm' && iswdigit( dest[ 1 ] ) );
|
||||
}
|
||||
|
||||
ArticleDom::ArticleDom( wstring const & str ):
|
||||
root( Node::Tag(), L"", L"" ), stringPos( str.c_str() )
|
||||
{
|
||||
list< Node * > stack; // Currently opened tags
|
||||
|
||||
Node * textNode = 0; // A leaf node which currently accumulates text.
|
||||
|
||||
try
|
||||
{
|
||||
for( ;; )
|
||||
{
|
||||
nextChar();
|
||||
|
||||
if ( ch == L'[' && !escaped )
|
||||
{
|
||||
// Beginning of a tag.
|
||||
do
|
||||
{
|
||||
nextChar();
|
||||
} while( iswblank( ch ) );
|
||||
|
||||
bool isClosing;
|
||||
|
||||
if ( ch == L'/' && !escaped )
|
||||
{
|
||||
// A closing tag.
|
||||
isClosing = true;
|
||||
nextChar();
|
||||
}
|
||||
else
|
||||
isClosing = false;
|
||||
|
||||
// Read tag's name
|
||||
wstring name;
|
||||
|
||||
while( ( ch != L']' || escaped ) && !iswblank( ch ) )
|
||||
{
|
||||
name.push_back( ch );
|
||||
nextChar();
|
||||
}
|
||||
|
||||
while( iswblank( ch ) )
|
||||
nextChar();
|
||||
|
||||
// Read attrs
|
||||
|
||||
wstring attrs;
|
||||
|
||||
while( ch != L']' || escaped )
|
||||
{
|
||||
attrs.push_back( ch );
|
||||
nextChar();
|
||||
}
|
||||
|
||||
// Add the tag, or close it
|
||||
|
||||
if ( textNode )
|
||||
{
|
||||
// Close the currently opened text node
|
||||
stack.pop_back();
|
||||
textNode = 0;
|
||||
}
|
||||
|
||||
if ( !isClosing )
|
||||
{
|
||||
Node node( Node::Tag(), name, attrs );
|
||||
|
||||
if ( stack.empty() )
|
||||
{
|
||||
root.push_back( node );
|
||||
stack.push_back( &root.back() );
|
||||
}
|
||||
else
|
||||
{
|
||||
stack.back()->push_back( node );
|
||||
stack.push_back( &stack.back()->back() );
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
// Find the tag which is to be closed
|
||||
|
||||
list< Node * >::reverse_iterator n;
|
||||
|
||||
for( n = stack.rbegin(); n != stack.rend(); ++n )
|
||||
{
|
||||
if ( (*n)->tagName == name || checkM( (*n)->tagName, name ) )
|
||||
{
|
||||
// Found it
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if ( n != stack.rend() )
|
||||
{
|
||||
// If there is a corresponding tag, close all tags above it,
|
||||
// then close the tag itself, then reopen all the tags which got
|
||||
// closed.
|
||||
|
||||
list< Node > nodesToReopen;
|
||||
|
||||
while( stack.size() )
|
||||
{
|
||||
bool found = stack.back()->tagName == name ||
|
||||
checkM( stack.back()->tagName, name );
|
||||
|
||||
if ( !found )
|
||||
nodesToReopen.push_back( Node( Node::Tag(), stack.back()->tagName,
|
||||
stack.back()->tagAttrs ) );
|
||||
|
||||
if ( stack.back()->empty() )
|
||||
{
|
||||
// Empty nodes are deleted since they're no use
|
||||
|
||||
stack.pop_back();
|
||||
|
||||
Node * parent = stack.size() ? stack.back() : &root;
|
||||
|
||||
parent->pop_back();
|
||||
}
|
||||
else
|
||||
stack.pop_back();
|
||||
|
||||
if ( found )
|
||||
break;
|
||||
}
|
||||
|
||||
while( nodesToReopen.size() )
|
||||
{
|
||||
if ( stack.empty() )
|
||||
{
|
||||
root.push_back( nodesToReopen.back() );
|
||||
stack.push_back( &root.back() );
|
||||
}
|
||||
else
|
||||
{
|
||||
stack.back()->push_back( nodesToReopen.back() );
|
||||
stack.push_back( &stack.back()->back() );
|
||||
}
|
||||
|
||||
nodesToReopen.pop_back();
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
fprintf( stderr, "Warning: no corresponding opening tag for closing tag \"/%ls\" found.\n",
|
||||
name.c_str() );
|
||||
}
|
||||
} // if ( isClosing )
|
||||
continue;
|
||||
} // if ( ch == '[' )
|
||||
|
||||
if ( ch == L'<' && !escaped )
|
||||
{
|
||||
// Special case: the <<name>> link
|
||||
|
||||
nextChar();
|
||||
|
||||
if ( ch != L'<' || escaped )
|
||||
{
|
||||
// Ok, it's not it.
|
||||
--stringPos;
|
||||
|
||||
if ( escaped )
|
||||
{
|
||||
--stringPos;
|
||||
escaped = false;
|
||||
}
|
||||
ch = L'<';
|
||||
}
|
||||
else
|
||||
{
|
||||
// Get the link's body
|
||||
do
|
||||
{
|
||||
nextChar();
|
||||
} while( iswblank( ch ) );
|
||||
|
||||
wstring linkTo;
|
||||
|
||||
for( ; ; nextChar() )
|
||||
{
|
||||
// Is it the end?
|
||||
if ( ch == L'>' && !escaped )
|
||||
{
|
||||
nextChar();
|
||||
|
||||
if ( ch == L'>' && !escaped )
|
||||
break;
|
||||
else
|
||||
{
|
||||
linkTo.push_back( L'>' );
|
||||
linkTo.push_back( ch );
|
||||
}
|
||||
}
|
||||
else
|
||||
linkTo.push_back( ch );
|
||||
}
|
||||
|
||||
// Add the corresponding node
|
||||
|
||||
if ( textNode )
|
||||
{
|
||||
// Close the currently opened text node
|
||||
stack.pop_back();
|
||||
textNode = 0;
|
||||
}
|
||||
|
||||
Node link( Node::Tag(), L"ref", L"" );
|
||||
link.push_back( Node( Node::Text(), linkTo ) );
|
||||
|
||||
if ( stack.empty() )
|
||||
root.push_back( link );
|
||||
else
|
||||
stack.back()->push_back( link );
|
||||
|
||||
continue;
|
||||
}
|
||||
} // if ( ch == '<' )
|
||||
|
||||
// If we're here, we've got a normal symbol, to be saved as text.
|
||||
|
||||
// If there's currently no text node, open one
|
||||
if ( !textNode )
|
||||
{
|
||||
Node text( Node::Text(), L"" );
|
||||
|
||||
if ( stack.empty() )
|
||||
{
|
||||
root.push_back( text );
|
||||
stack.push_back( &root.back() );
|
||||
}
|
||||
else
|
||||
{
|
||||
stack.back()->push_back( text );
|
||||
stack.push_back( &stack.back()->back() );
|
||||
}
|
||||
|
||||
textNode = stack.back();
|
||||
}
|
||||
|
||||
textNode->text.push_back( ch );
|
||||
} // for( ; ; )
|
||||
}
|
||||
catch( eot )
|
||||
{
|
||||
}
|
||||
|
||||
if ( textNode )
|
||||
stack.pop_back();
|
||||
|
||||
if ( stack.size() )
|
||||
fprintf( stderr, "Warning: %u tags were unclosed.\n", stack.size() );
|
||||
}
|
||||
|
||||
void ArticleDom::nextChar() throw( eot )
|
||||
{
|
||||
if ( !*stringPos )
|
||||
throw eot();
|
||||
|
||||
ch = *stringPos++;
|
||||
|
||||
if ( ch == L'\\' )
|
||||
{
|
||||
if ( !*stringPos )
|
||||
throw eot();
|
||||
|
||||
ch = *stringPos++;
|
||||
|
||||
escaped = true;
|
||||
}
|
||||
else
|
||||
escaped = false;
|
||||
}
|
||||
|
||||
|
||||
/////////////// DslScanner
|
||||
|
||||
DslScanner::DslScanner( string const & fileName ) throw( Ex, Iconv::Ex ):
|
||||
encoding( Windows1252 ), iconv( encoding ), readBufferPtr( readBuffer ),
|
||||
readBufferLeft( 0 )
|
||||
{
|
||||
// Since .dz is backwards-compatible with .gz, we use gz- functions to
|
||||
// read it -- they are much nicer than the dict_data- ones.
|
||||
f = gzopen( fileName.c_str(), "rb");
|
||||
|
||||
if ( !f )
|
||||
throw exCantOpen( fileName );
|
||||
|
||||
// Now try guessing the encoding by reading the first two bytes
|
||||
|
||||
unsigned char firstBytes[ 2 ];
|
||||
|
||||
if ( gzread( f, firstBytes, sizeof( firstBytes ) ) != sizeof( firstBytes ) )
|
||||
{
|
||||
// Apparently the file's too short
|
||||
gzclose( f );
|
||||
throw exMalformedDslFile( fileName );
|
||||
}
|
||||
|
||||
bool needExactEncoding = false;
|
||||
|
||||
|
||||
// If the file begins with the dedicated Unicode marker, we just consume
|
||||
// it. If, on the other hand, it's not, we return the bytes back
|
||||
if ( firstBytes[ 0 ] == 0xFF && firstBytes[ 1 ] == 0xFE )
|
||||
encoding = Utf16LE;
|
||||
else
|
||||
if ( firstBytes[ 0 ] == 0xFE && firstBytes[ 1 ] == 0xFF )
|
||||
encoding = Utf16BE;
|
||||
else
|
||||
{
|
||||
if ( firstBytes[ 0 ] && !firstBytes[ 1 ] )
|
||||
encoding = Utf16LE;
|
||||
else
|
||||
if ( !firstBytes[ 0 ] && firstBytes[ 1 ] )
|
||||
encoding = Utf16BE;
|
||||
else
|
||||
{
|
||||
// Ok, this doesn't look like 16-bit Unicode. We will start with a
|
||||
// 8-bit encoding with an intent to find out the exact one from
|
||||
// the header.
|
||||
needExactEncoding = true;
|
||||
encoding = Windows1251;
|
||||
}
|
||||
|
||||
if ( gzrewind( f ) )
|
||||
{
|
||||
gzclose( f );
|
||||
throw exCantOpen( fileName );
|
||||
}
|
||||
}
|
||||
|
||||
iconv.reinit( encoding );
|
||||
|
||||
// We now can use our own readNextLine() function
|
||||
|
||||
wstring str;
|
||||
size_t offset;
|
||||
|
||||
for( ; ; )
|
||||
{
|
||||
if ( !readNextLine( str, offset ) )
|
||||
{
|
||||
gzclose( f );
|
||||
throw exMalformedDslFile( fileName );
|
||||
}
|
||||
|
||||
if ( str.empty() || str[ 0 ] != L'#' )
|
||||
break;
|
||||
|
||||
bool isName = false;
|
||||
|
||||
if ( !str.compare( 0, 5, L"#NAME", 5 ) )
|
||||
isName = true;
|
||||
else
|
||||
if ( str.compare( 0, 17, L"#SOURCE_CODE_PAGE", 17 ) )
|
||||
continue;
|
||||
|
||||
// Locate the argument
|
||||
|
||||
size_t beg = str.find_first_of( L'"' );
|
||||
|
||||
if ( beg == wstring::npos )
|
||||
throw exMalformedDslFile( fileName );
|
||||
|
||||
size_t end = str.find_last_of( L'"' );
|
||||
|
||||
if ( end == beg )
|
||||
throw exMalformedDslFile( fileName );
|
||||
|
||||
wstring arg( str, beg + 1, end - beg - 1 );
|
||||
|
||||
if ( isName )
|
||||
dictionaryName = arg;
|
||||
else
|
||||
{
|
||||
// The encoding
|
||||
if ( !needExactEncoding )
|
||||
{
|
||||
// We don't need that!
|
||||
fprintf( stderr, "Warning: encoding was specified in a Unicode file, ignoring.\n" );
|
||||
}
|
||||
else
|
||||
if ( !wcscasecmp( arg.c_str(), L"Latin" ) )
|
||||
encoding = Windows1252;
|
||||
else
|
||||
if ( !wcscasecmp( arg.c_str(), L"Cyrillic" ) )
|
||||
encoding = Windows1251;
|
||||
else
|
||||
if ( !wcscasecmp( arg.c_str(), L"EasternEuropean" ) )
|
||||
encoding = Windows1250;
|
||||
else
|
||||
{
|
||||
gzclose( f );
|
||||
throw exUnknownCodePage();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// The loop will always end up reading a line which was not a #-directive.
|
||||
// We need to rewind to that line so readNextLine() would return it again
|
||||
// next time it's called. To do that, we just use the slow gzseek() and
|
||||
// empty the read buffer.
|
||||
gzseek( f, offset, SEEK_SET );
|
||||
readBufferPtr = readBuffer;
|
||||
readBufferLeft = 0;
|
||||
|
||||
if ( needExactEncoding )
|
||||
iconv.reinit( encoding );
|
||||
}
|
||||
|
||||
DslScanner::~DslScanner() throw()
|
||||
{
|
||||
gzclose( f );
|
||||
}
|
||||
|
||||
bool DslScanner::readNextLine( wstring & out, size_t & offset ) throw( Ex,
|
||||
Iconv::Ex )
|
||||
{
|
||||
offset = (size_t)( gztell( f ) - readBufferLeft );
|
||||
|
||||
// For now we just read one char at a time
|
||||
size_t readMultiple = distanceToBytes( 1 );
|
||||
|
||||
size_t leftInOut = wcharBuffer.size();
|
||||
|
||||
wchar_t * outPtr = &wcharBuffer.front();
|
||||
|
||||
for( ; ; )
|
||||
{
|
||||
// Check that we have bytes to read
|
||||
if ( !readBufferLeft )
|
||||
{
|
||||
if ( gzeof( f ) )
|
||||
return false;
|
||||
|
||||
// Read some more bytes to readBuffer
|
||||
int result = gzread( f, readBuffer, sizeof( readBuffer ) );
|
||||
|
||||
if ( result == -1 )
|
||||
throw exCantReadDslFile();
|
||||
|
||||
readBufferPtr = readBuffer;
|
||||
readBufferLeft = (size_t) result;
|
||||
}
|
||||
|
||||
if ( readBufferLeft < readMultiple )
|
||||
{
|
||||
// No more data. Return what we've got so far, forget the last byte if
|
||||
// it was a 16-bit Unicode and a file had an odd number of bytes.
|
||||
readBufferLeft = 0;
|
||||
|
||||
if ( outPtr != &wcharBuffer.front() )
|
||||
{
|
||||
// If there was a stray \r, remove it
|
||||
if ( outPtr[ -1 ] == L'\r' )
|
||||
--outPtr;
|
||||
|
||||
out = wstring( &wcharBuffer.front(), outPtr - &wcharBuffer.front() );
|
||||
|
||||
return true;
|
||||
}
|
||||
else
|
||||
return false;
|
||||
}
|
||||
|
||||
// Check that we have chars to write
|
||||
if ( !leftInOut )
|
||||
{
|
||||
wcharBuffer.resize( wcharBuffer.size() + 64 );
|
||||
outPtr = &wcharBuffer.front() + wcharBuffer.size() - 64;
|
||||
leftInOut += 64;
|
||||
}
|
||||
|
||||
// Ok, now convert one char
|
||||
size_t inBytesLeft = readMultiple;
|
||||
size_t outBytesLeft = sizeof( wchar_t );
|
||||
|
||||
if ( iconv.convert( (void const *&)readBufferPtr, inBytesLeft,
|
||||
(void *&)outPtr, outBytesLeft ) !=
|
||||
Iconv::Success || inBytesLeft || outBytesLeft )
|
||||
throw exEncodingError();
|
||||
|
||||
readBufferLeft -= readMultiple;
|
||||
--leftInOut;
|
||||
|
||||
// Have we got \n?
|
||||
if ( outPtr[ -1 ] == L'\n' )
|
||||
{
|
||||
--outPtr;
|
||||
|
||||
// Now kill a \r if there is one, and return the result.
|
||||
if ( outPtr != &wcharBuffer.front() && outPtr[ -1 ] == L'\r' )
|
||||
--outPtr;
|
||||
|
||||
out = wstring( &wcharBuffer.front(), outPtr - &wcharBuffer.front() );
|
||||
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
/////////////// DslScanner
|
||||
|
||||
DslIconv::DslIconv( DslEncoding e ) throw( Iconv::Ex ):
|
||||
Iconv( Iconv::Wchar_t, getEncodingNameFor( e ) )
|
||||
{
|
||||
}
|
||||
|
||||
void DslIconv::reinit( DslEncoding e ) throw( Iconv::Ex )
|
||||
{
|
||||
Iconv::reinit( Iconv::Wchar_t, getEncodingNameFor( e ) );
|
||||
}
|
||||
|
||||
char const * DslIconv::getEncodingNameFor( DslEncoding e )
|
||||
{
|
||||
switch( e )
|
||||
{
|
||||
case Utf16LE:
|
||||
return "UTF-16LE";
|
||||
case Utf16BE:
|
||||
return "UTF-16BE";
|
||||
case Windows1252:
|
||||
return "WINDOWS-1252";
|
||||
case Windows1251:
|
||||
return "WINDOWS-1251";
|
||||
case Windows1250:
|
||||
default:
|
||||
return "WINDOWS-1250";
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
void processUnsortedParts( wstring & str, bool strip )
|
||||
{
|
||||
int refCount = 0;
|
||||
|
||||
size_t startPos = 0;
|
||||
|
||||
for( size_t x = 0; x < str.size(); )
|
||||
{
|
||||
wchar_t ch = str[ x ];
|
||||
|
||||
if ( ch == L'\\' )
|
||||
{
|
||||
// Escape code
|
||||
x += 2;
|
||||
continue;
|
||||
}
|
||||
|
||||
if ( ch == '{' )
|
||||
{
|
||||
++refCount;
|
||||
|
||||
if ( !strip )
|
||||
{
|
||||
// Just remove it and continue
|
||||
str.erase( x, 1 );
|
||||
continue;
|
||||
}
|
||||
else
|
||||
if ( refCount == 1 )
|
||||
{
|
||||
// First opening brace. Save this position, we will be erasing the
|
||||
// whole range when we encounter the last closing brace.
|
||||
startPos = x;
|
||||
}
|
||||
}
|
||||
else
|
||||
if ( ch == '}' )
|
||||
{
|
||||
--refCount;
|
||||
|
||||
if ( refCount < 0 )
|
||||
{
|
||||
fprintf( stderr, "Warning: an unmatched closing brace was encountered.\n" );
|
||||
refCount = 0;
|
||||
// But we remove that thing either way
|
||||
str.erase( x, 1 );
|
||||
continue;
|
||||
}
|
||||
|
||||
if ( !strip )
|
||||
{
|
||||
// Just remove it and continue
|
||||
str.erase( x, 1 );
|
||||
continue;
|
||||
}
|
||||
else
|
||||
if ( !refCount )
|
||||
{
|
||||
// The final closing brace -- we can erase the whole range now.
|
||||
str.erase( startPos, x - startPos + 1 );
|
||||
x = startPos;
|
||||
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
++x;
|
||||
}
|
||||
|
||||
if ( strip && refCount )
|
||||
{
|
||||
fprintf( stderr, "Warning: unclosed brace(s) encountered.\n" );
|
||||
str.erase( startPos );
|
||||
}
|
||||
}
|
||||
|
||||
void expandOptionalParts( wstring & str, list< wstring > & result,
|
||||
size_t x )
|
||||
{
|
||||
for( ; x < str.size(); )
|
||||
{
|
||||
wchar_t ch = str[ x ];
|
||||
|
||||
if ( ch == L'\\' )
|
||||
{
|
||||
// Escape code
|
||||
x += 2;
|
||||
}
|
||||
else
|
||||
if ( ch == L'(' )
|
||||
{
|
||||
// First, handle the case where this block is removed
|
||||
|
||||
{
|
||||
int refCount = 1;
|
||||
|
||||
for( size_t y = x + 1; y < str.size(); ++y )
|
||||
{
|
||||
wchar_t ch = str[ y ];
|
||||
|
||||
if ( ch == L'\\' )
|
||||
{
|
||||
// Escape code
|
||||
++y;
|
||||
}
|
||||
else
|
||||
if ( ch == L'(' )
|
||||
++refCount;
|
||||
else
|
||||
if ( ch == L')' )
|
||||
{
|
||||
if ( !--refCount )
|
||||
{
|
||||
// Now that the closing parenthesis is found,
|
||||
// cut the whole thing out and be done.
|
||||
|
||||
if ( y != x + 1 ) // Only do for non-empty cases
|
||||
{
|
||||
wstring removed( str, 0, x );
|
||||
removed.append( str, y + 1, str.size() - y - 1 );
|
||||
|
||||
expandOptionalParts( removed, result, x );
|
||||
}
|
||||
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if ( refCount && x != str.size() - 1 )
|
||||
{
|
||||
// Closing paren not found? Chop it.
|
||||
|
||||
wstring removed( str, 0, x );
|
||||
|
||||
result.push_back( removed );
|
||||
}
|
||||
}
|
||||
|
||||
// Now, handling the case where it is kept -- we just erase
|
||||
// the paren and go on
|
||||
|
||||
str.erase( x, 1 );
|
||||
}
|
||||
else
|
||||
if ( ch == L')' )
|
||||
{
|
||||
// Closing paren doesn't mean much -- just erase it
|
||||
str.erase( x, 1 );
|
||||
}
|
||||
else
|
||||
++x;
|
||||
}
|
||||
|
||||
result.push_back( str );
|
||||
}
|
||||
|
||||
void expandTildes( wstring & str, wstring const & tildeReplacement )
|
||||
{
|
||||
for( size_t x = 0; x < str.size(); )
|
||||
if ( str[ x ] == L'\\' )
|
||||
x+=2;
|
||||
else
|
||||
if ( str[ x ] == L'~' )
|
||||
{
|
||||
str.replace( x, 1, tildeReplacement );
|
||||
x += tildeReplacement.size();
|
||||
}
|
||||
else
|
||||
++x;
|
||||
}
|
||||
|
||||
void unescapeDsl( wstring & str )
|
||||
{
|
||||
for( size_t x = 0; x < str.size(); ++x )
|
||||
if ( str[ x ] == L'\\' )
|
||||
str.erase( x, 1 ); // ++x would skip the next char without processing it
|
||||
}
|
||||
|
||||
}
|
||||
}
|
171
src/dsl_details.hh
Normal file
|
@ -0,0 +1,171 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __DSL_DETAILS_HH_INCLUDED__
|
||||
#define __DSL_DETAILS_HH_INCLUDED__
|
||||
|
||||
#include <string>
|
||||
#include <list>
|
||||
#include <vector>
|
||||
#include <zlib.h>
|
||||
#include "dictionary.hh"
|
||||
#include "iconv.hh"
|
||||
|
||||
// Implementation details for Dsl, not part of its interface
|
||||
namespace Dsl {
|
||||
namespace Details {
|
||||
|
||||
using std::string;
|
||||
using std::wstring;
|
||||
using std::list;
|
||||
using std::vector;
|
||||
|
||||
// Those are possible encodings for .dsl files
|
||||
enum DslEncoding
|
||||
{
|
||||
Utf16LE,
|
||||
Utf16BE,
|
||||
Windows1252,
|
||||
Windows1251,
|
||||
Windows1250
|
||||
};
|
||||
|
||||
|
||||
/// Parses the DSL language, representing it in its structural DOM form.
|
||||
struct ArticleDom
|
||||
{
|
||||
struct Node: public list< Node >
|
||||
{
|
||||
bool isTag; // true if it is a tag with subnodes, false if it's a leaf text
|
||||
// data.
|
||||
// Those are only used if isTag is true
|
||||
wstring tagName;
|
||||
wstring tagAttrs;
|
||||
wstring text; // This is only used if isTag is false
|
||||
|
||||
class Text {};
|
||||
class Tag {};
|
||||
|
||||
Node( Tag, wstring const & name, wstring const & attrs ): isTag( true ),
|
||||
tagName( name ), tagAttrs( attrs )
|
||||
{}
|
||||
|
||||
Node( Text, wstring const & text_ ): isTag( false ), text( text_ )
|
||||
{}
|
||||
|
||||
/// Concatenates all childen text nodes recursively to form all text
|
||||
/// the node contains stripped of any markup.
|
||||
wstring renderAsText() const;
|
||||
};
|
||||
|
||||
/// Does the parse at construction. Refer to the 'root' member variable
|
||||
/// afterwards.
|
||||
ArticleDom( wstring const & );
|
||||
|
||||
/// Root of DOM's tree
|
||||
Node root;
|
||||
|
||||
private:
|
||||
|
||||
wchar_t const * stringPos;
|
||||
|
||||
class eot {};
|
||||
|
||||
wchar_t ch;
|
||||
bool escaped;
|
||||
|
||||
void nextChar() throw( eot );
|
||||
};
|
||||
|
||||
/// A adapted version of Iconv which takes Dsl encoding and decodes to wchar_t.
|
||||
class DslIconv: public Iconv
|
||||
{
|
||||
public:
|
||||
DslIconv( DslEncoding ) throw( Iconv::Ex );
|
||||
void reinit( DslEncoding ) throw( Iconv::Ex );
|
||||
|
||||
/// Returns a name to be passed to iconv for the given dsl encoding.
|
||||
static char const * getEncodingNameFor( DslEncoding );
|
||||
};
|
||||
|
||||
/// Opens the .dsl or .dsl.dz file and allows line-by-line reading. Auto-detects
|
||||
/// the encoding, and reads all headers by itself.
|
||||
class DslScanner
|
||||
{
|
||||
gzFile f;
|
||||
DslEncoding encoding;
|
||||
DslIconv iconv;
|
||||
wstring dictionaryName;
|
||||
char readBuffer[ 65536 ];
|
||||
char * readBufferPtr;
|
||||
size_t readBufferLeft;
|
||||
vector< wchar_t > wcharBuffer;
|
||||
|
||||
public:
|
||||
|
||||
DEF_EX( Ex, "Dsl scanner exception", Dictionary::Ex )
|
||||
DEF_EX_STR( exCantOpen, "Can't open .dsl file", Ex )
|
||||
DEF_EX( exCantReadDslFile, "Can't read .dsl file", Ex )
|
||||
DEF_EX_STR( exMalformedDslFile, "The .dsl file is malformed:", Ex )
|
||||
DEF_EX( exUnknownCodePage, "The .dsl file specified an unknown code page", Ex )
|
||||
DEF_EX( exEncodingError, "Encoding error", Ex ) // Should never happen really
|
||||
|
||||
DslScanner( string const & fileName ) throw( Ex, Iconv::Ex );
|
||||
~DslScanner() throw();
|
||||
|
||||
/// Returns the detected encoding of this file.
|
||||
DslEncoding getEncoding() const
|
||||
{ return encoding; }
|
||||
|
||||
/// Returns the dictionary's name, as was read from file's headers.
|
||||
wstring const & getDictionaryName() const
|
||||
{ return dictionaryName; }
|
||||
|
||||
/// Reads next line from the file. Returns true if reading succeeded --
|
||||
/// the string gets stored in the one passed, along with its physical
|
||||
/// file offset in the file (the uncompressed one if the file is compressed).
|
||||
/// If end of file is reached, false is returned.
|
||||
/// Reading begins from the first line after the headers (ones which start
|
||||
/// with #).
|
||||
bool readNextLine( wstring &, size_t & offset ) throw( Ex, Iconv::Ex );
|
||||
|
||||
/// Converts the given number of characters to the number of bytes they
|
||||
/// would occupy in the file, knowing its encoding. It's possible to know
|
||||
/// that because no multibyte encodings are supported in .dsls.
|
||||
inline size_t distanceToBytes( size_t ) const;
|
||||
};
|
||||
|
||||
/// This function either removes parts of string enclosed in braces, or leaves
|
||||
/// them intact. The braces themselves are removed always, though.
|
||||
void processUnsortedParts( wstring & str, bool strip );
|
||||
|
||||
/// Expands optional parts of a headword (ones marked with parentheses),
|
||||
/// producing all possible combinations where they are present or absent.
|
||||
void expandOptionalParts( wstring & str, list< wstring > & result,
|
||||
size_t x = 0 );
|
||||
|
||||
/// Expands all unescaped tildes, inserting tildeReplacement text instead of
|
||||
/// them.
|
||||
void expandTildes( wstring & str, wstring const & tildeReplacement );
|
||||
|
||||
// Unescapes any escaped chars. Be sure to handle all their special meanings
|
||||
// before unescaping them.
|
||||
void unescapeDsl( wstring & str );
|
||||
|
||||
|
||||
inline size_t DslScanner::distanceToBytes( size_t x ) const
|
||||
{
|
||||
switch( encoding )
|
||||
{
|
||||
case Utf16LE:
|
||||
case Utf16BE:
|
||||
return x*2;
|
||||
default:
|
||||
return x;
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
#endif
|
37
src/ex.hh
Normal file
|
@ -0,0 +1,37 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __EX_HH_INCLUDED__
|
||||
#define __EX_HH_INCLUDED__
|
||||
|
||||
#include <string>
|
||||
|
||||
/// A way to declare an exception class fast
|
||||
/// Do like this:
|
||||
/// DEF_EX( exErrorInFoo, "An error in foo encountered", std::exception )
|
||||
/// DEF_EX( exFooNotFound, "Foo was not found", exErrorInFoo )
|
||||
|
||||
#define DEF_EX( exName, exDescription, exParent ) \
|
||||
class exName: public exParent { \
|
||||
public: \
|
||||
virtual const char * what() const throw() { return (exDescription); } \
|
||||
virtual ~exName() throw() {} };
|
||||
|
||||
/// Same as DEF_EX, but takes a runtime string argument, which gets concatenated
|
||||
/// with the description.
|
||||
///
|
||||
/// DEF_EX_STR( exCantOpen, "can't open file", std::exception )
|
||||
/// ...
|
||||
/// throw exCantOpen( "example.txt" );
|
||||
///
|
||||
/// what() would return "can't open file example.txt"
|
||||
|
||||
#define DEF_EX_STR( exName, exDescription, exParent ) \
|
||||
class exName: public exParent { \
|
||||
std::string value; \
|
||||
public: \
|
||||
exName( std::string const & value_ ): value( std::string( exDescription ) + " " + value_ ) {} \
|
||||
virtual const char * what() const throw() { return value.c_str(); } \
|
||||
virtual ~exName() throw() {} };
|
||||
|
||||
#endif
|
61
src/externalviewer.cc
Normal file
|
@ -0,0 +1,61 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include "externalviewer.hh"
|
||||
#include <QDir>
|
||||
|
||||
using std::vector;
|
||||
|
||||
ExternalViewer::ExternalViewer( QObject * parent, vector< char > const & data,
|
||||
QString const & extension,
|
||||
QString const & viewerProgram_ )
|
||||
throw( exCantCreateTempFile ):
|
||||
QObject( parent ),
|
||||
tempFile( QDir::temp().filePath( QString( "gd-XXXXXXXX." ) + extension ) ),
|
||||
viewer( this ),
|
||||
viewerProgram( viewerProgram_ )
|
||||
{
|
||||
if ( !tempFile.open() || tempFile.write( &data.front(), data.size() ) != data.size() )
|
||||
throw exCantCreateTempFile();
|
||||
|
||||
tempFileName = tempFile.fileName(); // For some reason it loses it after it was closed()
|
||||
|
||||
tempFile.close();
|
||||
|
||||
printf( "%s\n", tempFile.fileName().toLocal8Bit().data() );
|
||||
|
||||
connect( &viewer, SIGNAL( finished( int, QProcess::ExitStatus ) ),
|
||||
this, SLOT( viewerFinished( int, QProcess::ExitStatus ) ) );
|
||||
|
||||
connect( this, SIGNAL( finished( ExternalViewer * ) ),
|
||||
&ExternalViewerDeleter::instance(), SLOT( deleteExternalViewer( ExternalViewer * ) ),
|
||||
Qt::QueuedConnection );
|
||||
}
|
||||
|
||||
void ExternalViewer::start() throw( exCantRunViewer )
|
||||
{
|
||||
viewer.start( viewerProgram, QStringList( tempFileName ), QIODevice::NotOpen );
|
||||
|
||||
if ( !viewer.waitForStarted() )
|
||||
throw exCantRunViewer( viewerProgram.toStdString() );
|
||||
}
|
||||
|
||||
void ExternalViewer::viewerFinished( int, QProcess::ExitStatus )
|
||||
{
|
||||
emit finished( this );
|
||||
}
|
||||
|
||||
ExternalViewerDeleter & ExternalViewerDeleter::instance()
|
||||
{
|
||||
static ExternalViewerDeleter evd( 0 );
|
||||
|
||||
return evd;
|
||||
}
|
||||
|
||||
void ExternalViewerDeleter::deleteExternalViewer( ExternalViewer * e )
|
||||
{
|
||||
printf( "Deleting external viewer\n" );
|
||||
|
||||
delete e;
|
||||
}
|
||||
|
63
src/externalviewer.hh
Normal file
|
@ -0,0 +1,63 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __EXTERNALVIEWER_HH_INCLUDED__
|
||||
#define __EXTERNALVIEWER_HH_INCLUDED__
|
||||
|
||||
#include <QObject>
|
||||
#include <QTemporaryFile>
|
||||
#include <QProcess>
|
||||
#include <vector>
|
||||
#include "ex.hh"
|
||||
|
||||
/// An external viewer, opens resources in other programs
|
||||
class ExternalViewer: public QObject
|
||||
{
|
||||
Q_OBJECT
|
||||
|
||||
QTemporaryFile tempFile;
|
||||
QProcess viewer;
|
||||
QString viewerProgram;
|
||||
QString tempFileName;
|
||||
|
||||
public:
|
||||
|
||||
DEF_EX( Ex, "External viewer exception", std::exception )
|
||||
DEF_EX( exCantCreateTempFile, "Couldn't create temporary file.", Ex )
|
||||
DEF_EX_STR( exCantRunViewer, "Couldn't run external viewer:", Ex )
|
||||
|
||||
ExternalViewer( QObject * parent, std::vector< char > const & data,
|
||||
QString const & extension, QString const & viewerProgram )
|
||||
throw( exCantCreateTempFile );
|
||||
|
||||
void start() throw( exCantRunViewer );
|
||||
|
||||
private slots:
|
||||
|
||||
void viewerFinished( int, QProcess::ExitStatus );
|
||||
|
||||
signals:
|
||||
|
||||
void finished( ExternalViewer * );
|
||||
};
|
||||
|
||||
class ExternalViewerDeleter: public QObject
|
||||
{
|
||||
Q_OBJECT
|
||||
|
||||
public:
|
||||
|
||||
static ExternalViewerDeleter & instance();
|
||||
|
||||
public slots:
|
||||
|
||||
void deleteExternalViewer( ExternalViewer * e );
|
||||
|
||||
private:
|
||||
|
||||
ExternalViewerDeleter( QObject * parent ): QObject( parent )
|
||||
{}
|
||||
};
|
||||
|
||||
#endif
|
||||
|
268
src/file.cc
Normal file
|
@ -0,0 +1,268 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include "file.hh"
|
||||
|
||||
#include <cstring>
|
||||
#include <cerrno>
|
||||
|
||||
namespace File {
|
||||
|
||||
enum
|
||||
{
|
||||
// We employ a writing buffer to considerably speed up file operations when
|
||||
// they consists of many small writes. The default size for the buffer is 64k
|
||||
WriteBufferSize = 65536
|
||||
};
|
||||
|
||||
void Class::open( char const * filename, char const * mode ) throw( exCantOpen )
|
||||
{
|
||||
f = fopen( filename, mode );
|
||||
|
||||
if ( !f )
|
||||
throw exCantOpen( std::string( filename ) + ": " + strerror( errno ) );
|
||||
}
|
||||
|
||||
Class::Class( char const * filename, char const * mode ) throw( exCantOpen ):
|
||||
writeBuffer( 0 )
|
||||
{
|
||||
open( filename, mode );
|
||||
}
|
||||
|
||||
Class::Class( std::string const & filename, char const * mode )
|
||||
throw( exCantOpen ): writeBuffer( 0 )
|
||||
{
|
||||
open( filename.c_str(), mode );
|
||||
}
|
||||
|
||||
void Class::read( void * buf, size_t size ) throw( exReadError, exWriteError )
|
||||
{
|
||||
if ( !size )
|
||||
return;
|
||||
|
||||
if ( writeBuffer )
|
||||
flushWriteBuffer();
|
||||
|
||||
size_t result = fread( buf, size, 1, f );
|
||||
|
||||
if ( result != 1 )
|
||||
throw exReadError();
|
||||
}
|
||||
|
||||
size_t Class::readRecords( void * buf, size_t size, size_t count ) throw( exWriteError )
|
||||
{
|
||||
if ( writeBuffer )
|
||||
flushWriteBuffer();
|
||||
|
||||
return fread( buf, size, count, f );
|
||||
}
|
||||
|
||||
void Class::write( void const * buf, size_t size ) throw( exWriteError )
|
||||
{
|
||||
if ( !size )
|
||||
return;
|
||||
|
||||
if ( size >= WriteBufferSize )
|
||||
{
|
||||
// If the write is large, there's not much point in buffering
|
||||
flushWriteBuffer();
|
||||
|
||||
size_t result = fwrite( buf, size, 1, f );
|
||||
|
||||
if ( result != 1 )
|
||||
throw exWriteError();
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
if ( !writeBuffer )
|
||||
{
|
||||
// Allocate the writing buffer since we don't have any yet
|
||||
writeBuffer = new char[ WriteBufferSize ];
|
||||
writeBufferLeft = WriteBufferSize;
|
||||
}
|
||||
|
||||
size_t toAdd = size < writeBufferLeft ? size : writeBufferLeft;
|
||||
|
||||
memcpy( writeBuffer + ( WriteBufferSize - writeBufferLeft ),
|
||||
buf, toAdd );
|
||||
|
||||
size -= toAdd;
|
||||
writeBufferLeft -= toAdd;
|
||||
|
||||
if ( !writeBufferLeft ) // Out of buffer? Flush it.
|
||||
{
|
||||
flushWriteBuffer();
|
||||
|
||||
if ( size ) // Something's still left? Add to buffer.
|
||||
{
|
||||
memcpy( writeBuffer, (char const *)buf + toAdd, size );
|
||||
writeBufferLeft -= size;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
size_t Class::writeRecords( void const * buf, size_t size, size_t count )
|
||||
throw( exWriteError )
|
||||
{
|
||||
flushWriteBuffer();
|
||||
|
||||
return fwrite( buf, size, count, f );
|
||||
}
|
||||
|
||||
char * Class::gets( char * s, int size, bool stripNl )
|
||||
throw( exWriteError )
|
||||
{
|
||||
if ( writeBuffer )
|
||||
flushWriteBuffer();
|
||||
|
||||
char * result = fgets( s, size, f );
|
||||
|
||||
if ( result && stripNl )
|
||||
{
|
||||
size_t len = strlen( result );
|
||||
|
||||
char * last = result + len;
|
||||
|
||||
while( len-- )
|
||||
{
|
||||
--last;
|
||||
|
||||
if ( *last == '\n' || *last == '\r' )
|
||||
*last = 0;
|
||||
else
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
std::string Class::gets( bool stripNl ) throw( exReadError, exWriteError )
|
||||
{
|
||||
char buf[ 1024 ];
|
||||
|
||||
if ( !gets( buf, sizeof( buf ), stripNl ) )
|
||||
throw exReadError();
|
||||
|
||||
return std::string( buf );
|
||||
}
|
||||
|
||||
void Class::seek( long offset ) throw( exSeekError, exWriteError )
|
||||
{
|
||||
if ( writeBuffer )
|
||||
flushWriteBuffer();
|
||||
|
||||
if ( fseek( f, offset, SEEK_SET ) != 0 )
|
||||
throw exSeekError();
|
||||
}
|
||||
|
||||
void Class::seekCur( long offset ) throw( exSeekError, exWriteError )
|
||||
{
|
||||
if ( writeBuffer )
|
||||
flushWriteBuffer();
|
||||
|
||||
if ( fseek( f, offset, SEEK_CUR ) != 0 )
|
||||
throw exSeekError();
|
||||
}
|
||||
|
||||
void Class::seekEnd( long offset ) throw( exSeekError, exWriteError )
|
||||
{
|
||||
if ( writeBuffer )
|
||||
flushWriteBuffer();
|
||||
|
||||
if ( fseek( f, offset, SEEK_END ) != 0 )
|
||||
throw exSeekError();
|
||||
}
|
||||
|
||||
void Class::rewind() throw( exSeekError, exWriteError )
|
||||
{
|
||||
seek( 0 );
|
||||
}
|
||||
|
||||
size_t Class::tell() throw( exSeekError )
|
||||
{
|
||||
long result = ftell( f );
|
||||
|
||||
if ( result == -1 )
|
||||
throw exSeekError();
|
||||
|
||||
if ( writeBuffer )
|
||||
result += ( WriteBufferSize - writeBufferLeft );
|
||||
|
||||
return ( size_t ) result;
|
||||
}
|
||||
|
||||
bool Class::eof() throw( exWriteError )
|
||||
{
|
||||
if ( writeBuffer )
|
||||
flushWriteBuffer();
|
||||
|
||||
return feof( f );
|
||||
}
|
||||
|
||||
FILE * Class::file() throw( exWriteError )
|
||||
{
|
||||
flushWriteBuffer();
|
||||
|
||||
return f;
|
||||
}
|
||||
|
||||
FILE * Class::release() throw( exWriteError )
|
||||
{
|
||||
releaseWriteBuffer();
|
||||
|
||||
FILE * c = f;
|
||||
|
||||
f = 0;
|
||||
|
||||
return c;
|
||||
}
|
||||
|
||||
void Class::close() throw( exWriteError )
|
||||
{
|
||||
fclose( release() );
|
||||
}
|
||||
|
||||
Class::~Class() throw()
|
||||
{
|
||||
if ( f )
|
||||
{
|
||||
try
|
||||
{
|
||||
releaseWriteBuffer();
|
||||
}
|
||||
catch( exWriteError & )
|
||||
{
|
||||
}
|
||||
fclose( f );
|
||||
}
|
||||
}
|
||||
|
||||
void Class::flushWriteBuffer() throw( exWriteError )
|
||||
{
|
||||
if ( writeBuffer && writeBufferLeft != WriteBufferSize )
|
||||
{
|
||||
size_t result = fwrite( writeBuffer, WriteBufferSize - writeBufferLeft, 1, f );
|
||||
|
||||
if ( result != 1 )
|
||||
throw exWriteError();
|
||||
|
||||
writeBufferLeft = WriteBufferSize;
|
||||
}
|
||||
}
|
||||
|
||||
void Class::releaseWriteBuffer() throw( exWriteError )
|
||||
{
|
||||
flushWriteBuffer();
|
||||
|
||||
if ( writeBuffer )
|
||||
{
|
||||
delete [] writeBuffer;
|
||||
|
||||
writeBuffer = 0;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
}
|
117
src/file.hh
Normal file
|
@ -0,0 +1,117 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __FILE_HH_INCLUDED__
|
||||
#define __FILE_HH_INCLUDED__
|
||||
|
||||
#include <cstdio>
|
||||
#include <string>
|
||||
#include "ex.hh"
|
||||
|
||||
/// A simple wrapper over FILE * operations with added write-buffering,
|
||||
/// used for non-Qt parts of code.
|
||||
/// It is possible to ifdef implementation details for some platforms.
|
||||
namespace File {
|
||||
|
||||
DEF_EX( Ex, "File exception", std::exception )
|
||||
DEF_EX_STR( exCantOpen, "Can't open", Ex )
|
||||
DEF_EX( exReadError, "Error reading from the file", Ex )
|
||||
DEF_EX( exWriteError, "Error writing to the file", Ex )
|
||||
DEF_EX( exSeekError, "File seek error", Ex )
|
||||
|
||||
class Class
|
||||
{
|
||||
FILE * f;
|
||||
char * writeBuffer;
|
||||
size_t writeBufferLeft;
|
||||
|
||||
void open( char const * filename, char const * mode ) throw( exCantOpen );
|
||||
|
||||
public:
|
||||
|
||||
Class( char const * filename, char const * mode ) throw( exCantOpen );
|
||||
|
||||
Class( std::string const & filename, char const * mode ) throw( exCantOpen );
|
||||
|
||||
/// Reads the number of bytes to the buffer, throws an error if it
|
||||
/// failed to fill the whole buffer (short read, i/o error etc).
|
||||
void read( void * buf, size_t size ) throw( exReadError, exWriteError );
|
||||
|
||||
template< typename T >
|
||||
void read( T & value ) throw( exReadError, exWriteError )
|
||||
{ read( &value, sizeof( value ) ); }
|
||||
|
||||
template< typename T >
|
||||
T read() throw( exReadError, exWriteError )
|
||||
{ T value; read( value ); return value; }
|
||||
|
||||
/// Attempts reading at most 'count' records sized 'size'. Returns
|
||||
/// the number of records it managed to read, up to 'count'.
|
||||
size_t readRecords( void * buf, size_t size, size_t count ) throw( exWriteError );
|
||||
|
||||
/// Writes the number of bytes from the buffer, throws an error if it
|
||||
/// failed to write the whole buffer (short write, i/o error etc).
|
||||
/// This function employs write buffering, and as such, writes may not
|
||||
/// end up on disk immediately, or a short write may occur later
|
||||
/// than it really did. If you don't want write buffering, use
|
||||
/// writeRecords() function instead.
|
||||
void write( void const * buf, size_t size ) throw( exWriteError );
|
||||
|
||||
template< typename T >
|
||||
void write( T const & value ) throw( exWriteError )
|
||||
{ write( &value, sizeof( value ) ); }
|
||||
|
||||
/// Attempts writing at most 'count' records sized 'size'. Returns
|
||||
/// the number of records it managed to write, up to 'count'.
|
||||
/// This function does not employ buffering, but flushes the buffer if it
|
||||
/// was used before.
|
||||
size_t writeRecords( void const * buf, size_t size, size_t count )
|
||||
throw( exWriteError );
|
||||
|
||||
/// Reads a string from the file. Unlike the normal fgets(), this one
|
||||
/// can strip the trailing newline character, if this was requested.
|
||||
/// Returns either s or 0 if no characters were read.
|
||||
char * gets( char * s, int size, bool stripNl = false ) throw( exWriteError );
|
||||
|
||||
/// Like the above, but uses its own local internal buffer (1024 bytes
|
||||
/// currently), and strips newlines by default.
|
||||
std::string gets( bool stripNl = true ) throw( exReadError, exWriteError );
|
||||
|
||||
/// Seeks in the file, relative to its beginning.
|
||||
void seek( long offset ) throw( exSeekError, exWriteError );
|
||||
/// Seeks in the file, relative to the current position.
|
||||
void seekCur( long offset ) throw( exSeekError, exWriteError );
|
||||
/// Seeks in the file, relative to the end of file.
|
||||
void seekEnd( long offset = 0 ) throw( exSeekError, exWriteError );
|
||||
|
||||
/// Seeks to the beginning of file
|
||||
void rewind() throw( exSeekError, exWriteError );
|
||||
|
||||
/// Tells the current position within the file, relative to its beginning.
|
||||
size_t tell() throw( exSeekError );
|
||||
|
||||
/// Returns true if end-of-file condition is set.
|
||||
bool eof() throw( exWriteError );
|
||||
|
||||
/// Returns the underlying FILE * record, so other operations can be
|
||||
/// performed on it.
|
||||
FILE * file() throw( exWriteError );
|
||||
|
||||
/// Releases the file handle out of the control of the class. No further
|
||||
/// operations are valid. The file will not be closed on destruction.
|
||||
FILE * release() throw( exWriteError );
|
||||
|
||||
/// Closes the file. No further operations are valid.
|
||||
void close() throw( exWriteError );
|
||||
|
||||
~Class() throw();
|
||||
|
||||
private:
|
||||
|
||||
void flushWriteBuffer() throw( exWriteError );
|
||||
void releaseWriteBuffer() throw( exWriteError );
|
||||
};
|
||||
|
||||
}
|
||||
|
||||
#endif
|
85
src/filetype.cc
Normal file
|
@ -0,0 +1,85 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#include "filetype.hh"
|
||||
#include <ctype.h>
|
||||
|
||||
namespace Filetype {
|
||||
|
||||
namespace {
|
||||
|
||||
|
||||
/// Removes any trailing or leading spaces and lowercases the string.
|
||||
/// The lowercasing is done simplistically, but it is enough for file
|
||||
/// extensions.
|
||||
string simplifyString( string const & str )
|
||||
{
|
||||
string result;
|
||||
|
||||
size_t beginPos = 0;
|
||||
|
||||
while( beginPos < str.size() && isblank( str[ beginPos ] ) );
|
||||
|
||||
size_t endPos = str.size();
|
||||
|
||||
while( endPos && isblank( str[ endPos - 1 ] ) )
|
||||
--endPos;
|
||||
|
||||
result.reserve( endPos - beginPos );
|
||||
|
||||
while( beginPos < endPos )
|
||||
result.push_back( tolower( str[ beginPos++ ] ) );
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/// Checks if the given string ends with the given substring
|
||||
bool endsWith( string const & str, string const & tail )
|
||||
{
|
||||
return str.size() >= tail.size() &&
|
||||
str.compare( str.size() - tail.size(), tail.size(), tail ) == 0;
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
bool isNameOfSound( string const & name )
|
||||
{
|
||||
string s = simplifyString( name );
|
||||
|
||||
return
|
||||
endsWith( s, ".wav" ) ||
|
||||
endsWith( s, ".au" ) ||
|
||||
endsWith( s, ".voc" ) ||
|
||||
endsWith( s, ".ogg" ) ||
|
||||
endsWith( s, ".mp3" );
|
||||
}
|
||||
|
||||
bool isNameOfPicture( string const & name )
|
||||
{
|
||||
string s = simplifyString( name );
|
||||
|
||||
return
|
||||
endsWith( s, ".jpg" ) ||
|
||||
endsWith( s, ".jpeg" ) ||
|
||||
endsWith( s, ".jpe" ) ||
|
||||
endsWith( s, ".png" ) ||
|
||||
endsWith( s, ".gif" ) ||
|
||||
endsWith( s, ".bmp" ) ||
|
||||
endsWith( s, ".tif" ) ||
|
||||
endsWith( s, ".tiff" ) ||
|
||||
endsWith( s, ".tga" ) ||
|
||||
endsWith( s, ".pcx" ) ||
|
||||
endsWith( s, ".ico" ) ||
|
||||
endsWith( s, ".svg" );
|
||||
}
|
||||
|
||||
bool isNameOfTiff( string const & name )
|
||||
{
|
||||
string s = simplifyString( name );
|
||||
|
||||
return
|
||||
endsWith( s, ".tif" ) ||
|
||||
endsWith( s, ".tiff" );
|
||||
}
|
||||
|
||||
}
|
28
src/filetype.hh
Normal file
|
@ -0,0 +1,28 @@
|
|||
/* This file is (c) 2008-2009 Konstantin Isakov <ikm@users.sf.net>
|
||||
* Part of GoldenDict. Licensed under GPLv3 or later, see the LICENSE file */
|
||||
|
||||
#ifndef __FILETYPE_HH_INCLUDED__
|
||||
#define __FILETYPE_HH_INCLUDED__
|
||||
|
||||
#include <string>
|
||||
|
||||
/// Utilities to guess file types based on their names.
|
||||
namespace Filetype {
|
||||
|
||||
using std::string;
|
||||
|
||||
/// Returns true if the name resembles the one of a sound file (i.e. ends
|
||||
/// with .wav, .ogg and such).
|
||||
bool isNameOfSound( string const & );
|
||||
/// Returns true if the name resembles the one of a picture file (i.e. ends
|
||||
/// with .jpg, .png and such).
|
||||
bool isNameOfPicture( string const & );
|
||||
/// Returns true if the name resembles the one of a .tiff file (i.e. ends
|
||||
/// with .tif or tiff). We have this one separately since we need to reconvert
|
||||
/// TIFF files as WebKit doesn't seem to support them.
|
||||
bool isNameOfTiff( string const & );
|
||||
|
||||
}
|
||||
|
||||
#endif
|
||||
|
251
src/flags.qrc
Normal file
|
@ -0,0 +1,251 @@
|
|||
<RCC>
|
||||
<qresource>
|
||||
<file>flags/ad.png</file>
|
||||
<file>flags/ae.png</file>
|
||||
<file>flags/af.png</file>
|
||||
<file>flags/ag.png</file>
|
||||
<file>flags/ai.png</file>
|
||||
<file>flags/al.png</file>
|
||||
<file>flags/am.png</file>
|
||||
<file>flags/an.png</file>
|
||||
<file>flags/ao.png</file>
|
||||
<file>flags/ar.png</file>
|
||||
<file>flags/as.png</file>
|
||||
<file>flags/at.png</file>
|
||||
<file>flags/au.png</file>
|
||||
<file>flags/aw.png</file>
|
||||
<file>flags/ax.png</file>
|
||||
<file>flags/az.png</file>
|
||||
<file>flags/ba.png</file>
|
||||
<file>flags/bb.png</file>
|
||||
<file>flags/bd.png</file>
|
||||
<file>flags/be.png</file>
|
||||
<file>flags/bf.png</file>
|
||||
<file>flags/bg.png</file>
|
||||
<file>flags/bh.png</file>
|
||||
<file>flags/bi.png</file>
|
||||
<file>flags/bj.png</file>
|
||||
<file>flags/bm.png</file>
|
||||
<file>flags/bn.png</file>
|
||||
<file>flags/bo.png</file>
|
||||
<file>flags/br.png</file>
|
||||
<file>flags/bs.png</file>
|
||||
<file>flags/bt.png</file>
|
||||
<file>flags/bv.png</file>
|
||||
<file>flags/bw.png</file>
|
||||
<file>flags/by.png</file>
|
||||
<file>flags/bz.png</file>
|
||||
<file>flags/ca.png</file>
|
||||
<file>flags/catalonia.png</file>
|
||||
<file>flags/cc.png</file>
|
||||
<file>flags/cd.png</file>
|
||||
<file>flags/cf.png</file>
|
||||
<file>flags/cg.png</file>
|
||||
<file>flags/ch.png</file>
|
||||
<file>flags/ci.png</file>
|
||||
<file>flags/ck.png</file>
|
||||
<file>flags/cl.png</file>
|
||||
<file>flags/cm.png</file>
|
||||
<file>flags/cn.png</file>
|
||||
<file>flags/co.png</file>
|
||||
<file>flags/cr.png</file>
|
||||
<file>flags/cs.png</file>
|
||||
<file>flags/cu.png</file>
|
||||
<file>flags/cv.png</file>
|
||||
<file>flags/cx.png</file>
|
||||
<file>flags/cy.png</file>
|
||||
<file>flags/cz.png</file>
|
||||
<file>flags/de.png</file>
|
||||
<file>flags/dj.png</file>
|
||||
<file>flags/dk.png</file>
|
||||
<file>flags/dm.png</file>
|
||||
<file>flags/do.png</file>
|
||||
<file>flags/dz.png</file>
|
||||
<file>flags/ec.png</file>
|
||||
<file>flags/ee.png</file>
|
||||
<file>flags/eg.png</file>
|
||||
<file>flags/eh.png</file>
|
||||
<file>flags/england.png</file>
|
||||
<file>flags/er.png</file>
|
||||
<file>flags/es.png</file>
|
||||
<file>flags/et.png</file>
|
||||
<file>flags/europeanunion.png</file>
|
||||
<file>flags/fam.png</file>
|
||||
<file>flags/fi.png</file>
|
||||
<file>flags/fj.png</file>
|
||||
<file>flags/fk.png</file>
|
||||
<file>flags/fm.png</file>
|
||||
<file>flags/fo.png</file>
|
||||
<file>flags/fr.png</file>
|
||||
<file>flags/ga.png</file>
|
||||
<file>flags/gb.png</file>
|
||||
<file>flags/gd.png</file>
|
||||
<file>flags/ge.png</file>
|
||||
<file>flags/gf.png</file>
|
||||
<file>flags/gh.png</file>
|
||||
<file>flags/gi.png</file>
|
||||
<file>flags/gl.png</file>
|
||||
<file>flags/gm.png</file>
|
||||
<file>flags/gn.png</file>
|
||||
<file>flags/gp.png</file>
|
||||
<file>flags/gq.png</file>
|
||||
<file>flags/gr.png</file>
|
||||
<file>flags/gs.png</file>
|
||||
<file>flags/gt.png</file>
|
||||
<file>flags/gu.png</file>
|
||||
<file>flags/gw.png</file>
|
||||
<file>flags/gy.png</file>
|
||||
<file>flags/hk.png</file>
|
||||
<file>flags/hm.png</file>
|
||||
<file>flags/hn.png</file>
|
||||
<file>flags/hr.png</file>
|
||||
<file>flags/ht.png</file>
|
||||
<file>flags/hu.png</file>
|
||||
<file>flags/id.png</file>
|
||||
<file>flags/ie.png</file>
|
||||
<file>flags/il.png</file>
|
||||
<file>flags/in.png</file>
|
||||
<file>flags/io.png</file>
|
||||
<file>flags/iq.png</file>
|
||||
<file>flags/ir.png</file>
|
||||
<file>flags/is.png</file>
|
||||
<file>flags/it.png</file>
|
||||
<file>flags/jm.png</file>
|
||||
<file>flags/jo.png</file>
|
||||
<file>flags/jp.png</file>
|
||||
<file>flags/ke.png</file>
|
||||
<file>flags/kg.png</file>
|
||||
<file>flags/kh.png</file>
|
||||
<file>flags/ki.png</file>
|
||||
<file>flags/km.png</file>
|
||||
<file>flags/kn.png</file>
|
||||
<file>flags/kp.png</file>
|
||||
<file>flags/kr.png</file>
|
||||
<file>flags/kw.png</file>
|
||||
<file>flags/ky.png</file>
|
||||
<file>flags/kz.png</file>
|
||||
<file>flags/la.png</file>
|
||||
<file>flags/lb.png</file>
|
||||
<file>flags/lc.png</file>
|
||||
<file>flags/li.png</file>
|
||||
<file>flags/lk.png</file>
|
||||
<file>flags/lr.png</file>
|
||||
<file>flags/ls.png</file>
|
||||
<file>flags/lt.png</file>
|
||||
<file>flags/lu.png</file>
|
||||
<file>flags/lv.png</file>
|
||||
<file>flags/ly.png</file>
|
||||
<file>flags/ma.png</file>
|
||||
<file>flags/mc.png</file>
|
||||
<file>flags/md.png</file>
|
||||
<file>flags/me.png</file>
|
||||
<file>flags/mg.png</file>
|
||||
<file>flags/mh.png</file>
|
||||
<file>flags/mk.png</file>
|
||||
<file>flags/ml.png</file>
|
||||
<file>flags/mm.png</file>
|
||||
<file>flags/mn.png</file>
|
||||
<file>flags/mo.png</file>
|
||||
<file>flags/mp.png</file>
|
||||
<file>flags/mq.png</file>
|
||||
<file>flags/mr.png</file>
|
||||
<file>flags/ms.png</file>
|
||||
<file>flags/mt.png</file>
|
||||
<file>flags/mu.png</file>
|
||||
<file>flags/mv.png</file>
|
||||
<file>flags/mw.png</file>
|
||||
<file>flags/mx.png</file>
|
||||
<file>flags/my.png</file>
|
||||
<file>flags/mz.png</file>
|
||||
<file>flags/na.png</file>
|
||||
<file>flags/nc.png</file>
|
||||
<file>flags/ne.png</file>
|
||||
<file>flags/nf.png</file>
|
||||
<file>flags/ng.png</file>
|
||||
<file>flags/ni.png</file>
|
||||
<file>flags/nl.png</file>
|
||||
<file>flags/no.png</file>
|
||||
<file>flags/np.png</file>
|
||||
<file>flags/nr.png</file>
|
||||
<file>flags/nu.png</file>
|
||||
<file>flags/nz.png</file>
|
||||
<file>flags/om.png</file>
|
||||
<file>flags/pa.png</file>
|
||||
<file>flags/pe.png</file>
|
||||
<file>flags/pf.png</file>
|
||||
<file>flags/pg.png</file>
|
||||
<file>flags/ph.png</file>
|
||||
<file>flags/pk.png</file>
|
||||
<file>flags/pl.png</file>
|
||||
<file>flags/pm.png</file>
|
||||
<file>flags/pn.png</file>
|
||||
<file>flags/pr.png</file>
|
||||
<file>flags/ps.png</file>
|
||||
<file>flags/pt.png</file>
|
||||
<file>flags/pw.png</file>
|
||||
<file>flags/py.png</file>
|
||||
<file>flags/qa.png</file>
|
||||
<file>flags/re.png</file>
|
||||
<file>flags/ro.png</file>
|
||||
<file>flags/rs.png</file>
|
||||
<file>flags/ru.png</file>
|
||||
<file>flags/rw.png</file>
|
||||
<file>flags/sa.png</file>
|
||||
<file>flags/sb.png</file>
|
||||
<file>flags/scotland.png</file>
|
||||
<file>flags/sc.png</file>
|
||||
<file>flags/sd.png</file>
|
||||
<file>flags/se.png</file>
|
||||
<file>flags/sg.png</file>
|
||||
<file>flags/sh.png</file>
|
||||
<file>flags/si.png</file>
|
||||
<file>flags/sj.png</file>
|
||||
<file>flags/sk.png</file>
|
||||
<file>flags/sl.png</file>
|
||||
<file>flags/sm.png</file>
|
||||
<file>flags/sn.png</file>
|
||||
<file>flags/so.png</file>
|
||||
<file>flags/sr.png</file>
|
||||
<file>flags/st.png</file>
|
||||
<file>flags/sv.png</file>
|
||||
<file>flags/sy.png</file>
|
||||
<file>flags/sz.png</file>
|
||||
<file>flags/tc.png</file>
|
||||
<file>flags/td.png</file>
|
||||
<file>flags/tf.png</file>
|
||||
<file>flags/tg.png</file>
|
||||
<file>flags/th.png</file>
|
||||
<file>flags/tj.png</file>
|
||||
<file>flags/tk.png</file>
|
||||
<file>flags/tl.png</file>
|
||||
<file>flags/tm.png</file>
|
||||
<file>flags/tn.png</file>
|
||||
<file>flags/to.png</file>
|
||||
<file>flags/tr.png</file>
|
||||
<file>flags/tt.png</file>
|
||||
<file>flags/tv.png</file>
|
||||
<file>flags/tw.png</file>
|
||||
<file>flags/tz.png</file>
|
||||
<file>flags/ua.png</file>
|
||||
<file>flags/ug.png</file>
|
||||
<file>flags/um.png</file>
|
||||
<file>flags/us.png</file>
|
||||
<file>flags/uy.png</file>
|
||||
<file>flags/uz.png</file>
|
||||
<file>flags/va.png</file>
|
||||
<file>flags/vc.png</file>
|
||||
<file>flags/ve.png</file>
|
||||
<file>flags/vg.png</file>
|
||||
<file>flags/vi.png</file>
|
||||
<file>flags/vn.png</file>
|
||||
<file>flags/vu.png</file>
|
||||
<file>flags/wales.png</file>
|
||||
<file>flags/wf.png</file>
|
||||
<file>flags/ws.png</file>
|
||||
<file>flags/ye.png</file>
|
||||
<file>flags/yt.png</file>
|
||||
<file>flags/za.png</file>
|
||||
<file>flags/zm.png</file>
|
||||
<file>flags/zw.png</file>
|
||||
</qresource>
|
||||
</RCC>
|
6
src/flags/00readme.txt
Normal file
|
@ -0,0 +1,6 @@
|
|||
These icons were taken from http://www.famfamfam.com/lab/icons/flags/
|
||||
|
||||
Licensed under the following conditions:
|
||||
|
||||
"These flag icons are available for free use for any purpose with no
|
||||
requirement for attribution."
|
BIN
src/flags/ad.png
Normal file
After Width: | Height: | Size: 643 B |
BIN
src/flags/ae.png
Normal file
After Width: | Height: | Size: 408 B |
BIN
src/flags/af.png
Normal file
After Width: | Height: | Size: 604 B |
BIN
src/flags/ag.png
Normal file
After Width: | Height: | Size: 591 B |
BIN
src/flags/ai.png
Normal file
After Width: | Height: | Size: 643 B |
BIN
src/flags/al.png
Normal file
After Width: | Height: | Size: 600 B |
BIN
src/flags/am.png
Normal file
After Width: | Height: | Size: 497 B |
BIN
src/flags/an.png
Normal file
After Width: | Height: | Size: 488 B |
BIN
src/flags/ao.png
Normal file
After Width: | Height: | Size: 428 B |
BIN
src/flags/ar.png
Normal file
After Width: | Height: | Size: 506 B |
BIN
src/flags/as.png
Normal file
After Width: | Height: | Size: 647 B |
BIN
src/flags/at.png
Normal file
After Width: | Height: | Size: 403 B |
BIN
src/flags/au.png
Normal file
After Width: | Height: | Size: 673 B |
BIN
src/flags/aw.png
Normal file
After Width: | Height: | Size: 524 B |
BIN
src/flags/ax.png
Normal file
After Width: | Height: | Size: 663 B |
BIN
src/flags/az.png
Normal file
After Width: | Height: | Size: 589 B |
BIN
src/flags/ba.png
Normal file
After Width: | Height: | Size: 593 B |
BIN
src/flags/bb.png
Normal file
After Width: | Height: | Size: 585 B |
BIN
src/flags/bd.png
Normal file
After Width: | Height: | Size: 504 B |
BIN
src/flags/be.png
Normal file
After Width: | Height: | Size: 449 B |
BIN
src/flags/bf.png
Normal file
After Width: | Height: | Size: 497 B |
BIN
src/flags/bg.png
Normal file
After Width: | Height: | Size: 462 B |
BIN
src/flags/bh.png
Normal file
After Width: | Height: | Size: 457 B |
BIN
src/flags/bi.png
Normal file
After Width: | Height: | Size: 675 B |
BIN
src/flags/bj.png
Normal file
After Width: | Height: | Size: 486 B |
BIN
src/flags/bm.png
Normal file
After Width: | Height: | Size: 611 B |
BIN
src/flags/bn.png
Normal file
After Width: | Height: | Size: 639 B |
BIN
src/flags/bo.png
Normal file
After Width: | Height: | Size: 500 B |
BIN
src/flags/br.png
Normal file
After Width: | Height: | Size: 593 B |
BIN
src/flags/bs.png
Normal file
After Width: | Height: | Size: 526 B |
BIN
src/flags/bt.png
Normal file
After Width: | Height: | Size: 631 B |
BIN
src/flags/bv.png
Normal file
After Width: | Height: | Size: 512 B |
BIN
src/flags/bw.png
Normal file
After Width: | Height: | Size: 443 B |
BIN
src/flags/by.png
Normal file
After Width: | Height: | Size: 514 B |
BIN
src/flags/bz.png
Normal file
After Width: | Height: | Size: 600 B |
BIN
src/flags/ca.png
Normal file
After Width: | Height: | Size: 628 B |
BIN
src/flags/catalonia.png
Normal file
After Width: | Height: | Size: 398 B |
BIN
src/flags/cc.png
Normal file
After Width: | Height: | Size: 625 B |
BIN
src/flags/cd.png
Normal file
After Width: | Height: | Size: 528 B |
BIN
src/flags/cf.png
Normal file
After Width: | Height: | Size: 614 B |
BIN
src/flags/cg.png
Normal file
After Width: | Height: | Size: 521 B |
BIN
src/flags/ch.png
Normal file
After Width: | Height: | Size: 367 B |
BIN
src/flags/ci.png
Normal file
After Width: | Height: | Size: 453 B |
BIN
src/flags/ck.png
Normal file
After Width: | Height: | Size: 586 B |
BIN
src/flags/cl.png
Normal file
After Width: | Height: | Size: 450 B |
BIN
src/flags/cm.png
Normal file
After Width: | Height: | Size: 525 B |
BIN
src/flags/cn.png
Normal file
After Width: | Height: | Size: 472 B |
BIN
src/flags/co.png
Normal file
After Width: | Height: | Size: 483 B |
BIN
src/flags/cr.png
Normal file
After Width: | Height: | Size: 477 B |
BIN
src/flags/cs.png
Normal file
After Width: | Height: | Size: 439 B |
BIN
src/flags/cu.png
Normal file
After Width: | Height: | Size: 563 B |
BIN
src/flags/cv.png
Normal file
After Width: | Height: | Size: 529 B |
BIN
src/flags/cx.png
Normal file
After Width: | Height: | Size: 608 B |
BIN
src/flags/cy.png
Normal file
After Width: | Height: | Size: 428 B |
BIN
src/flags/cz.png
Normal file
After Width: | Height: | Size: 476 B |
BIN
src/flags/de.png
Normal file
After Width: | Height: | Size: 545 B |
BIN
src/flags/dj.png
Normal file
After Width: | Height: | Size: 572 B |
BIN
src/flags/dk.png
Normal file
After Width: | Height: | Size: 495 B |
BIN
src/flags/dm.png
Normal file
After Width: | Height: | Size: 620 B |
BIN
src/flags/do.png
Normal file
After Width: | Height: | Size: 508 B |
BIN
src/flags/dz.png
Normal file
After Width: | Height: | Size: 582 B |
BIN
src/flags/ec.png
Normal file
After Width: | Height: | Size: 500 B |
BIN
src/flags/ee.png
Normal file
After Width: | Height: | Size: 429 B |
BIN
src/flags/eg.png
Normal file
After Width: | Height: | Size: 465 B |