The replacement command:
git grep -l 'qrcx://localhost' | xargs sed -i 's/qrcx:\/\/localhost/qrc:\/\//g'
The qrcx:// URL scheme was introduced in 2009 or earlier - it is present
in the first commit in GoldenDict's git history. Back then GoldenDict
supported Qt versions earlier than 4.6, in which
QWebSecurityOrigin::addLocalScheme() was introduced. Adding the qrc URL
scheme as local obsoletes the qrcx URL scheme. GoldenDict does not
compile against Qt versions earlier than 4.6, so there is no reason to
use this custom URL scheme anymore.
Co-authored-by: Igor Kushnir <igorkuo@gmail.com>
Duplicate articles can be shown when the alts collection is not empty
and a MediaWiki site redirects multiple words to a single page. The
alts collection can be populated when:
* option Preferences=>Advanced=>"Extra search via synonyms" is enabled;
* a Morphology dictionary is active;
* a translation of a phrase is requested in a way that makes GoldenDict
pass the input phrase to Preferences::sanitizeInputPhrase().
Steps to reproduce 1:
1. Create and switch to a dictionary group with (1) "English Wikipedia"
and (2) "English (US) Morphology" dictionaries in it.
2. Request a translation of the word "plays" (without quotes).
Steps to reproduce 2:
1. Create a dictionary group with "English Wiktionary" dictionary in it;
switch to this group in the scan popup window (or in the main window
if the Preferences=>Scan Popup=>"Send translated word to main window"
option is enabled).
2. Select the word "i.e." (without quotes) and press Ctrl+C+C (or
whatever hotkey is configured to translate a word from clipboard).
For example, the first audio link in "The United States" English
Wikipedia article - "The Star-Spangled Banner" - ends with ".oga".
Without this commit the audio link is not recognized by GoldenDict:
* it is not pronounced when a Preferences=>Audio=>"Auto-pronounce..."
option is enabled;
* clicking on the link opens it in the default browser instead of
playing inside GoldenDict.
I have searched for the "<button" string and even for the "<\s*button"
pattern in tens of articles from all 5 default Wikipedia and all 5
default Wiktionary sites. Found none. I assume this pattern is obsolete.
Removing this useless code improves performance by doing less searching.
I have run the following command on directories that contained many
Wikipedia and Wiktionary articles received by GoldenDict:
pcregrep -MrI --buffer-size 20M '<\s*button' DIR-WITH-ARTICLES
This string replacement is 3-5 times faster than the QRegularExpression
replacement in "The United States" and "Paris" English Wikipedia
articles on my GNU/Linux system.
Before fe39fc8a05 the pattern started with
"<a\\shref=" instead of the current "<a\\s+href=", and no related bug
has been reported. I haven't encountered any whitespace character other
than space in this position. I believe that a single tab or a single EOL
character do not make sense after "<a". So a regression is unlikely.
I have searched for a tab or a newline character after "<a" and for a
whitespace character after "<a " in tens of articles from all 5 default
Wikipedia and all 5 default Wiktionary sites. Found none.
I have run the following command on directories that contained many
Wikipedia and Wiktionary articles received by GoldenDict:
pcregrep -MrI --buffer-size 20M "$PATTERN" DIR-WITH-ARTICLES
with PATTERN='<a(\t|\n)' and PATTERN='<a \s+href'.
I haven't encountered any prefix other than "/wiki/" that should be
discarded. If there are such other prefixes, I think they would conform
to some pattern, and so the replacement code could be adjusted to
accommodate them.
This commit fixes #813.
Examples of pages with subpage links in English Wikipedia that are fixed
by this commit: "Asio (disambiguation)", "Asio C plus plus library".
This issue is much more prevalent in Wookieepedia because it has
a two-tab link system with the patterns */Legends and */Canon.