For example, the first audio link in "The United States" English
Wikipedia article - "The Star-Spangled Banner" - ends with ".oga".
Without this commit the audio link is not recognized by GoldenDict:
* it is not pronounced when a Preferences=>Audio=>"Auto-pronounce..."
option is enabled;
* clicking on the link opens it in the default browser instead of
playing inside GoldenDict.
I have searched for the "<button" string and even for the "<\s*button"
pattern in tens of articles from all 5 default Wikipedia and all 5
default Wiktionary sites. Found none. I assume this pattern is obsolete.
Removing this useless code improves performance by doing less searching.
I have run the following command on directories that contained many
Wikipedia and Wiktionary articles received by GoldenDict:
pcregrep -MrI --buffer-size 20M '<\s*button' DIR-WITH-ARTICLES
This string replacement is 3-5 times faster than the QRegularExpression
replacement in "The United States" and "Paris" English Wikipedia
articles on my GNU/Linux system.
Before fe39fc8a05 the pattern started with
"<a\\shref=" instead of the current "<a\\s+href=", and no related bug
has been reported. I haven't encountered any whitespace character other
than space in this position. I believe that a single tab or a single EOL
character do not make sense after "<a". So a regression is unlikely.
I have searched for a tab or a newline character after "<a" and for a
whitespace character after "<a " in tens of articles from all 5 default
Wikipedia and all 5 default Wiktionary sites. Found none.
I have run the following command on directories that contained many
Wikipedia and Wiktionary articles received by GoldenDict:
pcregrep -MrI --buffer-size 20M "$PATTERN" DIR-WITH-ARTICLES
with PATTERN='<a(\t|\n)' and PATTERN='<a \s+href'.
I haven't encountered any prefix other than "/wiki/" that should be
discarded. If there are such other prefixes, I think they would conform
to some pattern, and so the replacement code could be adjusted to
accommodate them.
This commit fixes #813.
Examples of pages with subpage links in English Wikipedia that are fixed
by this commit: "Asio (disambiguation)", "Asio C plus plus library".
This issue is much more prevalent in Wookieepedia because it has
a two-tab link system with the patterns */Legends and */Canon.