mirror of
https://github.com/xiaoyifang/goldendict-ng.git
synced 2024-11-24 00:14:06 +00:00
MediaWiki: remove the /wiki/ prefix from links w/o regexp
This string replacement is 3-5 times faster than the QRegularExpression
replacement in "The United States" and "Paris" English Wikipedia
articles on my GNU/Linux system.
Before fe39fc8a05
the pattern started with
"<a\\shref=" instead of the current "<a\\s+href=", and no related bug
has been reported. I haven't encountered any whitespace character other
than space in this position. I believe that a single tab or a single EOL
character do not make sense after "<a". So a regression is unlikely.
I have searched for a tab or a newline character after "<a" and for a
whitespace character after "<a " in tens of articles from all 5 default
Wikipedia and all 5 default Wiktionary sites. Found none.
I have run the following command on directories that contained many
Wikipedia and Wiktionary articles received by GoldenDict:
pcregrep -MrI --buffer-size 20M "$PATTERN" DIR-WITH-ARTICLES
with PATTERN='<a(\t|\n)' and PATTERN='<a \s+href'.
This commit is contained in:
parent
b7da546dd5
commit
dec59439b9
|
@ -493,11 +493,7 @@ void MediaWikiArticleRequest::requestFinished( QNetworkReply * r )
|
|||
articleString.replace( "src=\"/", "src=\"" + wikiUrl.toString() );
|
||||
|
||||
// Remove the /wiki/ prefix from links
|
||||
#if QT_VERSION >= QT_VERSION_CHECK( 5, 0, 0 )
|
||||
articleString.replace( QRegularExpression( "<a\\s+href=\"/wiki/" ), "<a href=\"" );
|
||||
#else
|
||||
articleString.replace( QRegExp( "<a\\s+href=\"/wiki/" ), "<a href=\"" );
|
||||
#endif
|
||||
articleString.replace( "<a href=\"/wiki/", "<a href=\"" );
|
||||
|
||||
//fix audio
|
||||
#if QT_VERSION >= QT_VERSION_CHECK( 5, 0, 0 )
|
||||
|
|
Loading…
Reference in a new issue