Compare commits
1 commit
Author | SHA1 | Date | |
---|---|---|---|
9e0ff422d8 |
158
README.md
158
README.md
|
@ -4,13 +4,12 @@ compiling the scraped data into compact dictionary file formats.
|
|||
|
||||
### Supported Dictionaries
|
||||
* Web Dictionaries
|
||||
* [国語辞典オンライン](https://kokugo.jitenon.jp/) (`jitenon-kokugo`)
|
||||
* [四字熟語辞典オンライン](https://yoji.jitenon.jp/) (`jitenon-yoji`)
|
||||
* [故事・ことわざ・慣用句オンライン](https://kotowaza.jitenon.jp/) (`jitenon-kotowaza`)
|
||||
* Monokakido
|
||||
* [新明解国語辞典 第八版](https://www.monokakido.jp/ja/dictionaries/smk8/index.html) (`smk8`)
|
||||
* [大辞林 第四版](https://www.monokakido.jp/ja/dictionaries/daijirin2/index.html) (`daijirin2`)
|
||||
* [三省堂国語辞典 第八版](https://www.monokakido.jp/ja/dictionaries/sankoku8/index.html) (`sankoku8`)
|
||||
* [国語辞典オンライン](https://kokugo.jitenon.jp/) (Jitenon Kokugo)
|
||||
* [四字熟語辞典オンライン](https://yoji.jitenon.jp/) (Jitenon Yoji)
|
||||
* [故事・ことわざ・慣用句オンライン](https://kotowaza.jitenon.jp/) (Jitenon Kotowaza)
|
||||
* Monokakido (["辞書 by 物書堂"](https://www.monokakido.jp/ja/dictionaries/app/))
|
||||
* [新明解国語辞典 第八版](https://www.monokakido.jp/ja/dictionaries/smk8/index.html) (Shinmeikai 8e)
|
||||
* [大辞林 第四版](https://www.monokakido.jp/ja/dictionaries/daijirin2/index.html) (Daijirin 4e)
|
||||
|
||||
### Supported Output Formats
|
||||
|
||||
|
@ -49,12 +48,6 @@ compiling the scraped data into compact dictionary file formats.
|
|||
![daijirin2](https://user-images.githubusercontent.com/8003332/235578700-9dbf4fb0-0154-48b5-817c-8fe75e442afc.png)
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Sanseidō 8e (print | yomichan)</summary>
|
||||
|
||||
![sankoku8](https://github.com/stephenmk/jitenbot/assets/8003332/0358b3fc-71fb-4557-977c-1976a12229ec)
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>Various (GoldenDict)</summary>
|
||||
|
||||
|
@ -64,14 +57,13 @@ compiling the scraped data into compact dictionary file formats.
|
|||
# Usage
|
||||
```
|
||||
usage: jitenbot [-h] [-p PAGE_DIR] [-m MEDIA_DIR] [-i MDICT_ICON]
|
||||
[--no-mdict-export] [--no-yomichan-export]
|
||||
[--validate-yomichan-terms]
|
||||
{jitenon-kokugo,jitenon-yoji,jitenon-kotowaza,smk8,daijirin2,sankoku8}
|
||||
[--no-yomichan-export] [--no-mdict-export]
|
||||
{jitenon-kokugo,jitenon-yoji,jitenon-kotowaza,smk8,daijirin2}
|
||||
|
||||
Convert Japanese dictionary files to new formats.
|
||||
|
||||
positional arguments:
|
||||
{jitenon-kokugo,jitenon-yoji,jitenon-kotowaza,smk8,daijirin2,sankoku8}
|
||||
{jitenon-kokugo,jitenon-yoji,jitenon-kotowaza,smk8,daijirin2}
|
||||
name of dictionary to convert
|
||||
|
||||
options:
|
||||
|
@ -83,14 +75,10 @@ options:
|
|||
graphics, audio, etc.)
|
||||
-i MDICT_ICON, --mdict-icon MDICT_ICON
|
||||
path to icon file to be used with MDict
|
||||
--no-mdict-export skip export of dictionary data to MDict format
|
||||
--no-yomichan-export skip export of dictionary data to Yomichan format
|
||||
--validate-yomichan-terms
|
||||
validate JSON structure of exported Yomichan
|
||||
dictionary terms
|
||||
--no-mdict-export skip export of dictionary data to MDict format
|
||||
|
||||
See README.md for details regarding media directory structures
|
||||
|
||||
```
|
||||
### Web Targets
|
||||
Jitenbot will scrape the target website and save the pages to the [user cache directory](https://pypi.org/project/platformdirs/).
|
||||
|
@ -101,112 +89,58 @@ HTTP request headers (user agent string, etc.) may be customized by editing the
|
|||
[user config directory](https://pypi.org/project/platformdirs/).
|
||||
|
||||
### Monokakido Targets
|
||||
These digital dictionaries are available for purchase through the [Monokakido Dictionaries app](https://www.monokakido.jp/ja/dictionaries/app/) on MacOS/iOS. Under ideal circumstances, Jitenbot would be able to automatically fetch all the data it needs from this app's data directory[^1] on your system. In its current state of development, Jitenbot unfortunately requires you to find and assemble the necessary data yourself. The files must be organized into a particular folder structure (defined below) and then passed to Jitenbot via the corresponding command line arguments.
|
||||
Page data and media data must be [procured by the user](https://github.com/golddranks/monokakido/)
|
||||
and passed to jitenbot via the appropriate command line flags. Additionaly, the gaiji folder and the audio icon have to be manually copied from the original dictionary folder into the media folder.
|
||||
|
||||
Some of the folders in the app's data directory[^1] contain encoded files that must be unencoded using [golddranks' monokakido tool](https://github.com/golddranks/monokakido/). These folders are indicated by a reference mark (※) in the notes below.
|
||||
|
||||
[^1]: `/Library/Application Support/AppStoreContent/jp.monokakido.Dictionaries/Products/`
|
||||
Path:
|
||||
```/YOUR_SAVE_PATH/jp.monokakido.Dictionaries.DICTIONARY_NAME/Contents/DICTIONARY_NAME/```.
|
||||
|
||||
<details>
|
||||
<summary>smk8 files</summary>
|
||||
<summary>smk8 media directory</summary>
|
||||
|
||||
Since Yomichan does not support audio files from imported dictionaries, the `audio/` directory may be omitted to save filesize space in the output ZIP file if desired.
|
||||
Since Yomichan does not support audio files from imported
|
||||
dictionaries, the `audio/` directory may be omitted to save filesize
|
||||
space in the output ZIP file if desired.
|
||||
|
||||
```
|
||||
.
|
||||
├── media
|
||||
│ ├── audio (※)
|
||||
│ │ ├── 00001.aac
|
||||
│ │ ├── 00002.aac
|
||||
│ │ ├── 00003.aac
|
||||
│ │ ├── ...
|
||||
│ │ └── 82682.aac
|
||||
│ ├── Audio.png
|
||||
│ └── gaiji
|
||||
│ ├── 1d110.svg
|
||||
│ ├── 1d15d.svg
|
||||
│ ├── 1d15e.svg
|
||||
│ ├── ...
|
||||
│ └── xbunnoa.svg
|
||||
└── pages (※)
|
||||
├── 0000000000.xml
|
||||
├── 0000000001.xml
|
||||
├── 0000000002.xml
|
||||
├── ...
|
||||
└── 0000064581.xml
|
||||
media
|
||||
├── Audio.png
|
||||
├── audio
|
||||
│ ├── 00001.aac
|
||||
│ ├── 00002.aac
|
||||
│ ├── 00003.aac
|
||||
│ │ ...
|
||||
│ └── 82682.aac
|
||||
└── gaiji
|
||||
├── 1d110.svg
|
||||
├── 1d15d.svg
|
||||
├── 1d15e.svg
|
||||
│ ...
|
||||
└── xbunnoa.svg
|
||||
```
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>daijirin2 files</summary>
|
||||
<summary>daijirin2 media directory</summary>
|
||||
|
||||
The `graphics/` directory may be omitted to save space if desired.
|
||||
|
||||
```
|
||||
.
|
||||
├── media
|
||||
│ ├── gaiji
|
||||
│ │ ├── 1D10B.svg
|
||||
│ │ ├── 1D110.svg
|
||||
│ │ ├── 1D12A.svg
|
||||
│ │ ├── ...
|
||||
│ │ └── vectorOB.svg
|
||||
│ └── graphics (※)
|
||||
│ ├── 3djr_0002.png
|
||||
│ ├── 3djr_0004.png
|
||||
│ ├── 3djr_0005.png
|
||||
│ ├── ...
|
||||
│ └── 4djr_yahazu.png
|
||||
└── pages (※)
|
||||
├── 0000000001.xml
|
||||
├── 0000000002.xml
|
||||
├── 0000000003.xml
|
||||
├── ...
|
||||
└── 0000182633.xml
|
||||
```
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>sankoku8 files</summary>
|
||||
|
||||
```
|
||||
.
|
||||
├── media
|
||||
│ ├── graphics
|
||||
│ │ ├── 000chouchou.png
|
||||
│ │ ├── ...
|
||||
│ │ └── 888udatsu.png
|
||||
│ ├── svg-accent
|
||||
│ │ ├── アクセント.svg
|
||||
│ │ └── 平板.svg
|
||||
│ ├── svg-frac
|
||||
│ │ ├── frac-1-2.svg
|
||||
│ │ ├── ...
|
||||
│ │ └── frac-a-b.svg
|
||||
│ ├── svg-gaiji
|
||||
│ │ ├── aiaigasa.svg
|
||||
│ │ ├── ...
|
||||
│ │ └── 異体字_西.svg
|
||||
│ ├── svg-intonation
|
||||
│ │ ├── 上昇下降.svg
|
||||
│ │ ├── ...
|
||||
│ │ └── 長.svg
|
||||
│ ├── svg-logo
|
||||
│ │ ├── denshi.svg
|
||||
│ │ ├── ...
|
||||
│ │ └── 重要語.svg
|
||||
│ └── svg-special
|
||||
│ └── 区切り線.svg
|
||||
└── pages (※)
|
||||
├── 0000000001.xml
|
||||
├── ...
|
||||
└── 0000065457.xml
|
||||
media
|
||||
├── gaiji
|
||||
│ ├── 1D10B.svg
|
||||
│ ├── 1D110.svg
|
||||
│ ├── 1D12A.svg
|
||||
│ │ ...
|
||||
│ └── vectorOB.svg
|
||||
└── graphics
|
||||
├── 3djr_0002.png
|
||||
├── 3djr_0004.png
|
||||
├── 3djr_0005.png
|
||||
│ ...
|
||||
└── 4djr_yahazu.png
|
||||
```
|
||||
</details>
|
||||
|
||||
# Attribution
|
||||
`Adobe-Japan1_sequences.txt` is provided by [The Adobe-Japan1-7 Character Collection](https://github.com/adobe-type-tools/Adobe-Japan1).
|
||||
|
||||
The Yomichan term-bank schema definition `dictionary-term-bank-v3-schema.json` is provided by the [Yomichan](https://github.com/foosoft/yomichan) project.
|
||||
|
||||
Many thanks to [epistularum](https://github.com/epistularum) for providing thoughtful feedback regarding the implementation of the MDict export functionality.
|
||||
|
|
7
TODO.md
7
TODO.md
|
@ -1,13 +1,10 @@
|
|||
### Todo
|
||||
|
||||
- [x] Add factory classes to reduce the amount of class import statements
|
||||
- [x] Add dynamic import functionality to factory classes to reduce boilerplate
|
||||
- [x] Support exporting to MDict (.MDX) dictionary format
|
||||
- [x] Validate JSON schema of Yomichan terms during export
|
||||
- [ ] Add support for monokakido search keys from index files
|
||||
- [ ] Delete unneeded media from temp build directory before final export
|
||||
- [ ] Add test suite
|
||||
- [ ] Add documentation (docstrings, etc.)
|
||||
- [ ] Validate JSON schema of Yomichan terms during export
|
||||
- [ ] Add build scripts for producing program binaries
|
||||
- [ ] Validate scraped webpages after downloading
|
||||
- [ ] Log non-fatal failures to a log file instead of raising exceptions
|
||||
|
@ -16,7 +13,7 @@
|
|||
- [ ] [Yoji-Jukugo.com](https://yoji-jukugo.com/)
|
||||
- [ ] [実用日本語表現辞典](https://www.weblio.jp/cat/dictionary/jtnhj)
|
||||
- [ ] Support more Monokakido dictionaries
|
||||
- [x] 三省堂国語辞典 第8版 (SANKOKU8)
|
||||
- [ ] 三省堂国語辞典 第8版 (SANKOKU8)
|
||||
- [ ] 精選版 日本国語大辞典 (NDS)
|
||||
- [ ] 大辞泉 第2版 (DAIJISEN2)
|
||||
- [ ] 明鏡国語辞典 第3版 (MK3)
|
||||
|
|
|
@ -1,54 +0,0 @@
|
|||
import re
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
from bot.factory import new_entry
|
||||
from bot.factory import new_yomichan_exporter
|
||||
from bot.factory import new_mdict_exporter
|
||||
|
||||
|
||||
class BaseCrawler(ABC):
|
||||
def __init__(self, target):
|
||||
self._target = target
|
||||
self._page_map = {}
|
||||
self._entries = []
|
||||
self._page_id_pattern = None
|
||||
|
||||
@abstractmethod
|
||||
def collect_pages(self, page_dir):
|
||||
raise NotImplementedError
|
||||
|
||||
def read_pages(self):
|
||||
pages_len = len(self._page_map)
|
||||
items = self._page_map.items()
|
||||
for idx, (page_id, page_path) in enumerate(items):
|
||||
update = f"\tReading page {idx+1}/{pages_len}"
|
||||
print(update, end='\r', flush=True)
|
||||
entry = new_entry(self._target, page_id)
|
||||
with open(page_path, "r", encoding="utf-8") as f:
|
||||
page = f.read()
|
||||
try:
|
||||
entry.set_page(page)
|
||||
except ValueError as err:
|
||||
print(err)
|
||||
print("Try deleting and redownloading file:")
|
||||
print(f"\t{page_path}\n")
|
||||
continue
|
||||
self._entries.append(entry)
|
||||
print()
|
||||
|
||||
def make_yomichan_dictionary(self, media_dir, validate):
|
||||
exporter = new_yomichan_exporter(self._target)
|
||||
exporter.export(self._entries, media_dir, validate)
|
||||
|
||||
def make_mdict_dictionary(self, media_dir, icon_file):
|
||||
exporter = new_mdict_exporter(self._target)
|
||||
exporter.export(self._entries, media_dir, icon_file)
|
||||
|
||||
def _parse_page_id(self, page_link):
|
||||
m = re.search(self._page_id_pattern, page_link)
|
||||
if m is None:
|
||||
return None
|
||||
page_id = int(m.group(1))
|
||||
if page_id in self._page_map:
|
||||
return None
|
||||
return page_id
|
|
@ -1,30 +0,0 @@
|
|||
from bs4 import BeautifulSoup
|
||||
|
||||
from bot.time import timestamp
|
||||
from bot.crawlers.scrapers.jitenon import Jitenon as JitenonScraper
|
||||
from bot.crawlers.base.crawler import BaseCrawler
|
||||
|
||||
|
||||
class JitenonCrawler(BaseCrawler):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._gojuon_url = None
|
||||
|
||||
def collect_pages(self, page_dir):
|
||||
print(f"{timestamp()} Scraping {self._gojuon_url}")
|
||||
jitenon = JitenonScraper()
|
||||
gojuon_doc, _ = jitenon.scrape(self._gojuon_url)
|
||||
gojuon_soup = BeautifulSoup(gojuon_doc, features="html.parser")
|
||||
for gojuon_a in gojuon_soup.select(".kana_area a", href=True):
|
||||
gojuon_href = gojuon_a['href']
|
||||
kana_doc, _ = jitenon.scrape(gojuon_href)
|
||||
kana_soup = BeautifulSoup(kana_doc, features="html.parser")
|
||||
for kana_a in kana_soup.select(".word_box a", href=True):
|
||||
page_link = kana_a['href']
|
||||
page_id = self._parse_page_id(page_link)
|
||||
if page_id is None:
|
||||
continue
|
||||
_, page_path = jitenon.scrape(page_link)
|
||||
self._page_map[page_id] = page_path
|
||||
pages_len = len(self._page_map)
|
||||
print(f"\n{timestamp()} Found {pages_len} entry pages")
|
|
@ -1,20 +0,0 @@
|
|||
import os
|
||||
from bot.time import timestamp
|
||||
from bot.crawlers.base.crawler import BaseCrawler
|
||||
|
||||
|
||||
class MonokakidoCrawler(BaseCrawler):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._page_id_pattern = r"^([0-9]+)\.xml$"
|
||||
|
||||
def collect_pages(self, page_dir):
|
||||
print(f"{timestamp()} Searching for page files in `{page_dir}`")
|
||||
for pagefile in os.listdir(page_dir):
|
||||
page_id = self._parse_page_id(pagefile)
|
||||
if page_id is None or page_id == 0:
|
||||
continue
|
||||
path = os.path.join(page_dir, pagefile)
|
||||
self._page_map[page_id] = path
|
||||
pages_len = len(self._page_map)
|
||||
print(f"{timestamp()} Found {pages_len} page files for processing")
|
154
bot/crawlers/crawlers.py
Normal file
154
bot/crawlers/crawlers.py
Normal file
|
@ -0,0 +1,154 @@
|
|||
import os
|
||||
import re
|
||||
from abc import ABC, abstractmethod
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
import bot.crawlers.scraper as Scraper
|
||||
from bot.entries.factory import new_entry
|
||||
from bot.yomichan.exporters.factory import new_yomi_exporter
|
||||
from bot.mdict.exporters.factory import new_mdict_exporter
|
||||
|
||||
|
||||
class Crawler(ABC):
|
||||
def __init__(self, target):
|
||||
self._target = target
|
||||
self._page_map = {}
|
||||
self._entries = []
|
||||
self._page_id_pattern = None
|
||||
|
||||
@abstractmethod
|
||||
def collect_pages(self, page_dir):
|
||||
pass
|
||||
|
||||
def read_pages(self):
|
||||
pages_len = len(self._page_map)
|
||||
items = self._page_map.items()
|
||||
for idx, (page_id, page_path) in enumerate(items):
|
||||
update = f"Reading page {idx+1}/{pages_len}"
|
||||
print(update, end='\r', flush=True)
|
||||
entry = new_entry(self._target, page_id)
|
||||
with open(page_path, "r", encoding="utf-8") as f:
|
||||
page = f.read()
|
||||
try:
|
||||
entry.set_page(page)
|
||||
except ValueError as err:
|
||||
print(err)
|
||||
print("Try deleting and redownloading file:")
|
||||
print(f"\t{page_path}\n")
|
||||
continue
|
||||
self._entries.append(entry)
|
||||
print()
|
||||
|
||||
def make_yomichan_dictionary(self, media_dir):
|
||||
exporter = new_yomi_exporter(self._target)
|
||||
exporter.export(self._entries, media_dir)
|
||||
|
||||
def make_mdict_dictionary(self, media_dir, icon_file):
|
||||
exporter = new_mdict_exporter(self._target)
|
||||
exporter.export(self._entries, media_dir, icon_file)
|
||||
|
||||
def _parse_page_id(self, page_link):
|
||||
m = re.search(self._page_id_pattern, page_link)
|
||||
if m is None:
|
||||
return None
|
||||
page_id = int(m.group(1))
|
||||
if page_id in self._page_map:
|
||||
return None
|
||||
return page_id
|
||||
|
||||
|
||||
class JitenonKokugoCrawler(Crawler):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._gojuon_url = "https://kokugo.jitenon.jp/cat/gojuonindex.php"
|
||||
self._page_id_pattern = r"word/p([0-9]+)$"
|
||||
|
||||
def collect_pages(self, page_dir):
|
||||
jitenon = Scraper.Jitenon()
|
||||
gojuon_doc, _ = jitenon.scrape(self._gojuon_url)
|
||||
gojuon_soup = BeautifulSoup(gojuon_doc, features="html.parser")
|
||||
for gojuon_a in gojuon_soup.select(".kana_area a", href=True):
|
||||
gojuon_href = gojuon_a['href']
|
||||
max_kana_page = 1
|
||||
current_kana_page = 1
|
||||
while current_kana_page <= max_kana_page:
|
||||
kana_doc, _ = jitenon.scrape(f"{gojuon_href}&page={current_kana_page}")
|
||||
current_kana_page += 1
|
||||
kana_soup = BeautifulSoup(kana_doc, features="html.parser")
|
||||
page_total = kana_soup.find(class_="page_total").text
|
||||
m = re.search(r"全([0-9]+)件", page_total)
|
||||
if m:
|
||||
max_kana_page = int(m.group(1))
|
||||
for kana_a in kana_soup.select(".word_box a", href=True):
|
||||
page_link = kana_a['href']
|
||||
page_id = self._parse_page_id(page_link)
|
||||
if page_id is None:
|
||||
continue
|
||||
_, page_path = jitenon.scrape(page_link)
|
||||
self._page_map[page_id] = page_path
|
||||
pages_len = len(self._page_map)
|
||||
print(f"Finished scraping {pages_len} pages")
|
||||
|
||||
|
||||
class _JitenonCrawler(Crawler):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._gojuon_url = None
|
||||
|
||||
def collect_pages(self, page_dir):
|
||||
print("Scraping jitenon.jp")
|
||||
jitenon = Scraper.Jitenon()
|
||||
gojuon_doc, _ = jitenon.scrape(self._gojuon_url)
|
||||
gojuon_soup = BeautifulSoup(gojuon_doc, features="html.parser")
|
||||
for gojuon_a in gojuon_soup.select(".kana_area a", href=True):
|
||||
gojuon_href = gojuon_a['href']
|
||||
kana_doc, _ = jitenon.scrape(gojuon_href)
|
||||
kana_soup = BeautifulSoup(kana_doc, features="html.parser")
|
||||
for kana_a in kana_soup.select(".word_box a", href=True):
|
||||
page_link = kana_a['href']
|
||||
page_id = self._parse_page_id(page_link)
|
||||
if page_id is None:
|
||||
continue
|
||||
_, page_path = jitenon.scrape(page_link)
|
||||
self._page_map[page_id] = page_path
|
||||
pages_len = len(self._page_map)
|
||||
print(f"Finished scraping {pages_len} pages")
|
||||
|
||||
|
||||
class JitenonYojiCrawler(_JitenonCrawler):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._gojuon_url = "https://yoji.jitenon.jp/cat/gojuon.html"
|
||||
self._page_id_pattern = r"([0-9]+)\.html$"
|
||||
|
||||
|
||||
class JitenonKotowazaCrawler(_JitenonCrawler):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._gojuon_url = "https://kotowaza.jitenon.jp/cat/gojuon.php"
|
||||
self._page_id_pattern = r"([0-9]+)\.php$"
|
||||
|
||||
|
||||
class _MonokakidoCrawler(Crawler):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._page_id_pattern = r"^([0-9]+)\.xml$"
|
||||
|
||||
def collect_pages(self, page_dir):
|
||||
print(f"Searching for page files in `{page_dir}`")
|
||||
for pagefile in os.listdir(page_dir):
|
||||
page_id = self._parse_page_id(pagefile)
|
||||
if page_id is None or page_id == 0:
|
||||
continue
|
||||
path = os.path.join(page_dir, pagefile)
|
||||
self._page_map[page_id] = path
|
||||
pages_len = len(self._page_map)
|
||||
print(f"Found {pages_len} page files for processing")
|
||||
|
||||
|
||||
class Smk8Crawler(_MonokakidoCrawler):
|
||||
pass
|
||||
|
||||
|
||||
class Daijirin2Crawler(_MonokakidoCrawler):
|
||||
pass
|
|
@ -1,5 +0,0 @@
|
|||
from bot.crawlers.base.monokakido import MonokakidoCrawler
|
||||
|
||||
|
||||
class Crawler(MonokakidoCrawler):
|
||||
pass
|
18
bot/crawlers/factory.py
Normal file
18
bot/crawlers/factory.py
Normal file
|
@ -0,0 +1,18 @@
|
|||
from bot.targets import Targets
|
||||
|
||||
from bot.crawlers.crawlers import JitenonKokugoCrawler
|
||||
from bot.crawlers.crawlers import JitenonYojiCrawler
|
||||
from bot.crawlers.crawlers import JitenonKotowazaCrawler
|
||||
from bot.crawlers.crawlers import Smk8Crawler
|
||||
from bot.crawlers.crawlers import Daijirin2Crawler
|
||||
|
||||
|
||||
def new_crawler(target):
|
||||
crawler_map = {
|
||||
Targets.JITENON_KOKUGO: JitenonKokugoCrawler,
|
||||
Targets.JITENON_YOJI: JitenonYojiCrawler,
|
||||
Targets.JITENON_KOTOWAZA: JitenonKotowazaCrawler,
|
||||
Targets.SMK8: Smk8Crawler,
|
||||
Targets.DAIJIRIN2: Daijirin2Crawler,
|
||||
}
|
||||
return crawler_map[target](target)
|
|
@ -1,40 +0,0 @@
|
|||
import re
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
from bot.time import timestamp
|
||||
from bot.crawlers.base.crawler import BaseCrawler
|
||||
from bot.crawlers.scrapers.jitenon import Jitenon as JitenonScraper
|
||||
|
||||
|
||||
class Crawler(BaseCrawler):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._gojuon_url = "https://kokugo.jitenon.jp/cat/gojuonindex.php"
|
||||
self._page_id_pattern = r"word/p([0-9]+)$"
|
||||
|
||||
def collect_pages(self, page_dir):
|
||||
print(f"{timestamp()} Scraping {self._gojuon_url}")
|
||||
jitenon = JitenonScraper()
|
||||
gojuon_doc, _ = jitenon.scrape(self._gojuon_url)
|
||||
gojuon_soup = BeautifulSoup(gojuon_doc, features="html.parser")
|
||||
for gojuon_a in gojuon_soup.select(".kana_area a", href=True):
|
||||
gojuon_href = gojuon_a['href']
|
||||
max_kana_page = 1
|
||||
current_kana_page = 1
|
||||
while current_kana_page <= max_kana_page:
|
||||
kana_doc, _ = jitenon.scrape(f"{gojuon_href}&page={current_kana_page}")
|
||||
current_kana_page += 1
|
||||
kana_soup = BeautifulSoup(kana_doc, features="html.parser")
|
||||
page_total = kana_soup.find(class_="page_total").text
|
||||
m = re.search(r"全([0-9]+)件", page_total)
|
||||
if m:
|
||||
max_kana_page = int(m.group(1))
|
||||
for kana_a in kana_soup.select(".word_box a", href=True):
|
||||
page_link = kana_a['href']
|
||||
page_id = self._parse_page_id(page_link)
|
||||
if page_id is None:
|
||||
continue
|
||||
_, page_path = jitenon.scrape(page_link)
|
||||
self._page_map[page_id] = page_path
|
||||
pages_len = len(self._page_map)
|
||||
print(f"\n{timestamp()} Found {pages_len} entry pages")
|
|
@ -1,8 +0,0 @@
|
|||
from bot.crawlers.base.jitenon import JitenonCrawler
|
||||
|
||||
|
||||
class Crawler(JitenonCrawler):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._gojuon_url = "https://kotowaza.jitenon.jp/cat/gojuon.php"
|
||||
self._page_id_pattern = r"([0-9]+)\.php$"
|
|
@ -1,8 +0,0 @@
|
|||
from bot.crawlers.base.jitenon import JitenonCrawler
|
||||
|
||||
|
||||
class Crawler(JitenonCrawler):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._gojuon_url = "https://yoji.jitenon.jp/cat/gojuon.html"
|
||||
self._page_id_pattern = r"([0-9]+)\.html$"
|
|
@ -1,5 +0,0 @@
|
|||
from bot.crawlers.base.monokakido import MonokakidoCrawler
|
||||
|
||||
|
||||
class Crawler(MonokakidoCrawler):
|
||||
pass
|
|
@ -1,28 +1,24 @@
|
|||
import time
|
||||
import requests
|
||||
import re
|
||||
import os
|
||||
import hashlib
|
||||
import random
|
||||
import math
|
||||
from datetime import datetime
|
||||
from urllib.parse import urlparse
|
||||
from pathlib import Path
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
import requests
|
||||
from platformdirs import user_cache_dir
|
||||
from urllib.parse import urlparse
|
||||
from requests.adapters import HTTPAdapter
|
||||
from requests.packages.urllib3.util.retry import Retry
|
||||
from platformdirs import user_cache_dir
|
||||
|
||||
from bot.time import timestamp
|
||||
from bot.data import load_config
|
||||
|
||||
|
||||
class BaseScraper(ABC):
|
||||
class Scraper():
|
||||
def __init__(self):
|
||||
self.cache_count = 0
|
||||
self._config = load_config()
|
||||
self.netloc_re = self._get_netloc_re()
|
||||
pattern = r"^(?:([A-Za-z0-9.\-]+)\.)?" + self.domain + r"$"
|
||||
self.netloc_re = re.compile(pattern)
|
||||
self.__set_session()
|
||||
|
||||
def scrape(self, urlstring):
|
||||
|
@ -35,14 +31,9 @@ class BaseScraper(ABC):
|
|||
with open(cache_path, "w", encoding="utf-8") as f:
|
||||
f.write(html)
|
||||
else:
|
||||
self.cache_count += 1
|
||||
print(f"\tDiscovering cached file {self.cache_count}", end='\r', flush=True)
|
||||
print("Discovering cached files...", end='\r', flush=True)
|
||||
return html, cache_path
|
||||
|
||||
@abstractmethod
|
||||
def _get_netloc_re(self):
|
||||
raise NotImplementedError
|
||||
|
||||
def __set_session(self):
|
||||
retry_strategy = Retry(
|
||||
total=3,
|
||||
|
@ -96,14 +87,21 @@ class BaseScraper(ABC):
|
|||
def __get(self, urlstring):
|
||||
delay = 10
|
||||
time.sleep(delay)
|
||||
print(f"{timestamp()} Scraping {urlstring} ...", end='')
|
||||
now = datetime.now().strftime("%H:%M:%S")
|
||||
print(f"{now} scraping {urlstring} ...", end='')
|
||||
try:
|
||||
response = self.session.get(urlstring, timeout=10)
|
||||
print(f"{timestamp()} OK")
|
||||
print("OK")
|
||||
return response.text
|
||||
except Exception as ex:
|
||||
print(f"\tFailed: {str(ex)}")
|
||||
print(f"{timestamp()} Resetting session and trying again")
|
||||
except Exception:
|
||||
print("failed")
|
||||
print("resetting session and trying again")
|
||||
self.__set_session()
|
||||
response = self.session.get(urlstring, timeout=10)
|
||||
return response.text
|
||||
|
||||
|
||||
class Jitenon(Scraper):
|
||||
def __init__(self):
|
||||
self.domain = r"jitenon\.jp"
|
||||
super().__init__()
|
|
@ -1,10 +0,0 @@
|
|||
import re
|
||||
from bot.crawlers.scrapers.scraper import BaseScraper
|
||||
|
||||
|
||||
class Jitenon(BaseScraper):
|
||||
def _get_netloc_re(self):
|
||||
domain = r"jitenon\.jp"
|
||||
pattern = r"^(?:([A-Za-z0-9.\-]+)\.)?" + domain + r"$"
|
||||
netloc_re = re.compile(pattern)
|
||||
return netloc_re
|
|
@ -1,5 +0,0 @@
|
|||
from bot.crawlers.base.monokakido import MonokakidoCrawler
|
||||
|
||||
|
||||
class Crawler(MonokakidoCrawler):
|
||||
pass
|
49
bot/data.py
49
bot/data.py
|
@ -37,16 +37,14 @@ def load_config():
|
|||
|
||||
@cache
|
||||
def load_yomichan_inflection_categories():
|
||||
file_name = os.path.join(
|
||||
"yomichan", "inflection_categories.json")
|
||||
file_name = os.path.join("yomichan", "inflection_categories.json")
|
||||
data = __load_json(file_name)
|
||||
return data
|
||||
|
||||
|
||||
@cache
|
||||
def load_yomichan_metadata():
|
||||
file_name = os.path.join(
|
||||
"yomichan", "index.json")
|
||||
file_name = os.path.join("yomichan", "index.json")
|
||||
data = __load_json(file_name)
|
||||
return data
|
||||
|
||||
|
@ -55,21 +53,31 @@ def load_yomichan_metadata():
|
|||
def load_variant_kanji():
|
||||
def loader(data, row):
|
||||
data[row[0]] = row[1]
|
||||
file_name = os.path.join(
|
||||
"entries", "variant_kanji.csv")
|
||||
file_name = os.path.join("entries", "variant_kanji.csv")
|
||||
data = {}
|
||||
__load_csv(file_name, loader, data)
|
||||
return data
|
||||
|
||||
|
||||
@cache
|
||||
def load_phrase_readings(target):
|
||||
def load_smk8_phrase_readings():
|
||||
def loader(data, row):
|
||||
entry_id = (int(row[0]), int(row[1]))
|
||||
reading = row[2]
|
||||
data[entry_id] = reading
|
||||
file_name = os.path.join(
|
||||
"entries", target.value, "phrase_readings.csv")
|
||||
file_name = os.path.join("entries", "smk8", "phrase_readings.csv")
|
||||
data = {}
|
||||
__load_csv(file_name, loader, data)
|
||||
return data
|
||||
|
||||
|
||||
@cache
|
||||
def load_daijirin2_phrase_readings():
|
||||
def loader(data, row):
|
||||
entry_id = (int(row[0]), int(row[1]))
|
||||
reading = row[2]
|
||||
data[entry_id] = reading
|
||||
file_name = os.path.join("entries", "daijirin2", "phrase_readings.csv")
|
||||
data = {}
|
||||
__load_csv(file_name, loader, data)
|
||||
return data
|
||||
|
@ -84,8 +92,7 @@ def load_daijirin2_kana_abbreviations():
|
|||
if abbr.strip() != "":
|
||||
abbreviations.append(abbr)
|
||||
data[entry_id] = abbreviations
|
||||
file_name = os.path.join(
|
||||
"entries", "daijirin2", "kana_abbreviations.csv")
|
||||
file_name = os.path.join("entries", "daijirin2", "kana_abbreviations.csv")
|
||||
data = {}
|
||||
__load_csv(file_name, loader, data)
|
||||
return data
|
||||
|
@ -93,24 +100,14 @@ def load_daijirin2_kana_abbreviations():
|
|||
|
||||
@cache
|
||||
def load_yomichan_name_conversion(target):
|
||||
file_name = os.path.join(
|
||||
"yomichan", "name_conversion", f"{target.value}.json")
|
||||
file_name = os.path.join("yomichan", "name_conversion", f"{target.value}.json")
|
||||
data = __load_json(file_name)
|
||||
return data
|
||||
|
||||
|
||||
@cache
|
||||
def load_yomichan_term_schema():
|
||||
file_name = os.path.join(
|
||||
"yomichan", "dictionary-term-bank-v3-schema.json")
|
||||
schema = __load_json(file_name)
|
||||
return schema
|
||||
|
||||
|
||||
@cache
|
||||
def load_mdict_name_conversion(target):
|
||||
file_name = os.path.join(
|
||||
"mdict", "name_conversion", f"{target.value}.json")
|
||||
file_name = os.path.join("mdict", "name_conversion", f"{target.value}.json")
|
||||
data = __load_json(file_name)
|
||||
return data
|
||||
|
||||
|
@ -134,8 +131,7 @@ def __load_adobe_glyphs():
|
|||
data[code].append(character)
|
||||
else:
|
||||
data[code] = [character]
|
||||
file_name = os.path.join(
|
||||
"entries", "adobe", "Adobe-Japan1_sequences.txt")
|
||||
file_name = os.path.join("entries", "adobe", "Adobe-Japan1_sequences.txt")
|
||||
data = {}
|
||||
__load_csv(file_name, loader, data, delim=';')
|
||||
return data
|
||||
|
@ -143,8 +139,7 @@ def __load_adobe_glyphs():
|
|||
|
||||
@cache
|
||||
def __load_override_adobe_glyphs():
|
||||
file_name = os.path.join(
|
||||
"entries", "adobe", "override_glyphs.json")
|
||||
file_name = os.path.join("entries", "adobe", "override_glyphs.json")
|
||||
json_data = __load_json(file_name)
|
||||
data = {}
|
||||
for key, val in json_data.items():
|
||||
|
|
|
@ -1,60 +0,0 @@
|
|||
from abc import abstractmethod
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
from bot.entries.base.entry import Entry
|
||||
import bot.entries.base.expressions as Expressions
|
||||
|
||||
|
||||
class SanseidoEntry(Entry):
|
||||
def set_page(self, page):
|
||||
page = self._decompose_subentries(page)
|
||||
self._page = page
|
||||
|
||||
def get_page_soup(self):
|
||||
soup = BeautifulSoup(self._page, "xml")
|
||||
return soup
|
||||
|
||||
def get_global_identifier(self):
|
||||
parent_part = format(self.entry_id[0], '06')
|
||||
child_part = hex(self.entry_id[1]).lstrip('0x').zfill(4).upper()
|
||||
return f"@{self.target.value}-{parent_part}-{child_part}"
|
||||
|
||||
def _decompose_subentries(self, page):
|
||||
soup = BeautifulSoup(page, features="xml")
|
||||
for x in self._get_subentry_parameters():
|
||||
subentry_class, tags, subentry_list = x
|
||||
for tag in tags:
|
||||
tag_soup = soup.find(tag)
|
||||
while tag_soup is not None:
|
||||
tag_soup.name = "項目"
|
||||
subentry_id = self.id_string_to_entry_id(tag_soup.attrs["id"])
|
||||
self.SUBENTRY_ID_TO_ENTRY_ID[subentry_id] = self.entry_id
|
||||
subentry = subentry_class(self.target, subentry_id)
|
||||
page = tag_soup.decode()
|
||||
subentry.set_page(page)
|
||||
subentry_list.append(subentry)
|
||||
tag_soup.decompose()
|
||||
tag_soup = soup.find(tag)
|
||||
return soup.decode()
|
||||
|
||||
@abstractmethod
|
||||
def _get_subentry_parameters(self):
|
||||
raise NotImplementedError
|
||||
|
||||
def _add_variant_expressions(self, headwords):
|
||||
for expressions in headwords.values():
|
||||
Expressions.add_variant_kanji(expressions)
|
||||
Expressions.add_fullwidth(expressions)
|
||||
Expressions.remove_iteration_mark(expressions)
|
||||
Expressions.add_iteration_mark(expressions)
|
||||
|
||||
@staticmethod
|
||||
def id_string_to_entry_id(id_string):
|
||||
parts = id_string.split("-")
|
||||
if len(parts) == 1:
|
||||
return (int(parts[0]), 0)
|
||||
elif len(parts) == 2:
|
||||
# subentries have a hexadecimal part
|
||||
return (int(parts[0]), int(parts[1], 16))
|
||||
else:
|
||||
raise Exception(f"Invalid entry ID: {id_string}")
|
231
bot/entries/daijirin2.py
Normal file
231
bot/entries/daijirin2.py
Normal file
|
@ -0,0 +1,231 @@
|
|||
from bs4 import BeautifulSoup
|
||||
|
||||
import bot.entries.expressions as Expressions
|
||||
import bot.soup as Soup
|
||||
from bot.data import load_daijirin2_phrase_readings
|
||||
from bot.data import load_daijirin2_kana_abbreviations
|
||||
from bot.entries.entry import Entry
|
||||
from bot.entries.daijirin2_preprocess import preprocess_page
|
||||
|
||||
|
||||
class _BaseDaijirin2Entry(Entry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.children = []
|
||||
self.phrases = []
|
||||
self._kana_abbreviations = load_daijirin2_kana_abbreviations()
|
||||
|
||||
def get_global_identifier(self):
|
||||
parent_part = format(self.entry_id[0], '06')
|
||||
child_part = hex(self.entry_id[1]).lstrip('0x').zfill(4).upper()
|
||||
return f"@{self.target.value}-{parent_part}-{child_part}"
|
||||
|
||||
def set_page(self, page):
|
||||
page = self.__decompose_subentries(page)
|
||||
self._page = page
|
||||
|
||||
def get_page_soup(self):
|
||||
soup = BeautifulSoup(self._page, "xml")
|
||||
return soup
|
||||
|
||||
def get_part_of_speech_tags(self):
|
||||
if self._part_of_speech_tags is not None:
|
||||
return self._part_of_speech_tags
|
||||
self._part_of_speech_tags = []
|
||||
soup = self.get_page_soup()
|
||||
for pos_group in soup.find_all("品詞G"):
|
||||
if pos_group.parent.name == "大語義":
|
||||
self._set_part_of_speech_tags(pos_group)
|
||||
return self._part_of_speech_tags
|
||||
|
||||
def _set_part_of_speech_tags(self, el):
|
||||
pos_names = ["品詞", "品詞活用", "品詞行", "用法"]
|
||||
for child in el.children:
|
||||
if child.name is not None:
|
||||
self._set_part_of_speech_tags(child)
|
||||
continue
|
||||
pos = str(child)
|
||||
if el.name not in pos_names:
|
||||
continue
|
||||
elif pos in ["[", "]"]:
|
||||
continue
|
||||
elif pos in self._part_of_speech_tags:
|
||||
continue
|
||||
else:
|
||||
self._part_of_speech_tags.append(pos)
|
||||
|
||||
def _get_regular_headwords(self, soup):
|
||||
self._fill_alts(soup)
|
||||
reading = soup.find("見出仮名").text
|
||||
expressions = []
|
||||
for el in soup.find_all("標準表記"):
|
||||
expression = self._clean_expression(el.text)
|
||||
if "—" in expression:
|
||||
kana_abbrs = self._kana_abbreviations[self.entry_id]
|
||||
for abbr in kana_abbrs:
|
||||
expression = expression.replace("—", abbr, 1)
|
||||
expressions.append(expression)
|
||||
expressions = Expressions.expand_abbreviation_list(expressions)
|
||||
if len(expressions) == 0:
|
||||
expressions.append(reading)
|
||||
headwords = {reading: expressions}
|
||||
return headwords
|
||||
|
||||
def _add_variant_expressions(self, headwords):
|
||||
for expressions in headwords.values():
|
||||
Expressions.add_variant_kanji(expressions)
|
||||
Expressions.add_fullwidth(expressions)
|
||||
Expressions.remove_iteration_mark(expressions)
|
||||
Expressions.add_iteration_mark(expressions)
|
||||
|
||||
def __decompose_subentries(self, page):
|
||||
soup = BeautifulSoup(page, features="xml")
|
||||
subentry_parameters = [
|
||||
[Daijirin2ChildEntry, ["子項目"], self.children],
|
||||
[Daijirin2PhraseEntry, ["句項目"], self.phrases],
|
||||
]
|
||||
for x in subentry_parameters:
|
||||
subentry_class, tags, subentry_list = x
|
||||
for tag in tags:
|
||||
tag_soup = soup.find(tag)
|
||||
while tag_soup is not None:
|
||||
tag_soup.name = "項目"
|
||||
subentry_id = self.id_string_to_entry_id(tag_soup.attrs["id"])
|
||||
self.SUBENTRY_ID_TO_ENTRY_ID[subentry_id] = self.entry_id
|
||||
subentry = subentry_class(self.target, subentry_id)
|
||||
page = tag_soup.decode()
|
||||
subentry.set_page(page)
|
||||
subentry_list.append(subentry)
|
||||
tag_soup.decompose()
|
||||
tag_soup = soup.find(tag)
|
||||
return soup.decode()
|
||||
|
||||
@staticmethod
|
||||
def id_string_to_entry_id(id_string):
|
||||
parts = id_string.split("-")
|
||||
if len(parts) == 1:
|
||||
return (int(parts[0]), 0)
|
||||
elif len(parts) == 2:
|
||||
# subentries have a hexadecimal part
|
||||
return (int(parts[0]), int(parts[1], 16))
|
||||
else:
|
||||
raise Exception(f"Invalid entry ID: {id_string}")
|
||||
|
||||
@staticmethod
|
||||
def _delete_unused_nodes(soup):
|
||||
"""Remove extra markup elements that appear in the entry
|
||||
headword line which are not part of the entry headword"""
|
||||
unused_nodes = [
|
||||
"漢字音logo", "活用分節", "連語句活用分節", "語構成",
|
||||
"表外字マーク", "表外字マーク", "ルビG"
|
||||
]
|
||||
for name in unused_nodes:
|
||||
Soup.delete_soup_nodes(soup, name)
|
||||
|
||||
@staticmethod
|
||||
def _clean_expression(expression):
|
||||
for x in ["〈", "〉", "《", "》", " "]:
|
||||
expression = expression.replace(x, "")
|
||||
return expression
|
||||
|
||||
@staticmethod
|
||||
def _fill_alts(soup):
|
||||
for gaiji in soup.find_all(class_="gaiji"):
|
||||
if gaiji.name == "img" and gaiji.has_attr("alt"):
|
||||
gaiji.name = "span"
|
||||
gaiji.string = gaiji.attrs["alt"]
|
||||
|
||||
|
||||
class Daijirin2Entry(_BaseDaijirin2Entry):
|
||||
def __init__(self, target, page_id):
|
||||
entry_id = (page_id, 0)
|
||||
super().__init__(target, entry_id)
|
||||
|
||||
def set_page(self, page):
|
||||
page = preprocess_page(page)
|
||||
super().set_page(page)
|
||||
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
self._delete_unused_nodes(soup)
|
||||
if soup.find("漢字見出") is not None:
|
||||
headwords = self._get_kanji_headwords(soup)
|
||||
elif soup.find("略語G") is not None:
|
||||
headwords = self._get_acronym_headwords(soup)
|
||||
else:
|
||||
headwords = self._get_regular_headwords(soup)
|
||||
return headwords
|
||||
|
||||
def _get_kanji_headwords(self, soup):
|
||||
readings = []
|
||||
for el in soup.find_all("漢字音"):
|
||||
hira = Expressions.kata_to_hira(el.text)
|
||||
readings.append(hira)
|
||||
if soup.find("漢字音") is None:
|
||||
readings.append("")
|
||||
expressions = []
|
||||
for el in soup.find_all("漢字見出"):
|
||||
expressions.append(el.text)
|
||||
headwords = {}
|
||||
for reading in readings:
|
||||
headwords[reading] = expressions
|
||||
return headwords
|
||||
|
||||
def _get_acronym_headwords(self, soup):
|
||||
expressions = []
|
||||
for el in soup.find_all("略語"):
|
||||
expression_parts = []
|
||||
for part in el.find_all(["欧字", "和字"]):
|
||||
expression_parts.append(part.text)
|
||||
expression = "".join(expression_parts)
|
||||
expressions.append(expression)
|
||||
headwords = {"": expressions}
|
||||
return headwords
|
||||
|
||||
|
||||
class Daijirin2ChildEntry(_BaseDaijirin2Entry):
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
self._delete_unused_nodes(soup)
|
||||
headwords = self._get_regular_headwords(soup)
|
||||
return headwords
|
||||
|
||||
|
||||
class Daijirin2PhraseEntry(_BaseDaijirin2Entry):
|
||||
def get_part_of_speech_tags(self):
|
||||
# phrases do not contain these tags
|
||||
return []
|
||||
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
headwords = {}
|
||||
expressions = self._find_expressions(soup)
|
||||
readings = self._find_readings()
|
||||
for idx, expression in enumerate(expressions):
|
||||
reading = readings[idx]
|
||||
if reading in headwords:
|
||||
headwords[reading].append(expression)
|
||||
else:
|
||||
headwords[reading] = [expression]
|
||||
return headwords
|
||||
|
||||
def _find_expressions(self, soup):
|
||||
self._delete_unused_nodes(soup)
|
||||
text = soup.find("句表記").text
|
||||
text = self._clean_expression(text)
|
||||
alternatives = Expressions.expand_daijirin_alternatives(text)
|
||||
expressions = []
|
||||
for alt in alternatives:
|
||||
for exp in Expressions.expand_abbreviation(alt):
|
||||
expressions.append(exp)
|
||||
return expressions
|
||||
|
||||
def _find_readings(self):
|
||||
phrase_readings = load_daijirin2_phrase_readings()
|
||||
text = phrase_readings[self.entry_id]
|
||||
alternatives = Expressions.expand_daijirin_alternatives(text)
|
||||
readings = []
|
||||
for alt in alternatives:
|
||||
for reading in Expressions.expand_abbreviation(alt):
|
||||
readings.append(reading)
|
||||
return readings
|
|
@ -1,88 +0,0 @@
|
|||
import bot.soup as Soup
|
||||
from bot.data import load_daijirin2_kana_abbreviations
|
||||
from bot.entries.base.sanseido_entry import SanseidoEntry
|
||||
import bot.entries.base.expressions as Expressions
|
||||
|
||||
|
||||
class BaseEntry(SanseidoEntry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.children = []
|
||||
self.phrases = []
|
||||
self._kana_abbreviations = load_daijirin2_kana_abbreviations()
|
||||
|
||||
def get_part_of_speech_tags(self):
|
||||
if self._part_of_speech_tags is not None:
|
||||
return self._part_of_speech_tags
|
||||
self._part_of_speech_tags = []
|
||||
soup = self.get_page_soup()
|
||||
for pos_group in soup.find_all("品詞G"):
|
||||
if pos_group.parent.name == "大語義":
|
||||
self._set_part_of_speech_tags(pos_group)
|
||||
return self._part_of_speech_tags
|
||||
|
||||
def _set_part_of_speech_tags(self, el):
|
||||
pos_names = ["品詞", "品詞活用", "品詞行", "用法"]
|
||||
for child in el.children:
|
||||
if child.name is not None:
|
||||
self._set_part_of_speech_tags(child)
|
||||
continue
|
||||
pos = str(child)
|
||||
if el.name not in pos_names:
|
||||
continue
|
||||
elif pos in ["[", "]"]:
|
||||
continue
|
||||
elif pos in self._part_of_speech_tags:
|
||||
continue
|
||||
else:
|
||||
self._part_of_speech_tags.append(pos)
|
||||
|
||||
def _get_regular_headwords(self, soup):
|
||||
self._fill_alts(soup)
|
||||
reading = soup.find("見出仮名").text
|
||||
expressions = []
|
||||
for el in soup.find_all("標準表記"):
|
||||
expression = self._clean_expression(el.text)
|
||||
if "—" in expression:
|
||||
kana_abbrs = self._kana_abbreviations[self.entry_id]
|
||||
for abbr in kana_abbrs:
|
||||
expression = expression.replace("—", abbr, 1)
|
||||
expressions.append(expression)
|
||||
expressions = Expressions.expand_abbreviation_list(expressions)
|
||||
if len(expressions) == 0:
|
||||
expressions.append(reading)
|
||||
headwords = {reading: expressions}
|
||||
return headwords
|
||||
|
||||
def _get_subentry_parameters(self):
|
||||
from bot.entries.daijirin2.child_entry import ChildEntry
|
||||
from bot.entries.daijirin2.phrase_entry import PhraseEntry
|
||||
subentry_parameters = [
|
||||
[ChildEntry, ["子項目"], self.children],
|
||||
[PhraseEntry, ["句項目"], self.phrases],
|
||||
]
|
||||
return subentry_parameters
|
||||
|
||||
@staticmethod
|
||||
def _delete_unused_nodes(soup):
|
||||
"""Remove extra markup elements that appear in the entry
|
||||
headword line which are not part of the entry headword"""
|
||||
unused_nodes = [
|
||||
"漢字音logo", "活用分節", "連語句活用分節", "語構成",
|
||||
"表外字マーク", "表外字マーク", "ルビG"
|
||||
]
|
||||
for name in unused_nodes:
|
||||
Soup.delete_soup_nodes(soup, name)
|
||||
|
||||
@staticmethod
|
||||
def _clean_expression(expression):
|
||||
for x in ["〈", "〉", "《", "》", " "]:
|
||||
expression = expression.replace(x, "")
|
||||
return expression
|
||||
|
||||
@staticmethod
|
||||
def _fill_alts(soup):
|
||||
for gaiji in soup.find_all(class_="gaiji"):
|
||||
if gaiji.name == "img" and gaiji.has_attr("alt"):
|
||||
gaiji.name = "span"
|
||||
gaiji.string = gaiji.attrs["alt"]
|
|
@ -1,9 +0,0 @@
|
|||
from bot.entries.daijirin2.base_entry import BaseEntry
|
||||
|
||||
|
||||
class ChildEntry(BaseEntry):
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
self._delete_unused_nodes(soup)
|
||||
headwords = self._get_regular_headwords(soup)
|
||||
return headwords
|
|
@ -1,50 +0,0 @@
|
|||
import bot.entries.base.expressions as Expressions
|
||||
from bot.entries.daijirin2.base_entry import BaseEntry
|
||||
from bot.entries.daijirin2.preprocess import preprocess_page
|
||||
|
||||
|
||||
class Entry(BaseEntry):
|
||||
def __init__(self, target, page_id):
|
||||
entry_id = (page_id, 0)
|
||||
super().__init__(target, entry_id)
|
||||
|
||||
def set_page(self, page):
|
||||
page = preprocess_page(page)
|
||||
super().set_page(page)
|
||||
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
self._delete_unused_nodes(soup)
|
||||
if soup.find("漢字見出") is not None:
|
||||
headwords = self._get_kanji_headwords(soup)
|
||||
elif soup.find("略語G") is not None:
|
||||
headwords = self._get_acronym_headwords(soup)
|
||||
else:
|
||||
headwords = self._get_regular_headwords(soup)
|
||||
return headwords
|
||||
|
||||
def _get_kanji_headwords(self, soup):
|
||||
readings = []
|
||||
for el in soup.find_all("漢字音"):
|
||||
hira = Expressions.kata_to_hira(el.text)
|
||||
readings.append(hira)
|
||||
if soup.find("漢字音") is None:
|
||||
readings.append("")
|
||||
expressions = []
|
||||
for el in soup.find_all("漢字見出"):
|
||||
expressions.append(el.text)
|
||||
headwords = {}
|
||||
for reading in readings:
|
||||
headwords[reading] = expressions
|
||||
return headwords
|
||||
|
||||
def _get_acronym_headwords(self, soup):
|
||||
expressions = []
|
||||
for el in soup.find_all("略語"):
|
||||
expression_parts = []
|
||||
for part in el.find_all(["欧字", "和字"]):
|
||||
expression_parts.append(part.text)
|
||||
expression = "".join(expression_parts)
|
||||
expressions.append(expression)
|
||||
headwords = {"": expressions}
|
||||
return headwords
|
|
@ -1,67 +0,0 @@
|
|||
import re
|
||||
|
||||
import bot.entries.base.expressions as Expressions
|
||||
from bot.data import load_phrase_readings
|
||||
from bot.entries.daijirin2.base_entry import BaseEntry
|
||||
|
||||
|
||||
class PhraseEntry(BaseEntry):
|
||||
def get_part_of_speech_tags(self):
|
||||
# phrases do not contain these tags
|
||||
return []
|
||||
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
headwords = {}
|
||||
expressions = self._find_expressions(soup)
|
||||
readings = self._find_readings()
|
||||
for idx, expression in enumerate(expressions):
|
||||
reading = readings[idx]
|
||||
if reading in headwords:
|
||||
headwords[reading].append(expression)
|
||||
else:
|
||||
headwords[reading] = [expression]
|
||||
return headwords
|
||||
|
||||
def _find_expressions(self, soup):
|
||||
self._delete_unused_nodes(soup)
|
||||
text = soup.find("句表記").text
|
||||
text = self._clean_expression(text)
|
||||
alternatives = parse_phrase(text)
|
||||
expressions = []
|
||||
for alt in alternatives:
|
||||
for exp in Expressions.expand_abbreviation(alt):
|
||||
expressions.append(exp)
|
||||
return expressions
|
||||
|
||||
def _find_readings(self):
|
||||
phrase_readings = load_phrase_readings(self.target)
|
||||
text = phrase_readings[self.entry_id]
|
||||
alternatives = parse_phrase(text)
|
||||
readings = []
|
||||
for alt in alternatives:
|
||||
for reading in Expressions.expand_abbreviation(alt):
|
||||
readings.append(reading)
|
||||
return readings
|
||||
|
||||
|
||||
def parse_phrase(text):
|
||||
"""Return a list of strings described by = notation."""
|
||||
group_pattern = r"([^=]+)(=([^(]+)(=([^(]+)))?"
|
||||
groups = re.findall(group_pattern, text)
|
||||
expressions = [""]
|
||||
for group in groups:
|
||||
new_exps = []
|
||||
for expression in expressions:
|
||||
new_exps.append(expression + group[0])
|
||||
expressions = new_exps.copy()
|
||||
if group[1] == "":
|
||||
continue
|
||||
new_exps = []
|
||||
for expression in expressions:
|
||||
new_exps.append(expression + group[2])
|
||||
for expression in expressions:
|
||||
for alt in group[3].split("・"):
|
||||
new_exps.append(expression + alt)
|
||||
expressions = new_exps.copy()
|
||||
return expressions
|
|
@ -18,15 +18,15 @@ class Entry(ABC):
|
|||
|
||||
@abstractmethod
|
||||
def get_global_identifier(self):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def set_page(self, page):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_page_soup(self):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
def get_headwords(self):
|
||||
if self._headwords is not None:
|
||||
|
@ -38,15 +38,15 @@ class Entry(ABC):
|
|||
|
||||
@abstractmethod
|
||||
def _get_headwords(self):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def _add_variant_expressions(self, headwords):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def get_part_of_speech_tags(self):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
def get_parent(self):
|
||||
if self.entry_id in self.SUBENTRY_ID_TO_ENTRY_ID:
|
|
@ -31,14 +31,11 @@ def add_fullwidth(expressions):
|
|||
|
||||
def add_variant_kanji(expressions):
|
||||
variant_kanji = load_variant_kanji()
|
||||
for kyuuji, shinji in variant_kanji.items():
|
||||
for old_kanji, new_kanji in variant_kanji.items():
|
||||
new_exps = []
|
||||
for expression in expressions:
|
||||
if kyuuji in expression:
|
||||
new_exp = expression.replace(kyuuji, shinji)
|
||||
new_exps.append(new_exp)
|
||||
if shinji in expression:
|
||||
new_exp = expression.replace(shinji, kyuuji)
|
||||
if old_kanji in expression:
|
||||
new_exp = expression.replace(old_kanji, new_kanji)
|
||||
new_exps.append(new_exp)
|
||||
for new_exp in new_exps:
|
||||
if new_exp not in expressions:
|
||||
|
@ -88,3 +85,40 @@ def expand_abbreviation_list(expressions):
|
|||
if new_exp not in new_exps:
|
||||
new_exps.append(new_exp)
|
||||
return new_exps
|
||||
|
||||
|
||||
def expand_smk_alternatives(text):
|
||||
"""Return a list of strings described by △ notation."""
|
||||
m = re.search(r"△([^(]+)(([^(]+))", text)
|
||||
if m is None:
|
||||
return [text]
|
||||
alt_parts = [m.group(1)]
|
||||
for alt_part in m.group(2).split("・"):
|
||||
alt_parts.append(alt_part)
|
||||
alts = []
|
||||
for alt_part in alt_parts:
|
||||
alt_exp = re.sub(r"△[^(]+([^(]+)", alt_part, text)
|
||||
alts.append(alt_exp)
|
||||
return alts
|
||||
|
||||
|
||||
def expand_daijirin_alternatives(text):
|
||||
"""Return a list of strings described by = notation."""
|
||||
group_pattern = r"([^=]+)(=([^(]+)(=([^(]+)))?"
|
||||
groups = re.findall(group_pattern, text)
|
||||
expressions = [""]
|
||||
for group in groups:
|
||||
new_exps = []
|
||||
for expression in expressions:
|
||||
new_exps.append(expression + group[0])
|
||||
expressions = new_exps.copy()
|
||||
if group[1] == "":
|
||||
continue
|
||||
new_exps = []
|
||||
for expression in expressions:
|
||||
new_exps.append(expression + group[2])
|
||||
for expression in expressions:
|
||||
for alt in group[3].split("・"):
|
||||
new_exps.append(expression + alt)
|
||||
expressions = new_exps.copy()
|
||||
return expressions
|
18
bot/entries/factory.py
Normal file
18
bot/entries/factory.py
Normal file
|
@ -0,0 +1,18 @@
|
|||
from bot.targets import Targets
|
||||
|
||||
from bot.entries.jitenon import JitenonKokugoEntry
|
||||
from bot.entries.jitenon import JitenonYojiEntry
|
||||
from bot.entries.jitenon import JitenonKotowazaEntry
|
||||
from bot.entries.smk8 import Smk8Entry
|
||||
from bot.entries.daijirin2 import Daijirin2Entry
|
||||
|
||||
|
||||
def new_entry(target, page_id):
|
||||
entry_map = {
|
||||
Targets.JITENON_KOKUGO: JitenonKokugoEntry,
|
||||
Targets.JITENON_YOJI: JitenonYojiEntry,
|
||||
Targets.JITENON_KOTOWAZA: JitenonKotowazaEntry,
|
||||
Targets.SMK8: Smk8Entry,
|
||||
Targets.DAIJIRIN2: Daijirin2Entry,
|
||||
}
|
||||
return entry_map[target](target, page_id)
|
|
@ -3,11 +3,11 @@ from abc import abstractmethod
|
|||
from datetime import datetime, date
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
from bot.entries.base.entry import Entry
|
||||
import bot.entries.base.expressions as Expressions
|
||||
from bot.entries.entry import Entry
|
||||
import bot.entries.expressions as Expressions
|
||||
|
||||
|
||||
class JitenonEntry(Entry):
|
||||
class _JitenonEntry(Entry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.expression = ""
|
||||
|
@ -58,7 +58,7 @@ class JitenonEntry(Entry):
|
|||
|
||||
@abstractmethod
|
||||
def _get_column_map(self):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
def __set_modified_date(self, page):
|
||||
m = re.search(r"\"dateModified\": \"(\d{4}-\d{2}-\d{2})", page)
|
||||
|
@ -140,3 +140,104 @@ class JitenonEntry(Entry):
|
|||
elif isinstance(attr_val, list):
|
||||
colvals.append(";".join(attr_val))
|
||||
return ",".join(colvals)
|
||||
|
||||
|
||||
class JitenonYojiEntry(_JitenonEntry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.origin = ""
|
||||
self.kanken_level = ""
|
||||
self.category = ""
|
||||
self.related_expressions = []
|
||||
|
||||
def _get_column_map(self):
|
||||
return {
|
||||
"四字熟語": "expression",
|
||||
"読み方": "yomikata",
|
||||
"意味": "definition",
|
||||
"異形": "other_forms",
|
||||
"出典": "origin",
|
||||
"漢検級": "kanken_level",
|
||||
"場面用途": "category",
|
||||
"類義語": "related_expressions",
|
||||
}
|
||||
|
||||
def _add_variant_expressions(self, headwords):
|
||||
for expressions in headwords.values():
|
||||
Expressions.add_variant_kanji(expressions)
|
||||
|
||||
|
||||
class JitenonKotowazaEntry(_JitenonEntry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.origin = ""
|
||||
self.example = ""
|
||||
self.related_expressions = []
|
||||
|
||||
def _get_column_map(self):
|
||||
return {
|
||||
"言葉": "expression",
|
||||
"読み方": "yomikata",
|
||||
"意味": "definition",
|
||||
"異形": "other_forms",
|
||||
"出典": "origin",
|
||||
"例文": "example",
|
||||
"類句": "related_expressions",
|
||||
}
|
||||
|
||||
def _get_headwords(self):
|
||||
if self.expression == "金棒引き・鉄棒引き":
|
||||
headwords = {
|
||||
"かなぼうひき": ["金棒引き", "鉄棒引き"]
|
||||
}
|
||||
else:
|
||||
headwords = super()._get_headwords()
|
||||
return headwords
|
||||
|
||||
def _add_variant_expressions(self, headwords):
|
||||
for expressions in headwords.values():
|
||||
Expressions.add_variant_kanji(expressions)
|
||||
Expressions.add_fullwidth(expressions)
|
||||
|
||||
|
||||
class JitenonKokugoEntry(_JitenonEntry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.example = ""
|
||||
self.alt_expression = ""
|
||||
self.antonym = ""
|
||||
self.attachments = ""
|
||||
self.compounds = ""
|
||||
self.related_words = ""
|
||||
|
||||
def _get_column_map(self):
|
||||
return {
|
||||
"言葉": "expression",
|
||||
"読み方": "yomikata",
|
||||
"意味": "definition",
|
||||
"例文": "example",
|
||||
"別表記": "alt_expression",
|
||||
"対義語": "antonym",
|
||||
"活用": "attachments",
|
||||
"用例": "compounds",
|
||||
"類語": "related_words",
|
||||
}
|
||||
|
||||
def _get_headwords(self):
|
||||
headwords = {}
|
||||
for reading in self.yomikata.split("・"):
|
||||
if reading not in headwords:
|
||||
headwords[reading] = []
|
||||
for expression in self.expression.split("・"):
|
||||
headwords[reading].append(expression)
|
||||
if self.alt_expression.strip() != "":
|
||||
for expression in self.alt_expression.split("・"):
|
||||
headwords[reading].append(expression)
|
||||
return headwords
|
||||
|
||||
def _add_variant_expressions(self, headwords):
|
||||
for expressions in headwords.values():
|
||||
Expressions.add_variant_kanji(expressions)
|
||||
Expressions.add_fullwidth(expressions)
|
||||
Expressions.remove_iteration_mark(expressions)
|
||||
Expressions.add_iteration_mark(expressions)
|
|
@ -1,45 +0,0 @@
|
|||
from bot.entries.base.jitenon_entry import JitenonEntry
|
||||
import bot.entries.base.expressions as Expressions
|
||||
|
||||
|
||||
class Entry(JitenonEntry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.example = ""
|
||||
self.alt_expression = ""
|
||||
self.antonym = ""
|
||||
self.attachments = ""
|
||||
self.compounds = ""
|
||||
self.related_words = ""
|
||||
|
||||
def _get_column_map(self):
|
||||
return {
|
||||
"言葉": "expression",
|
||||
"読み方": "yomikata",
|
||||
"意味": "definition",
|
||||
"例文": "example",
|
||||
"別表記": "alt_expression",
|
||||
"対義語": "antonym",
|
||||
"活用": "attachments",
|
||||
"用例": "compounds",
|
||||
"類語": "related_words",
|
||||
}
|
||||
|
||||
def _get_headwords(self):
|
||||
headwords = {}
|
||||
for reading in self.yomikata.split("・"):
|
||||
if reading not in headwords:
|
||||
headwords[reading] = []
|
||||
for expression in self.expression.split("・"):
|
||||
headwords[reading].append(expression)
|
||||
if self.alt_expression.strip() != "":
|
||||
for expression in self.alt_expression.split("・"):
|
||||
headwords[reading].append(expression)
|
||||
return headwords
|
||||
|
||||
def _add_variant_expressions(self, headwords):
|
||||
for expressions in headwords.values():
|
||||
Expressions.add_variant_kanji(expressions)
|
||||
Expressions.add_fullwidth(expressions)
|
||||
Expressions.remove_iteration_mark(expressions)
|
||||
Expressions.add_iteration_mark(expressions)
|
|
@ -1,35 +0,0 @@
|
|||
from bot.entries.base.jitenon_entry import JitenonEntry
|
||||
import bot.entries.base.expressions as Expressions
|
||||
|
||||
|
||||
class Entry(JitenonEntry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.origin = ""
|
||||
self.example = ""
|
||||
self.related_expressions = []
|
||||
|
||||
def _get_column_map(self):
|
||||
return {
|
||||
"言葉": "expression",
|
||||
"読み方": "yomikata",
|
||||
"意味": "definition",
|
||||
"異形": "other_forms",
|
||||
"出典": "origin",
|
||||
"例文": "example",
|
||||
"類句": "related_expressions",
|
||||
}
|
||||
|
||||
def _get_headwords(self):
|
||||
if self.expression == "金棒引き・鉄棒引き":
|
||||
headwords = {
|
||||
"かなぼうひき": ["金棒引き", "鉄棒引き"]
|
||||
}
|
||||
else:
|
||||
headwords = super()._get_headwords()
|
||||
return headwords
|
||||
|
||||
def _add_variant_expressions(self, headwords):
|
||||
for expressions in headwords.values():
|
||||
Expressions.add_variant_kanji(expressions)
|
||||
Expressions.add_fullwidth(expressions)
|
|
@ -1,27 +0,0 @@
|
|||
import bot.entries.base.expressions as Expressions
|
||||
from bot.entries.base.jitenon_entry import JitenonEntry
|
||||
|
||||
|
||||
class Entry(JitenonEntry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.origin = ""
|
||||
self.kanken_level = ""
|
||||
self.category = ""
|
||||
self.related_expressions = []
|
||||
|
||||
def _get_column_map(self):
|
||||
return {
|
||||
"四字熟語": "expression",
|
||||
"読み方": "yomikata",
|
||||
"意味": "definition",
|
||||
"異形": "other_forms",
|
||||
"出典": "origin",
|
||||
"漢検級": "kanken_level",
|
||||
"場面用途": "category",
|
||||
"類義語": "related_expressions",
|
||||
}
|
||||
|
||||
def _add_variant_expressions(self, headwords):
|
||||
for expressions in headwords.values():
|
||||
Expressions.add_variant_kanji(expressions)
|
|
@ -1,104 +0,0 @@
|
|||
import bot.soup as Soup
|
||||
from bot.entries.base.sanseido_entry import SanseidoEntry
|
||||
from bot.entries.sankoku8.parse import parse_hyouki_soup
|
||||
|
||||
|
||||
class BaseEntry(SanseidoEntry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.children = []
|
||||
self.phrases = []
|
||||
self._hyouki_name = "表記"
|
||||
self._midashi_name = None
|
||||
self._midashi_kana_name = None
|
||||
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
self._delete_unused_nodes(soup)
|
||||
readings = self._find_readings(soup)
|
||||
expressions = self._find_expressions(soup)
|
||||
headwords = {}
|
||||
for reading in readings:
|
||||
headwords[reading] = []
|
||||
if len(readings) == 1:
|
||||
reading = readings[0]
|
||||
if soup.find(self._midashi_name).find(self._hyouki_name) is None:
|
||||
headwords[reading].append(reading)
|
||||
for exp in expressions:
|
||||
if exp not in headwords[reading]:
|
||||
headwords[reading].append(exp)
|
||||
elif len(readings) > 1 and len(expressions) == 0:
|
||||
for reading in readings:
|
||||
headwords[reading].append(reading)
|
||||
elif len(readings) > 1 and len(expressions) == 1:
|
||||
if soup.find(self._midashi_name).find(self._hyouki_name) is None:
|
||||
for reading in readings:
|
||||
headwords[reading].append(reading)
|
||||
expression = expressions[0]
|
||||
for reading in readings:
|
||||
if expression not in headwords[reading]:
|
||||
headwords[reading].append(expression)
|
||||
elif len(readings) > 1 and len(expressions) == len(readings):
|
||||
if soup.find(self._midashi_name).find(self._hyouki_name) is None:
|
||||
for reading in readings:
|
||||
headwords[reading].append(reading)
|
||||
for idx, reading in enumerate(readings):
|
||||
exp = expressions[idx]
|
||||
if exp not in headwords[reading]:
|
||||
headwords[reading].append(exp)
|
||||
else:
|
||||
raise Exception() # shouldn't happen
|
||||
return headwords
|
||||
|
||||
def get_part_of_speech_tags(self):
|
||||
if self._part_of_speech_tags is not None:
|
||||
return self._part_of_speech_tags
|
||||
self._part_of_speech_tags = []
|
||||
soup = self.get_page_soup()
|
||||
for midashi in soup.find_all([self._midashi_name, "見出部要素"]):
|
||||
pos_group = midashi.find("品詞G")
|
||||
if pos_group is None:
|
||||
continue
|
||||
for tag in pos_group.find_all("a"):
|
||||
if tag.text not in self._part_of_speech_tags:
|
||||
self._part_of_speech_tags.append(tag.text)
|
||||
return self._part_of_speech_tags
|
||||
|
||||
def _find_expressions(self, soup):
|
||||
expressions = []
|
||||
for hyouki in soup.find_all(self._hyouki_name):
|
||||
self._fill_alts(hyouki)
|
||||
for expression in parse_hyouki_soup(hyouki, [""]):
|
||||
expressions.append(expression)
|
||||
return expressions
|
||||
|
||||
def _find_readings(self, soup):
|
||||
midasi_kana = soup.find(self._midashi_kana_name)
|
||||
readings = parse_hyouki_soup(midasi_kana, [""])
|
||||
return readings
|
||||
|
||||
def _get_subentry_parameters(self):
|
||||
from bot.entries.sankoku8.child_entry import ChildEntry
|
||||
from bot.entries.sankoku8.phrase_entry import PhraseEntry
|
||||
subentry_parameters = [
|
||||
[ChildEntry, ["子項目"], self.children],
|
||||
[PhraseEntry, ["句項目"], self.phrases],
|
||||
]
|
||||
return subentry_parameters
|
||||
|
||||
@staticmethod
|
||||
def _delete_unused_nodes(soup):
|
||||
"""Remove extra markup elements that appear in the entry
|
||||
headword line which are not part of the entry headword"""
|
||||
unused_nodes = [
|
||||
"語構成", "平板", "アクセント", "表外字マーク", "表外音訓マーク",
|
||||
"アクセント分節", "活用分節", "ルビG", "分書"
|
||||
]
|
||||
for name in unused_nodes:
|
||||
Soup.delete_soup_nodes(soup, name)
|
||||
|
||||
@staticmethod
|
||||
def _fill_alts(soup):
|
||||
for img in soup.find_all("img"):
|
||||
if img.has_attr("alt"):
|
||||
img.string = img.attrs["alt"]
|
|
@ -1,8 +0,0 @@
|
|||
from bot.entries.sankoku8.base_entry import BaseEntry
|
||||
|
||||
|
||||
class ChildEntry(BaseEntry):
|
||||
def __init__(self, target, page_id):
|
||||
super().__init__(target, page_id)
|
||||
self._midashi_name = "子見出部"
|
||||
self._midashi_kana_name = "子見出仮名"
|
|
@ -1,14 +0,0 @@
|
|||
from bot.entries.sankoku8.base_entry import BaseEntry
|
||||
from bot.entries.sankoku8.preprocess import preprocess_page
|
||||
|
||||
|
||||
class Entry(BaseEntry):
|
||||
def __init__(self, target, page_id):
|
||||
entry_id = (page_id, 0)
|
||||
super().__init__(target, entry_id)
|
||||
self._midashi_name = "見出部"
|
||||
self._midashi_kana_name = "見出仮名"
|
||||
|
||||
def set_page(self, page):
|
||||
page = preprocess_page(page)
|
||||
super().set_page(page)
|
|
@ -1,65 +0,0 @@
|
|||
from bs4 import BeautifulSoup
|
||||
|
||||
|
||||
def parse_hyouki_soup(soup, base_exps):
|
||||
omitted_characters = [
|
||||
"/", "〈", "〉", "(", ")", "⦅", "⦆", ":", "…"
|
||||
]
|
||||
exps = base_exps.copy()
|
||||
for child in soup.children:
|
||||
new_exps = []
|
||||
if child.name == "言換G":
|
||||
for alt in child.find_all("言換"):
|
||||
parts = parse_hyouki_soup(alt, [""])
|
||||
for exp in exps:
|
||||
for part in parts:
|
||||
new_exps.append(exp + part)
|
||||
elif child.name == "補足表記":
|
||||
alt1 = child.find("表記対象")
|
||||
alt2 = child.find("表記内容G")
|
||||
parts1 = parse_hyouki_soup(alt1, [""])
|
||||
parts2 = parse_hyouki_soup(alt2, [""])
|
||||
for exp in exps:
|
||||
for part in parts1:
|
||||
new_exps.append(exp + part)
|
||||
for part in parts2:
|
||||
new_exps.append(exp + part)
|
||||
elif child.name == "省略":
|
||||
parts = parse_hyouki_soup(child, [""])
|
||||
for exp in exps:
|
||||
new_exps.append(exp)
|
||||
for part in parts:
|
||||
new_exps.append(exp + part)
|
||||
elif child.name is not None:
|
||||
new_exps = parse_hyouki_soup(child, exps)
|
||||
else:
|
||||
text = child.text
|
||||
for char in omitted_characters:
|
||||
text = text.replace(char, "")
|
||||
for exp in exps:
|
||||
new_exps.append(exp + text)
|
||||
exps = new_exps.copy()
|
||||
return exps
|
||||
|
||||
|
||||
def parse_hyouki_pattern(pattern):
|
||||
replacements = {
|
||||
"(": "<省略>(",
|
||||
")": ")</省略>",
|
||||
"{": "<補足表記><表記対象>",
|
||||
"・": "</表記対象><表記内容G>(<表記内容>",
|
||||
"}": "</表記内容>)</表記内容G></補足表記>",
|
||||
"〈": "<言換G>〈<言換>",
|
||||
"/": "</言換>/<言換>",
|
||||
"〉": "</言換>〉</言換G>",
|
||||
"⦅": "<補足表記><表記対象>",
|
||||
"\": "</表記対象><表記内容G>⦅<表記内容>",
|
||||
"⦆": "</表記内容>⦆</表記内容G></補足表記>",
|
||||
}
|
||||
markup = f"<span>{pattern}</span>"
|
||||
for key, val in replacements.items():
|
||||
markup = markup.replace(key, val)
|
||||
soup = BeautifulSoup(markup, "xml")
|
||||
hyouki_soup = soup.find("span")
|
||||
exps = parse_hyouki_soup(hyouki_soup, [""])
|
||||
return exps
|
|
@ -1,37 +0,0 @@
|
|||
from bot.data import load_phrase_readings
|
||||
from bot.entries.sankoku8.base_entry import BaseEntry
|
||||
from bot.entries.sankoku8.parse import parse_hyouki_soup
|
||||
from bot.entries.sankoku8.parse import parse_hyouki_pattern
|
||||
|
||||
|
||||
class PhraseEntry(BaseEntry):
|
||||
def get_part_of_speech_tags(self):
|
||||
# phrases do not contain these tags
|
||||
return []
|
||||
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
self._delete_unused_nodes(soup)
|
||||
expressions = self._find_expressions(soup)
|
||||
readings = self._find_readings(soup)
|
||||
headwords = {}
|
||||
if len(expressions) != len(readings):
|
||||
raise Exception(f"{self.entry_id[0]}-{self.entry_id[1]}")
|
||||
for idx, expression in enumerate(expressions):
|
||||
reading = readings[idx]
|
||||
if reading in headwords:
|
||||
headwords[reading].append(expression)
|
||||
else:
|
||||
headwords[reading] = [expression]
|
||||
return headwords
|
||||
|
||||
def _find_expressions(self, soup):
|
||||
phrase_soup = soup.find("句表記")
|
||||
expressions = parse_hyouki_soup(phrase_soup, [""])
|
||||
return expressions
|
||||
|
||||
def _find_readings(self, soup):
|
||||
reading_patterns = load_phrase_readings(self.target)
|
||||
reading_pattern = reading_patterns[self.entry_id]
|
||||
readings = parse_hyouki_pattern(reading_pattern)
|
||||
return readings
|
|
@ -1,51 +0,0 @@
|
|||
import re
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
from bot.data import get_adobe_glyph
|
||||
|
||||
|
||||
__GAIJI = {
|
||||
"svg-gaiji/byan.svg": "𰻞",
|
||||
"svg-gaiji/G16EF.svg": "篡",
|
||||
}
|
||||
|
||||
|
||||
def preprocess_page(page):
|
||||
soup = BeautifulSoup(page, features="xml")
|
||||
__replace_glyph_codes(soup)
|
||||
__add_image_alt_text(soup)
|
||||
__replace_tatehyphen(soup)
|
||||
page = __strip_page(soup)
|
||||
return page
|
||||
|
||||
|
||||
def __replace_glyph_codes(soup):
|
||||
for el in soup.find_all("glyph"):
|
||||
m = re.search(r"^glyph:([0-9]+);?$", el.attrs["style"])
|
||||
code = int(m.group(1))
|
||||
for geta in el.find_all(string="〓"):
|
||||
glyph = get_adobe_glyph(code)
|
||||
geta.replace_with(glyph)
|
||||
|
||||
|
||||
def __add_image_alt_text(soup):
|
||||
for img in soup.find_all("img"):
|
||||
if not img.has_attr("src"):
|
||||
continue
|
||||
src = img.attrs["src"]
|
||||
if src in __GAIJI:
|
||||
img.attrs["alt"] = __GAIJI[src]
|
||||
|
||||
|
||||
def __replace_tatehyphen(soup):
|
||||
for img in soup.find_all("img", {"src": "svg-gaiji/tatehyphen.svg"}):
|
||||
img.string = "−"
|
||||
img.unwrap()
|
||||
|
||||
|
||||
def __strip_page(soup):
|
||||
koumoku = soup.find(["項目"])
|
||||
if koumoku is not None:
|
||||
return koumoku.decode()
|
||||
else:
|
||||
raise Exception(f"Primary 項目 not found in page:\n{soup.prettify()}")
|
221
bot/entries/smk8.py
Normal file
221
bot/entries/smk8.py
Normal file
|
@ -0,0 +1,221 @@
|
|||
from bs4 import BeautifulSoup
|
||||
|
||||
import bot.entries.expressions as Expressions
|
||||
import bot.soup as Soup
|
||||
from bot.data import load_smk8_phrase_readings
|
||||
from bot.entries.entry import Entry
|
||||
from bot.entries.smk8_preprocess import preprocess_page
|
||||
|
||||
|
||||
class _BaseSmk8Entry(Entry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.children = []
|
||||
self.phrases = []
|
||||
self.kanjis = []
|
||||
|
||||
def get_global_identifier(self):
|
||||
parent_part = format(self.entry_id[0], '06')
|
||||
child_part = hex(self.entry_id[1]).lstrip('0x').zfill(4).upper()
|
||||
return f"@{self.target.value}-{parent_part}-{child_part}"
|
||||
|
||||
def set_page(self, page):
|
||||
page = self.__decompose_subentries(page)
|
||||
self._page = page
|
||||
|
||||
def get_page_soup(self):
|
||||
soup = BeautifulSoup(self._page, "xml")
|
||||
return soup
|
||||
|
||||
def get_part_of_speech_tags(self):
|
||||
if self._part_of_speech_tags is not None:
|
||||
return self._part_of_speech_tags
|
||||
self._part_of_speech_tags = []
|
||||
soup = self.get_page_soup()
|
||||
headword_info = soup.find("見出要素")
|
||||
if headword_info is None:
|
||||
return self._part_of_speech_tags
|
||||
for tag in headword_info.find_all("品詞M"):
|
||||
if tag.text not in self._part_of_speech_tags:
|
||||
self._part_of_speech_tags.append(tag.text)
|
||||
return self._part_of_speech_tags
|
||||
|
||||
def _add_variant_expressions(self, headwords):
|
||||
for expressions in headwords.values():
|
||||
Expressions.add_variant_kanji(expressions)
|
||||
Expressions.add_fullwidth(expressions)
|
||||
Expressions.remove_iteration_mark(expressions)
|
||||
Expressions.add_iteration_mark(expressions)
|
||||
|
||||
def _find_reading(self, soup):
|
||||
midasi_kana = soup.find("見出仮名")
|
||||
reading = midasi_kana.text
|
||||
for x in [" ", "・"]:
|
||||
reading = reading.replace(x, "")
|
||||
return reading
|
||||
|
||||
def _find_expressions(self, soup):
|
||||
clean_expressions = []
|
||||
for expression in soup.find_all("標準表記"):
|
||||
clean_expression = self._clean_expression(expression.text)
|
||||
clean_expressions.append(clean_expression)
|
||||
expressions = Expressions.expand_abbreviation_list(clean_expressions)
|
||||
return expressions
|
||||
|
||||
def __decompose_subentries(self, page):
|
||||
soup = BeautifulSoup(page, features="xml")
|
||||
subentry_parameters = [
|
||||
[Smk8ChildEntry, ["子項目F", "子項目"], self.children],
|
||||
[Smk8PhraseEntry, ["句項目F", "句項目"], self.phrases],
|
||||
[Smk8KanjiEntry, ["造語成分項目"], self.kanjis],
|
||||
]
|
||||
for x in subentry_parameters:
|
||||
subentry_class, tags, subentry_list = x
|
||||
for tag in tags:
|
||||
tag_soup = soup.find(tag)
|
||||
while tag_soup is not None:
|
||||
tag_soup.name = "項目"
|
||||
subentry_id = self.id_string_to_entry_id(tag_soup.attrs["id"])
|
||||
self.SUBENTRY_ID_TO_ENTRY_ID[subentry_id] = self.entry_id
|
||||
subentry = subentry_class(self.target, subentry_id)
|
||||
page = tag_soup.decode()
|
||||
subentry.set_page(page)
|
||||
subentry_list.append(subentry)
|
||||
tag_soup.decompose()
|
||||
tag_soup = soup.find(tag)
|
||||
return soup.decode()
|
||||
|
||||
@staticmethod
|
||||
def id_string_to_entry_id(id_string):
|
||||
parts = id_string.split("-")
|
||||
if len(parts) == 1:
|
||||
return (int(parts[0]), 0)
|
||||
elif len(parts) == 2:
|
||||
# subentries have a hexadecimal part
|
||||
return (int(parts[0]), int(parts[1], 16))
|
||||
else:
|
||||
raise Exception(f"Invalid entry ID: {id_string}")
|
||||
|
||||
@staticmethod
|
||||
def _delete_unused_nodes(soup):
|
||||
"""Remove extra markup elements that appear in the entry
|
||||
headword line which are not part of the entry headword"""
|
||||
unused_nodes = [
|
||||
"表音表記", "表外音訓マーク", "表外字マーク", "ルビG"
|
||||
]
|
||||
for name in unused_nodes:
|
||||
Soup.delete_soup_nodes(soup, name)
|
||||
|
||||
@staticmethod
|
||||
def _clean_expression(expression):
|
||||
for x in ["〈", "〉", "{", "}", "…", " "]:
|
||||
expression = expression.replace(x, "")
|
||||
return expression
|
||||
|
||||
@staticmethod
|
||||
def _fill_alts(soup):
|
||||
for el in soup.find_all(["親見出仮名", "親見出表記"]):
|
||||
el.string = el.attrs["alt"]
|
||||
for gaiji in soup.find_all("外字"):
|
||||
gaiji.string = gaiji.img.attrs["alt"]
|
||||
|
||||
|
||||
class Smk8Entry(_BaseSmk8Entry):
|
||||
def __init__(self, target, page_id):
|
||||
entry_id = (page_id, 0)
|
||||
super().__init__(target, entry_id)
|
||||
|
||||
def set_page(self, page):
|
||||
page = preprocess_page(page)
|
||||
super().set_page(page)
|
||||
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
self._delete_unused_nodes(soup)
|
||||
self._fill_alts(soup)
|
||||
reading = self._find_reading(soup)
|
||||
expressions = []
|
||||
if soup.find("見出部").find("標準表記") is None:
|
||||
expressions.append(reading)
|
||||
for expression in self._find_expressions(soup):
|
||||
if expression not in expressions:
|
||||
expressions.append(expression)
|
||||
headwords = {reading: expressions}
|
||||
return headwords
|
||||
|
||||
|
||||
class Smk8ChildEntry(_BaseSmk8Entry):
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
self._delete_unused_nodes(soup)
|
||||
self._fill_alts(soup)
|
||||
reading = self._find_reading(soup)
|
||||
expressions = []
|
||||
if soup.find("子見出部").find("標準表記") is None:
|
||||
expressions.append(reading)
|
||||
for expression in self._find_expressions(soup):
|
||||
if expression not in expressions:
|
||||
expressions.append(expression)
|
||||
headwords = {reading: expressions}
|
||||
return headwords
|
||||
|
||||
|
||||
class Smk8PhraseEntry(_BaseSmk8Entry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.__phrase_readings = load_smk8_phrase_readings()
|
||||
|
||||
def get_part_of_speech_tags(self):
|
||||
# phrases do not contain these tags
|
||||
return []
|
||||
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
headwords = {}
|
||||
expressions = self._find_expressions(soup)
|
||||
readings = self._find_readings()
|
||||
for idx, expression in enumerate(expressions):
|
||||
reading = readings[idx]
|
||||
if reading in headwords:
|
||||
headwords[reading].append(expression)
|
||||
else:
|
||||
headwords[reading] = [expression]
|
||||
return headwords
|
||||
|
||||
def _find_expressions(self, soup):
|
||||
self._delete_unused_nodes(soup)
|
||||
self._fill_alts(soup)
|
||||
text = soup.find("標準表記").text
|
||||
text = self._clean_expression(text)
|
||||
alternatives = Expressions.expand_smk_alternatives(text)
|
||||
expressions = []
|
||||
for alt in alternatives:
|
||||
for exp in Expressions.expand_abbreviation(alt):
|
||||
expressions.append(exp)
|
||||
return expressions
|
||||
|
||||
def _find_readings(self):
|
||||
text = self.__phrase_readings[self.entry_id]
|
||||
alternatives = Expressions.expand_smk_alternatives(text)
|
||||
readings = []
|
||||
for alt in alternatives:
|
||||
for reading in Expressions.expand_abbreviation(alt):
|
||||
readings.append(reading)
|
||||
return readings
|
||||
|
||||
|
||||
class Smk8KanjiEntry(_BaseSmk8Entry):
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
self._delete_unused_nodes(soup)
|
||||
self._fill_alts(soup)
|
||||
reading = self.__get_parent_reading()
|
||||
expressions = self._find_expressions(soup)
|
||||
headwords = {reading: expressions}
|
||||
return headwords
|
||||
|
||||
def __get_parent_reading(self):
|
||||
parent_id = self.SUBENTRY_ID_TO_ENTRY_ID[self.entry_id]
|
||||
parent = self.ID_TO_ENTRY[parent_id]
|
||||
reading = parent.get_first_reading()
|
||||
return reading
|
|
@ -1,73 +0,0 @@
|
|||
import bot.soup as Soup
|
||||
import bot.entries.base.expressions as Expressions
|
||||
from bot.entries.base.sanseido_entry import SanseidoEntry
|
||||
|
||||
|
||||
class BaseEntry(SanseidoEntry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.children = []
|
||||
self.phrases = []
|
||||
self.kanjis = []
|
||||
|
||||
def get_part_of_speech_tags(self):
|
||||
if self._part_of_speech_tags is not None:
|
||||
return self._part_of_speech_tags
|
||||
self._part_of_speech_tags = []
|
||||
soup = self.get_page_soup()
|
||||
headword_info = soup.find("見出要素")
|
||||
if headword_info is None:
|
||||
return self._part_of_speech_tags
|
||||
for tag in headword_info.find_all("品詞M"):
|
||||
if tag.text not in self._part_of_speech_tags:
|
||||
self._part_of_speech_tags.append(tag.text)
|
||||
return self._part_of_speech_tags
|
||||
|
||||
def _find_reading(self, soup):
|
||||
midasi_kana = soup.find("見出仮名")
|
||||
reading = midasi_kana.text
|
||||
for x in [" ", "・"]:
|
||||
reading = reading.replace(x, "")
|
||||
return reading
|
||||
|
||||
def _find_expressions(self, soup):
|
||||
clean_expressions = []
|
||||
for expression in soup.find_all("標準表記"):
|
||||
clean_expression = self._clean_expression(expression.text)
|
||||
clean_expressions.append(clean_expression)
|
||||
expressions = Expressions.expand_abbreviation_list(clean_expressions)
|
||||
return expressions
|
||||
|
||||
def _get_subentry_parameters(self):
|
||||
from bot.entries.smk8.child_entry import ChildEntry
|
||||
from bot.entries.smk8.phrase_entry import PhraseEntry
|
||||
from bot.entries.smk8.kanji_entry import KanjiEntry
|
||||
subentry_parameters = [
|
||||
[ChildEntry, ["子項目F", "子項目"], self.children],
|
||||
[PhraseEntry, ["句項目F", "句項目"], self.phrases],
|
||||
[KanjiEntry, ["造語成分項目"], self.kanjis],
|
||||
]
|
||||
return subentry_parameters
|
||||
|
||||
@staticmethod
|
||||
def _delete_unused_nodes(soup):
|
||||
"""Remove extra markup elements that appear in the entry
|
||||
headword line which are not part of the entry headword"""
|
||||
unused_nodes = [
|
||||
"表音表記", "表外音訓マーク", "表外字マーク", "ルビG"
|
||||
]
|
||||
for name in unused_nodes:
|
||||
Soup.delete_soup_nodes(soup, name)
|
||||
|
||||
@staticmethod
|
||||
def _clean_expression(expression):
|
||||
for x in ["〈", "〉", "{", "}", "…", " "]:
|
||||
expression = expression.replace(x, "")
|
||||
return expression
|
||||
|
||||
@staticmethod
|
||||
def _fill_alts(soup):
|
||||
for elm in soup.find_all(["親見出仮名", "親見出表記"]):
|
||||
elm.string = elm.attrs["alt"]
|
||||
for gaiji in soup.find_all("外字"):
|
||||
gaiji.string = gaiji.img.attrs["alt"]
|
|
@ -1,17 +0,0 @@
|
|||
from bot.entries.smk8.base_entry import BaseEntry
|
||||
|
||||
|
||||
class ChildEntry(BaseEntry):
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
self._delete_unused_nodes(soup)
|
||||
self._fill_alts(soup)
|
||||
reading = self._find_reading(soup)
|
||||
expressions = []
|
||||
if soup.find("子見出部").find("標準表記") is None:
|
||||
expressions.append(reading)
|
||||
for expression in self._find_expressions(soup):
|
||||
if expression not in expressions:
|
||||
expressions.append(expression)
|
||||
headwords = {reading: expressions}
|
||||
return headwords
|
|
@ -1,26 +0,0 @@
|
|||
from bot.entries.smk8.base_entry import BaseEntry
|
||||
from bot.entries.smk8.preprocess import preprocess_page
|
||||
|
||||
|
||||
class Entry(BaseEntry):
|
||||
def __init__(self, target, page_id):
|
||||
entry_id = (page_id, 0)
|
||||
super().__init__(target, entry_id)
|
||||
|
||||
def set_page(self, page):
|
||||
page = preprocess_page(page)
|
||||
super().set_page(page)
|
||||
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
self._delete_unused_nodes(soup)
|
||||
self._fill_alts(soup)
|
||||
reading = self._find_reading(soup)
|
||||
expressions = []
|
||||
if soup.find("見出部").find("標準表記") is None:
|
||||
expressions.append(reading)
|
||||
for expression in self._find_expressions(soup):
|
||||
if expression not in expressions:
|
||||
expressions.append(expression)
|
||||
headwords = {reading: expressions}
|
||||
return headwords
|
|
@ -1,22 +0,0 @@
|
|||
from bot.entries.smk8.base_entry import BaseEntry
|
||||
|
||||
|
||||
class KanjiEntry(BaseEntry):
|
||||
def get_part_of_speech_tags(self):
|
||||
# kanji entries do not contain these tags
|
||||
return []
|
||||
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
self._delete_unused_nodes(soup)
|
||||
self._fill_alts(soup)
|
||||
reading = self.__get_parent_reading()
|
||||
expressions = self._find_expressions(soup)
|
||||
headwords = {reading: expressions}
|
||||
return headwords
|
||||
|
||||
def __get_parent_reading(self):
|
||||
parent_id = self.SUBENTRY_ID_TO_ENTRY_ID[self.entry_id]
|
||||
parent = self.ID_TO_ENTRY[parent_id]
|
||||
reading = parent.get_first_reading()
|
||||
return reading
|
|
@ -1,64 +0,0 @@
|
|||
import re
|
||||
|
||||
import bot.entries.base.expressions as Expressions
|
||||
from bot.data import load_phrase_readings
|
||||
from bot.entries.smk8.base_entry import BaseEntry
|
||||
|
||||
|
||||
class PhraseEntry(BaseEntry):
|
||||
def __init__(self, target, entry_id):
|
||||
super().__init__(target, entry_id)
|
||||
self.__phrase_readings = load_phrase_readings(self.target)
|
||||
|
||||
def get_part_of_speech_tags(self):
|
||||
# phrase entries do not contain these tags
|
||||
return []
|
||||
|
||||
def _get_headwords(self):
|
||||
soup = self.get_page_soup()
|
||||
headwords = {}
|
||||
expressions = self._find_expressions(soup)
|
||||
readings = self._find_readings()
|
||||
for idx, expression in enumerate(expressions):
|
||||
reading = readings[idx]
|
||||
if reading in headwords:
|
||||
headwords[reading].append(expression)
|
||||
else:
|
||||
headwords[reading] = [expression]
|
||||
return headwords
|
||||
|
||||
def _find_expressions(self, soup):
|
||||
self._delete_unused_nodes(soup)
|
||||
self._fill_alts(soup)
|
||||
text = soup.find("標準表記").text
|
||||
text = self._clean_expression(text)
|
||||
alternatives = parse_phrase(text)
|
||||
expressions = []
|
||||
for alt in alternatives:
|
||||
for exp in Expressions.expand_abbreviation(alt):
|
||||
expressions.append(exp)
|
||||
return expressions
|
||||
|
||||
def _find_readings(self):
|
||||
text = self.__phrase_readings[self.entry_id]
|
||||
alternatives = parse_phrase(text)
|
||||
readings = []
|
||||
for alt in alternatives:
|
||||
for reading in Expressions.expand_abbreviation(alt):
|
||||
readings.append(reading)
|
||||
return readings
|
||||
|
||||
|
||||
def parse_phrase(text):
|
||||
"""Return a list of strings described by △ notation."""
|
||||
match = re.search(r"△([^(]+)(([^(]+))", text)
|
||||
if match is None:
|
||||
return [text]
|
||||
alt_parts = [match.group(1)]
|
||||
for alt_part in match.group(2).split("・"):
|
||||
alt_parts.append(alt_part)
|
||||
alts = []
|
||||
for alt_part in alt_parts:
|
||||
alt_exp = re.sub(r"△[^(]+([^(]+)", alt_part, text)
|
||||
alts.append(alt_exp)
|
||||
return alts
|
|
@ -6,8 +6,8 @@ from bot.data import get_adobe_glyph
|
|||
|
||||
__GAIJI = {
|
||||
"gaiji/5350.svg": "卐",
|
||||
"gaiji/62cb.svg": "拋",
|
||||
"gaiji/7be1.svg": "篡",
|
||||
"gaiji/62cb.svg": "抛",
|
||||
"gaiji/7be1.svg": "簒",
|
||||
}
|
||||
|
||||
|
|
@ -1,37 +0,0 @@
|
|||
import importlib
|
||||
|
||||
|
||||
def new_crawler(target):
|
||||
module_path = f"bot.crawlers.{target.name.lower()}"
|
||||
module = importlib.import_module(module_path)
|
||||
return module.Crawler(target)
|
||||
|
||||
|
||||
def new_entry(target, page_id):
|
||||
module_path = f"bot.entries.{target.name.lower()}.entry"
|
||||
module = importlib.import_module(module_path)
|
||||
return module.Entry(target, page_id)
|
||||
|
||||
|
||||
def new_yomichan_exporter(target):
|
||||
module_path = f"bot.yomichan.exporters.{target.name.lower()}"
|
||||
module = importlib.import_module(module_path)
|
||||
return module.Exporter(target)
|
||||
|
||||
|
||||
def new_yomichan_terminator(target):
|
||||
module_path = f"bot.yomichan.terms.{target.name.lower()}"
|
||||
module = importlib.import_module(module_path)
|
||||
return module.Terminator(target)
|
||||
|
||||
|
||||
def new_mdict_exporter(target):
|
||||
module_path = f"bot.mdict.exporters.{target.name.lower()}"
|
||||
module = importlib.import_module(module_path)
|
||||
return module.Exporter(target)
|
||||
|
||||
|
||||
def new_mdict_terminator(target):
|
||||
module_path = f"bot.mdict.terms.{target.name.lower()}"
|
||||
module = importlib.import_module(module_path)
|
||||
return module.Terminator(target)
|
|
@ -1,18 +0,0 @@
|
|||
from bot.mdict.exporters.base.exporter import BaseExporter
|
||||
|
||||
|
||||
class JitenonExporter(BaseExporter):
|
||||
def _get_revision(self, entries):
|
||||
modified_date = None
|
||||
for entry in entries:
|
||||
if modified_date is None or entry.modified_date > modified_date:
|
||||
modified_date = entry.modified_date
|
||||
revision = modified_date.strftime("%Y年%m月%d日閲覧")
|
||||
return revision
|
||||
|
||||
def _get_attribution(self, entries):
|
||||
modified_date = None
|
||||
for entry in entries:
|
||||
if modified_date is None or entry.modified_date > modified_date:
|
||||
attribution = entry.attribution
|
||||
return attribution
|
|
@ -1,8 +0,0 @@
|
|||
from datetime import datetime
|
||||
from bot.mdict.exporters.base.exporter import BaseExporter
|
||||
|
||||
|
||||
class MonokakidoExporter(BaseExporter):
|
||||
def _get_revision(self, entries):
|
||||
timestamp = datetime.now().strftime("%Y年%m月%d日作成")
|
||||
return timestamp
|
|
@ -1,6 +0,0 @@
|
|||
from bot.mdict.exporters.base.monokakido import MonokakidoExporter
|
||||
|
||||
|
||||
class Exporter(MonokakidoExporter):
|
||||
def _get_attribution(self, entries):
|
||||
return "© Sanseido Co., LTD. 2019"
|
|
@ -1,19 +1,21 @@
|
|||
# pylint: disable=too-few-public-methods
|
||||
|
||||
import subprocess
|
||||
import os
|
||||
import shutil
|
||||
import subprocess
|
||||
from abc import ABC, abstractmethod
|
||||
from pathlib import Path
|
||||
|
||||
from datetime import datetime
|
||||
from platformdirs import user_documents_dir, user_cache_dir
|
||||
|
||||
from bot.time import timestamp
|
||||
from bot.factory import new_mdict_terminator
|
||||
from bot.targets import Targets
|
||||
from bot.mdict.terms.factory import new_terminator
|
||||
|
||||
|
||||
class BaseExporter(ABC):
|
||||
class Exporter(ABC):
|
||||
def __init__(self, target):
|
||||
self._target = target
|
||||
self._terminator = new_mdict_terminator(target)
|
||||
self._terminator = new_terminator(target)
|
||||
self._build_dir = None
|
||||
self._build_media_dir = None
|
||||
self._description_file = None
|
||||
|
@ -22,10 +24,11 @@ class BaseExporter(ABC):
|
|||
def export(self, entries, media_dir, icon_file):
|
||||
self._init_build_media_dir(media_dir)
|
||||
self._init_description_file(entries)
|
||||
self._write_mdx_file(entries)
|
||||
terms = self._get_terms(entries)
|
||||
print(f"Exporting {len(terms)} Mdict keys...")
|
||||
self._write_mdx_file(terms)
|
||||
self._write_mdd_file()
|
||||
self._write_icon_file(icon_file)
|
||||
self._write_css_file()
|
||||
self._rm_build_dir()
|
||||
|
||||
def _get_build_dir(self):
|
||||
|
@ -33,7 +36,7 @@ class BaseExporter(ABC):
|
|||
return self._build_dir
|
||||
cache_dir = user_cache_dir("jitenbot")
|
||||
build_directory = os.path.join(cache_dir, "mdict_build")
|
||||
print(f"{timestamp()} Initializing build directory `{build_directory}`")
|
||||
print(f"Initializing build directory `{build_directory}`")
|
||||
if Path(build_directory).is_dir():
|
||||
shutil.rmtree(build_directory)
|
||||
os.makedirs(build_directory)
|
||||
|
@ -44,7 +47,7 @@ class BaseExporter(ABC):
|
|||
build_dir = self._get_build_dir()
|
||||
build_media_dir = os.path.join(build_dir, self._target.value)
|
||||
if media_dir is not None:
|
||||
print(f"{timestamp()} Copying media files to build directory...")
|
||||
print("Copying media files to build directory...")
|
||||
shutil.copytree(media_dir, build_media_dir)
|
||||
else:
|
||||
os.makedirs(build_media_dir)
|
||||
|
@ -54,23 +57,34 @@ class BaseExporter(ABC):
|
|||
self._build_media_dir = build_media_dir
|
||||
|
||||
def _init_description_file(self, entries):
|
||||
description_template_file = self._get_description_template_file()
|
||||
with open(description_template_file, "r", encoding="utf8") as f:
|
||||
filename = f"{self._target.value}.mdx.description.html"
|
||||
original_file = os.path.join(
|
||||
"data", "mdict", "description", filename)
|
||||
with open(original_file, "r", encoding="utf8") as f:
|
||||
description = f.read()
|
||||
description = description.replace(
|
||||
"{{revision}}", self._get_revision(entries))
|
||||
description = description.replace(
|
||||
"{{attribution}}", self._get_attribution(entries))
|
||||
build_dir = self._get_build_dir()
|
||||
description_file = os.path.join(
|
||||
build_dir, f"{self._target.value}.mdx.description.html")
|
||||
description_file = os.path.join(build_dir, filename)
|
||||
with open(description_file, "w", encoding="utf8") as f:
|
||||
f.write(description)
|
||||
self._description_file = description_file
|
||||
|
||||
def _write_mdx_file(self, entries):
|
||||
terms = self._get_terms(entries)
|
||||
print(f"{timestamp()} Exporting {len(terms)} Mdict keys...")
|
||||
def _get_terms(self, entries):
|
||||
terms = []
|
||||
entries_len = len(entries)
|
||||
for idx, entry in enumerate(entries):
|
||||
update = f"Creating Mdict terms for entry {idx+1}/{entries_len}"
|
||||
print(update, end='\r', flush=True)
|
||||
new_terms = self._terminator.make_terms(entry)
|
||||
for term in new_terms:
|
||||
terms.append(term)
|
||||
print()
|
||||
return terms
|
||||
|
||||
def _write_mdx_file(self, terms):
|
||||
out_dir = self._get_out_dir()
|
||||
out_file = os.path.join(out_dir, f"{self._target.value}.mdx")
|
||||
params = [
|
||||
|
@ -82,18 +96,6 @@ class BaseExporter(ABC):
|
|||
]
|
||||
subprocess.run(params, check=True)
|
||||
|
||||
def _get_terms(self, entries):
|
||||
terms = []
|
||||
entries_len = len(entries)
|
||||
for idx, entry in enumerate(entries):
|
||||
update = f"\tCreating MDict terms for entry {idx+1}/{entries_len}"
|
||||
print(update, end='\r', flush=True)
|
||||
new_terms = self._terminator.make_terms(entry)
|
||||
for term in new_terms:
|
||||
terms.append(term)
|
||||
print()
|
||||
return terms
|
||||
|
||||
def _write_mdd_file(self):
|
||||
out_dir = self._get_out_dir()
|
||||
out_file = os.path.join(out_dir, f"{self._target.value}.mdd")
|
||||
|
@ -107,7 +109,7 @@ class BaseExporter(ABC):
|
|||
subprocess.run(params, check=True)
|
||||
|
||||
def _write_icon_file(self, icon_file):
|
||||
premade_icon_file = self._get_premade_icon_file()
|
||||
premade_icon_file = f"data/mdict/icon/{self._target.value}.png"
|
||||
out_dir = self._get_out_dir()
|
||||
out_file = os.path.join(out_dir, f"{self._target.value}.png")
|
||||
if icon_file is not None and Path(icon_file).is_file():
|
||||
|
@ -115,17 +117,12 @@ class BaseExporter(ABC):
|
|||
elif Path(premade_icon_file).is_file():
|
||||
shutil.copy(premade_icon_file, out_file)
|
||||
|
||||
def _write_css_file(self):
|
||||
css_file = self._get_css_file()
|
||||
out_dir = self._get_out_dir()
|
||||
shutil.copy(css_file, out_dir)
|
||||
|
||||
def _get_out_dir(self):
|
||||
if self._out_dir is not None:
|
||||
return self._out_dir
|
||||
out_dir = os.path.join(
|
||||
user_documents_dir(), "jitenbot", "mdict", self._target.value)
|
||||
print(f"{timestamp()} Initializing output directory `{out_dir}`")
|
||||
print(f"Initializing output directory `{out_dir}`")
|
||||
if Path(out_dir).is_dir():
|
||||
shutil.rmtree(out_dir)
|
||||
os.makedirs(out_dir)
|
||||
|
@ -151,24 +148,59 @@ class BaseExporter(ABC):
|
|||
"data", "mdict", "css",
|
||||
f"{self._target.value}.css")
|
||||
|
||||
def _get_premade_icon_file(self):
|
||||
return os.path.join(
|
||||
"data", "mdict", "icon",
|
||||
f"{self._target.value}.png")
|
||||
|
||||
def _get_description_template_file(self):
|
||||
return os.path.join(
|
||||
"data", "mdict", "description",
|
||||
f"{self._target.value}.mdx.description.html")
|
||||
|
||||
def _rm_build_dir(self):
|
||||
build_dir = self._get_build_dir()
|
||||
shutil.rmtree(build_dir)
|
||||
|
||||
@abstractmethod
|
||||
def _get_revision(self, entries):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def _get_attribution(self, entries):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
|
||||
class _JitenonExporter(Exporter):
|
||||
def _get_revision(self, entries):
|
||||
modified_date = None
|
||||
for entry in entries:
|
||||
if modified_date is None or entry.modified_date > modified_date:
|
||||
modified_date = entry.modified_date
|
||||
revision = modified_date.strftime("%Y年%m月%d日閲覧")
|
||||
return revision
|
||||
|
||||
def _get_attribution(self, entries):
|
||||
modified_date = None
|
||||
for entry in entries:
|
||||
if modified_date is None or entry.modified_date > modified_date:
|
||||
attribution = entry.attribution
|
||||
return attribution
|
||||
|
||||
|
||||
class JitenonKokugoExporter(_JitenonExporter):
|
||||
pass
|
||||
|
||||
|
||||
class JitenonYojiExporter(_JitenonExporter):
|
||||
pass
|
||||
|
||||
|
||||
class JitenonKotowazaExporter(_JitenonExporter):
|
||||
pass
|
||||
|
||||
|
||||
class _MonokakidoExporter(Exporter):
|
||||
def _get_revision(self, entries):
|
||||
timestamp = datetime.now().strftime("%Y年%m月%d日作成")
|
||||
return timestamp
|
||||
|
||||
|
||||
class Smk8Exporter(_MonokakidoExporter):
|
||||
def _get_attribution(self, entries):
|
||||
return "© Sanseido Co., LTD. 2020"
|
||||
|
||||
|
||||
class Daijirin2Exporter(_MonokakidoExporter):
|
||||
def _get_attribution(self, entries):
|
||||
return "© Sanseido Co., LTD. 2019"
|
18
bot/mdict/exporters/factory.py
Normal file
18
bot/mdict/exporters/factory.py
Normal file
|
@ -0,0 +1,18 @@
|
|||
from bot.targets import Targets
|
||||
|
||||
from bot.mdict.exporters.export import JitenonKokugoExporter
|
||||
from bot.mdict.exporters.export import JitenonYojiExporter
|
||||
from bot.mdict.exporters.export import JitenonKotowazaExporter
|
||||
from bot.mdict.exporters.export import Smk8Exporter
|
||||
from bot.mdict.exporters.export import Daijirin2Exporter
|
||||
|
||||
|
||||
def new_mdict_exporter(target):
|
||||
exporter_map = {
|
||||
Targets.JITENON_KOKUGO: JitenonKokugoExporter,
|
||||
Targets.JITENON_YOJI: JitenonYojiExporter,
|
||||
Targets.JITENON_KOTOWAZA: JitenonKotowazaExporter,
|
||||
Targets.SMK8: Smk8Exporter,
|
||||
Targets.DAIJIRIN2: Daijirin2Exporter,
|
||||
}
|
||||
return exporter_map[target](target)
|
|
@ -1,5 +0,0 @@
|
|||
from bot.mdict.exporters.base.jitenon import JitenonExporter
|
||||
|
||||
|
||||
class Exporter(JitenonExporter):
|
||||
pass
|
|
@ -1,5 +0,0 @@
|
|||
from bot.mdict.exporters.base.jitenon import JitenonExporter
|
||||
|
||||
|
||||
class Exporter(JitenonExporter):
|
||||
pass
|
|
@ -1,5 +0,0 @@
|
|||
from bot.mdict.exporters.base.jitenon import JitenonExporter
|
||||
|
||||
|
||||
class Exporter(JitenonExporter):
|
||||
pass
|
|
@ -1,6 +0,0 @@
|
|||
from bot.mdict.exporters.base.monokakido import MonokakidoExporter
|
||||
|
||||
|
||||
class Exporter(MonokakidoExporter):
|
||||
def _get_attribution(self, entries):
|
||||
return "© Sanseido Co., LTD. 2021"
|
|
@ -1,6 +0,0 @@
|
|||
from bot.mdict.exporters.base.monokakido import MonokakidoExporter
|
||||
|
||||
|
||||
class Exporter(MonokakidoExporter):
|
||||
def _get_attribution(self, entries):
|
||||
return "© Sanseido Co., LTD. 2020"
|
|
@ -1,137 +0,0 @@
|
|||
import re
|
||||
from bs4 import BeautifulSoup
|
||||
from bot.data import load_mdict_name_conversion
|
||||
from bot.name_conversion import convert_names
|
||||
|
||||
|
||||
def make_glossary(entry, media_dir):
|
||||
soup = entry.get_page_soup()
|
||||
__reposition_marks(soup)
|
||||
__remove_appendix_links(soup)
|
||||
__convert_images(soup)
|
||||
__remove_links_without_href(soup)
|
||||
__convert_links(soup, entry)
|
||||
__add_parent_link(soup, entry)
|
||||
__add_homophone_links(soup, entry)
|
||||
|
||||
name_conversion = load_mdict_name_conversion(entry.target)
|
||||
convert_names(soup, name_conversion)
|
||||
|
||||
glossary = soup.span.decode()
|
||||
return glossary
|
||||
|
||||
|
||||
def __reposition_marks(soup):
|
||||
"""These 表外字マーク symbols will be converted to rubies later, so they need to
|
||||
be positioned after the corresponding text in order to appear correctly"""
|
||||
for elm in soup.find_all("表外字"):
|
||||
mark = elm.find("表外字マーク")
|
||||
elm.append(mark)
|
||||
for elm in soup.find_all("表外音訓"):
|
||||
mark = elm.find("表外音訓マーク")
|
||||
elm.append(mark)
|
||||
|
||||
|
||||
def __remove_appendix_links(soup):
|
||||
"""This info would be useful and nice to have, but jitenbot currently
|
||||
isn't designed to fetch and process these appendix files. It probably
|
||||
wouldn't be possible to include them in Yomichan, but it would definitely
|
||||
be possible for Mdict."""
|
||||
for elm in soup.find_all("a"):
|
||||
if not elm.has_attr("href"):
|
||||
continue
|
||||
if elm.attrs["href"].startswith("appendix"):
|
||||
elm.attrs["data-name"] = "a"
|
||||
elm.attrs["data-href"] = elm.attrs["href"]
|
||||
elm.name = "span"
|
||||
del elm.attrs["href"]
|
||||
|
||||
|
||||
def __convert_images(soup):
|
||||
conversions = [
|
||||
["svg-logo/重要語.svg", "*"],
|
||||
["svg-logo/最重要語.svg", "**"],
|
||||
["svg-logo/一般常識語.svg", "☆☆"],
|
||||
["svg-logo/追い込み.svg", ""],
|
||||
["svg-special/区切り線.svg", "|"],
|
||||
["svg-accent/平板.svg", "⎺"],
|
||||
["svg-accent/アクセント.svg", "⌝"],
|
||||
["svg-logo/アク.svg", "アク"],
|
||||
["svg-logo/丁寧.svg", "丁寧"],
|
||||
["svg-logo/可能.svg", "可能"],
|
||||
["svg-logo/尊敬.svg", "尊敬"],
|
||||
["svg-logo/接尾.svg", "接尾"],
|
||||
["svg-logo/接頭.svg", "接頭"],
|
||||
["svg-logo/表記.svg", "表記"],
|
||||
["svg-logo/謙譲.svg", "謙譲"],
|
||||
["svg-logo/区別.svg", "区別"],
|
||||
["svg-logo/由来.svg", "由来"],
|
||||
]
|
||||
for conversion in conversions:
|
||||
filename, text = conversion
|
||||
for elm in soup.find_all("img", attrs={"src": filename}):
|
||||
elm.attrs["data-name"] = elm.name
|
||||
elm.attrs["data-src"] = elm.attrs["src"]
|
||||
elm.name = "span"
|
||||
elm.string = text
|
||||
del elm.attrs["src"]
|
||||
|
||||
|
||||
def __remove_links_without_href(soup):
|
||||
for elm in soup.find_all("a"):
|
||||
if elm.has_attr("href"):
|
||||
continue
|
||||
elm.attrs["data-name"] = elm.name
|
||||
elm.name = "span"
|
||||
|
||||
|
||||
def __convert_links(soup, entry):
|
||||
for elm in soup.find_all("a"):
|
||||
href = elm.attrs["href"].split(" ")[0]
|
||||
if re.match(r"^#?[0-9]+(?:-[0-9A-F]{4})?$", href):
|
||||
href = href.removeprefix("#")
|
||||
ref_entry_id = entry.id_string_to_entry_id(href)
|
||||
if ref_entry_id in entry.ID_TO_ENTRY:
|
||||
ref_entry = entry.ID_TO_ENTRY[ref_entry_id]
|
||||
else:
|
||||
ref_entry = entry.ID_TO_ENTRY[(ref_entry_id[0], 0)]
|
||||
gid = ref_entry.get_global_identifier()
|
||||
elm.attrs["href"] = f"entry://{gid}"
|
||||
elif re.match(r"^entry:", href):
|
||||
pass
|
||||
elif re.match(r"^https?:[\w\W]*", href):
|
||||
pass
|
||||
else:
|
||||
raise Exception(f"Invalid href format: {href}")
|
||||
|
||||
|
||||
def __add_parent_link(soup, entry):
|
||||
elm = soup.find("親見出相当部")
|
||||
if elm is not None:
|
||||
parent_entry = entry.get_parent()
|
||||
gid = parent_entry.get_global_identifier()
|
||||
elm.attrs["href"] = f"entry://{gid}"
|
||||
elm.attrs["data-name"] = elm.name
|
||||
elm.name = "a"
|
||||
|
||||
|
||||
def __add_homophone_links(soup, entry):
|
||||
forward_link = ["←", entry.entry_id[0] + 1]
|
||||
backward_link = ["→", entry.entry_id[0] - 1]
|
||||
homophone_info_list = [
|
||||
["svg-logo/homophone1.svg", [forward_link]],
|
||||
["svg-logo/homophone2.svg", [forward_link, backward_link]],
|
||||
["svg-logo/homophone3.svg", [backward_link]],
|
||||
]
|
||||
for homophone_info in homophone_info_list:
|
||||
filename, link_info = homophone_info
|
||||
for elm in soup.find_all("img", attrs={"src": filename}):
|
||||
for info in link_info:
|
||||
text, link_id = info
|
||||
link_entry = entry.ID_TO_ENTRY[(link_id, 0)]
|
||||
gid = link_entry.get_global_identifier()
|
||||
link = BeautifulSoup("<a/>", "xml").a
|
||||
link.string = text
|
||||
link.attrs["href"] = f"entry://{gid}"
|
||||
elm.append(link)
|
||||
elm.unwrap()
|
|
@ -1,20 +0,0 @@
|
|||
from bot.mdict.terms.base.terminator import BaseTerminator
|
||||
|
||||
|
||||
class JitenonTerminator(BaseTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = None
|
||||
|
||||
def _glossary(self, entry):
|
||||
if entry.entry_id in self._glossary_cache:
|
||||
return self._glossary_cache[entry.entry_id]
|
||||
glossary = self._glossary_maker.make_glossary(entry, self._media_dir)
|
||||
self._glossary_cache[entry.entry_id] = glossary
|
||||
return glossary
|
||||
|
||||
def _link_glossary_parameters(self, entry):
|
||||
return []
|
||||
|
||||
def _subentry_lists(self, entry):
|
||||
return []
|
|
@ -1,8 +1,8 @@
|
|||
from bot.mdict.terms.base.terminator import BaseTerminator
|
||||
from bot.mdict.terms.terminator import Terminator
|
||||
from bot.mdict.glossary.daijirin2 import make_glossary
|
||||
|
||||
|
||||
class Terminator(BaseTerminator):
|
||||
class Daijirin2Terminator(Terminator):
|
||||
def _glossary(self, entry):
|
||||
if entry.entry_id in self._glossary_cache:
|
||||
return self._glossary_cache[entry.entry_id]
|
||||
|
|
18
bot/mdict/terms/factory.py
Normal file
18
bot/mdict/terms/factory.py
Normal file
|
@ -0,0 +1,18 @@
|
|||
from bot.targets import Targets
|
||||
|
||||
from bot.mdict.terms.jitenon import JitenonKokugoTerminator
|
||||
from bot.mdict.terms.jitenon import JitenonYojiTerminator
|
||||
from bot.mdict.terms.jitenon import JitenonKotowazaTerminator
|
||||
from bot.mdict.terms.smk8 import Smk8Terminator
|
||||
from bot.mdict.terms.daijirin2 import Daijirin2Terminator
|
||||
|
||||
|
||||
def new_terminator(target):
|
||||
terminator_map = {
|
||||
Targets.JITENON_KOKUGO: JitenonKokugoTerminator,
|
||||
Targets.JITENON_YOJI: JitenonYojiTerminator,
|
||||
Targets.JITENON_KOTOWAZA: JitenonKotowazaTerminator,
|
||||
Targets.SMK8: Smk8Terminator,
|
||||
Targets.DAIJIRIN2: Daijirin2Terminator,
|
||||
}
|
||||
return terminator_map[target](target)
|
42
bot/mdict/terms/jitenon.py
Normal file
42
bot/mdict/terms/jitenon.py
Normal file
|
@ -0,0 +1,42 @@
|
|||
from bot.mdict.terms.terminator import Terminator
|
||||
|
||||
from bot.mdict.glossary.jitenon import JitenonKokugoGlossary
|
||||
from bot.mdict.glossary.jitenon import JitenonYojiGlossary
|
||||
from bot.mdict.glossary.jitenon import JitenonKotowazaGlossary
|
||||
|
||||
|
||||
class JitenonTerminator(Terminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = None
|
||||
|
||||
def _glossary(self, entry):
|
||||
if entry.entry_id in self._glossary_cache:
|
||||
return self._glossary_cache[entry.entry_id]
|
||||
glossary = self._glossary_maker.make_glossary(entry, self._media_dir)
|
||||
self._glossary_cache[entry.entry_id] = glossary
|
||||
return glossary
|
||||
|
||||
def _link_glossary_parameters(self, entry):
|
||||
return []
|
||||
|
||||
def _subentry_lists(self, entry):
|
||||
return []
|
||||
|
||||
|
||||
class JitenonKokugoTerminator(JitenonTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = JitenonKokugoGlossary()
|
||||
|
||||
|
||||
class JitenonYojiTerminator(JitenonTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = JitenonYojiGlossary()
|
||||
|
||||
|
||||
class JitenonKotowazaTerminator(JitenonTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = JitenonKotowazaGlossary()
|
|
@ -1,8 +0,0 @@
|
|||
from bot.mdict.terms.base.jitenon import JitenonTerminator
|
||||
from bot.mdict.glossary.jitenon import JitenonKokugoGlossary
|
||||
|
||||
|
||||
class Terminator(JitenonTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = JitenonKokugoGlossary()
|
|
@ -1,8 +0,0 @@
|
|||
from bot.mdict.terms.base.jitenon import JitenonTerminator
|
||||
from bot.mdict.glossary.jitenon import JitenonKotowazaGlossary
|
||||
|
||||
|
||||
class Terminator(JitenonTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = JitenonKotowazaGlossary()
|
|
@ -1,8 +0,0 @@
|
|||
from bot.mdict.terms.base.jitenon import JitenonTerminator
|
||||
from bot.mdict.glossary.jitenon import JitenonYojiGlossary
|
||||
|
||||
|
||||
class Terminator(JitenonTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = JitenonYojiGlossary()
|
|
@ -1,23 +0,0 @@
|
|||
from bot.mdict.terms.base.terminator import BaseTerminator
|
||||
from bot.mdict.glossary.sankoku8 import make_glossary
|
||||
|
||||
|
||||
class Terminator(BaseTerminator):
|
||||
def _glossary(self, entry):
|
||||
if entry.entry_id in self._glossary_cache:
|
||||
return self._glossary_cache[entry.entry_id]
|
||||
glossary = make_glossary(entry, self._media_dir)
|
||||
self._glossary_cache[entry.entry_id] = glossary
|
||||
return glossary
|
||||
|
||||
def _link_glossary_parameters(self, entry):
|
||||
return [
|
||||
[entry.children, "子項目"],
|
||||
[entry.phrases, "句項目"],
|
||||
]
|
||||
|
||||
def _subentry_lists(self, entry):
|
||||
return [
|
||||
entry.children,
|
||||
entry.phrases,
|
||||
]
|
|
@ -1,8 +1,8 @@
|
|||
from bot.mdict.terms.base.terminator import BaseTerminator
|
||||
from bot.mdict.terms.terminator import Terminator
|
||||
from bot.mdict.glossary.smk8 import make_glossary
|
||||
|
||||
|
||||
class Terminator(BaseTerminator):
|
||||
class Smk8Terminator(Terminator):
|
||||
def _glossary(self, entry):
|
||||
if entry.entry_id in self._glossary_cache:
|
||||
return self._glossary_cache[entry.entry_id]
|
||||
|
|
|
@ -1,8 +1,7 @@
|
|||
import re
|
||||
from abc import abstractmethod, ABC
|
||||
|
||||
|
||||
class BaseTerminator(ABC):
|
||||
class Terminator(ABC):
|
||||
def __init__(self, target):
|
||||
self._target = target
|
||||
self._glossary_cache = {}
|
||||
|
@ -13,20 +12,35 @@ class BaseTerminator(ABC):
|
|||
|
||||
def make_terms(self, entry):
|
||||
gid = entry.get_global_identifier()
|
||||
glossary = self.__get_full_glossary(entry)
|
||||
glossary = self.__full_glossary(entry)
|
||||
terms = [[gid, glossary]]
|
||||
keys = self.__get_keys(entry)
|
||||
keys = set()
|
||||
headwords = entry.get_headwords()
|
||||
for reading, expressions in headwords.items():
|
||||
if len(expressions) == 0:
|
||||
keys.add(reading)
|
||||
for expression in expressions:
|
||||
if expression.strip() == "":
|
||||
keys.add(reading)
|
||||
continue
|
||||
keys.add(expression)
|
||||
if reading.strip() == "":
|
||||
continue
|
||||
if reading != expression:
|
||||
keys.add(f"{reading}【{expression}】")
|
||||
else:
|
||||
keys.add(reading)
|
||||
link = f"@@@LINK={gid}"
|
||||
for key in keys:
|
||||
if key.strip() != "":
|
||||
terms.append([key, link])
|
||||
for subentry_list in self._subentry_lists(entry):
|
||||
for subentry in subentry_list:
|
||||
for subentries in self._subentry_lists(entry):
|
||||
for subentry in subentries:
|
||||
for term in self.make_terms(subentry):
|
||||
terms.append(term)
|
||||
return terms
|
||||
|
||||
def __get_full_glossary(self, entry):
|
||||
def __full_glossary(self, entry):
|
||||
glossary = []
|
||||
style_link = f"<link rel='stylesheet' href='{self._target.value}.css' type='text/css'>"
|
||||
glossary.append(style_link)
|
||||
|
@ -46,38 +60,14 @@ class BaseTerminator(ABC):
|
|||
glossary.append(link_glossary)
|
||||
return "\n".join(glossary)
|
||||
|
||||
def __get_keys(self, entry):
|
||||
keys = set()
|
||||
headwords = entry.get_headwords()
|
||||
for reading, expressions in headwords.items():
|
||||
stripped_reading = reading.strip()
|
||||
keys.add(stripped_reading)
|
||||
if re.match(r"^[ぁ-ヿ、]+$", stripped_reading):
|
||||
kana_only_key = f"{stripped_reading}【∅】"
|
||||
else:
|
||||
kana_only_key = ""
|
||||
if len(expressions) == 0:
|
||||
keys.add(kana_only_key)
|
||||
for expression in expressions:
|
||||
stripped_expression = expression.strip()
|
||||
keys.add(stripped_expression)
|
||||
if stripped_expression == "":
|
||||
keys.add(kana_only_key)
|
||||
elif stripped_expression == stripped_reading:
|
||||
keys.add(kana_only_key)
|
||||
else:
|
||||
combo_key = f"{stripped_reading}【{stripped_expression}】"
|
||||
keys.add(combo_key)
|
||||
return keys
|
||||
|
||||
@abstractmethod
|
||||
def _glossary(self, entry):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def _link_glossary_parameters(self, entry):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def _subentry_lists(self, entry):
|
||||
raise NotImplementedError
|
||||
pass
|
|
@ -7,4 +7,3 @@ class Targets(Enum):
|
|||
JITENON_KOTOWAZA = "jitenon-kotowaza"
|
||||
SMK8 = "smk8"
|
||||
DAIJIRIN2 = "daijirin2"
|
||||
SANKOKU8 = "sankoku8"
|
||||
|
|
|
@ -1,5 +0,0 @@
|
|||
import time
|
||||
|
||||
|
||||
def timestamp():
|
||||
return time.strftime('%X')
|
|
@ -1,18 +0,0 @@
|
|||
from bot.yomichan.exporters.base.exporter import BaseExporter
|
||||
|
||||
|
||||
class JitenonExporter(BaseExporter):
|
||||
def _get_revision(self, entries):
|
||||
modified_date = None
|
||||
for entry in entries:
|
||||
if modified_date is None or entry.modified_date > modified_date:
|
||||
modified_date = entry.modified_date
|
||||
revision = f"{self._target.value};{modified_date}"
|
||||
return revision
|
||||
|
||||
def _get_attribution(self, entries):
|
||||
modified_date = None
|
||||
for entry in entries:
|
||||
if modified_date is None or entry.modified_date > modified_date:
|
||||
attribution = entry.attribution
|
||||
return attribution
|
|
@ -1,8 +0,0 @@
|
|||
from datetime import datetime
|
||||
from bot.yomichan.exporters.base.exporter import BaseExporter
|
||||
|
||||
|
||||
class MonokakidoExporter(BaseExporter):
|
||||
def _get_revision(self, entries):
|
||||
timestamp = datetime.now().strftime("%Y-%m-%d")
|
||||
return f"{self._target.value};{timestamp}"
|
|
@ -1,6 +0,0 @@
|
|||
from bot.yomichan.exporters.base.monokakido import MonokakidoExporter
|
||||
|
||||
|
||||
class Exporter(MonokakidoExporter):
|
||||
def _get_attribution(self, entries):
|
||||
return "© Sanseido Co., LTD. 2019"
|
|
@ -1,27 +1,25 @@
|
|||
# pylint: disable=too-few-public-methods
|
||||
|
||||
import json
|
||||
import os
|
||||
import shutil
|
||||
import copy
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
import fastjsonschema
|
||||
from platformdirs import user_documents_dir, user_cache_dir
|
||||
|
||||
from bot.time import timestamp
|
||||
from bot.data import load_yomichan_metadata
|
||||
from bot.data import load_yomichan_term_schema
|
||||
from bot.factory import new_yomichan_terminator
|
||||
from bot.yomichan.terms.factory import new_terminator
|
||||
|
||||
|
||||
class BaseExporter(ABC):
|
||||
class Exporter(ABC):
|
||||
def __init__(self, target):
|
||||
self._target = target
|
||||
self._terminator = new_yomichan_terminator(target)
|
||||
self._terminator = new_terminator(target)
|
||||
self._build_dir = None
|
||||
self._terms_per_file = 2000
|
||||
|
||||
def export(self, entries, image_dir, validate):
|
||||
def export(self, entries, image_dir):
|
||||
self.__init_build_image_dir(image_dir)
|
||||
meta = load_yomichan_metadata()
|
||||
index = meta[self._target.value]["index"]
|
||||
|
@ -29,45 +27,34 @@ class BaseExporter(ABC):
|
|||
index["attribution"] = self._get_attribution(entries)
|
||||
tags = meta[self._target.value]["tags"]
|
||||
terms = self.__get_terms(entries)
|
||||
if validate:
|
||||
self.__validate_terms(terms)
|
||||
self.__make_dictionary(terms, index, tags)
|
||||
|
||||
@abstractmethod
|
||||
def _get_revision(self, entries):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def _get_attribution(self, entries):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
def _get_build_dir(self):
|
||||
if self._build_dir is not None:
|
||||
return self._build_dir
|
||||
cache_dir = user_cache_dir("jitenbot")
|
||||
build_directory = os.path.join(cache_dir, "yomichan_build")
|
||||
print(f"{timestamp()} Initializing build directory `{build_directory}`")
|
||||
print(f"Initializing build directory `{build_directory}`")
|
||||
if Path(build_directory).is_dir():
|
||||
shutil.rmtree(build_directory)
|
||||
os.makedirs(build_directory)
|
||||
self._build_dir = build_directory
|
||||
return self._build_dir
|
||||
|
||||
def __get_invalid_term_dir(self):
|
||||
cache_dir = user_cache_dir("jitenbot")
|
||||
log_dir = os.path.join(cache_dir, "invalid_yomichan_terms")
|
||||
if Path(log_dir).is_dir():
|
||||
shutil.rmtree(log_dir)
|
||||
os.makedirs(log_dir)
|
||||
return log_dir
|
||||
|
||||
def __init_build_image_dir(self, image_dir):
|
||||
build_dir = self._get_build_dir()
|
||||
build_img_dir = os.path.join(build_dir, self._target.value)
|
||||
if image_dir is not None:
|
||||
print(f"{timestamp()} Copying media files to build directory...")
|
||||
print("Copying media files to build directory...")
|
||||
shutil.copytree(image_dir, build_img_dir)
|
||||
print(f"{timestamp()} Finished copying files")
|
||||
else:
|
||||
os.makedirs(build_img_dir)
|
||||
self._terminator.set_image_dir(build_img_dir)
|
||||
|
@ -76,7 +63,7 @@ class BaseExporter(ABC):
|
|||
terms = []
|
||||
entries_len = len(entries)
|
||||
for idx, entry in enumerate(entries):
|
||||
update = f"\tCreating Yomichan terms for entry {idx+1}/{entries_len}"
|
||||
update = f"Creating Yomichan terms for entry {idx+1}/{entries_len}"
|
||||
print(update, end='\r', flush=True)
|
||||
new_terms = self._terminator.make_terms(entry)
|
||||
for term in new_terms:
|
||||
|
@ -84,29 +71,8 @@ class BaseExporter(ABC):
|
|||
print()
|
||||
return terms
|
||||
|
||||
def __validate_terms(self, terms):
|
||||
print(f"{timestamp()} Making a copy of term data for validation...")
|
||||
terms_copy = copy.deepcopy(terms) # because validator will alter data!
|
||||
term_count = len(terms_copy)
|
||||
log_dir = self.__get_invalid_term_dir()
|
||||
schema = load_yomichan_term_schema()
|
||||
validator = fastjsonschema.compile(schema)
|
||||
failure_count = 0
|
||||
for idx, term in enumerate(terms_copy):
|
||||
update = f"\tValidating term {idx+1}/{term_count}"
|
||||
print(update, end='\r', flush=True)
|
||||
try:
|
||||
validator([term])
|
||||
except fastjsonschema.JsonSchemaException:
|
||||
failure_count += 1
|
||||
term_file = os.path.join(log_dir, f"{idx}.json")
|
||||
with open(term_file, "w", encoding='utf8') as f:
|
||||
json.dump([term], f, indent=4, ensure_ascii=False)
|
||||
print(f"\n{timestamp()} Finished validating with {failure_count} error{'' if failure_count == 1 else 's'}")
|
||||
if failure_count > 0:
|
||||
print(f"{timestamp()} Invalid terms saved to `{log_dir}` for debugging")
|
||||
|
||||
def __make_dictionary(self, terms, index, tags):
|
||||
print(f"Exporting {len(terms)} Yomichan terms...")
|
||||
self.__write_term_banks(terms)
|
||||
self.__write_index(index)
|
||||
self.__write_tag_bank(tags)
|
||||
|
@ -114,18 +80,14 @@ class BaseExporter(ABC):
|
|||
self.__rm_build_dir()
|
||||
|
||||
def __write_term_banks(self, terms):
|
||||
print(f"{timestamp()} Exporting {len(terms)} JSON terms")
|
||||
build_dir = self._get_build_dir()
|
||||
max_i = int(len(terms) / self._terms_per_file) + 1
|
||||
for i in range(max_i):
|
||||
update = f"\tWriting terms to term bank {i+1}/{max_i}"
|
||||
print(update, end='\r', flush=True)
|
||||
start = self._terms_per_file * i
|
||||
end = self._terms_per_file * (i + 1)
|
||||
term_file = os.path.join(build_dir, f"term_bank_{i+1}.json")
|
||||
with open(term_file, "w", encoding='utf8') as f:
|
||||
start = self._terms_per_file * i
|
||||
end = self._terms_per_file * (i + 1)
|
||||
json.dump(terms[start:end], f, indent=4, ensure_ascii=False)
|
||||
print()
|
||||
|
||||
def __write_index(self, index):
|
||||
build_dir = self._get_build_dir()
|
||||
|
@ -143,7 +105,6 @@ class BaseExporter(ABC):
|
|||
|
||||
def __write_archive(self, filename):
|
||||
archive_format = "zip"
|
||||
print(f"{timestamp()} Archiving data to {archive_format.upper()} file...")
|
||||
out_dir = os.path.join(user_documents_dir(), "jitenbot", "yomichan")
|
||||
if not Path(out_dir).is_dir():
|
||||
os.makedirs(out_dir)
|
||||
|
@ -154,8 +115,55 @@ class BaseExporter(ABC):
|
|||
base_filename = os.path.join(out_dir, filename)
|
||||
build_dir = self._get_build_dir()
|
||||
shutil.make_archive(base_filename, archive_format, build_dir)
|
||||
print(f"{timestamp()} Dictionary file saved to `{out_filepath}`")
|
||||
print(f"Dictionary file saved to {out_filepath}")
|
||||
|
||||
def __rm_build_dir(self):
|
||||
build_dir = self._get_build_dir()
|
||||
shutil.rmtree(build_dir)
|
||||
|
||||
|
||||
class _JitenonExporter(Exporter):
|
||||
def _get_revision(self, entries):
|
||||
modified_date = None
|
||||
for entry in entries:
|
||||
if modified_date is None or entry.modified_date > modified_date:
|
||||
modified_date = entry.modified_date
|
||||
revision = f"{self._target.value};{modified_date}"
|
||||
return revision
|
||||
|
||||
def _get_attribution(self, entries):
|
||||
modified_date = None
|
||||
for entry in entries:
|
||||
if modified_date is None or entry.modified_date > modified_date:
|
||||
attribution = entry.attribution
|
||||
return attribution
|
||||
|
||||
|
||||
class JitenonKokugoExporter(_JitenonExporter):
|
||||
pass
|
||||
|
||||
|
||||
class JitenonYojiExporter(_JitenonExporter):
|
||||
pass
|
||||
|
||||
|
||||
class JitenonKotowazaExporter(_JitenonExporter):
|
||||
pass
|
||||
|
||||
|
||||
class Smk8Exporter(Exporter):
|
||||
def _get_revision(self, entries):
|
||||
timestamp = datetime.now().strftime("%Y-%m-%d")
|
||||
return f"{self._target.value};{timestamp}"
|
||||
|
||||
def _get_attribution(self, entries):
|
||||
return "© Sanseido Co., LTD. 2020"
|
||||
|
||||
|
||||
class Daijirin2Exporter(Exporter):
|
||||
def _get_revision(self, entries):
|
||||
timestamp = datetime.now().strftime("%Y-%m-%d")
|
||||
return f"{self._target.value};{timestamp}"
|
||||
|
||||
def _get_attribution(self, entries):
|
||||
return "© Sanseido Co., LTD. 2019"
|
18
bot/yomichan/exporters/factory.py
Normal file
18
bot/yomichan/exporters/factory.py
Normal file
|
@ -0,0 +1,18 @@
|
|||
from bot.targets import Targets
|
||||
|
||||
from bot.yomichan.exporters.export import JitenonKokugoExporter
|
||||
from bot.yomichan.exporters.export import JitenonYojiExporter
|
||||
from bot.yomichan.exporters.export import JitenonKotowazaExporter
|
||||
from bot.yomichan.exporters.export import Smk8Exporter
|
||||
from bot.yomichan.exporters.export import Daijirin2Exporter
|
||||
|
||||
|
||||
def new_yomi_exporter(target):
|
||||
exporter_map = {
|
||||
Targets.JITENON_KOKUGO: JitenonKokugoExporter,
|
||||
Targets.JITENON_YOJI: JitenonYojiExporter,
|
||||
Targets.JITENON_KOTOWAZA: JitenonKotowazaExporter,
|
||||
Targets.SMK8: Smk8Exporter,
|
||||
Targets.DAIJIRIN2: Daijirin2Exporter,
|
||||
}
|
||||
return exporter_map[target](target)
|
|
@ -1,5 +0,0 @@
|
|||
from bot.yomichan.exporters.base.jitenon import JitenonExporter
|
||||
|
||||
|
||||
class Exporter(JitenonExporter):
|
||||
pass
|
|
@ -1,5 +0,0 @@
|
|||
from bot.yomichan.exporters.base.jitenon import JitenonExporter
|
||||
|
||||
|
||||
class Exporter(JitenonExporter):
|
||||
pass
|
|
@ -1,5 +0,0 @@
|
|||
from bot.yomichan.exporters.base.jitenon import JitenonExporter
|
||||
|
||||
|
||||
class Exporter(JitenonExporter):
|
||||
pass
|
|
@ -1,6 +0,0 @@
|
|||
from bot.yomichan.exporters.base.monokakido import MonokakidoExporter
|
||||
|
||||
|
||||
class Exporter(MonokakidoExporter):
|
||||
def _get_attribution(self, entries):
|
||||
return "© Sanseido Co., LTD. 2021"
|
|
@ -1,6 +0,0 @@
|
|||
from bot.yomichan.exporters.base.monokakido import MonokakidoExporter
|
||||
|
||||
|
||||
class Exporter(MonokakidoExporter):
|
||||
def _get_attribution(self, entries):
|
||||
return "© Sanseido Co., LTD. 2020"
|
|
@ -1,10 +1,9 @@
|
|||
import re
|
||||
import os
|
||||
from bs4 import BeautifulSoup
|
||||
from functools import cache
|
||||
from pathlib import Path
|
||||
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
import bot.yomichan.glossary.icons as Icons
|
||||
from bot.soup import delete_soup_nodes
|
||||
from bot.data import load_yomichan_name_conversion
|
||||
|
@ -112,8 +111,8 @@ def __convert_gaiji(soup, image_dir):
|
|||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"height": 1.0 if ratio > 1.0 else ratio,
|
||||
"width": ratio if ratio > 1.0 else 1.0,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
|
@ -151,8 +150,8 @@ def __convert_logos(soup, image_dir):
|
|||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"height": 1.0 if ratio > 1.0 else ratio,
|
||||
"width": ratio if ratio > 1.0 else 1.0,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
|
@ -175,8 +174,8 @@ def __convert_kanjion_logos(soup, image_dir):
|
|||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"height": 1.0 if ratio > 1.0 else ratio,
|
||||
"width": ratio if ratio > 1.0 else 1.0,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
|
@ -199,8 +198,8 @@ def __convert_daigoginum(soup, image_dir):
|
|||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"height": 1.0 if ratio > 1.0 else ratio,
|
||||
"width": ratio if ratio > 1.0 else 1.0,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
|
@ -223,8 +222,8 @@ def __convert_jundaigoginum(soup, image_dir):
|
|||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"height": 1.0 if ratio > 1.0 else ratio,
|
||||
"width": ratio if ratio > 1.0 else 1.0,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
|
|
|
@ -76,7 +76,6 @@ def __get_attributes(attrs):
|
|||
|
||||
|
||||
def __get_style(inline_style_string):
|
||||
# pylint: disable=no-member
|
||||
style = {}
|
||||
parsed_style = parseStyle(inline_style_string)
|
||||
if parsed_style.fontStyle != "":
|
||||
|
@ -101,7 +100,7 @@ def __get_style(inline_style_string):
|
|||
"marginLeft": parsed_style.marginLeft,
|
||||
}
|
||||
for key, val in margins.items():
|
||||
m = re.search(r"(-?\d+(\.\d*)?|-?\.\d+)em", val)
|
||||
m = re.search(r"(\d+(\.\d*)?|\.\d+)em", val)
|
||||
if m:
|
||||
style[key] = float(m.group(1))
|
||||
|
||||
|
|
|
@ -26,27 +26,6 @@ def make_monochrome_fill_rectangle(path, text):
|
|||
f.write(svg)
|
||||
|
||||
|
||||
@cache
|
||||
def make_accent(path):
|
||||
svg = __svg_accent()
|
||||
with open(path, "w", encoding="utf-8") as f:
|
||||
f.write(svg)
|
||||
|
||||
|
||||
@cache
|
||||
def make_heiban(path):
|
||||
svg = __svg_heiban()
|
||||
with open(path, "w", encoding="utf-8") as f:
|
||||
f.write(svg)
|
||||
|
||||
|
||||
@cache
|
||||
def make_red_char(path, char):
|
||||
svg = __svg_red_character(char)
|
||||
with open(path, "w", encoding="utf-8") as f:
|
||||
f.write(svg)
|
||||
|
||||
|
||||
def __calculate_svg_ratio(path):
|
||||
with open(path, "r", encoding="utf-8") as f:
|
||||
xml = f.read()
|
||||
|
@ -103,30 +82,3 @@ def __svg_masked_rectangle(text):
|
|||
fill='black' mask='url(#a)'/>
|
||||
</svg>"""
|
||||
return svg.strip()
|
||||
|
||||
|
||||
def __svg_heiban():
|
||||
svg = f"""
|
||||
<svg viewBox='0 0 210 300' xmlns='http://www.w3.org/2000/svg' version='1.1'>
|
||||
<rect width='210' height='30' fill='red'/>
|
||||
</svg>"""
|
||||
return svg.strip()
|
||||
|
||||
|
||||
def __svg_accent():
|
||||
svg = f"""
|
||||
<svg viewBox='0 0 150 300' xmlns='http://www.w3.org/2000/svg' version='1.1'>
|
||||
<rect width='150' height='30' fill='red'/>
|
||||
<rect width='30' height='150' x='120' fill='red'/>
|
||||
</svg>"""
|
||||
return svg.strip()
|
||||
|
||||
|
||||
def __svg_red_character(char):
|
||||
svg = f"""
|
||||
<svg viewBox='0 0 300 300' xmlns='http://www.w3.org/2000/svg' version='1.1'>
|
||||
<text text-anchor='middle' x='50%' y='50%' dy='.37em'
|
||||
font-family='sans-serif' font-size='300px'
|
||||
fill='red'>{char}</text>
|
||||
</svg>"""
|
||||
return svg.strip()
|
||||
|
|
|
@ -118,8 +118,8 @@ class JitenonKokugoGlossary(JitenonGlossary):
|
|||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"height": 1.0 if ratio > 1.0 else ratio,
|
||||
"width": ratio if ratio > 1.0 else 1.0,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
|
|
|
@ -1,344 +0,0 @@
|
|||
import re
|
||||
import os
|
||||
from bs4 import BeautifulSoup
|
||||
|
||||
import bot.yomichan.glossary.icons as Icons
|
||||
from bot.data import load_yomichan_name_conversion
|
||||
from bot.yomichan.glossary.gloss import make_gloss
|
||||
from bot.name_conversion import convert_names
|
||||
|
||||
|
||||
def make_glossary(entry, media_dir):
|
||||
soup = entry.get_page_soup()
|
||||
__remove_glyph_styles(soup)
|
||||
__reposition_marks(soup)
|
||||
__remove_links_without_href(soup)
|
||||
__remove_appendix_links(soup)
|
||||
__convert_links(soup, entry)
|
||||
__add_parent_link(soup, entry)
|
||||
__add_homophone_links(soup, entry)
|
||||
__convert_images_to_text(soup)
|
||||
__text_parens_to_images(soup, media_dir)
|
||||
__replace_icons(soup, media_dir)
|
||||
__replace_accent_symbols(soup, media_dir)
|
||||
__convert_gaiji(soup, media_dir)
|
||||
__convert_graphics(soup, media_dir)
|
||||
__convert_number_icons(soup, media_dir)
|
||||
|
||||
name_conversion = load_yomichan_name_conversion(entry.target)
|
||||
convert_names(soup, name_conversion)
|
||||
|
||||
gloss = make_gloss(soup.span)
|
||||
glossary = [gloss]
|
||||
return glossary
|
||||
|
||||
|
||||
def __remove_glyph_styles(soup):
|
||||
"""The css_parser library will emit annoying warning messages
|
||||
later if it sees these glyph character styles"""
|
||||
for elm in soup.find_all("glyph"):
|
||||
if elm.has_attr("style"):
|
||||
elm["data-style"] = elm.attrs["style"]
|
||||
del elm.attrs["style"]
|
||||
|
||||
|
||||
def __reposition_marks(soup):
|
||||
"""These マーク symbols will be converted to rubies later, so they need to
|
||||
be positioned after the corresponding text in order to appear correctly"""
|
||||
for elm in soup.find_all("表外字"):
|
||||
mark = elm.find("表外字マーク")
|
||||
elm.append(mark)
|
||||
for elm in soup.find_all("表外音訓"):
|
||||
mark = elm.find("表外音訓マーク")
|
||||
elm.append(mark)
|
||||
|
||||
|
||||
def __remove_links_without_href(soup):
|
||||
for elm in soup.find_all("a"):
|
||||
if elm.has_attr("href"):
|
||||
continue
|
||||
elm.attrs["data-name"] = elm.name
|
||||
elm.name = "span"
|
||||
|
||||
|
||||
def __remove_appendix_links(soup):
|
||||
for elm in soup.find_all("a"):
|
||||
if elm.attrs["href"].startswith("appendix"):
|
||||
elm.unwrap()
|
||||
|
||||
|
||||
def __convert_links(soup, entry):
|
||||
for elm in soup.find_all("a"):
|
||||
href = elm.attrs["href"].split(" ")[0]
|
||||
href = href.removeprefix("#")
|
||||
if not re.match(r"^[0-9]+(?:-[0-9A-F]{4})?$", href):
|
||||
raise Exception(f"Invalid href format: {href}")
|
||||
ref_entry_id = entry.id_string_to_entry_id(href)
|
||||
if ref_entry_id in entry.ID_TO_ENTRY:
|
||||
ref_entry = entry.ID_TO_ENTRY[ref_entry_id]
|
||||
else:
|
||||
ref_entry = entry.ID_TO_ENTRY[(ref_entry_id[0], 0)]
|
||||
expression = ref_entry.get_first_expression()
|
||||
elm.attrs["href"] = f"?query={expression}&wildcards=off"
|
||||
|
||||
|
||||
def __add_parent_link(soup, entry):
|
||||
elm = soup.find("親見出相当部")
|
||||
if elm is not None:
|
||||
parent_entry = entry.get_parent()
|
||||
expression = parent_entry.get_first_expression()
|
||||
elm.attrs["href"] = f"?query={expression}&wildcards=off"
|
||||
elm.name = "a"
|
||||
|
||||
|
||||
def __add_homophone_links(soup, entry):
|
||||
forward_link = ["←", entry.entry_id[0] + 1]
|
||||
backward_link = ["→", entry.entry_id[0] - 1]
|
||||
homophone_info_list = [
|
||||
["svg-logo/homophone1.svg", [forward_link]],
|
||||
["svg-logo/homophone2.svg", [forward_link, backward_link]],
|
||||
["svg-logo/homophone3.svg", [backward_link]],
|
||||
]
|
||||
for homophone_info in homophone_info_list:
|
||||
filename, link_info = homophone_info
|
||||
for elm in soup.find_all("img", attrs={"src": filename}):
|
||||
for info in link_info:
|
||||
text, link_id = info
|
||||
link_entry = entry.ID_TO_ENTRY[(link_id, 0)]
|
||||
expression = link_entry.get_first_expression()
|
||||
link = BeautifulSoup("<a/>", "xml").a
|
||||
link.string = text
|
||||
link.attrs["href"] = f"?query={expression}&wildcards=off"
|
||||
elm.append(link)
|
||||
elm.unwrap()
|
||||
|
||||
|
||||
def __convert_images_to_text(soup):
|
||||
conversions = [
|
||||
["svg-logo/重要語.svg", "*", "vertical-align: super; font-size: 0.6em"],
|
||||
["svg-logo/最重要語.svg", "**", "vertical-align: super; font-size: 0.6em"],
|
||||
["svg-logo/一般常識語.svg", "☆☆", "vertical-align: super; font-size: 0.6em"],
|
||||
["svg-logo/追い込み.svg", "", ""],
|
||||
["svg-special/区切り線.svg", "|", ""],
|
||||
]
|
||||
for conversion in conversions:
|
||||
filename, text, style = conversion
|
||||
for elm in soup.find_all("img", attrs={"src": filename}):
|
||||
if text == "":
|
||||
elm.unwrap()
|
||||
continue
|
||||
if style != "":
|
||||
elm.attrs["style"] = style
|
||||
elm.attrs["data-name"] = elm.name
|
||||
elm.attrs["data-src"] = elm.attrs["src"]
|
||||
elm.name = "span"
|
||||
elm.string = text
|
||||
del elm.attrs["src"]
|
||||
|
||||
|
||||
def __text_parens_to_images(soup, media_dir):
|
||||
for elm in soup.find_all("red"):
|
||||
char = elm.text
|
||||
if char not in ["(", ")"]:
|
||||
continue
|
||||
filename = f"red_{char}.svg"
|
||||
path = os.path.join(media_dir, filename)
|
||||
Icons.make_red_char(path, char)
|
||||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
"background": False,
|
||||
"appearance": "auto",
|
||||
"path": f"{os.path.basename(media_dir)}/{filename}",
|
||||
}
|
||||
elm.attrs["data-name"] = elm.name
|
||||
elm.name = "span"
|
||||
elm.string = ""
|
||||
elm.append(img)
|
||||
elm.attrs["style"] = "vertical-align: text-bottom;"
|
||||
|
||||
|
||||
def __replace_icons(soup, media_dir):
|
||||
cls_to_appearance = {
|
||||
"default": "monochrome",
|
||||
"fill": "monochrome",
|
||||
"red": "auto",
|
||||
"redfill": "auto",
|
||||
"none": "monochrome",
|
||||
}
|
||||
icon_info_list = [
|
||||
["svg-logo/アク.svg", "アク", "default"],
|
||||
["svg-logo/丁寧.svg", "丁寧", "default"],
|
||||
["svg-logo/可能.svg", "可能", "default"],
|
||||
["svg-logo/尊敬.svg", "尊敬", "default"],
|
||||
["svg-logo/接尾.svg", "接尾", "default"],
|
||||
["svg-logo/接頭.svg", "接頭", "default"],
|
||||
["svg-logo/表記.svg", "表記", "default"],
|
||||
["svg-logo/謙譲.svg", "謙譲", "default"],
|
||||
["svg-logo/区別.svg", "区別", "redfill"],
|
||||
["svg-logo/由来.svg", "由来", "redfill"],
|
||||
["svg-logo/人.svg", "", "none"],
|
||||
["svg-logo/他.svg", "", "none"],
|
||||
["svg-logo/動.svg", "", "none"],
|
||||
["svg-logo/名.svg", "", "none"],
|
||||
["svg-logo/句.svg", "", "none"],
|
||||
["svg-logo/派.svg", "", "none"],
|
||||
["svg-logo/自.svg", "", "none"],
|
||||
["svg-logo/連.svg", "", "none"],
|
||||
["svg-logo/造.svg", "", "none"],
|
||||
["svg-logo/造2.svg", "", "none"],
|
||||
["svg-logo/造3.svg", "", "none"],
|
||||
["svg-logo/百科.svg", "", "none"],
|
||||
]
|
||||
for icon_info in icon_info_list:
|
||||
src, text, cls = icon_info
|
||||
for elm in soup.find_all("img", attrs={"src": src}):
|
||||
path = media_dir
|
||||
for part in src.split("/"):
|
||||
path = os.path.join(path, part)
|
||||
__make_rectangle(path, text, cls)
|
||||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
"background": False,
|
||||
"appearance": cls_to_appearance[cls],
|
||||
"title": elm.attrs["alt"] if elm.has_attr("alt") else "",
|
||||
"path": f"{os.path.basename(media_dir)}/{src}",
|
||||
}
|
||||
elm.name = "span"
|
||||
elm.clear()
|
||||
elm.append(img)
|
||||
elm.attrs["style"] = "vertical-align: text-bottom; margin-right: 0.25em;"
|
||||
|
||||
|
||||
def __replace_accent_symbols(soup, media_dir):
|
||||
accent_info_list = [
|
||||
["svg-accent/平板.svg", Icons.make_heiban],
|
||||
["svg-accent/アクセント.svg", Icons.make_accent],
|
||||
]
|
||||
for info in accent_info_list:
|
||||
src, write_svg_function = info
|
||||
for elm in soup.find_all("img", attrs={"src": src}):
|
||||
path = media_dir
|
||||
for part in src.split("/"):
|
||||
path = os.path.join(path, part)
|
||||
write_svg_function(path)
|
||||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
"background": False,
|
||||
"appearance": "auto",
|
||||
"path": f"{os.path.basename(media_dir)}/{src}",
|
||||
}
|
||||
elm.name = "span"
|
||||
elm.clear()
|
||||
elm.append(img)
|
||||
elm.attrs["style"] = "vertical-align: super; margin-left: -0.25em;"
|
||||
|
||||
|
||||
def __convert_gaiji(soup, media_dir):
|
||||
for elm in soup.find_all("img"):
|
||||
if not elm.has_attr("src"):
|
||||
continue
|
||||
src = elm.attrs["src"]
|
||||
if src.startswith("graphics"):
|
||||
continue
|
||||
path = media_dir
|
||||
for part in src.split("/"):
|
||||
if part.strip() == "":
|
||||
continue
|
||||
path = os.path.join(path, part)
|
||||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
"background": False,
|
||||
"appearance": "monochrome",
|
||||
"title": elm.attrs["alt"] if elm.has_attr("alt") else "",
|
||||
"path": f"{os.path.basename(media_dir)}/{src}",
|
||||
}
|
||||
elm.name = "span"
|
||||
elm.clear()
|
||||
elm.append(img)
|
||||
elm.attrs["style"] = "vertical-align: text-bottom;"
|
||||
|
||||
|
||||
def __convert_graphics(soup, media_dir):
|
||||
for elm in soup.find_all("img"):
|
||||
if not elm.has_attr("src"):
|
||||
continue
|
||||
src = elm.attrs["src"]
|
||||
if not src.startswith("graphics"):
|
||||
continue
|
||||
elm.attrs = {
|
||||
"collapsible": True,
|
||||
"collapsed": True,
|
||||
"title": elm.attrs["alt"] if elm.has_attr("alt") else "",
|
||||
"path": f"{os.path.basename(media_dir)}/{src}",
|
||||
"src": src,
|
||||
}
|
||||
|
||||
|
||||
def __convert_number_icons(soup, media_dir):
|
||||
for elm in soup.find_all("大語義番号"):
|
||||
if elm.find_parent("a") is None:
|
||||
filename = f"{elm.text}-fill.svg"
|
||||
appearance = "monochrome"
|
||||
path = os.path.join(media_dir, filename)
|
||||
__make_rectangle(path, elm.text, "fill")
|
||||
else:
|
||||
filename = f"{elm.text}-bluefill.svg"
|
||||
appearance = "auto"
|
||||
path = os.path.join(media_dir, filename)
|
||||
__make_rectangle(path, elm.text, "bluefill")
|
||||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
"background": False,
|
||||
"appearance": appearance,
|
||||
"title": elm.text,
|
||||
"path": f"{os.path.basename(media_dir)}/{filename}",
|
||||
}
|
||||
elm.name = "span"
|
||||
elm.clear()
|
||||
elm.append(img)
|
||||
elm.attrs["style"] = "vertical-align: text-bottom; margin-right: 0.25em;"
|
||||
|
||||
|
||||
def __make_rectangle(path, text, cls):
|
||||
if cls == "none":
|
||||
pass
|
||||
elif cls == "fill":
|
||||
Icons.make_monochrome_fill_rectangle(path, text)
|
||||
elif cls == "red":
|
||||
Icons.make_rectangle(path, text, "red", "white", "red")
|
||||
elif cls == "redfill":
|
||||
Icons.make_rectangle(path, text, "red", "red", "white")
|
||||
elif cls == "bluefill":
|
||||
Icons.make_rectangle(path, text, "blue", "blue", "white")
|
||||
else:
|
||||
Icons.make_rectangle(path, text, "black", "transparent", "black")
|
|
@ -92,8 +92,8 @@ def __convert_gaiji(soup, image_dir):
|
|||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"height": 1.0 if ratio > 1.0 else ratio,
|
||||
"width": ratio if ratio > 1.0 else 1.0,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
|
@ -124,8 +124,8 @@ def __convert_rectangles(soup, image_dir):
|
|||
ratio = Icons.calculate_ratio(path)
|
||||
img = BeautifulSoup("<img/>", "xml").img
|
||||
img.attrs = {
|
||||
"height": 1.0,
|
||||
"width": ratio,
|
||||
"height": 1.0 if ratio > 1.0 else ratio,
|
||||
"width": ratio if ratio > 1.0 else 1.0,
|
||||
"sizeUnits": "em",
|
||||
"collapsible": False,
|
||||
"collapsed": False,
|
||||
|
|
|
@ -1,26 +0,0 @@
|
|||
from bot.yomichan.terms.base.terminator import BaseTerminator
|
||||
|
||||
|
||||
class JitenonTerminator(BaseTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = None
|
||||
|
||||
def _definition_tags(self, entry):
|
||||
return None
|
||||
|
||||
def _glossary(self, entry):
|
||||
if entry.entry_id in self._glossary_cache:
|
||||
return self._glossary_cache[entry.entry_id]
|
||||
glossary = self._glossary_maker.make_glossary(entry, self._image_dir)
|
||||
self._glossary_cache[entry.entry_id] = glossary
|
||||
return glossary
|
||||
|
||||
def _sequence(self, entry):
|
||||
return entry.entry_id
|
||||
|
||||
def _link_glossary_parameters(self, entry):
|
||||
return []
|
||||
|
||||
def _subentry_lists(self, entry):
|
||||
return []
|
|
@ -1,10 +1,14 @@
|
|||
from bot.entries.daijirin2.phrase_entry import PhraseEntry
|
||||
from bot.yomichan.terms.base.terminator import BaseTerminator
|
||||
from bot.entries.daijirin2 import Daijirin2PhraseEntry as PhraseEntry
|
||||
|
||||
from bot.yomichan.terms.terminator import Terminator
|
||||
from bot.yomichan.glossary.daijirin2 import make_glossary
|
||||
from bot.yomichan.grammar import sudachi_rules, tags_to_rules
|
||||
|
||||
|
||||
class Terminator(BaseTerminator):
|
||||
class Daijirin2Terminator(Terminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
|
||||
def _definition_tags(self, entry):
|
||||
return ""
|
||||
|
||||
|
|
18
bot/yomichan/terms/factory.py
Normal file
18
bot/yomichan/terms/factory.py
Normal file
|
@ -0,0 +1,18 @@
|
|||
from bot.targets import Targets
|
||||
|
||||
from bot.yomichan.terms.jitenon import JitenonKokugoTerminator
|
||||
from bot.yomichan.terms.jitenon import JitenonYojiTerminator
|
||||
from bot.yomichan.terms.jitenon import JitenonKotowazaTerminator
|
||||
from bot.yomichan.terms.smk8 import Smk8Terminator
|
||||
from bot.yomichan.terms.daijirin2 import Daijirin2Terminator
|
||||
|
||||
|
||||
def new_terminator(target):
|
||||
terminator_map = {
|
||||
Targets.JITENON_KOKUGO: JitenonKokugoTerminator,
|
||||
Targets.JITENON_YOJI: JitenonYojiTerminator,
|
||||
Targets.JITENON_KOTOWAZA: JitenonKotowazaTerminator,
|
||||
Targets.SMK8: Smk8Terminator,
|
||||
Targets.DAIJIRIN2: Daijirin2Terminator,
|
||||
}
|
||||
return terminator_map[target](target)
|
68
bot/yomichan/terms/jitenon.py
Normal file
68
bot/yomichan/terms/jitenon.py
Normal file
|
@ -0,0 +1,68 @@
|
|||
from bot.yomichan.grammar import sudachi_rules
|
||||
from bot.yomichan.terms.terminator import Terminator
|
||||
|
||||
from bot.yomichan.glossary.jitenon import JitenonKokugoGlossary
|
||||
from bot.yomichan.glossary.jitenon import JitenonYojiGlossary
|
||||
from bot.yomichan.glossary.jitenon import JitenonKotowazaGlossary
|
||||
|
||||
|
||||
class JitenonTerminator(Terminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = None
|
||||
|
||||
def _definition_tags(self, entry):
|
||||
return None
|
||||
|
||||
def _glossary(self, entry):
|
||||
if entry.entry_id in self._glossary_cache:
|
||||
return self._glossary_cache[entry.entry_id]
|
||||
glossary = self._glossary_maker.make_glossary(entry, self._image_dir)
|
||||
self._glossary_cache[entry.entry_id] = glossary
|
||||
return glossary
|
||||
|
||||
def _sequence(self, entry):
|
||||
return entry.entry_id
|
||||
|
||||
def _link_glossary_parameters(self, entry):
|
||||
return []
|
||||
|
||||
def _subentry_lists(self, entry):
|
||||
return []
|
||||
|
||||
|
||||
class JitenonKokugoTerminator(JitenonTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = JitenonKokugoGlossary()
|
||||
|
||||
def _inflection_rules(self, entry, expression):
|
||||
return sudachi_rules(expression)
|
||||
|
||||
def _term_tags(self, entry):
|
||||
return ""
|
||||
|
||||
|
||||
class JitenonYojiTerminator(JitenonTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = JitenonYojiGlossary()
|
||||
|
||||
def _inflection_rules(self, entry, expression):
|
||||
return ""
|
||||
|
||||
def _term_tags(self, entry):
|
||||
tags = entry.kanken_level.split("/")
|
||||
return " ".join(tags)
|
||||
|
||||
|
||||
class JitenonKotowazaTerminator(JitenonTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = JitenonKotowazaGlossary()
|
||||
|
||||
def _inflection_rules(self, entry, expression):
|
||||
return sudachi_rules(expression)
|
||||
|
||||
def _term_tags(self, entry):
|
||||
return ""
|
|
@ -1,15 +0,0 @@
|
|||
from bot.yomichan.grammar import sudachi_rules
|
||||
from bot.yomichan.glossary.jitenon import JitenonKokugoGlossary
|
||||
from bot.yomichan.terms.base.jitenon import JitenonTerminator
|
||||
|
||||
|
||||
class Terminator(JitenonTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = JitenonKokugoGlossary()
|
||||
|
||||
def _inflection_rules(self, entry, expression):
|
||||
return sudachi_rules(expression)
|
||||
|
||||
def _term_tags(self, entry):
|
||||
return ""
|
|
@ -1,15 +0,0 @@
|
|||
from bot.yomichan.grammar import sudachi_rules
|
||||
from bot.yomichan.glossary.jitenon import JitenonKotowazaGlossary
|
||||
from bot.yomichan.terms.base.jitenon import JitenonTerminator
|
||||
|
||||
|
||||
class Terminator(JitenonTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = JitenonKotowazaGlossary()
|
||||
|
||||
def _inflection_rules(self, entry, expression):
|
||||
return sudachi_rules(expression)
|
||||
|
||||
def _term_tags(self, entry):
|
||||
return ""
|
|
@ -1,15 +0,0 @@
|
|||
from bot.yomichan.glossary.jitenon import JitenonYojiGlossary
|
||||
from bot.yomichan.terms.base.jitenon import JitenonTerminator
|
||||
|
||||
|
||||
class Terminator(JitenonTerminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
self._glossary_maker = JitenonYojiGlossary()
|
||||
|
||||
def _inflection_rules(self, entry, expression):
|
||||
return ""
|
||||
|
||||
def _term_tags(self, entry):
|
||||
tags = entry.kanken_level.split("/")
|
||||
return " ".join(tags)
|
|
@ -1,43 +0,0 @@
|
|||
from bot.entries.sankoku8.phrase_entry import PhraseEntry
|
||||
from bot.yomichan.terms.base.terminator import BaseTerminator
|
||||
from bot.yomichan.glossary.sankoku8 import make_glossary
|
||||
from bot.yomichan.grammar import sudachi_rules, tags_to_rules
|
||||
|
||||
|
||||
class Terminator(BaseTerminator):
|
||||
def _definition_tags(self, entry):
|
||||
return ""
|
||||
|
||||
def _inflection_rules(self, entry, expression):
|
||||
if isinstance(entry, PhraseEntry):
|
||||
return sudachi_rules(expression)
|
||||
pos_tags = entry.get_part_of_speech_tags()
|
||||
if len(pos_tags) == 0:
|
||||
return sudachi_rules(expression)
|
||||
else:
|
||||
return tags_to_rules(expression, pos_tags, self._inflection_categories)
|
||||
|
||||
def _glossary(self, entry):
|
||||
if entry.entry_id in self._glossary_cache:
|
||||
return self._glossary_cache[entry.entry_id]
|
||||
glossary = make_glossary(entry, self._image_dir)
|
||||
self._glossary_cache[entry.entry_id] = glossary
|
||||
return glossary
|
||||
|
||||
def _sequence(self, entry):
|
||||
return entry.entry_id[0] * 100000 + entry.entry_id[1]
|
||||
|
||||
def _term_tags(self, entry):
|
||||
return ""
|
||||
|
||||
def _link_glossary_parameters(self, entry):
|
||||
return [
|
||||
[entry.children, "子"],
|
||||
[entry.phrases, "句"]
|
||||
]
|
||||
|
||||
def _subentry_lists(self, entry):
|
||||
return [
|
||||
entry.children,
|
||||
entry.phrases,
|
||||
]
|
|
@ -1,11 +1,12 @@
|
|||
from bot.entries.smk8.kanji_entry import KanjiEntry
|
||||
from bot.entries.smk8.phrase_entry import PhraseEntry
|
||||
from bot.yomichan.terms.base.terminator import BaseTerminator
|
||||
from bot.entries.smk8 import Smk8KanjiEntry as KanjiEntry
|
||||
from bot.entries.smk8 import Smk8PhraseEntry as PhraseEntry
|
||||
|
||||
from bot.yomichan.terms.terminator import Terminator
|
||||
from bot.yomichan.glossary.smk8 import make_glossary
|
||||
from bot.yomichan.grammar import sudachi_rules, tags_to_rules
|
||||
|
||||
|
||||
class Terminator(BaseTerminator):
|
||||
class Smk8Terminator(Terminator):
|
||||
def __init__(self, target):
|
||||
super().__init__(target)
|
||||
|
||||
|
|
|
@ -2,7 +2,7 @@ from abc import abstractmethod, ABC
|
|||
from bot.data import load_yomichan_inflection_categories
|
||||
|
||||
|
||||
class BaseTerminator(ABC):
|
||||
class Terminator(ABC):
|
||||
def __init__(self, target):
|
||||
self._target = target
|
||||
self._glossary_cache = {}
|
||||
|
@ -66,28 +66,28 @@ class BaseTerminator(ABC):
|
|||
|
||||
@abstractmethod
|
||||
def _definition_tags(self, entry):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def _inflection_rules(self, entry, expression):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def _glossary(self, entry):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def _sequence(self, entry):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def _term_tags(self, entry):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def _link_glossary_parameters(self, entry):
|
||||
raise NotImplementedError
|
||||
pass
|
||||
|
||||
@abstractmethod
|
||||
def _subentry_lists(self, entry):
|
||||
raise NotImplementedError
|
||||
pass
|
|
@ -1391,7 +1391,7 @@
|
|||
22544,16385,おもいこしをあげる
|
||||
22634,16385,おもいたったがきちにち
|
||||
22634,16386,おもいたつひがきちじつ
|
||||
22728,16385,おもうえに
|
||||
22728,16385,おもうゆえに
|
||||
22728,16386,おもうこころ
|
||||
22728,16387,おもうこといわねばはらふくる
|
||||
22728,16388,おもうそら
|
||||
|
@ -5224,7 +5224,7 @@
|
|||
111520,16385,てんちょうにたっする
|
||||
111583,16385,てんどうぜかひか
|
||||
111583,16386,てんどうひとをころさず
|
||||
111645,16385,てんばくうをゆく
|
||||
111645,16385,てんばくうをいく
|
||||
111695,16385,てんびんにかける
|
||||
111790,16385,てんめいをしる
|
||||
111801,16385,てんもうかいかいそにしてもらさず
|
||||
|
@ -5713,7 +5713,7 @@
|
|||
119456,16385,なまきにくぎ
|
||||
119456,16386,なまきをさく
|
||||
119472,16385,なまけもののあしからとりがたつ
|
||||
119472,16386,なまけもののせっくばたらき
|
||||
119472,16386,なまけもののせっくはたらき
|
||||
119503,16385,なますにたたく
|
||||
119503,16386,なますをふく
|
||||
119507,16385,なまずをひょうたんでおさえる
|
||||
|
@ -7215,7 +7215,7 @@
|
|||
154782,16388,みずがはいる
|
||||
154782,16389,みずがひく
|
||||
154782,16390,みずかる
|
||||
154782,16391,みずきよければうおすまず
|
||||
154782,16391,みずきょければうおすまず
|
||||
154782,16392,みずすむ
|
||||
154782,16393,みずでわる
|
||||
154782,16394,みずとあぶら
|
||||
|
|
|
File diff suppressed because it is too large
Load diff
|
@ -1,61 +1,47 @@
|
|||
𠮟,叱
|
||||
吞,呑
|
||||
靭,靱
|
||||
臈,﨟
|
||||
啞,唖
|
||||
嚙,噛
|
||||
屛,屏
|
||||
幷,并
|
||||
彎,弯
|
||||
搔,掻
|
||||
攪,撹
|
||||
枡,桝
|
||||
濾,沪
|
||||
繡,繍
|
||||
蔣,蒋
|
||||
蠟,蝋
|
||||
醬,醤
|
||||
穎,頴
|
||||
鷗,鴎
|
||||
鹼,鹸
|
||||
麴,麹
|
||||
俠,侠
|
||||
俱,倶
|
||||
儘,侭
|
||||
凜,凛
|
||||
剝,剥
|
||||
𠮟,叱
|
||||
吞,呑
|
||||
啞,唖
|
||||
噓,嘘
|
||||
嚙,噛
|
||||
囊,嚢
|
||||
塡,填
|
||||
姸,妍
|
||||
屛,屏
|
||||
屢,屡
|
||||
拋,抛
|
||||
搔,掻
|
||||
摑,掴
|
||||
瀆,涜
|
||||
攪,撹
|
||||
潑,溌
|
||||
瀆,涜
|
||||
焰,焔
|
||||
禱,祷
|
||||
竜,龍
|
||||
筓,笄
|
||||
簞,箪
|
||||
籠,篭
|
||||
繡,繍
|
||||
繫,繋
|
||||
腁,胼
|
||||
萊,莱
|
||||
藪,薮
|
||||
蟬,蝉
|
||||
蠟,蝋
|
||||
軀,躯
|
||||
醬,醤
|
||||
醱,醗
|
||||
頰,頬
|
||||
顚,顛
|
||||
驒,騨
|
||||
姸,妍
|
||||
攢,攅
|
||||
𣜜,杤
|
||||
檔,档
|
||||
槶,椢
|
||||
櫳,槞
|
||||
纊,絋
|
||||
纘,纉
|
||||
隯,陦
|
||||
筓,笄
|
||||
逬,迸
|
||||
腁,胼
|
||||
騈,駢
|
||||
拋,抛
|
||||
篡,簒
|
||||
檜,桧
|
||||
禰,祢
|
||||
禱,祷
|
||||
蘆,芦
|
||||
凜,凛
|
||||
鶯,鴬
|
||||
鷗,鴎
|
||||
鷽,鴬
|
||||
鹼,鹸
|
||||
麴,麹
|
||||
靭,靱
|
||||
靱,靭
|
||||
|
|
|
|
@ -1,19 +1,19 @@
|
|||
|
||||
@font-face {
|
||||
font-family: jpgothic;
|
||||
src: local("Noto Sans CJK JP"), local("IPAexGothic"), local("Source Han Sans JP");
|
||||
src: local("メイリオ"), local("ヒラギノ角ゴ Pro W3"), local("Hiragino Kaku Gothic Pro"), local("Meiryo"), local("Noto Sans CJK JP"), local("IPAexGothic"), local("Source Han Sans JP"), local("MS Pゴシック"), local("MS Pgothic"), local("sans-serif");
|
||||
}
|
||||
|
||||
@font-face {
|
||||
font-family: jpmincho;
|
||||
src: local("Noto Serif CJK JP"), local("IPAexMincho"), local("IPAmjMincho"), local("Source Han Serif JP"), local("HanaMinA"), local("HanaMinB");
|
||||
src: local("Noto Serif CJK JP"), local("IPAexMincho"), local("Source Han Serif JP"), local("MS PMincho"), local("serif");
|
||||
}
|
||||
|
||||
body {
|
||||
/*margin: 0em 1em;*/
|
||||
margin: 0em 1em;
|
||||
line-height: 1.5em;
|
||||
font-family: jpmincho, serif;
|
||||
/*font-size: 1.2em;*/
|
||||
font-family: jpmincho;
|
||||
font-size: 1.2em;
|
||||
color: black;
|
||||
}
|
||||
|
||||
|
@ -43,7 +43,7 @@ span[data-name="i"] {
|
|||
}
|
||||
|
||||
span[data-name="h1"] {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-size: 1em;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
@ -134,7 +134,7 @@ span[data-name="キャプション"] {
|
|||
}
|
||||
|
||||
span[data-name="ルビG"] {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-size: 0.7em;
|
||||
font-weight: normal;
|
||||
vertical-align: 0.35em;
|
||||
|
@ -142,7 +142,7 @@ span[data-name="ルビG"] {
|
|||
}
|
||||
|
||||
.warichu span[data-name="ルビG"] {
|
||||
font-family: jpmincho, serif;
|
||||
font-family: jpmincho;
|
||||
font-size: 0.5em;
|
||||
font-weight: normal;
|
||||
vertical-align: 0em;
|
||||
|
@ -178,7 +178,7 @@ span[data-name="句仮名"] {
|
|||
}
|
||||
|
||||
span[data-name="句表記"] {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
|
@ -189,7 +189,7 @@ span[data-name="句項目"] {
|
|||
}
|
||||
|
||||
span[data-name="和字"] {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
}
|
||||
|
||||
span[data-name="品詞行"] {
|
||||
|
@ -209,7 +209,7 @@ span[data-name="大語義"] {
|
|||
span[data-name="大語義num"] {
|
||||
margin: 0.025em;
|
||||
padding: 0.1em;
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-size: 0.8em;
|
||||
color: white;
|
||||
background-color: black;
|
||||
|
@ -227,7 +227,7 @@ span[data-name="慣用G"] {
|
|||
}
|
||||
|
||||
span[data-name="欧字"] {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
}
|
||||
|
||||
span[data-name="歴史仮名"] {
|
||||
|
@ -248,7 +248,7 @@ span[data-name="準大語義"] {
|
|||
span[data-name="準大語義num"] {
|
||||
margin: 0.025em;
|
||||
padding: 0.1em;
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-size: 0.8em;
|
||||
border: solid 1px black;
|
||||
}
|
||||
|
@ -256,7 +256,7 @@ span[data-name="準大語義num"] {
|
|||
span[data-name="漢字音logo"] {
|
||||
margin: 0.025em;
|
||||
padding: 0.1em;
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-size: 0.8em;
|
||||
border: solid 0.5px black;
|
||||
border-radius: 1em;
|
||||
|
@ -290,17 +290,17 @@ span[data-name="異字同訓"] {
|
|||
}
|
||||
|
||||
span[data-name="異字同訓仮名"] {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
span[data-name="異字同訓漢字"] {
|
||||
font-family: jpmincho, serif;
|
||||
font-family: jpmincho;
|
||||
font-weight: normal;
|
||||
}
|
||||
|
||||
span[data-name="異字同訓表記"] {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
|
@ -321,12 +321,12 @@ rt {
|
|||
}
|
||||
|
||||
span[data-name="見出仮名"] {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
span[data-name="見出相当部"] {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
|
@ -371,7 +371,7 @@ span[data-name="logo"] {
|
|||
}
|
||||
|
||||
.gothic {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
|
@ -407,7 +407,7 @@ span[data-name="付記"]:after {
|
|||
}
|
||||
|
||||
div[data-child-links] {
|
||||
padding-left: 1em;
|
||||
padding-top: 1em;
|
||||
}
|
||||
|
||||
div[data-child-links] ul {
|
||||
|
@ -417,7 +417,7 @@ div[data-child-links] ul {
|
|||
|
||||
div[data-child-links] span {
|
||||
padding: 0.1em;
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-size: 0.8em;
|
||||
color: white;
|
||||
border-width: 0.05em;
|
||||
|
|
|
@ -1,17 +1,20 @@
|
|||
|
||||
@font-face {
|
||||
font-family: jpgothic;
|
||||
src: local("Noto Sans CJK JP"), local("IPAexGothic"), local("Source Han Sans JP");
|
||||
src: local("メイリオ"), local("ヒラギノ角ゴ Pro W3"), local("Hiragino Kaku Gothic Pro"), local("Meiryo"), local("Noto Sans CJK JP"), local("IPAexGothic"), local("Source Han Sans JP"), local("MS Pゴシック"), local("MS Pgothic"), local("sans-serif");
|
||||
}
|
||||
|
||||
@font-face {
|
||||
font-family: jpmincho;
|
||||
src: local("Noto Serif CJK JP"), local("IPAexMincho"), local("IPAmjMincho"), local("Source Han Serif JP"), local("HanaMinA"), local("HanaMinB");
|
||||
src: local("Noto Serif CJK JP"), local("IPAexMincho"), local("Source Han Serif JP"), local("MS PMincho"), local("serif");
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: jpmincho, serif;
|
||||
font-family: jpmincho;
|
||||
margin: 0em 1em;
|
||||
line-height: 1.5em;
|
||||
font-size: 1.2em;
|
||||
color: black;
|
||||
}
|
||||
|
||||
table, th, td {
|
||||
|
@ -21,7 +24,7 @@ table, th, td {
|
|||
}
|
||||
|
||||
th {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
color: black;
|
||||
background-color: lightgray;
|
||||
font-weight: normal;
|
||||
|
@ -40,18 +43,17 @@ td ul {
|
|||
}
|
||||
|
||||
.読み方 {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
.意味,
|
||||
.kanjirighttb {
|
||||
.意味 {
|
||||
margin-left: 1.0em;
|
||||
margin-bottom: 0.5em;
|
||||
}
|
||||
|
||||
.num_icon {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
padding-left: 0.25em;
|
||||
margin-right: 0.5em;
|
||||
font-size: 0.8em;
|
||||
|
@ -61,3 +63,4 @@ td ul {
|
|||
border-style: none;
|
||||
-webkit-border-radius: 0.1em;
|
||||
}
|
||||
|
||||
|
|
|
@ -1,17 +1,20 @@
|
|||
|
||||
@font-face {
|
||||
font-family: jpgothic;
|
||||
src: local("Noto Sans CJK JP"), local("IPAexGothic"), local("Source Han Sans JP");
|
||||
src: local("メイリオ"), local("ヒラギノ角ゴ Pro W3"), local("Hiragino Kaku Gothic Pro"), local("Meiryo"), local("Noto Sans CJK JP"), local("IPAexGothic"), local("Source Han Sans JP"), local("MS Pゴシック"), local("MS Pgothic"), local("sans-serif");
|
||||
}
|
||||
|
||||
@font-face {
|
||||
font-family: jpmincho;
|
||||
src: local("Noto Serif CJK JP"), local("IPAexMincho"), local("IPAmjMincho"), local("Source Han Serif JP"), local("HanaMinA"), local("HanaMinB");
|
||||
src: local("Noto Serif CJK JP"), local("IPAexMincho"), local("Source Han Serif JP"), local("MS PMincho"), local("serif");
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: jpmincho, serif;
|
||||
font-family: jpmincho;
|
||||
margin: 0em 1em;
|
||||
line-height: 1.5em;
|
||||
font-size: 1.2em;
|
||||
color: black;
|
||||
}
|
||||
|
||||
table, th, td {
|
||||
|
@ -21,7 +24,7 @@ table, th, td {
|
|||
}
|
||||
|
||||
th {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
color: black;
|
||||
background-color: lightgray;
|
||||
font-weight: normal;
|
||||
|
@ -36,12 +39,12 @@ a {
|
|||
}
|
||||
|
||||
.読み方 {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
.意味,
|
||||
.kanjirighttb {
|
||||
.意味 {
|
||||
margin-left: 1.0em;
|
||||
margin-bottom: 0.5em;
|
||||
}
|
||||
|
||||
|
|
|
@ -1,17 +1,20 @@
|
|||
|
||||
@font-face {
|
||||
font-family: jpgothic;
|
||||
src: local("Noto Sans CJK JP"), local("IPAexGothic"), local("Source Han Sans JP");
|
||||
src: local("メイリオ"), local("ヒラギノ角ゴ Pro W3"), local("Hiragino Kaku Gothic Pro"), local("Meiryo"), local("Noto Sans CJK JP"), local("IPAexGothic"), local("Source Han Sans JP"), local("MS Pゴシック"), local("MS Pgothic"), local("sans-serif");
|
||||
}
|
||||
|
||||
@font-face {
|
||||
font-family: jpmincho;
|
||||
src: local("Noto Serif CJK JP"), local("IPAexMincho"), local("IPAmjMincho"), local("Source Han Serif JP"), local("HanaMinA"), local("HanaMinB");
|
||||
src: local("Noto Serif CJK JP"), local("IPAexMincho"), local("Source Han Serif JP"), local("MS PMincho"), local("serif");
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: jpmincho, serif;
|
||||
font-family: jpmincho;
|
||||
margin: 0em 1em;
|
||||
line-height: 1.5em;
|
||||
font-size: 1.2em;
|
||||
color: black;
|
||||
}
|
||||
|
||||
table, th, td {
|
||||
|
@ -21,7 +24,7 @@ table, th, td {
|
|||
}
|
||||
|
||||
th {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
color: black;
|
||||
background-color: lightgray;
|
||||
font-weight: normal;
|
||||
|
@ -36,12 +39,12 @@ a {
|
|||
}
|
||||
|
||||
.読み方 {
|
||||
font-family: jpgothic, sans-serif;
|
||||
font-family: jpgothic;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
.意味,
|
||||
.kanjirighttb {
|
||||
.意味 {
|
||||
margin-left: 1.0em;
|
||||
margin-bottom: 0.5em;
|
||||
}
|
||||
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show more
Loading…
Reference in a new issue