Update README.md
This commit is contained in:
parent
e85d0a1625
commit
14e50fb4f4
157
README.md
157
README.md
|
@ -4,12 +4,13 @@ compiling the scraped data into compact dictionary file formats.
|
||||||
|
|
||||||
### Supported Dictionaries
|
### Supported Dictionaries
|
||||||
* Web Dictionaries
|
* Web Dictionaries
|
||||||
* [国語辞典オンライン](https://kokugo.jitenon.jp/) (Jitenon Kokugo)
|
* [国語辞典オンライン](https://kokugo.jitenon.jp/) (`jitenon-kokugo`)
|
||||||
* [四字熟語辞典オンライン](https://yoji.jitenon.jp/) (Jitenon Yoji)
|
* [四字熟語辞典オンライン](https://yoji.jitenon.jp/) (`jitenon-yoji`)
|
||||||
* [故事・ことわざ・慣用句オンライン](https://kotowaza.jitenon.jp/) (Jitenon Kotowaza)
|
* [故事・ことわざ・慣用句オンライン](https://kotowaza.jitenon.jp/) (`jitenon-kotowaza`)
|
||||||
* Monokakido (["辞書 by 物書堂"](https://www.monokakido.jp/ja/dictionaries/app/))
|
* Monokakido
|
||||||
* [新明解国語辞典 第八版](https://www.monokakido.jp/ja/dictionaries/smk8/index.html) (Shinmeikai 8e)
|
* [新明解国語辞典 第八版](https://www.monokakido.jp/ja/dictionaries/smk8/index.html) (`smk8`)
|
||||||
* [大辞林 第四版](https://www.monokakido.jp/ja/dictionaries/daijirin2/index.html) (Daijirin 4e)
|
* [大辞林 第四版](https://www.monokakido.jp/ja/dictionaries/daijirin2/index.html) (`daijirin2`)
|
||||||
|
* [三省堂国語辞典 第八版](https://www.monokakido.jp/ja/dictionaries/sankoku8/index.html) (`sankoku8`)
|
||||||
|
|
||||||
### Supported Output Formats
|
### Supported Output Formats
|
||||||
|
|
||||||
|
@ -48,6 +49,12 @@ compiling the scraped data into compact dictionary file formats.
|
||||||
![daijirin2](https://user-images.githubusercontent.com/8003332/235578700-9dbf4fb0-0154-48b5-817c-8fe75e442afc.png)
|
![daijirin2](https://user-images.githubusercontent.com/8003332/235578700-9dbf4fb0-0154-48b5-817c-8fe75e442afc.png)
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Sanseidō 8e (print | yomichan)</summary>
|
||||||
|
|
||||||
|
![sankoku8](https://github.com/stephenmk/jitenbot/assets/8003332/0358b3fc-71fb-4557-977c-1976a12229ec)
|
||||||
|
</details>
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary>Various (GoldenDict)</summary>
|
<summary>Various (GoldenDict)</summary>
|
||||||
|
|
||||||
|
@ -57,13 +64,14 @@ compiling the scraped data into compact dictionary file formats.
|
||||||
# Usage
|
# Usage
|
||||||
```
|
```
|
||||||
usage: jitenbot [-h] [-p PAGE_DIR] [-m MEDIA_DIR] [-i MDICT_ICON]
|
usage: jitenbot [-h] [-p PAGE_DIR] [-m MEDIA_DIR] [-i MDICT_ICON]
|
||||||
[--no-yomichan-export] [--no-mdict-export]
|
[--no-mdict-export] [--no-yomichan-export]
|
||||||
{jitenon-kokugo,jitenon-yoji,jitenon-kotowaza,smk8,daijirin2}
|
[--validate-yomichan-terms]
|
||||||
|
{jitenon-kokugo,jitenon-yoji,jitenon-kotowaza,smk8,daijirin2,sankoku8}
|
||||||
|
|
||||||
Convert Japanese dictionary files to new formats.
|
Convert Japanese dictionary files to new formats.
|
||||||
|
|
||||||
positional arguments:
|
positional arguments:
|
||||||
{jitenon-kokugo,jitenon-yoji,jitenon-kotowaza,smk8,daijirin2}
|
{jitenon-kokugo,jitenon-yoji,jitenon-kotowaza,smk8,daijirin2,sankoku8}
|
||||||
name of dictionary to convert
|
name of dictionary to convert
|
||||||
|
|
||||||
options:
|
options:
|
||||||
|
@ -75,10 +83,14 @@ options:
|
||||||
graphics, audio, etc.)
|
graphics, audio, etc.)
|
||||||
-i MDICT_ICON, --mdict-icon MDICT_ICON
|
-i MDICT_ICON, --mdict-icon MDICT_ICON
|
||||||
path to icon file to be used with MDict
|
path to icon file to be used with MDict
|
||||||
--no-yomichan-export skip export of dictionary data to Yomichan format
|
|
||||||
--no-mdict-export skip export of dictionary data to MDict format
|
--no-mdict-export skip export of dictionary data to MDict format
|
||||||
|
--no-yomichan-export skip export of dictionary data to Yomichan format
|
||||||
|
--validate-yomichan-terms
|
||||||
|
validate JSON structure of exported Yomichan
|
||||||
|
dictionary terms
|
||||||
|
|
||||||
See README.md for details regarding media directory structures
|
See README.md for details regarding media directory structures
|
||||||
|
|
||||||
```
|
```
|
||||||
### Web Targets
|
### Web Targets
|
||||||
Jitenbot will scrape the target website and save the pages to the [user cache directory](https://pypi.org/project/platformdirs/).
|
Jitenbot will scrape the target website and save the pages to the [user cache directory](https://pypi.org/project/platformdirs/).
|
||||||
|
@ -89,55 +101,112 @@ HTTP request headers (user agent string, etc.) may be customized by editing the
|
||||||
[user config directory](https://pypi.org/project/platformdirs/).
|
[user config directory](https://pypi.org/project/platformdirs/).
|
||||||
|
|
||||||
### Monokakido Targets
|
### Monokakido Targets
|
||||||
Page data and media data must be [procured by the user](https://github.com/golddranks/monokakido/)
|
These digital dictionaries are available for purchase through the [Monokakido Dictionaries app](https://www.monokakido.jp/ja/dictionaries/app/) on MacOS/iOS. Under ideal circumstances, Jitenbot would be able to automatically fetch all the data it needs from this app's data directory[^1] on your system. In its current state of development, Jitenbot unfortunately requires you to find and assemble the necessary data yourself. The files must be organized into a particular folder structure (defined below) and then passed to Jitenbot via the corresponding command line arguments.
|
||||||
and passed to jitenbot via the appropriate command line flags.
|
|
||||||
|
Some of the files in the app's data directory[^1] are encoded and must be unencoded using [golddranks' monokakido tool](https://github.com/golddranks/monokakido/). Directories which contain these encoded files are indicated by a reference mark (※) in the notes below.
|
||||||
|
|
||||||
|
[^1]: `/Library/Application Support/AppStoreContent/jp.monokakido.Dictionaries/Products/`
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary>smk8 media directory</summary>
|
<summary>smk8 files</summary>
|
||||||
|
|
||||||
Since Yomichan does not support audio files from imported
|
Since Yomichan does not support audio files from imported dictionaries, the `audio/` directory may be omitted to save filesize space in the output ZIP file if desired.
|
||||||
dictionaries, the `audio/` directory may be omitted to save filesize
|
|
||||||
space in the output ZIP file if desired.
|
|
||||||
|
|
||||||
```
|
```
|
||||||
media
|
.
|
||||||
├── Audio.png
|
├── media
|
||||||
├── audio
|
│ ├── audio (※)
|
||||||
│ ├── 00001.aac
|
│ │ ├── 00001.aac
|
||||||
│ ├── 00002.aac
|
│ │ ├── 00002.aac
|
||||||
│ ├── 00003.aac
|
│ │ ├── 00003.aac
|
||||||
│ │ ...
|
│ │ ├── ...
|
||||||
│ └── 82682.aac
|
│ │ └── 82682.aac
|
||||||
└── gaiji
|
│ ├── Audio.png
|
||||||
├── 1d110.svg
|
│ └── gaiji
|
||||||
├── 1d15d.svg
|
│ ├── 1d110.svg
|
||||||
├── 1d15e.svg
|
│ ├── 1d15d.svg
|
||||||
│ ...
|
│ ├── 1d15e.svg
|
||||||
└── xbunnoa.svg
|
│ ├── ...
|
||||||
|
│ └── xbunnoa.svg
|
||||||
|
└── pages (※)
|
||||||
|
├── 0000000000.xml
|
||||||
|
├── 0000000001.xml
|
||||||
|
├── 0000000002.xml
|
||||||
|
├── ...
|
||||||
|
└── 0000064581.xml
|
||||||
```
|
```
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
<details>
|
<details>
|
||||||
<summary>daijirin2 media directory</summary>
|
<summary>daijirin2 files</summary>
|
||||||
|
|
||||||
The `graphics/` directory may be omitted to save space if desired.
|
The `graphics/` directory may be omitted to save space if desired.
|
||||||
|
|
||||||
```
|
```
|
||||||
media
|
.
|
||||||
├── gaiji
|
├── media
|
||||||
│ ├── 1D10B.svg
|
│ ├── gaiji
|
||||||
│ ├── 1D110.svg
|
│ │ ├── 1D10B.svg
|
||||||
│ ├── 1D12A.svg
|
│ │ ├── 1D110.svg
|
||||||
│ │ ...
|
│ │ ├── 1D12A.svg
|
||||||
│ └── vectorOB.svg
|
│ │ ├── ...
|
||||||
└── graphics
|
│ │ └── vectorOB.svg
|
||||||
├── 3djr_0002.png
|
│ └── graphics (※)
|
||||||
├── 3djr_0004.png
|
│ ├── 3djr_0002.png
|
||||||
├── 3djr_0005.png
|
│ ├── 3djr_0004.png
|
||||||
│ ...
|
│ ├── 3djr_0005.png
|
||||||
└── 4djr_yahazu.png
|
│ ├── ...
|
||||||
|
│ └── 4djr_yahazu.png
|
||||||
|
└── pages (※)
|
||||||
|
├── 0000000001.xml
|
||||||
|
├── 0000000002.xml
|
||||||
|
├── 0000000003.xml
|
||||||
|
├── ...
|
||||||
|
└── 0000182633.xml
|
||||||
|
```
|
||||||
|
</details>
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>sankoku8 files</summary>
|
||||||
|
|
||||||
|
```
|
||||||
|
.
|
||||||
|
├── media
|
||||||
|
│ ├── graphics
|
||||||
|
│ │ ├── 000chouchou.png
|
||||||
|
│ │ ├── ...
|
||||||
|
│ │ └── 888udatsu.png
|
||||||
|
│ ├── svg-accent
|
||||||
|
│ │ ├── アクセント.svg
|
||||||
|
│ │ └── 平板.svg
|
||||||
|
│ ├── svg-frac
|
||||||
|
│ │ ├── frac-1-2.svg
|
||||||
|
│ │ ├── ...
|
||||||
|
│ │ └── frac-a-b.svg
|
||||||
|
│ ├── svg-gaiji
|
||||||
|
│ │ ├── aiaigasa.svg
|
||||||
|
│ │ ├── ...
|
||||||
|
│ │ └── 異体字_西.svg
|
||||||
|
│ ├── svg-intonation
|
||||||
|
│ │ ├── 上昇下降.svg
|
||||||
|
│ │ ├── ...
|
||||||
|
│ │ └── 長.svg
|
||||||
|
│ ├── svg-logo
|
||||||
|
│ │ ├── denshi.svg
|
||||||
|
│ │ ├── ...
|
||||||
|
│ │ └── 重要語.svg
|
||||||
|
│ └── svg-special
|
||||||
|
│ └── 区切り線.svg
|
||||||
|
└── pages (※)
|
||||||
|
├── 0000000001.xml
|
||||||
|
├── ...
|
||||||
|
└── 0000065457.xml
|
||||||
```
|
```
|
||||||
</details>
|
</details>
|
||||||
|
|
||||||
# Attribution
|
# Attribution
|
||||||
`Adobe-Japan1_sequences.txt` is provided by [The Adobe-Japan1-7 Character Collection](https://github.com/adobe-type-tools/Adobe-Japan1).
|
`Adobe-Japan1_sequences.txt` is provided by [The Adobe-Japan1-7 Character Collection](https://github.com/adobe-type-tools/Adobe-Japan1).
|
||||||
|
|
||||||
|
The Yomichan term-bank schema definition `dictionary-term-bank-v3-schema.json` is provided by the [Yomichan](https://github.com/foosoft/yomichan) project.
|
||||||
|
|
||||||
|
Many thanks to [epistularum](https://github.com/epistularum) for providing thoughtful feedback regarding the implementation of the MDict export functionality.
|
||||||
|
|
Loading…
Reference in a new issue