jitenbot/README.md

# jitenbot
Jitenbot is a program for scraping Japanese dictionary websites and
compiling the scraped data into compact dictionary file formats.

### Supported Dictionaries
* Online
  * [四字熟語辞典オンライン](https://yoji.jitenon.jp/)
  * [故事・ことわざ・慣用句オンライン](https://kotowaza.jitenon.jp/)
* Offline
  * [新明解国語辞典 第八版](https://www.monokakido.jp/ja/dictionaries/smk8/index.html)
  * [大辞林 第四版](https://www.monokakido.jp/ja/dictionaries/daijirin2/index.html)


### Supported Output Formats

* [Yomichan](https://github.com/foosoft/yomichan)

# Usage
```
usage: jitenbot [-h] [-p PAGE_DIR] [-i IMAGE_DIR]
                {jitenon-yoji,jitenon-kotowaza,smk8,daijirin2}

Convert Japanese dictionary files to new formats.

positional arguments:
  {jitenon-yoji,jitenon-kotowaza,smk8,daijirin2}
                        name of dictionary to convert

options:
  -h, --help            show this help message and exit
  -p PAGE_DIR, --page-dir PAGE_DIR
                        path to directory containing XML page files
  -i IMAGE_DIR, --image-dir IMAGE_DIR
                        path to directory containing image folders (gaiji,
                        graphics, etc.)

```
### Online Targets
Jitenbot will scrape the target website and save the pages to the [user's cache directory](https://pypi.org/project/platformdirs/).
As a courtesy to the website owners, jitenbot is configured to pause for 10 seconds between each page request. Consequently, 
a complete crawl of a target website may take several hours.

### Offline Targets
Page data and image data must be supplied by the user and passed to jitenbot via the appropriate command line flags.

# Attribution
`Adobe-Japan1_sequences.txt` is provided by [The Adobe-Japan1-7 Character Collection](https://github.com/adobe-type-tools/Adobe-Japan1).
Create README.md 2023-05-01 23:23:05 +00:00			`# jitenbot`
			`Jitenbot is a program for scraping Japanese dictionary websites and`
			`compiling the scraped data into compact dictionary file formats.`

			`### Supported Dictionaries`
			`* Online`
			`* [四字熟語辞典オンライン](https://yoji.jitenon.jp/)`
			`* [故事・ことわざ・慣用句オンライン](https://kotowaza.jitenon.jp/)`
			`* Offline`
			`* [新明解国語辞典第八版](https://www.monokakido.jp/ja/dictionaries/smk8/index.html)`
			`* [大辞林第四版](https://www.monokakido.jp/ja/dictionaries/daijirin2/index.html)`


			`### Supported Output Formats`

			`* [Yomichan](https://github.com/foosoft/yomichan)`

			`# Usage`
			```
			`usage: jitenbot [-h] [-p PAGE_DIR] [-i IMAGE_DIR]`
			`{jitenon-yoji,jitenon-kotowaza,smk8,daijirin2}`

			`Convert Japanese dictionary files to new formats.`

			`positional arguments:`
			`{jitenon-yoji,jitenon-kotowaza,smk8,daijirin2}`
			`name of dictionary to convert`

			`options:`
			`-h, --help show this help message and exit`
			`-p PAGE_DIR, --page-dir PAGE_DIR`
			`path to directory containing XML page files`
			`-i IMAGE_DIR, --image-dir IMAGE_DIR`
			`path to directory containing image folders (gaiji,`
			`graphics, etc.)`

			```
			`### Online Targets`
			`Jitenbot will scrape the target website and save the pages to the [user's cache directory](https://pypi.org/project/platformdirs/).`
			`As a courtesy to the website owners, jitenbot is configured to pause for 10 seconds between each page request. Consequently,`
			`a complete crawl of a target website may take several hours.`

			`### Offline Targets`
			`Page data and image data must be supplied by the user and passed to jitenbot via the appropriate command line flags.`

			`# Attribution`
			`Adobe-Japan1_sequences.txt` is provided by [The Adobe-Japan1-7 Character Collection](https://github.com/adobe-type-tools/Adobe-Japan1).