Create README.md

2023-05-01 18:23:05 -05:00 · 2023-05-01 18:23:05 -05:00 · c23db8c50e
parent 5aa954bf2d
commit c23db8c50e
1 changed files with 47 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,47 @@
+# jitenbot
+Jitenbot is a program for scraping Japanese dictionary websites and
+compiling the scraped data into compact dictionary file formats.
+
+### Supported Dictionaries
+* Online
+  * [四字熟語辞典オンライン](https://yoji.jitenon.jp/)
+  * [故事・ことわざ・慣用句オンライン](https://kotowaza.jitenon.jp/)
+* Offline
+  * [新明解国語辞典 第八版](https://www.monokakido.jp/ja/dictionaries/smk8/index.html)
+  * [大辞林 第四版](https://www.monokakido.jp/ja/dictionaries/daijirin2/index.html)
+
+
+### Supported Output Formats
+
+* [Yomichan](https://github.com/foosoft/yomichan)
+
+# Usage
+```
+usage: jitenbot [-h] [-p PAGE_DIR] [-i IMAGE_DIR]
+                {jitenon-yoji,jitenon-kotowaza,smk8,daijirin2}
+
+Convert Japanese dictionary files to new formats.
+
+positional arguments:
+  {jitenon-yoji,jitenon-kotowaza,smk8,daijirin2}
+                        name of dictionary to convert
+
+options:
+  -h, --help            show this help message and exit
+  -p PAGE_DIR, --page-dir PAGE_DIR
+                        path to directory containing XML page files
+  -i IMAGE_DIR, --image-dir IMAGE_DIR
+                        path to directory containing image folders (gaiji,
+                        graphics, etc.)
+
+```
+### Online Targets
+Jitenbot will scrape the target website and save the pages to the [user's cache directory](https://pypi.org/project/platformdirs/).
+As a courtesy to the website owners, jitenbot is configured to pause for 10 seconds between each page request. Consequently, 
+a complete crawl of a target website may take several hours.
+
+### Offline Targets
+Page data and image data must be supplied by the user and passed to jitenbot via the appropriate command line flags.
+
+# Attribution
+`Adobe-Japan1_sequences.txt` is provided by [The Adobe-Japan1-7 Character Collection](https://github.com/adobe-type-tools/Adobe-Japan1).