Create README.md

2023-04-11 14:12:55 -05:00 · 2023-04-11 14:12:55 -05:00 · bc692f6c5a
parent 1b89a3542c
commit bc692f6c5a
1 changed files with 35 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,35 @@
+# jitenbot
+Jitenbot is a program for scraping Japanese dictionary websites and converting the scraped data into structured dictionary files.
+
+### Target Websites
+
+* [四字熟語辞典オンライン](https://yoji.jitenon.jp/)
+* [故事・ことわざ・慣用句オンライン](https://kotowaza.jitenon.jp/)
+
+### Export Formats
+
+* [Yomichan](https://github.com/foosoft/yomichan)
+
+# Usage
+Add your desired HTTP request headers to [config.json](https://github.com/stephenmk/jitenbot/blob/main/config.json)
+and ensure that all [requirements](https://github.com/stephenmk/jitenbot/blob/main/requirements.txt)
+are installed.
+
+```
+jitenbot [-h] {all,jitenon-yoji,jitenon-kotowaza}
+
+positional arguments:
+  {all,jitenon-yoji,jitenon-kotowaza}
+                        website to crawl
+
+options:
+  -h, --help            show this help message and exit
+```
+
+Scraped webpages are written to a `webcache` directory. Each page may be as large as a megabyte,
+and a single dictionary may include thousands of pages. Ensure that adequate disk space is available.
+
+Jitenbot will pause for at least 10 seconds between each web request. Depending upon the size of
+the target dictionary, it make take hours or days to finish scraping.
+
+Exported dictionary files will be saved in an `output` directory.