* Add debug/trace logging to extract_items
* Handle invalid timestamps for livestreams extraction
* Make use of author_fallback in playlist extractor
* Don't use extract_text for video length extraction
The extract_text function attempts to extract from both the simpleText and
the runs route. This is typically what we'd want for text extraction as
it could appear in both locations. However, while this still holds true,
the thumbnailOverlayTimeStatusRenderer writes a numerical length (when
present on the video) to the simpleText route and uses runs for a
text overlay like "LIVE" or "PREMIERE".
Therefore, when a video has a text overlay instead of a numerical one,
Invidious still passes it onto decode_length_seconds, which obviously
raises since it cannot be converted into integers.
In the future, if more routes requires one text route over the other, we
should go ahead and add an argument to extract_text itself. Though for
now, this is sufficient.
* Handle unsupported "special" categories
- Add extract_text to simplify extraction of InnerTube texts
- Add helper extractor methods to reduce repetition in parsing InnerTube
- Change [] more than 2 blocks long to use #dig or #dig?
- Remove useless ?.try blocks for items that always exists
- Add (some) documentation to VideoRendererParser
This is a squash of a bunch of commits
cherry-picked commits
Fix category parse error on search
(cherry picked from commit cc02fed4e69f0eb5f19e017173632b3a3f20519f)
Fix category items not being extracted in search
(cherry picked from commit 2605b9c609ff217b5a6ae09d22450596dcad90fc)
Make search not include category items for now
(cherry picked from commit ca4afd59f46b595e3c339f31432cad98a5771ee1)
Change behavior of categories in search results
(cherry picked from commit cc1067561051b1c113b490e79c4a71cd346f7b3f)
Fix missing search results in extraction
(cherry picked from commit abda6840d5bfe58f845128bdd1a3f4916dd3bb84)
Fix miscount of search results
(cherry picked from commit 491e33450eb1300d0234bb33df0d0e78a027114f)
This commit adds a new parser for YT's shelfRenderers which are
typically used to denote different categories.The code for featured
channels parsing has also been moved to use the new parser but some
additional refactoring are needed there.
The ContinuationExtractor has also been improved and is now capable of
extraction continuation data that is packaged under
"appendContinuationItemsAction"
In additional this commit adds some useful helper functions to extract
the current selected tab the continuation token. This is to mainly
reduce code size and repetition.
--
This cherry-picked commit also removes the code for parsing featured
channels present on the original.
(cherry picked from commit 8000d538dbbf1eb9c78e000b1449926ba3b24da9)
This commit completely rewrites the extract_item and extract_items
function. Before this commit these two function were an unreadable
mess. The extract_item function was a lengthy if-elsif chain
while the extract_items function contained an incomprehensible
mess of .try, else and ||.
With this commit both of these functions have been pulled into a
separate file with the internal logic being moved to a few classes.
This significantly reduces the size of these two methods, enhances
readability and makes adding new extraction/parse rules much simpler.
See diff for details.
--
This cherry-picked commit also removes the code for parsing featured
channels present on the original.
(cherry picked from commit a027fbf7af1f96dc26fe5a610525ae52bcc40c28)
Video mimetype may contain code information between double quotes.
If not properly escaped, it breaks the browser's parser. E.g:
```
type="video/mp4; codecs=" avc1.64001f,="" mp4a.40.2""=""
```
Thank Robin for catching this!