merge 'master'

This commit is contained in:
Mozi 2025-01-29 05:49:12 +00:00
commit 85f96a8688
49 changed files with 1673 additions and 1507 deletions

1
.gitignore vendored
View File

@ -92,6 +92,7 @@ updates_key.pem
*.class *.class
*.isorted *.isorted
*.stackdump *.stackdump
uv.lock
# Generated # Generated
AUTHORS AUTHORS

View File

@ -715,3 +715,24 @@ Crypto90
MutantPiggieGolem1 MutantPiggieGolem1
Sanceilaks Sanceilaks
Strkmn Strkmn
0x9fff00
4ft35t
7x11x13
b5i
cotko
d3d9
Dioarya
finch71
hexahigh
InvalidUsernameException
jixunmoe
knackku
krandor
kvk-2015
lonble
msm595
n10dollar
NecroRomnt
pjrobertson
subsense
test20140

View File

@ -4,6 +4,65 @@
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master # To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
--> -->
### 2025.01.26
#### Core changes
- [Fix float comparison values in format filters](https://github.com/yt-dlp/yt-dlp/commit/f7d071e8aa3bf67ed7e0f881e749ca9ab50b3f8f) ([#11880](https://github.com/yt-dlp/yt-dlp/issues/11880)) by [bashonly](https://github.com/bashonly), [Dioarya](https://github.com/Dioarya)
- **utils**: `sanitize_path`: [Fix some incorrect behavior](https://github.com/yt-dlp/yt-dlp/commit/fc12e724a3b4988cfc467d2981887dde48c26b69) ([#11923](https://github.com/yt-dlp/yt-dlp/issues/11923)) by [Grub4K](https://github.com/Grub4K)
#### Extractor changes
- **1tv**: [Support sport1tv.ru domain](https://github.com/yt-dlp/yt-dlp/commit/61ae5dc34ac775d6c122575e21ef2153b1273a2b) ([#11889](https://github.com/yt-dlp/yt-dlp/issues/11889)) by [kvk-2015](https://github.com/kvk-2015)
- **abematv**: [Support season extraction](https://github.com/yt-dlp/yt-dlp/commit/c709cc41cbc16edc846e0a431cfa8508396d4cb6) ([#11771](https://github.com/yt-dlp/yt-dlp/issues/11771)) by [middlingphys](https://github.com/middlingphys)
- **bilibili**
- [Support space `/lists/` URLs](https://github.com/yt-dlp/yt-dlp/commit/465167910407449354eb48e9861efd0819f53eb5) ([#11964](https://github.com/yt-dlp/yt-dlp/issues/11964)) by [c-basalt](https://github.com/c-basalt)
- [Support space video list extraction without login](https://github.com/yt-dlp/yt-dlp/commit/78912ed9c81f109169b828c397294a6cf8eacf41) ([#12089](https://github.com/yt-dlp/yt-dlp/issues/12089)) by [grqz](https://github.com/grqz)
- **bilibilidynamic**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/9676b05715b61c8c5dd5598871e60d8807fb1a86) ([#11838](https://github.com/yt-dlp/yt-dlp/issues/11838)) by [finch71](https://github.com/finch71), [grqz](https://github.com/grqz)
- **bluesky**: [Prefer source format](https://github.com/yt-dlp/yt-dlp/commit/ccda63934df7de2823f0834218c4254c7c4d2e4c) ([#12154](https://github.com/yt-dlp/yt-dlp/issues/12154)) by [0x9fff00](https://github.com/0x9fff00)
- **crunchyroll**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/ff44ed53061e065804da6275d182d7928cc03a5e) ([#12195](https://github.com/yt-dlp/yt-dlp/issues/12195)) by [seproDev](https://github.com/seproDev)
- **dropout**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/164368610456e2d96b279f8b120dea08f7b1d74f) ([#12102](https://github.com/yt-dlp/yt-dlp/issues/12102)) by [bashonly](https://github.com/bashonly)
- **eggs**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/20c765d02385a105c8ef13b6f7a737491d29c19a) ([#11904](https://github.com/yt-dlp/yt-dlp/issues/11904)) by [seproDev](https://github.com/seproDev), [subsense](https://github.com/subsense)
- **funimation**: [Remove extractors](https://github.com/yt-dlp/yt-dlp/commit/cdcf1e86726b8fa44f7e7126bbf1c18e1798d25c) ([#12167](https://github.com/yt-dlp/yt-dlp/issues/12167)) by [doe1080](https://github.com/doe1080)
- **goodgame**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/e7cc02b14d8d323f805d14325a9c95593a170d28) ([#12173](https://github.com/yt-dlp/yt-dlp/issues/12173)) by [NecroRomnt](https://github.com/NecroRomnt)
- **lbry**: [Support signed URLs](https://github.com/yt-dlp/yt-dlp/commit/de30f652ffb7623500215f5906844f2ae0d92c7b) ([#12138](https://github.com/yt-dlp/yt-dlp/issues/12138)) by [seproDev](https://github.com/seproDev)
- **naver**: [Fix m3u8 formats extraction](https://github.com/yt-dlp/yt-dlp/commit/b3007c44cdac38187fc6600de76959a7079a44d1) ([#12037](https://github.com/yt-dlp/yt-dlp/issues/12037)) by [kclauhk](https://github.com/kclauhk)
- **nest**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/1ef3ee7500c4ab8c26f7fdc5b0ad1da4d16eec8e) ([#11747](https://github.com/yt-dlp/yt-dlp/issues/11747)) by [pabs3](https://github.com/pabs3), [seproDev](https://github.com/seproDev)
- **niconico**: series: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/bc88b904cd02314da41ce1b2fdf046d0680fe965) ([#11822](https://github.com/yt-dlp/yt-dlp/issues/11822)) by [test20140](https://github.com/test20140)
- **nrk**
- [Extract more formats](https://github.com/yt-dlp/yt-dlp/commit/89198bb23b4d03e0473ac408bfb50d67c2f71165) ([#12069](https://github.com/yt-dlp/yt-dlp/issues/12069)) by [hexahigh](https://github.com/hexahigh)
- [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/45732e2590a1bd0bc9608f5eb68c59341ca84f02) ([#12193](https://github.com/yt-dlp/yt-dlp/issues/12193)) by [hexahigh](https://github.com/hexahigh)
- **patreon**: [Extract attachment filename as `alt_title`](https://github.com/yt-dlp/yt-dlp/commit/e2e73b5c65593ec0a5e685663e6ec0f4aaffc1f1) ([#12000](https://github.com/yt-dlp/yt-dlp/issues/12000)) by [msm595](https://github.com/msm595)
- **pbs**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/13825ab77815ee6e1603abbecbb9f3795057b93c) ([#12024](https://github.com/yt-dlp/yt-dlp/issues/12024)) by [dirkf](https://github.com/dirkf), [krandor](https://github.com/krandor), [n10dollar](https://github.com/n10dollar)
- **piramidetv**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/af2c821d74049b519895288aca23cee81fc4b049) ([#10777](https://github.com/yt-dlp/yt-dlp/issues/10777)) by [HobbyistDev](https://github.com/HobbyistDev), [kclauhk](https://github.com/kclauhk), [seproDev](https://github.com/seproDev)
- **redgifs**: [Support `/ifr/` URLs](https://github.com/yt-dlp/yt-dlp/commit/4850ce91d163579fa615c3c0d44c9bd64682c22b) ([#11805](https://github.com/yt-dlp/yt-dlp/issues/11805)) by [invertico](https://github.com/invertico)
- **rtvslo.si**: show: [Extract more metadata](https://github.com/yt-dlp/yt-dlp/commit/3fc46086562857d5493cbcff687f76e4e4ed303f) ([#12136](https://github.com/yt-dlp/yt-dlp/issues/12136)) by [cotko](https://github.com/cotko)
- **senategov**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/68221ecc87c6a3f3515757bac2a0f9674a38e3f2) ([#9361](https://github.com/yt-dlp/yt-dlp/issues/9361)) by [Grabien](https://github.com/Grabien), [seproDev](https://github.com/seproDev)
- **soundcloud**
- [Extract more metadata](https://github.com/yt-dlp/yt-dlp/commit/6d304133ab32bcd1eb78ff1467f1a41dd9b66c33) ([#11945](https://github.com/yt-dlp/yt-dlp/issues/11945)) by [7x11x13](https://github.com/7x11x13)
- user: [Add `/comments` page support](https://github.com/yt-dlp/yt-dlp/commit/7bfb4f72e490310d2681c7f4815218a2ebbc73ee) ([#11999](https://github.com/yt-dlp/yt-dlp/issues/11999)) by [7x11x13](https://github.com/7x11x13)
- **subsplash**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/5d904b077d2f58ae44bdf208d2dcfcc3ff8347f5) ([#11054](https://github.com/yt-dlp/yt-dlp/issues/11054)) by [seproDev](https://github.com/seproDev), [subrat-lima](https://github.com/subrat-lima)
- **theatercomplextownppv**: [Support `live` URLs](https://github.com/yt-dlp/yt-dlp/commit/797d2472a299692e01ad1500e8c3b7bc1daa7fe4) ([#11720](https://github.com/yt-dlp/yt-dlp/issues/11720)) by [bashonly](https://github.com/bashonly)
- **vimeo**: [Fix thumbnail extraction](https://github.com/yt-dlp/yt-dlp/commit/9ff330948c92f6b2e1d9c928787362ab19cd6c62) ([#12142](https://github.com/yt-dlp/yt-dlp/issues/12142)) by [jixunmoe](https://github.com/jixunmoe)
- **vimp**: Playlist: [Add support for tags](https://github.com/yt-dlp/yt-dlp/commit/d4f5be1735c8feaeb3308666e0b878e9782f529d) ([#11688](https://github.com/yt-dlp/yt-dlp/issues/11688)) by [FestplattenSchnitzel](https://github.com/FestplattenSchnitzel)
- **weibo**: [Extend `_VALID_URL`](https://github.com/yt-dlp/yt-dlp/commit/a567f97b62ae9f6d6f5a9376c361512ab8dceda2) ([#12088](https://github.com/yt-dlp/yt-dlp/issues/12088)) by [4ft35t](https://github.com/4ft35t)
- **xhamster**: [Various improvements](https://github.com/yt-dlp/yt-dlp/commit/3b99a0f0e07f0120ab416f34a8f5ab75d4fdf1d1) ([#11738](https://github.com/yt-dlp/yt-dlp/issues/11738)) by [knackku](https://github.com/knackku)
- **xiaohongshu**: [Extract more formats](https://github.com/yt-dlp/yt-dlp/commit/f9f24ae376a9eaca777816479a4a29f6f0ce7681) ([#12147](https://github.com/yt-dlp/yt-dlp/issues/12147)) by [seproDev](https://github.com/seproDev)
- **youtube**
- [Download `tv` client Innertube config](https://github.com/yt-dlp/yt-dlp/commit/326fb1ffaf4e8349f1fe8ba2a81839652e044bff) ([#12168](https://github.com/yt-dlp/yt-dlp/issues/12168)) by [coletdjnz](https://github.com/coletdjnz)
- [Extract `media_type` for livestreams](https://github.com/yt-dlp/yt-dlp/commit/421bc72103d1faed473a451299cd17d6abb433bb) ([#11605](https://github.com/yt-dlp/yt-dlp/issues/11605)) by [nosoop](https://github.com/nosoop)
- [Restore convenience workarounds](https://github.com/yt-dlp/yt-dlp/commit/f0d4b8a5d6354b294bc9631cf15a7160b7bad5de) ([#12181](https://github.com/yt-dlp/yt-dlp/issues/12181)) by [bashonly](https://github.com/bashonly)
- [Update `ios` player client](https://github.com/yt-dlp/yt-dlp/commit/de82acf8769282ce321a86737ecc1d4bef0e82a7) ([#12155](https://github.com/yt-dlp/yt-dlp/issues/12155)) by [b5i](https://github.com/b5i)
- [Use different PO token for GVS and Player](https://github.com/yt-dlp/yt-dlp/commit/6b91d232e316efa406035915532eb126fbaeea38) ([#12090](https://github.com/yt-dlp/yt-dlp/issues/12090)) by [coletdjnz](https://github.com/coletdjnz)
- tab: [Improve shorts title extraction](https://github.com/yt-dlp/yt-dlp/commit/76ac023ff02f06e8c003d104f02a03deeddebdcd) ([#11997](https://github.com/yt-dlp/yt-dlp/issues/11997)) by [bashonly](https://github.com/bashonly), [d3d9](https://github.com/d3d9)
- **zdf**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/bb69f5dab79fb32c4ec0d50e05f7fa26d05d54ba) ([#11041](https://github.com/yt-dlp/yt-dlp/issues/11041)) by [InvalidUsernameException](https://github.com/InvalidUsernameException)
#### Misc. changes
- **cleanup**: Miscellaneous: [3b45319](https://github.com/yt-dlp/yt-dlp/commit/3b4531934465580be22937fecbb6e1a3a9e2334f) by [bashonly](https://github.com/bashonly), [lonble](https://github.com/lonble), [pjrobertson](https://github.com/pjrobertson), [seproDev](https://github.com/seproDev)
### 2025.01.15
#### Extractor changes
- **youtube**: [Do not use `web_creator` as a default client](https://github.com/yt-dlp/yt-dlp/commit/c8541f8b13e743fcfa06667530d13fee8686e22a) ([#12087](https://github.com/yt-dlp/yt-dlp/issues/12087)) by [bashonly](https://github.com/bashonly)
### 2025.01.12 ### 2025.01.12
#### Core changes #### Core changes

View File

@ -1760,7 +1760,7 @@ $ yt-dlp --replace-in-metadata "title,uploader" "[ _]" "-"
# EXTRACTOR ARGUMENTS # EXTRACTOR ARGUMENTS
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=tv,mweb;formats=incomplete" --extractor-args "funimation:version=uncut"` Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=tv,mweb;formats=incomplete" --extractor-args "twitter:api=syndication"`
Note: In CLI, `ARG` can use `-` instead of `_`; e.g. `youtube:player-client"` becomes `youtube:player_client"` Note: In CLI, `ARG` can use `-` instead of `_`; e.g. `youtube:player-client"` becomes `youtube:player_client"`
@ -1769,7 +1769,7 @@ The following extractors use this feature:
#### youtube #### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes * `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively * `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mweb`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `ios,tv` is used, or `web_creator,tv` is used when authenticating with cookies. The `_music` variants are added for `music.youtube.com` URLs. Some clients, such as `web` and `android`, require a `po_token` for their formats to be downloadable. Some clients, such as the `_creator` variants, will only work with authentication. Not all clients support authentication via cookies. You can use `all` to use all the clients, and `default` for the default clients. You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=all,-web` * `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mweb`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `tv,ios,web` is used, or `tv,web` is used when authenticating with cookies. The `web_music` client is added for `music.youtube.com` URLs when logged-in cookies are used. The `tv_embedded` and `web_creator` clients are added for age-restricted videos if account age-verification is required. Some clients, such as `web` and `web_music`, require a `po_token` for their formats to be downloadable. Some clients, such as the `_creator` variants, will only work with authentication. Not all clients support authentication via cookies. You can use `default` for the default clients, or you can use `all` for all clients (not recommended). You can prefix a client with `-` to exclude it, e.g. `youtube:player_client=default,-ios`
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details * `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp. * `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side) * `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
@ -1781,7 +1781,7 @@ The following extractors use this feature:
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning * `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
* `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage` * `data_sync_id`: Overrides the account Data Sync ID used in Innertube API requests. This may be needed if you are using an account with `youtube:player_skip=webpage,configs` or `youtubetab:skip=webpage`
* `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID) * `visitor_data`: Overrides the Visitor Data used in Innertube API requests. This should be used with `player_skip=webpage,configs` and without cookies. Note: this may have adverse effects if used improperly. If a session from a browser is wanted, you should pass cookies instead (which contain the Visitor ID)
* `po_token`: Proof of Origin (PO) Token(s) to use for requesting video playback. Comma seperated list of PO Tokens in the format `CLIENT+PO_TOKEN`, e.g. `youtube:po_token=web+XXX,android+YYY` * `po_token`: Proof of Origin (PO) Token(s) to use. Comma seperated list of PO Tokens in the format `CLIENT.CONTEXT+PO_TOKEN`, e.g. `youtube:po_token=web.gvs+XXX,web.player=XXX,web_safari.gvs+YYY`. Context can be either `gvs` (Google Video Server URLs) or `player` (Innertube player request)
#### youtubetab (YouTube playlists, channels, feeds, etc.) #### youtubetab (YouTube playlists, channels, feeds, etc.)
* `skip`: One or more of `webpage` (skip initial webpage download), `authcheck` (allow the download of playlists requiring authentication when no initial webpage is downloaded. This may cause unwanted behavior, see [#1122](https://github.com/yt-dlp/yt-dlp/pull/1122) for more details) * `skip`: One or more of `webpage` (skip initial webpage download), `authcheck` (allow the download of playlists requiring authentication when no initial webpage is downloaded. This may cause unwanted behavior, see [#1122](https://github.com/yt-dlp/yt-dlp/pull/1122) for more details)
@ -1795,13 +1795,6 @@ The following extractors use this feature:
* `is_live`: Bypass live HLS detection and manually set `live_status` - a value of `false` will set `not_live`, any other value (or no value) will set `is_live` * `is_live`: Bypass live HLS detection and manually set `live_status` - a value of `false` will set `not_live`, any other value (or no value) will set `is_live`
* `impersonate`: Target(s) to try and impersonate with the initial webpage request; e.g. `generic:impersonate=safari,chrome-110`. Use `generic:impersonate` to impersonate any available target, and use `generic:impersonate=false` to disable impersonation (default) * `impersonate`: Target(s) to try and impersonate with the initial webpage request; e.g. `generic:impersonate=safari,chrome-110`. Use `generic:impersonate` to impersonate any available target, and use `generic:impersonate=false` to disable impersonation (default)
#### funimation
* `language`: Audio languages to extract, e.g. `funimation:language=english,japanese`
* `version`: The video version to extract - `uncut` or `simulcast`
#### crunchyrollbeta (Crunchyroll)
* `hardsub`: One or more hardsub versions to extract (in order of preference), or `all` (default: `None` = no hardsubs will be extracted), e.g. `crunchyrollbeta:hardsub=en-US,de-DE`
#### vikichannel #### vikichannel
* `video_types`: Types of videos to download - one or more of `episodes`, `movies`, `clips`, `trailers` * `video_types`: Types of videos to download - one or more of `episodes`, `movies`, `clips`, `trailers`

View File

@ -239,5 +239,11 @@
"action": "add", "action": "add",
"when": "52c0ffe40ad6e8404d93296f575007b05b04c686", "when": "52c0ffe40ad6e8404d93296f575007b05b04c686",
"short": "[priority] **Login with OAuth is no longer supported for YouTube**\nDue to a change made by the site, yt-dlp is no longer able to support OAuth login for YouTube. [Read more](https://github.com/yt-dlp/yt-dlp/issues/11462#issuecomment-2471703090)" "short": "[priority] **Login with OAuth is no longer supported for YouTube**\nDue to a change made by the site, yt-dlp is no longer able to support OAuth login for YouTube. [Read more](https://github.com/yt-dlp/yt-dlp/issues/11462#issuecomment-2471703090)"
},
{
"action": "change",
"when": "76ac023ff02f06e8c003d104f02a03deeddebdcd",
"short": "[ie/youtube:tab] Improve shorts title extraction (#11997)",
"authors": ["bashonly", "d3d9"]
} }
] ]

View File

@ -171,6 +171,7 @@
- **BilibiliCheese** - **BilibiliCheese**
- **BilibiliCheeseSeason** - **BilibiliCheeseSeason**
- **BilibiliCollectionList** - **BilibiliCollectionList**
- **BiliBiliDynamic**
- **BilibiliFavoritesList** - **BilibiliFavoritesList**
- **BiliBiliPlayer** - **BiliBiliPlayer**
- **BilibiliPlaylist** - **BilibiliPlaylist**
@ -303,10 +304,6 @@
- **CrowdBunker** - **CrowdBunker**
- **CrowdBunkerChannel** - **CrowdBunkerChannel**
- **Crtvg** - **Crtvg**
- **crunchyroll**: [*crunchyroll*](## "netrc machine")
- **crunchyroll:artist**: [*crunchyroll*](## "netrc machine")
- **crunchyroll:music**: [*crunchyroll*](## "netrc machine")
- **crunchyroll:playlist**: [*crunchyroll*](## "netrc machine")
- **CSpan**: C-SPAN - **CSpan**: C-SPAN
- **CSpanCongress** - **CSpanCongress**
- **CtsNews**: 華視新聞 - **CtsNews**: 華視新聞
@ -393,6 +390,8 @@
- **Ebay** - **Ebay**
- **egghead:course**: egghead.io course - **egghead:course**: egghead.io course
- **egghead:lesson**: egghead.io lesson - **egghead:lesson**: egghead.io lesson
- **eggs:artist**
- **eggs:single**
- **EinsUndEinsTV**: [*1und1tv*](## "netrc machine") - **EinsUndEinsTV**: [*1und1tv*](## "netrc machine")
- **EinsUndEinsTVLive**: [*1und1tv*](## "netrc machine") - **EinsUndEinsTVLive**: [*1und1tv*](## "netrc machine")
- **EinsUndEinsTVRecordings**: [*1und1tv*](## "netrc machine") - **EinsUndEinsTVRecordings**: [*1und1tv*](## "netrc machine")
@ -477,9 +476,6 @@
- **FrontendMastersCourse**: [*frontendmasters*](## "netrc machine") - **FrontendMastersCourse**: [*frontendmasters*](## "netrc machine")
- **FrontendMastersLesson**: [*frontendmasters*](## "netrc machine") - **FrontendMastersLesson**: [*frontendmasters*](## "netrc machine")
- **FujiTVFODPlus7** - **FujiTVFODPlus7**
- **Funimation**: [*funimation*](## "netrc machine")
- **funimation:page**: [*funimation*](## "netrc machine")
- **funimation:show**: [*funimation*](## "netrc machine")
- **Funk** - **Funk**
- **Funker530** - **Funker530**
- **Fux** - **Fux**
@ -892,6 +888,8 @@
- **nebula:video**: [*watchnebula*](## "netrc machine") - **nebula:video**: [*watchnebula*](## "netrc machine")
- **NekoHacker** - **NekoHacker**
- **NerdCubedFeed** - **NerdCubedFeed**
- **Nest**
- **NestClip**
- **netease:album**: 网易云音乐 - 专辑 - **netease:album**: 网易云音乐 - 专辑
- **netease:djradio**: 网易云音乐 - 电台 - **netease:djradio**: 网易云音乐 - 电台
- **netease:mv**: 网易云音乐 - MV - **netease:mv**: 网易云音乐 - MV
@ -1071,6 +1069,8 @@
- **Pinkbike** - **Pinkbike**
- **Pinterest** - **Pinterest**
- **PinterestCollection** - **PinterestCollection**
- **PiramideTV**
- **PiramideTVChannel**
- **pixiv:sketch** - **pixiv:sketch**
- **pixiv:sketch:user** - **pixiv:sketch:user**
- **Pladform** - **Pladform**
@ -1396,6 +1396,8 @@
- **StretchInternet** - **StretchInternet**
- **Stripchat** - **Stripchat**
- **stv:player** - **stv:player**
- **Subsplash**
- **subsplash:playlist**
- **Substack** - **Substack**
- **SunPorno** - **SunPorno**
- **sverigesradio:episode** - **sverigesradio:episode**

View File

@ -486,11 +486,11 @@ class TestFormatSelection(unittest.TestCase):
def test_format_filtering(self): def test_format_filtering(self):
formats = [ formats = [
{'format_id': 'A', 'filesize': 500, 'width': 1000}, {'format_id': 'A', 'filesize': 500, 'width': 1000, 'aspect_ratio': 1.0},
{'format_id': 'B', 'filesize': 1000, 'width': 500}, {'format_id': 'B', 'filesize': 1000, 'width': 500, 'aspect_ratio': 1.33},
{'format_id': 'C', 'filesize': 1000, 'width': 400}, {'format_id': 'C', 'filesize': 1000, 'width': 400, 'aspect_ratio': 1.5},
{'format_id': 'D', 'filesize': 2000, 'width': 600}, {'format_id': 'D', 'filesize': 2000, 'width': 600, 'aspect_ratio': 1.78},
{'format_id': 'E', 'filesize': 3000}, {'format_id': 'E', 'filesize': 3000, 'aspect_ratio': 0.56},
{'format_id': 'F'}, {'format_id': 'F'},
{'format_id': 'G', 'filesize': 1000000}, {'format_id': 'G', 'filesize': 1000000},
] ]
@ -549,6 +549,31 @@ class TestFormatSelection(unittest.TestCase):
ydl.process_ie_result(info_dict) ydl.process_ie_result(info_dict)
self.assertEqual(ydl.downloaded_info_dicts, []) self.assertEqual(ydl.downloaded_info_dicts, [])
ydl = YDL({'format': 'best[aspect_ratio=1]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'A')
ydl = YDL({'format': 'all[aspect_ratio > 1.00]'})
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['D', 'C', 'B'])
ydl = YDL({'format': 'all[aspect_ratio < 1.00]'})
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['E'])
ydl = YDL({'format': 'best[aspect_ratio=1.5]'})
ydl.process_ie_result(info_dict)
downloaded = ydl.downloaded_info_dicts[0]
self.assertEqual(downloaded['format_id'], 'C')
ydl = YDL({'format': 'all[aspect_ratio!=1]'})
ydl.process_ie_result(info_dict)
downloaded_ids = [info['format_id'] for info in ydl.downloaded_info_dicts]
self.assertEqual(downloaded_ids, ['E', 'D', 'C', 'B'])
@patch('yt_dlp.postprocessor.ffmpeg.FFmpegMergerPP.available', False) @patch('yt_dlp.postprocessor.ffmpeg.FFmpegMergerPP.available', False)
def test_default_format_spec_without_ffmpeg(self): def test_default_format_spec_without_ffmpeg(self):
ydl = YDL({}) ydl = YDL({})

View File

@ -249,17 +249,36 @@ class TestUtil(unittest.TestCase):
self.assertEqual(sanitize_path('abc/def...'), 'abc\\def..#') self.assertEqual(sanitize_path('abc/def...'), 'abc\\def..#')
self.assertEqual(sanitize_path('abc.../def'), 'abc..#\\def') self.assertEqual(sanitize_path('abc.../def'), 'abc..#\\def')
self.assertEqual(sanitize_path('abc.../def...'), 'abc..#\\def..#') self.assertEqual(sanitize_path('abc.../def...'), 'abc..#\\def..#')
self.assertEqual(sanitize_path('../abc'), '..\\abc')
self.assertEqual(sanitize_path('../../abc'), '..\\..\\abc')
self.assertEqual(sanitize_path('./abc'), 'abc')
self.assertEqual(sanitize_path('./../abc'), '..\\abc')
self.assertEqual(sanitize_path('\\abc'), '\\abc')
self.assertEqual(sanitize_path('C:abc'), 'C:abc')
self.assertEqual(sanitize_path('C:abc\\..\\'), 'C:..')
self.assertEqual(sanitize_path('C:\\abc:%(title)s.%(ext)s'), 'C:\\abc#%(title)s.%(ext)s') self.assertEqual(sanitize_path('C:\\abc:%(title)s.%(ext)s'), 'C:\\abc#%(title)s.%(ext)s')
# Check with nt._path_normpath if available
try:
import nt
nt_path_normpath = getattr(nt, '_path_normpath', None)
except Exception:
nt_path_normpath = None
for test, expected in [
('C:\\', 'C:\\'),
('../abc', '..\\abc'),
('../../abc', '..\\..\\abc'),
('./abc', 'abc'),
('./../abc', '..\\abc'),
('\\abc', '\\abc'),
('C:abc', 'C:abc'),
('C:abc\\..\\', 'C:'),
('C:abc\\..\\def\\..\\..\\', 'C:..'),
('C:\\abc\\xyz///..\\def\\', 'C:\\abc\\def'),
('abc/../', '.'),
('./abc/../', '.'),
]:
result = sanitize_path(test)
assert result == expected, f'{test} was incorrectly resolved'
assert result == sanitize_path(result), f'{test} changed after sanitizing again'
if nt_path_normpath:
assert result == nt_path_normpath(test), f'{test} does not match nt._path_normpath'
def test_sanitize_url(self): def test_sanitize_url(self):
self.assertEqual(sanitize_url('//foo.bar'), 'http://foo.bar') self.assertEqual(sanitize_url('//foo.bar'), 'http://foo.bar')
self.assertEqual(sanitize_url('httpss://foo.bar'), 'https://foo.bar') self.assertEqual(sanitize_url('httpss://foo.bar'), 'https://foo.bar')

View File

@ -2121,7 +2121,7 @@ class YoutubeDL:
m = operator_rex.fullmatch(filter_spec) m = operator_rex.fullmatch(filter_spec)
if m: if m:
try: try:
comparison_value = int(m.group('value')) comparison_value = float(m.group('value'))
except ValueError: except ValueError:
comparison_value = parse_filesize(m.group('value')) comparison_value = parse_filesize(m.group('value'))
if comparison_value is None: if comparison_value is None:

View File

@ -256,6 +256,7 @@ from .bilibili import (
BilibiliCheeseIE, BilibiliCheeseIE,
BilibiliCheeseSeasonIE, BilibiliCheeseSeasonIE,
BilibiliCollectionListIE, BilibiliCollectionListIE,
BiliBiliDynamicIE,
BilibiliFavoritesListIE, BilibiliFavoritesListIE,
BiliBiliIE, BiliBiliIE,
BiliBiliPlayerIE, BiliBiliPlayerIE,
@ -440,12 +441,6 @@ from .crowdbunker import (
CrowdBunkerIE, CrowdBunkerIE,
) )
from .crtvg import CrtvgIE from .crtvg import CrtvgIE
from .crunchyroll import (
CrunchyrollArtistIE,
CrunchyrollBetaIE,
CrunchyrollBetaShowIE,
CrunchyrollMusicIE,
)
from .cspan import ( from .cspan import (
CSpanCongressIE, CSpanCongressIE,
CSpanIE, CSpanIE,
@ -585,6 +580,10 @@ from .egghead import (
EggheadCourseIE, EggheadCourseIE,
EggheadLessonIE, EggheadLessonIE,
) )
from .eggs import (
EggsArtistIE,
EggsIE,
)
from .eighttracks import EightTracksIE from .eighttracks import EightTracksIE
from .eitb import EitbIE from .eitb import EitbIE
from .elementorembed import ElementorEmbedIE from .elementorembed import ElementorEmbedIE
@ -700,11 +699,6 @@ from .frontendmasters import (
FrontendMastersLessonIE, FrontendMastersLessonIE,
) )
from .fujitv import FujiTVFODPlus7IE from .fujitv import FujiTVFODPlus7IE
from .funimation import (
FunimationIE,
FunimationPageIE,
FunimationShowIE,
)
from .funk import FunkIE from .funk import FunkIE
from .funker530 import Funker530IE from .funker530 import Funker530IE
from .fuyintv import FuyinTVIE from .fuyintv import FuyinTVIE
@ -1279,6 +1273,10 @@ from .nebula import (
) )
from .nekohacker import NekoHackerIE from .nekohacker import NekoHackerIE
from .nerdcubed import NerdCubedFeedIE from .nerdcubed import NerdCubedFeedIE
from .nest import (
NestClipIE,
NestIE,
)
from .neteasemusic import ( from .neteasemusic import (
NetEaseMusicAlbumIE, NetEaseMusicAlbumIE,
NetEaseMusicDjRadioIE, NetEaseMusicDjRadioIE,
@ -1533,6 +1531,10 @@ from .pinterest import (
PinterestCollectionIE, PinterestCollectionIE,
PinterestIE, PinterestIE,
) )
from .piramidetv import (
PiramideTVChannelIE,
PiramideTVIE,
)
from .pixivsketch import ( from .pixivsketch import (
PixivSketchIE, PixivSketchIE,
PixivSketchUserIE, PixivSketchUserIE,
@ -1984,6 +1986,10 @@ from .streetvoice import StreetVoiceIE
from .stretchinternet import StretchInternetIE from .stretchinternet import StretchInternetIE
from .stripchat import StripchatIE from .stripchat import StripchatIE
from .stv import STVPlayerIE from .stv import STVPlayerIE
from .subsplash import (
SubsplashIE,
SubsplashPlaylistIE,
)
from .substack import SubstackIE from .substack import SubstackIE
from .sunporno import SunPornoIE from .sunporno import SunPornoIE
from .sverigesradio import ( from .sverigesradio import (

View File

@ -421,14 +421,15 @@ class AbemaTVIE(AbemaTVBaseIE):
class AbemaTVTitleIE(AbemaTVBaseIE): class AbemaTVTitleIE(AbemaTVBaseIE):
_VALID_URL = r'https?://abema\.tv/video/title/(?P<id>[^?/]+)' _VALID_URL = r'https?://abema\.tv/video/title/(?P<id>[^?/#]+)/?(?:\?(?:[^#]+&)?s=(?P<season>[^&#]+))?'
_PAGE_SIZE = 25 _PAGE_SIZE = 25
_TESTS = [{ _TESTS = [{
'url': 'https://abema.tv/video/title/90-1597', 'url': 'https://abema.tv/video/title/90-1887',
'info_dict': { 'info_dict': {
'id': '90-1597', 'id': '90-1887',
'title': 'シャッフルアイランド', 'title': 'シャッフルアイランド',
'description': 'md5:61b2425308f41a5282a926edda66f178',
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
}, { }, {
@ -436,41 +437,54 @@ class AbemaTVTitleIE(AbemaTVBaseIE):
'info_dict': { 'info_dict': {
'id': '193-132', 'id': '193-132',
'title': '真心が届く~僕とスターのオフィス・ラブ!?~', 'title': '真心が届く~僕とスターのオフィス・ラブ!?~',
'description': 'md5:9b59493d1f3a792bafbc7319258e7af8',
}, },
'playlist_mincount': 16, 'playlist_mincount': 16,
}, { }, {
'url': 'https://abema.tv/video/title/25-102', 'url': 'https://abema.tv/video/title/25-1nzan-whrxe',
'info_dict': { 'info_dict': {
'id': '25-102', 'id': '25-1nzan-whrxe',
'title': 'ソードアート・オンライン アリシゼーション', 'title': 'ソードアート・オンライン',
'description': 'md5:c094904052322e6978495532bdbf06e6',
}, },
'playlist_mincount': 24, 'playlist_mincount': 25,
}, {
'url': 'https://abema.tv/video/title/26-2mzbynr-cph?s=26-2mzbynr-cph_s40',
'info_dict': {
'title': '〈物語〉シリーズ',
'id': '26-2mzbynr-cph',
'description': 'md5:e67873de1c88f360af1f0a4b84847a52',
},
'playlist_count': 59,
}] }]
def _fetch_page(self, playlist_id, series_version, page): def _fetch_page(self, playlist_id, series_version, season_id, page):
programs = self._call_api(
f'v1/video/series/{playlist_id}/programs', playlist_id,
note=f'Downloading page {page + 1}',
query = { query = {
'seriesVersion': series_version, 'seriesVersion': series_version,
'offset': str(page * self._PAGE_SIZE), 'offset': str(page * self._PAGE_SIZE),
'order': 'seq', 'order': 'seq',
'limit': str(self._PAGE_SIZE), 'limit': str(self._PAGE_SIZE),
}) }
if season_id:
query['seasonId'] = season_id
programs = self._call_api(
f'v1/video/series/{playlist_id}/programs', playlist_id,
note=f'Downloading page {page + 1}',
query=query)
yield from ( yield from (
self.url_result(f'https://abema.tv/video/episode/{x}') self.url_result(f'https://abema.tv/video/episode/{x}')
for x in traverse_obj(programs, ('programs', ..., 'id'))) for x in traverse_obj(programs, ('programs', ..., 'id')))
def _entries(self, playlist_id, series_version): def _entries(self, playlist_id, series_version, season_id):
return OnDemandPagedList( return OnDemandPagedList(
functools.partial(self._fetch_page, playlist_id, series_version), functools.partial(self._fetch_page, playlist_id, series_version, season_id),
self._PAGE_SIZE) self._PAGE_SIZE)
def _real_extract(self, url): def _real_extract(self, url):
playlist_id = self._match_id(url) playlist_id, season_id = self._match_valid_url(url).group('id', 'season')
series_info = self._call_api(f'v1/video/series/{playlist_id}', playlist_id) series_info = self._call_api(f'v1/video/series/{playlist_id}', playlist_id)
return self.playlist_result( return self.playlist_result(
self._entries(playlist_id, series_info['version']), playlist_id=playlist_id, self._entries(playlist_id, series_info['version'], season_id), playlist_id=playlist_id,
playlist_title=series_info.get('title'), playlist_title=series_info.get('title'),
playlist_description=series_info.get('content')) playlist_description=series_info.get('content'))

View File

@ -43,14 +43,14 @@ class ACastIE(ACastBaseIE):
_VALID_URL = r'''(?x: _VALID_URL = r'''(?x:
https?:// https?://
(?: (?:
(?:(?:embed|www)\.)?acast\.com/| (?:(?:embed|www|shows)\.)?acast\.com/|
play\.acast\.com/s/ play\.acast\.com/s/
) )
(?P<channel>[^/]+)/(?P<id>[^/#?"]+) (?P<channel>[^/?#]+)/(?:episodes/)?(?P<id>[^/#?"]+)
)''' )'''
_EMBED_REGEX = [rf'(?x)<iframe[^>]+\bsrc=[\'"](?P<url>{_VALID_URL})'] _EMBED_REGEX = [rf'(?x)<iframe[^>]+\bsrc=[\'"](?P<url>{_VALID_URL})']
_TESTS = [{ _TESTS = [{
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna', 'url': 'https://shows.acast.com/sparpodcast/episodes/2.raggarmordet-rosterurdetforflutna',
'info_dict': { 'info_dict': {
'id': '2a92b283-1a75-4ad8-8396-499c641de0d9', 'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
'ext': 'mp3', 'ext': 'mp3',
@ -59,7 +59,7 @@ class ACastIE(ACastBaseIE):
'timestamp': 1477346700, 'timestamp': 1477346700,
'upload_date': '20161024', 'upload_date': '20161024',
'duration': 2766, 'duration': 2766,
'creator': 'Third Ear Studio', 'creators': ['Third Ear Studio'],
'series': 'Spår', 'series': 'Spår',
'episode': '2. Raggarmordet - Röster ur det förflutna', 'episode': '2. Raggarmordet - Röster ur det förflutna',
'thumbnail': 'https://assets.pippa.io/shows/616ebe1886d7b1398620b943/616ebe33c7e6e70013cae7da.jpg', 'thumbnail': 'https://assets.pippa.io/shows/616ebe1886d7b1398620b943/616ebe33c7e6e70013cae7da.jpg',
@ -74,6 +74,9 @@ class ACastIE(ACastBaseIE):
}, { }, {
'url': 'https://play.acast.com/s/rattegangspodden/s04e09styckmordetihelenelund-del2-2', 'url': 'https://play.acast.com/s/rattegangspodden/s04e09styckmordetihelenelund-del2-2',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
'only_matching': True,
}, { }, {
'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9', 'url': 'https://play.acast.com/s/sparpodcast/2a92b283-1a75-4ad8-8396-499c641de0d9',
'only_matching': True, 'only_matching': True,
@ -110,7 +113,7 @@ class ACastChannelIE(ACastBaseIE):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?: (?:
(?:www\.)?acast\.com/| (?:(?:www|shows)\.)?acast\.com/|
play\.acast\.com/s/ play\.acast\.com/s/
) )
(?P<id>[^/#?]+) (?P<id>[^/#?]+)
@ -120,12 +123,15 @@ class ACastChannelIE(ACastBaseIE):
'info_dict': { 'info_dict': {
'id': '4efc5294-5385-4847-98bd-519799ce5786', 'id': '4efc5294-5385-4847-98bd-519799ce5786',
'title': 'Today in Focus', 'title': 'Today in Focus',
'description': 'md5:c09ce28c91002ce4ffce71d6504abaae', 'description': 'md5:feca253de9947634605080cd9eeea2bf',
}, },
'playlist_mincount': 200, 'playlist_mincount': 200,
}, { }, {
'url': 'http://play.acast.com/s/ft-banking-weekly', 'url': 'http://play.acast.com/s/ft-banking-weekly',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://shows.acast.com/sparpodcast',
'only_matching': True,
}] }]
@classmethod @classmethod

View File

@ -4,7 +4,9 @@ import hashlib
import itertools import itertools
import json import json
import math import math
import random
import re import re
import string
import time import time
import urllib.parse import urllib.parse
import uuid import uuid
@ -1177,28 +1179,26 @@ class BilibiliSpaceBaseIE(BilibiliBaseIE):
class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE): class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
_VALID_URL = r'https?://space\.bilibili\.com/(?P<id>\d+)(?P<video>/video)?/?(?:[?#]|$)' _VALID_URL = r'https?://space\.bilibili\.com/(?P<id>\d+)(?P<video>(?:/upload)?/video)?/?(?:[?#]|$)'
_TESTS = [{ _TESTS = [{
'url': 'https://space.bilibili.com/3985676/video', 'url': 'https://space.bilibili.com/3985676/video',
'info_dict': { 'info_dict': {
'id': '3985676', 'id': '3985676',
}, },
'playlist_mincount': 178, 'playlist_mincount': 178,
'skip': 'login required',
}, { }, {
'url': 'https://space.bilibili.com/313580179/video', 'url': 'https://space.bilibili.com/313580179/video',
'info_dict': { 'info_dict': {
'id': '313580179', 'id': '313580179',
}, },
'playlist_mincount': 92, 'playlist_mincount': 92,
'skip': 'login required',
}] }]
def _real_extract(self, url): def _real_extract(self, url):
playlist_id, is_video_url = self._match_valid_url(url).group('id', 'video') playlist_id, is_video_url = self._match_valid_url(url).group('id', 'video')
if not is_video_url: if not is_video_url:
self.to_screen('A channel URL was given. Only the channel\'s videos will be downloaded. ' self.to_screen('A channel URL was given. Only the channel\'s videos will be downloaded. '
'To download audios, add a "/audio" to the URL') 'To download audios, add a "/upload/audio" to the URL')
def fetch_page(page_idx): def fetch_page(page_idx):
query = { query = {
@ -1211,6 +1211,12 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
'ps': 30, 'ps': 30,
'tid': 0, 'tid': 0,
'web_location': 1550101, 'web_location': 1550101,
'dm_img_list': '[]',
'dm_img_str': base64.b64encode(
''.join(random.choices(string.printable, k=random.randint(16, 64))).encode())[:-2].decode(),
'dm_cover_img_str': base64.b64encode(
''.join(random.choices(string.printable, k=random.randint(32, 128))).encode())[:-2].decode(),
'dm_img_inter': '{"ds":[],"wh":[6093,6631,31],"of":[430,760,380]}',
} }
try: try:
@ -1221,14 +1227,14 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
except ExtractorError as e: except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 412: if isinstance(e.cause, HTTPError) and e.cause.status == 412:
raise ExtractorError( raise ExtractorError(
'Request is blocked by server (412), please add cookies, wait and try later.', expected=True) 'Request is blocked by server (412), please wait and try later.', expected=True)
raise raise
status_code = response['code'] status_code = response['code']
if status_code == -401: if status_code == -401:
raise ExtractorError( raise ExtractorError(
'Request is blocked by server (401), please add cookies, wait and try later.', expected=True) 'Request is blocked by server (401), please wait and try later.', expected=True)
elif status_code == -352 and not self.is_logged_in: elif status_code == -352:
self.raise_login_required('Request is rejected, you need to login to access playlist') raise ExtractorError('Request is rejected by server (352)', expected=True)
elif status_code != 0: elif status_code != 0:
raise ExtractorError(f'Request failed ({status_code}): {response.get("message") or "Unknown error"}') raise ExtractorError(f'Request failed ({status_code}): {response.get("message") or "Unknown error"}')
return response['data'] return response['data']
@ -1250,9 +1256,9 @@ class BilibiliSpaceVideoIE(BilibiliSpaceBaseIE):
class BilibiliSpaceAudioIE(BilibiliSpaceBaseIE): class BilibiliSpaceAudioIE(BilibiliSpaceBaseIE):
_VALID_URL = r'https?://space\.bilibili\.com/(?P<id>\d+)/audio' _VALID_URL = r'https?://space\.bilibili\.com/(?P<id>\d+)/(?:upload/)?audio'
_TESTS = [{ _TESTS = [{
'url': 'https://space.bilibili.com/313580179/audio', 'url': 'https://space.bilibili.com/313580179/upload/audio',
'info_dict': { 'info_dict': {
'id': '313580179', 'id': '313580179',
}, },
@ -1275,7 +1281,8 @@ class BilibiliSpaceAudioIE(BilibiliSpaceBaseIE):
} }
def get_entries(page_data): def get_entries(page_data):
for entry in page_data.get('data', []): # data is None when the playlist is empty
for entry in page_data.get('data') or []:
yield self.url_result(f'https://www.bilibili.com/audio/au{entry["id"]}', BilibiliAudioIE, entry['id']) yield self.url_result(f'https://www.bilibili.com/audio/au{entry["id"]}', BilibiliAudioIE, entry['id'])
metadata, paged_list = self._extract_playlist(fetch_page, get_metadata, get_entries) metadata, paged_list = self._extract_playlist(fetch_page, get_metadata, get_entries)
@ -1299,30 +1306,43 @@ class BilibiliSpaceListBaseIE(BilibiliSpaceBaseIE):
class BilibiliCollectionListIE(BilibiliSpaceListBaseIE): class BilibiliCollectionListIE(BilibiliSpaceListBaseIE):
_VALID_URL = r'https?://space\.bilibili\.com/(?P<mid>\d+)/channel/collectiondetail/?\?sid=(?P<sid>\d+)' _VALID_URL = [
r'https?://space\.bilibili\.com/(?P<mid>\d+)/channel/collectiondetail/?\?sid=(?P<sid>\d+)',
r'https?://space\.bilibili\.com/(?P<mid>\d+)/lists/(?P<sid>\d+)',
]
_TESTS = [{ _TESTS = [{
'url': 'https://space.bilibili.com/2142762/channel/collectiondetail?sid=57445', 'url': 'https://space.bilibili.com/2142762/lists/3662502?type=season',
'info_dict': { 'info_dict': {
'id': '2142762_57445', 'id': '2142762_3662502',
'title': '【完结】《底特律 变人》全结局流程解说', 'title': '合集·《黑神话悟空》流程解说',
'description': '', 'description': '黑神话悟空 相关节目',
'uploader': '老戴在此', 'uploader': '老戴在此',
'uploader_id': '2142762', 'uploader_id': '2142762',
'timestamp': int, 'timestamp': int,
'upload_date': str, 'upload_date': str,
'thumbnail': 'https://archive.biliimg.com/bfs/archive/e0e543ae35ad3df863ea7dea526bc32e70f4c091.jpg', 'thumbnail': 'https://archive.biliimg.com/bfs/archive/22302e17dc849dd4533606d71bc89df162c3a9bf.jpg',
}, },
'playlist_mincount': 31, 'playlist_mincount': 62,
}, {
'url': 'https://space.bilibili.com/2142762/lists/3662502',
'only_matching': True,
}, {
'url': 'https://space.bilibili.com/2142762/channel/collectiondetail?sid=57445',
'only_matching': True,
}] }]
@classmethod
def suitable(cls, url):
return False if BilibiliSeriesListIE.suitable(url) else super().suitable(url)
def _real_extract(self, url): def _real_extract(self, url):
mid, sid = self._match_valid_url(url).group('mid', 'sid') mid, sid = self._match_valid_url(url).group('mid', 'sid')
playlist_id = f'{mid}_{sid}' playlist_id = f'{mid}_{sid}'
def fetch_page(page_idx): def fetch_page(page_idx):
return self._download_json( return self._download_json(
'https://api.bilibili.com/x/polymer/space/seasons_archives_list', 'https://api.bilibili.com/x/polymer/web-space/seasons_archives_list',
playlist_id, note=f'Downloading page {page_idx}', playlist_id, note=f'Downloading page {page_idx}', headers={'Referer': url},
query={'mid': mid, 'season_id': sid, 'page_num': page_idx + 1, 'page_size': 30})['data'] query={'mid': mid, 'season_id': sid, 'page_num': page_idx + 1, 'page_size': 30})['data']
def get_metadata(page_data): def get_metadata(page_data):
@ -1349,9 +1369,12 @@ class BilibiliCollectionListIE(BilibiliSpaceListBaseIE):
class BilibiliSeriesListIE(BilibiliSpaceListBaseIE): class BilibiliSeriesListIE(BilibiliSpaceListBaseIE):
_VALID_URL = r'https?://space\.bilibili\.com/(?P<mid>\d+)/channel/seriesdetail/?\?\bsid=(?P<sid>\d+)' _VALID_URL = [
r'https?://space\.bilibili\.com/(?P<mid>\d+)/channel/seriesdetail/?\?\bsid=(?P<sid>\d+)',
r'https?://space\.bilibili\.com/(?P<mid>\d+)/lists/(?P<sid>\d+)/?\?(?:[^#]+&)?type=series(?:[&#]|$)',
]
_TESTS = [{ _TESTS = [{
'url': 'https://space.bilibili.com/1958703906/channel/seriesdetail?sid=547718&ctype=0', 'url': 'https://space.bilibili.com/1958703906/lists/547718?type=series',
'info_dict': { 'info_dict': {
'id': '1958703906_547718', 'id': '1958703906_547718',
'title': '直播回放', 'title': '直播回放',
@ -1364,6 +1387,9 @@ class BilibiliSeriesListIE(BilibiliSpaceListBaseIE):
'modified_date': str, 'modified_date': str,
}, },
'playlist_mincount': 513, 'playlist_mincount': 513,
}, {
'url': 'https://space.bilibili.com/1958703906/channel/seriesdetail?sid=547718&ctype=0',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -1382,7 +1408,7 @@ class BilibiliSeriesListIE(BilibiliSpaceListBaseIE):
def fetch_page(page_idx): def fetch_page(page_idx):
return self._download_json( return self._download_json(
'https://api.bilibili.com/x/series/archives', 'https://api.bilibili.com/x/series/archives',
playlist_id, note=f'Downloading page {page_idx}', playlist_id, note=f'Downloading page {page_idx}', headers={'Referer': url},
query={'mid': mid, 'series_id': sid, 'pn': page_idx + 1, 'ps': 30})['data'] query={'mid': mid, 'series_id': sid, 'pn': page_idx + 1, 'ps': 30})['data']
def get_metadata(page_data): def get_metadata(page_data):
@ -1861,6 +1887,47 @@ class BiliBiliPlayerIE(InfoExtractor):
ie=BiliBiliIE.ie_key(), video_id=video_id) ie=BiliBiliIE.ie_key(), video_id=video_id)
class BiliBiliDynamicIE(InfoExtractor):
_VALID_URL = r'https?://(?:t\.bilibili\.com|(?:www\.)?bilibili\.com/opus)/(?P<id>\d+)'
_TESTS = [{
'url': 'https://t.bilibili.com/998134289197432852',
'info_dict': {
'id': 'BV1TAmBYVEJr',
'ext': 'mp4',
'uploader_id': '1192648858',
'comment_count': int,
'_old_archive_ids': ['bilibili 113457567568273_part1'],
'thumbnail': 'http://i2.hdslb.com/bfs/archive/50091efd965d9f13ff6814f7ad374f90ab21e77d.jpg',
'duration': 929.238,
'upload_date': '20241110',
'uploader': '何同学工作室',
'like_count': int,
'view_count': int,
'title': '美国小朋友就玩这个何同学工作室11月开箱',
'description': '本期产品信息:\n机器狗\n气味模拟器\nCloudboom Strike LS\n无弦吉他\n蓝牙磁带音箱\n神奇画板',
'timestamp': 1731232800,
'tags': list,
'chapters': list,
},
}]
def _real_extract(self, url):
post_id = self._match_id(url)
# Without the newer chrome UA, the API will return an error (-352)
post_data = self._download_json(
'https://api.bilibili.com/x/polymer/web-dynamic/v1/detail', post_id,
query={'id': post_id}, headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
})
video_url = traverse_obj(post_data, (
'data', 'item', (None, 'orig'), 'modules', 'module_dynamic',
(('major', ('archive', 'pgc')), ('additional', ('reserve', 'common'))),
'jump_url', {url_or_none}, any, {self._proto_relative_url}))
if not video_url or (self.suitable(video_url) and post_id == self._match_id(video_url)):
raise ExtractorError('No valid video URL found', expected=True)
return self.url_result(video_url)
class BiliIntlBaseIE(InfoExtractor): class BiliIntlBaseIE(InfoExtractor):
_API_URL = 'https://api.bilibili.tv/intl/gateway' _API_URL = 'https://api.bilibili.tv/intl/gateway'
_NETRC_MACHINE = 'biliintl' _NETRC_MACHINE = 'biliintl'

View File

@ -88,7 +88,7 @@ class BlueskyIE(InfoExtractor):
}, },
}, { }, {
'url': 'https://bsky.app/profile/de1.pds.tentacle.expert/post/3l3w4tnezek2e', 'url': 'https://bsky.app/profile/de1.pds.tentacle.expert/post/3l3w4tnezek2e',
'md5': '1af9c7fda061cf7593bbffca89e43d1c', 'md5': 'cc0110ed1f6b0247caac8234cc1e861d',
'info_dict': { 'info_dict': {
'id': '3l3w4tnezek2e', 'id': '3l3w4tnezek2e',
'ext': 'mp4', 'ext': 'mp4',
@ -133,6 +133,8 @@ class BlueskyIE(InfoExtractor):
'channel_follower_count': int, 'channel_follower_count': int,
'categories': ['Entertainment'], 'categories': ['Entertainment'],
'tags': [], 'tags': [],
'chapters': list,
'heatmap': 'count:100',
}, },
'add_ie': ['Youtube'], 'add_ie': ['Youtube'],
}, { }, {
@ -184,14 +186,14 @@ class BlueskyIE(InfoExtractor):
}, },
}, },
}, { }, {
'url': 'https://bsky.app/profile/alt.bun.how/post/3l7rdfxhyds2f', 'url': 'https://bsky.app/profile/cinny.bun.how/post/3l7rdfxhyds2f',
'md5': '8775118b235cf9fa6b5ad30f95cda75c', 'md5': '8775118b235cf9fa6b5ad30f95cda75c',
'info_dict': { 'info_dict': {
'id': '3l7rdfxhyds2f', 'id': '3l7rdfxhyds2f',
'ext': 'mp4', 'ext': 'mp4',
'uploader': 'cinnamon', 'uploader': 'cinnamon',
'uploader_id': 'alt.bun.how', 'uploader_id': 'cinny.bun.how',
'uploader_url': 'https://bsky.app/profile/alt.bun.how', 'uploader_url': 'https://bsky.app/profile/cinny.bun.how',
'channel_id': 'did:plc:7x6rtuenkuvxq3zsvffp2ide', 'channel_id': 'did:plc:7x6rtuenkuvxq3zsvffp2ide',
'channel_url': 'https://bsky.app/profile/did:plc:7x6rtuenkuvxq3zsvffp2ide', 'channel_url': 'https://bsky.app/profile/did:plc:7x6rtuenkuvxq3zsvffp2ide',
'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$', 'thumbnail': r're:https://video.bsky.app/watch/.*\.jpg$',
@ -284,17 +286,19 @@ class BlueskyIE(InfoExtractor):
services, ('service', lambda _, x: x['type'] == 'AtprotoPersonalDataServer', services, ('service', lambda _, x: x['type'] == 'AtprotoPersonalDataServer',
'serviceEndpoint', {url_or_none}, any)) or 'https://bsky.social' 'serviceEndpoint', {url_or_none}, any)) or 'https://bsky.social'
def _real_extract(self, url): def _extract_post(self, handle, post_id):
handle, video_id = self._match_valid_url(url).group('handle', 'id') return self._download_json(
post = self._download_json(
'https://public.api.bsky.app/xrpc/app.bsky.feed.getPostThread', 'https://public.api.bsky.app/xrpc/app.bsky.feed.getPostThread',
video_id, query={ post_id, query={
'uri': f'at://{handle}/app.bsky.feed.post/{video_id}', 'uri': f'at://{handle}/app.bsky.feed.post/{post_id}',
'depth': 0, 'depth': 0,
'parentHeight': 0, 'parentHeight': 0,
})['thread']['post'] })['thread']['post']
def _real_extract(self, url):
handle, video_id = self._match_valid_url(url).group('handle', 'id')
post = self._extract_post(handle, video_id)
entries = [] entries = []
# app.bsky.embed.video.view/app.bsky.embed.external.view # app.bsky.embed.video.view/app.bsky.embed.external.view
entries.extend(self._extract_videos(post, video_id)) entries.extend(self._extract_videos(post, video_id))
@ -341,6 +345,7 @@ class BlueskyIE(InfoExtractor):
formats.append({ formats.append({
'format_id': 'blob', 'format_id': 'blob',
'quality': 1,
'url': update_url_query( 'url': update_url_query(
self._BLOB_URL_TMPL.format(endpoint), {'did': did, 'cid': video_cid}), self._BLOB_URL_TMPL.format(endpoint), {'did': did, 'cid': video_cid}),
**traverse_obj(root, (*embed_path, 'aspectRatio', { **traverse_obj(root, (*embed_path, 'aspectRatio', {

View File

@ -1,692 +0,0 @@
import base64
import uuid
from .common import InfoExtractor
from ..networking import Request
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
float_or_none,
format_field,
int_or_none,
jwt_decode_hs256,
parse_age_limit,
parse_count,
parse_iso8601,
qualities,
time_seconds,
traverse_obj,
url_or_none,
urlencode_postdata,
)
class CrunchyrollBaseIE(InfoExtractor):
_BASE_URL = 'https://www.crunchyroll.com'
_API_BASE = 'https://api.crunchyroll.com'
_NETRC_MACHINE = 'crunchyroll'
_SWITCH_USER_AGENT = 'Crunchyroll/1.8.0 Nintendo Switch/12.3.12.0 UE4/4.27'
_REFRESH_TOKEN = None
_AUTH_HEADERS = None
_AUTH_EXPIRY = None
_API_ENDPOINT = None
_BASIC_AUTH = 'Basic ' + base64.b64encode(':'.join((
't-kdgp2h8c3jub8fn0fq',
'yfLDfMfrYvKXh4JXS1LEI2cCqu1v5Wan',
)).encode()).decode()
_IS_PREMIUM = None
_LOCALE_LOOKUP = {
'ar': 'ar-SA',
'de': 'de-DE',
'': 'en-US',
'es': 'es-419',
'es-es': 'es-ES',
'fr': 'fr-FR',
'it': 'it-IT',
'pt-br': 'pt-BR',
'pt-pt': 'pt-PT',
'ru': 'ru-RU',
'hi': 'hi-IN',
}
def _set_auth_info(self, response):
CrunchyrollBaseIE._IS_PREMIUM = 'cr_premium' in traverse_obj(response, ('access_token', {jwt_decode_hs256}, 'benefits', ...))
CrunchyrollBaseIE._AUTH_HEADERS = {'Authorization': response['token_type'] + ' ' + response['access_token']}
CrunchyrollBaseIE._AUTH_EXPIRY = time_seconds(seconds=traverse_obj(response, ('expires_in', {float_or_none}), default=300) - 10)
def _request_token(self, headers, data, note='Requesting token', errnote='Failed to request token'):
try:
return self._download_json(
f'{self._BASE_URL}/auth/v1/token', None, note=note, errnote=errnote,
headers=headers, data=urlencode_postdata(data), impersonate=True)
except ExtractorError as error:
if not isinstance(error.cause, HTTPError) or error.cause.status != 403:
raise
if target := error.cause.response.extensions.get('impersonate'):
raise ExtractorError(f'Got HTTP Error 403 when using impersonate target "{target}"')
raise ExtractorError(
'Request blocked by Cloudflare. '
'Install the required impersonation dependency if possible, '
'or else navigate to Crunchyroll in your browser, '
'then pass the fresh cookies (with --cookies-from-browser or --cookies) '
'and your browser\'s User-Agent (with --user-agent)', expected=True)
def _perform_login(self, username, password):
if not CrunchyrollBaseIE._REFRESH_TOKEN:
CrunchyrollBaseIE._REFRESH_TOKEN = self.cache.load(self._NETRC_MACHINE, username)
if CrunchyrollBaseIE._REFRESH_TOKEN:
return
try:
login_response = self._request_token(
headers={'Authorization': self._BASIC_AUTH}, data={
'username': username,
'password': password,
'grant_type': 'password',
'scope': 'offline_access',
}, note='Logging in', errnote='Failed to log in')
except ExtractorError as error:
if isinstance(error.cause, HTTPError) and error.cause.status == 401:
raise ExtractorError('Invalid username and/or password', expected=True)
raise
CrunchyrollBaseIE._REFRESH_TOKEN = login_response['refresh_token']
self.cache.store(self._NETRC_MACHINE, username, CrunchyrollBaseIE._REFRESH_TOKEN)
self._set_auth_info(login_response)
def _update_auth(self):
if CrunchyrollBaseIE._AUTH_HEADERS and CrunchyrollBaseIE._AUTH_EXPIRY > time_seconds():
return
auth_headers = {'Authorization': self._BASIC_AUTH}
if CrunchyrollBaseIE._REFRESH_TOKEN:
data = {
'refresh_token': CrunchyrollBaseIE._REFRESH_TOKEN,
'grant_type': 'refresh_token',
'scope': 'offline_access',
}
else:
data = {'grant_type': 'client_id'}
auth_headers['ETP-Anonymous-ID'] = uuid.uuid4()
try:
auth_response = self._request_token(auth_headers, data)
except ExtractorError as error:
username, password = self._get_login_info()
if not username or not isinstance(error.cause, HTTPError) or error.cause.status != 400:
raise
self.to_screen('Refresh token has expired. Re-logging in')
CrunchyrollBaseIE._REFRESH_TOKEN = None
self.cache.store(self._NETRC_MACHINE, username, None)
self._perform_login(username, password)
return
self._set_auth_info(auth_response)
def _locale_from_language(self, language):
config_locale = self._configuration_arg('metadata', ie_key=CrunchyrollBetaIE, casesense=True)
return config_locale[0] if config_locale else self._LOCALE_LOOKUP.get(language)
def _call_base_api(self, endpoint, internal_id, lang, note=None, query={}):
self._update_auth()
if not endpoint.startswith('/'):
endpoint = f'/{endpoint}'
query = query.copy()
locale = self._locale_from_language(lang)
if locale:
query['locale'] = locale
return self._download_json(
f'{self._BASE_URL}{endpoint}', internal_id, note or f'Calling API: {endpoint}',
headers=CrunchyrollBaseIE._AUTH_HEADERS, query=query)
def _call_api(self, path, internal_id, lang, note='api', query={}):
if not path.startswith(f'/content/v2/{self._API_ENDPOINT}/'):
path = f'/content/v2/{self._API_ENDPOINT}/{path}'
try:
result = self._call_base_api(
path, internal_id, lang, f'Downloading {note} JSON ({self._API_ENDPOINT})', query=query)
except ExtractorError as error:
if isinstance(error.cause, HTTPError) and error.cause.status == 404:
return None
raise
if not result:
raise ExtractorError(f'Unexpected response when downloading {note} JSON')
return result
def _extract_chapters(self, internal_id):
# if no skip events are available, a 403 xml error is returned
skip_events = self._download_json(
f'https://static.crunchyroll.com/skip-events/production/{internal_id}.json',
internal_id, note='Downloading chapter info', fatal=False, errnote=False)
if not skip_events:
return None
chapters = []
for event in ('recap', 'intro', 'credits', 'preview'):
start = traverse_obj(skip_events, (event, 'start', {float_or_none}))
end = traverse_obj(skip_events, (event, 'end', {float_or_none}))
# some chapters have no start and/or ending time, they will just be ignored
if start is None or end is None:
continue
chapters.append({'title': event.capitalize(), 'start_time': start, 'end_time': end})
return chapters
def _extract_stream(self, identifier, display_id=None):
if not display_id:
display_id = identifier
self._update_auth()
headers = {**CrunchyrollBaseIE._AUTH_HEADERS, 'User-Agent': self._SWITCH_USER_AGENT}
try:
stream_response = self._download_json(
f'https://cr-play-service.prd.crunchyrollsvc.com/v1/{identifier}/console/switch/play',
display_id, note='Downloading stream info', errnote='Failed to download stream info', headers=headers)
except ExtractorError as error:
if self.get_param('ignore_no_formats_error'):
self.report_warning(error.orig_msg)
return [], {}
elif isinstance(error.cause, HTTPError) and error.cause.status == 420:
raise ExtractorError(
'You have reached the rate-limit for active streams; try again later', expected=True)
raise
available_formats = {'': ('', '', stream_response['url'])}
for hardsub_lang, stream in traverse_obj(stream_response, ('hardSubs', {dict.items}, lambda _, v: v[1]['url'])):
available_formats[hardsub_lang] = (f'hardsub-{hardsub_lang}', hardsub_lang, stream['url'])
requested_hardsubs = [('' if val == 'none' else val) for val in (self._configuration_arg('hardsub') or ['none'])]
hardsub_langs = [lang for lang in available_formats if lang]
if hardsub_langs and 'all' not in requested_hardsubs:
full_format_langs = set(requested_hardsubs)
self.to_screen(f'Available hardsub languages: {", ".join(hardsub_langs)}')
self.to_screen(
'To extract formats of a hardsub language, use '
'"--extractor-args crunchyrollbeta:hardsub=<language_code or all>". '
'See https://github.com/yt-dlp/yt-dlp#crunchyrollbeta-crunchyroll for more info',
only_once=True)
else:
full_format_langs = set(map(str.lower, available_formats))
audio_locale = traverse_obj(stream_response, ('audioLocale', {str}))
hardsub_preference = qualities(requested_hardsubs[::-1])
formats, subtitles = [], {}
for format_id, hardsub_lang, stream_url in available_formats.values():
if hardsub_lang.lower() in full_format_langs:
adaptive_formats, dash_subs = self._extract_mpd_formats_and_subtitles(
stream_url, display_id, mpd_id=format_id, headers=CrunchyrollBaseIE._AUTH_HEADERS,
fatal=False, note=f'Downloading {f"{format_id} " if hardsub_lang else ""}MPD manifest')
self._merge_subtitles(dash_subs, target=subtitles)
else:
continue # XXX: Update this if meta mpd formats work; will be tricky with token invalidation
for f in adaptive_formats:
if f.get('acodec') != 'none':
f['language'] = audio_locale
f['quality'] = hardsub_preference(hardsub_lang.lower())
formats.extend(adaptive_formats)
for locale, subtitle in traverse_obj(stream_response, (('subtitles', 'captions'), {dict.items}, ...)):
subtitles.setdefault(locale, []).append(traverse_obj(subtitle, {'url': 'url', 'ext': 'format'}))
# Invalidate stream token to avoid rate-limit
error_msg = 'Unable to invalidate stream token; you may experience rate-limiting'
if stream_token := stream_response.get('token'):
self._request_webpage(Request(
f'https://cr-play-service.prd.crunchyrollsvc.com/v1/token/{identifier}/{stream_token}/inactive',
headers=headers, method='PATCH'), display_id, 'Invalidating stream token', error_msg, fatal=False)
else:
self.report_warning(error_msg)
return formats, subtitles
class CrunchyrollCmsBaseIE(CrunchyrollBaseIE):
_API_ENDPOINT = 'cms'
_CMS_EXPIRY = None
def _call_cms_api_signed(self, path, internal_id, lang, note='api'):
if not CrunchyrollCmsBaseIE._CMS_EXPIRY or CrunchyrollCmsBaseIE._CMS_EXPIRY <= time_seconds():
response = self._call_base_api('index/v2', None, lang, 'Retrieving signed policy')['cms_web']
CrunchyrollCmsBaseIE._CMS_QUERY = {
'Policy': response['policy'],
'Signature': response['signature'],
'Key-Pair-Id': response['key_pair_id'],
}
CrunchyrollCmsBaseIE._CMS_BUCKET = response['bucket']
CrunchyrollCmsBaseIE._CMS_EXPIRY = parse_iso8601(response['expires']) - 10
if not path.startswith('/cms/v2'):
path = f'/cms/v2{CrunchyrollCmsBaseIE._CMS_BUCKET}/{path}'
return self._call_base_api(
path, internal_id, lang, f'Downloading {note} JSON (signed cms)', query=CrunchyrollCmsBaseIE._CMS_QUERY)
class CrunchyrollBetaIE(CrunchyrollCmsBaseIE):
IE_NAME = 'crunchyroll'
_VALID_URL = r'''(?x)
https?://(?:beta\.|www\.)?crunchyroll\.com/
(?:(?P<lang>\w{2}(?:-\w{2})?)/)?
watch/(?!concert|musicvideo)(?P<id>\w+)'''
_TESTS = [{
# Premium only
'url': 'https://www.crunchyroll.com/watch/GY2P1Q98Y/to-the-future',
'info_dict': {
'id': 'GY2P1Q98Y',
'ext': 'mp4',
'duration': 1380.241,
'timestamp': 1459632600,
'description': 'md5:a022fbec4fbb023d43631032c91ed64b',
'title': 'World Trigger Episode 73 To the Future',
'upload_date': '20160402',
'series': 'World Trigger',
'series_id': 'GR757DMKY',
'season': 'World Trigger',
'season_id': 'GR9P39NJ6',
'season_number': 1,
'episode': 'To the Future',
'episode_number': 73,
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'chapters': 'count:2',
'age_limit': 14,
'like_count': int,
'dislike_count': int,
},
'params': {
'skip_download': 'm3u8',
'extractor_args': {'crunchyrollbeta': {'hardsub': ['de-DE']}},
'format': 'bv[format_id~=hardsub]',
},
}, {
# Premium only
'url': 'https://www.crunchyroll.com/watch/GYE5WKQGR',
'info_dict': {
'id': 'GYE5WKQGR',
'ext': 'mp4',
'duration': 366.459,
'timestamp': 1476788400,
'description': 'md5:74b67283ffddd75f6e224ca7dc031e76',
'title': 'SHELTER Porter Robinson presents Shelter the Animation',
'upload_date': '20161018',
'series': 'SHELTER',
'series_id': 'GYGG09WWY',
'season': 'SHELTER',
'season_id': 'GR09MGK4R',
'season_number': 1,
'episode': 'Porter Robinson presents Shelter the Animation',
'episode_number': 0,
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'age_limit': 14,
'like_count': int,
'dislike_count': int,
},
'params': {'skip_download': True},
}, {
'url': 'https://www.crunchyroll.com/watch/GJWU2VKK3/cherry-blossom-meeting-and-a-coming-blizzard',
'info_dict': {
'id': 'GJWU2VKK3',
'ext': 'mp4',
'duration': 1420.054,
'description': 'md5:2d1c67c0ec6ae514d9c30b0b99a625cd',
'title': 'The Ice Guy and His Cool Female Colleague Episode 1 Cherry Blossom Meeting and a Coming Blizzard',
'series': 'The Ice Guy and His Cool Female Colleague',
'series_id': 'GW4HM75NP',
'season': 'The Ice Guy and His Cool Female Colleague',
'season_id': 'GY9PC21VE',
'season_number': 1,
'episode': 'Cherry Blossom Meeting and a Coming Blizzard',
'episode_number': 1,
'chapters': 'count:2',
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'timestamp': 1672839000,
'upload_date': '20230104',
'age_limit': 14,
'like_count': int,
'dislike_count': int,
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.crunchyroll.com/watch/GM8F313NQ',
'info_dict': {
'id': 'GM8F313NQ',
'ext': 'mp4',
'title': 'Garakowa -Restore the World-',
'description': 'md5:8d2f8b6b9dd77d87810882e7d2ee5608',
'duration': 3996.104,
'age_limit': 13,
'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
},
'params': {'skip_download': 'm3u8'},
'skip': 'no longer exists',
}, {
'url': 'https://www.crunchyroll.com/watch/G62PEZ2E6',
'info_dict': {
'id': 'G62PEZ2E6',
'description': 'md5:8d2f8b6b9dd77d87810882e7d2ee5608',
'age_limit': 13,
'duration': 65.138,
'title': 'Garakowa -Restore the World-',
},
'playlist_mincount': 5,
}, {
'url': 'https://www.crunchyroll.com/de/watch/GY2P1Q98Y',
'only_matching': True,
}, {
'url': 'https://beta.crunchyroll.com/pt-br/watch/G8WUN8VKP/the-ruler-of-conspiracy',
'only_matching': True,
}]
# We want to support lazy playlist filtering and movie listings cannot be inside a playlist
_RETURN_TYPE = 'video'
def _real_extract(self, url):
lang, internal_id = self._match_valid_url(url).group('lang', 'id')
# We need to use unsigned API call to allow ratings query string
response = traverse_obj(self._call_api(
f'objects/{internal_id}', internal_id, lang, 'object info', {'ratings': 'true'}), ('data', 0, {dict}))
if not response:
raise ExtractorError(f'No video with id {internal_id} could be found (possibly region locked?)', expected=True)
object_type = response.get('type')
if object_type == 'episode':
result = self._transform_episode_response(response)
elif object_type == 'movie':
result = self._transform_movie_response(response)
elif object_type == 'movie_listing':
first_movie_id = traverse_obj(response, ('movie_listing_metadata', 'first_movie_id'))
if not self._yes_playlist(internal_id, first_movie_id):
return self.url_result(f'{self._BASE_URL}/{lang}watch/{first_movie_id}', CrunchyrollBetaIE, first_movie_id)
def entries():
movies = self._call_api(f'movie_listings/{internal_id}/movies', internal_id, lang, 'movie list')
for movie_response in traverse_obj(movies, ('data', ...)):
yield self.url_result(
f'{self._BASE_URL}/{lang}watch/{movie_response["id"]}',
CrunchyrollBetaIE, **self._transform_movie_response(movie_response))
return self.playlist_result(entries(), **self._transform_movie_response(response))
else:
raise ExtractorError(f'Unknown object type {object_type}')
if not self._IS_PREMIUM and traverse_obj(response, (f'{object_type}_metadata', 'is_premium_only')):
message = f'This {object_type} is for premium members only'
if CrunchyrollBaseIE._REFRESH_TOKEN:
self.raise_no_formats(message, expected=True, video_id=internal_id)
else:
self.raise_login_required(message, method='password', metadata_available=True)
else:
result['formats'], result['subtitles'] = self._extract_stream(internal_id)
result['chapters'] = self._extract_chapters(internal_id)
def calculate_count(item):
return parse_count(''.join((item['displayed'], item.get('unit') or '')))
result.update(traverse_obj(response, ('rating', {
'like_count': ('up', {calculate_count}),
'dislike_count': ('down', {calculate_count}),
})))
return result
@staticmethod
def _transform_episode_response(data):
metadata = traverse_obj(data, (('episode_metadata', None), {dict}), get_all=False) or {}
return {
'id': data['id'],
'title': ' \u2013 '.join((
('{}{}'.format(
format_field(metadata, 'season_title'),
format_field(metadata, 'episode', ' Episode %s'))),
format_field(data, 'title'))),
**traverse_obj(data, {
'episode': ('title', {str}),
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n')}),
'thumbnails': ('images', 'thumbnail', ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
}),
**traverse_obj(metadata, {
'duration': ('duration_ms', {float_or_none(scale=1000)}),
'timestamp': ('upload_date', {parse_iso8601}),
'series': ('series_title', {str}),
'series_id': ('series_id', {str}),
'season': ('season_title', {str}),
'season_id': ('season_id', {str}),
'season_number': ('season_number', ({int}, {float_or_none})),
'episode_number': ('sequence_number', ({int}, {float_or_none})),
'age_limit': ('maturity_ratings', -1, {parse_age_limit}),
'language': ('audio_locale', {str}),
}, get_all=False),
}
@staticmethod
def _transform_movie_response(data):
metadata = traverse_obj(data, (('movie_metadata', 'movie_listing_metadata', None), {dict}), get_all=False) or {}
return {
'id': data['id'],
**traverse_obj(data, {
'title': ('title', {str}),
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n')}),
'thumbnails': ('images', 'thumbnail', ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
}),
**traverse_obj(metadata, {
'duration': ('duration_ms', {float_or_none(scale=1000)}),
'age_limit': ('maturity_ratings', -1, {parse_age_limit}),
}),
}
class CrunchyrollBetaShowIE(CrunchyrollCmsBaseIE):
IE_NAME = 'crunchyroll:playlist'
_VALID_URL = r'''(?x)
https?://(?:beta\.|www\.)?crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
series/(?P<id>\w+)'''
_TESTS = [{
'url': 'https://www.crunchyroll.com/series/GY19NQ2QR/Girl-Friend-BETA',
'info_dict': {
'id': 'GY19NQ2QR',
'title': 'Girl Friend BETA',
'description': 'md5:99c1b22ee30a74b536a8277ced8eb750',
# XXX: `thumbnail` does not get set from `thumbnails` in playlist
# 'thumbnail': r're:^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'age_limit': 14,
},
'playlist_mincount': 10,
}, {
'url': 'https://beta.crunchyroll.com/it/series/GY19NQ2QR',
'only_matching': True,
}]
def _real_extract(self, url):
lang, internal_id = self._match_valid_url(url).group('lang', 'id')
def entries():
seasons_response = self._call_cms_api_signed(f'seasons?series_id={internal_id}', internal_id, lang, 'seasons')
for season in traverse_obj(seasons_response, ('items', ..., {dict})):
episodes_response = self._call_cms_api_signed(
f'episodes?season_id={season["id"]}', season['id'], lang, 'episode list')
for episode_response in traverse_obj(episodes_response, ('items', ..., {dict})):
yield self.url_result(
f'{self._BASE_URL}/{lang}watch/{episode_response["id"]}',
CrunchyrollBetaIE, **CrunchyrollBetaIE._transform_episode_response(episode_response))
return self.playlist_result(
entries(), internal_id,
**traverse_obj(self._call_api(f'series/{internal_id}', internal_id, lang, 'series'), ('data', 0, {
'title': ('title', {str}),
'description': ('description', {lambda x: x.replace(r'\r\n', '\n')}),
'age_limit': ('maturity_ratings', -1, {parse_age_limit}),
'thumbnails': ('images', ..., ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
})))
class CrunchyrollMusicIE(CrunchyrollBaseIE):
IE_NAME = 'crunchyroll:music'
_VALID_URL = r'''(?x)
https?://(?:www\.)?crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
watch/(?P<type>concert|musicvideo)/(?P<id>\w+)'''
_TESTS = [{
'url': 'https://www.crunchyroll.com/de/watch/musicvideo/MV5B02C79',
'info_dict': {
'ext': 'mp4',
'id': 'MV5B02C79',
'display_id': 'egaono-hana',
'title': 'Egaono Hana',
'track': 'Egaono Hana',
'artists': ['Goose house'],
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'genres': ['J-Pop'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.crunchyroll.com/watch/musicvideo/MV88BB7F2C',
'info_dict': {
'ext': 'mp4',
'id': 'MV88BB7F2C',
'display_id': 'crossing-field',
'title': 'Crossing Field',
'track': 'Crossing Field',
'artists': ['LiSA'],
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'genres': ['Anime'],
},
'params': {'skip_download': 'm3u8'},
'skip': 'no longer exists',
}, {
'url': 'https://www.crunchyroll.com/watch/concert/MC2E2AC135',
'info_dict': {
'ext': 'mp4',
'id': 'MC2E2AC135',
'display_id': 'live-is-smile-always-364joker-at-yokohama-arena',
'title': 'LiVE is Smile Always-364+JOKER- at YOKOHAMA ARENA',
'track': 'LiVE is Smile Always-364+JOKER- at YOKOHAMA ARENA',
'artists': ['LiSA'],
'thumbnail': r're:(?i)^https://www.crunchyroll.com/imgsrv/.*\.jpeg?$',
'description': 'md5:747444e7e6300907b7a43f0a0503072e',
'genres': ['J-Pop'],
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.crunchyroll.com/de/watch/musicvideo/MV5B02C79/egaono-hana',
'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/watch/concert/MC2E2AC135/live-is-smile-always-364joker-at-yokohama-arena',
'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/watch/musicvideo/MV88BB7F2C/crossing-field',
'only_matching': True,
}]
_API_ENDPOINT = 'music'
def _real_extract(self, url):
lang, internal_id, object_type = self._match_valid_url(url).group('lang', 'id', 'type')
path, name = {
'concert': ('concerts', 'concert info'),
'musicvideo': ('music_videos', 'music video info'),
}[object_type]
response = traverse_obj(self._call_api(f'{path}/{internal_id}', internal_id, lang, name), ('data', 0, {dict}))
if not response:
raise ExtractorError(f'No video with id {internal_id} could be found (possibly region locked?)', expected=True)
result = self._transform_music_response(response)
if not self._IS_PREMIUM and response.get('isPremiumOnly'):
message = f'This {response.get("type") or "media"} is for premium members only'
if CrunchyrollBaseIE._REFRESH_TOKEN:
self.raise_no_formats(message, expected=True, video_id=internal_id)
else:
self.raise_login_required(message, method='password', metadata_available=True)
else:
result['formats'], _ = self._extract_stream(f'music/{internal_id}', internal_id)
return result
@staticmethod
def _transform_music_response(data):
return {
'id': data['id'],
**traverse_obj(data, {
'display_id': 'slug',
'title': 'title',
'track': 'title',
'artists': ('artist', 'name', all),
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n') or None}),
'thumbnails': ('images', ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
'genres': ('genres', ..., 'displayValue'),
'age_limit': ('maturity_ratings', -1, {parse_age_limit}),
}),
}
class CrunchyrollArtistIE(CrunchyrollBaseIE):
IE_NAME = 'crunchyroll:artist'
_VALID_URL = r'''(?x)
https?://(?:www\.)?crunchyroll\.com/
(?P<lang>(?:\w{2}(?:-\w{2})?/)?)
artist/(?P<id>\w{10})'''
_TESTS = [{
'url': 'https://www.crunchyroll.com/artist/MA179CB50D',
'info_dict': {
'id': 'MA179CB50D',
'title': 'LiSA',
'genres': ['Anime', 'J-Pop', 'Rock'],
'description': 'md5:16d87de61a55c3f7d6c454b73285938e',
},
'playlist_mincount': 83,
}, {
'url': 'https://www.crunchyroll.com/artist/MA179CB50D/lisa',
'only_matching': True,
}]
_API_ENDPOINT = 'music'
def _real_extract(self, url):
lang, internal_id = self._match_valid_url(url).group('lang', 'id')
response = traverse_obj(self._call_api(
f'artists/{internal_id}', internal_id, lang, 'artist info'), ('data', 0))
def entries():
for attribute, path in [('concerts', 'concert'), ('videos', 'musicvideo')]:
for internal_id in traverse_obj(response, (attribute, ...)):
yield self.url_result(f'{self._BASE_URL}/watch/{path}/{internal_id}', CrunchyrollMusicIE, internal_id)
return self.playlist_result(entries(), **self._transform_artist_response(response))
@staticmethod
def _transform_artist_response(data):
return {
'id': data['id'],
**traverse_obj(data, {
'title': 'name',
'description': ('description', {str}, {lambda x: x.replace(r'\r\n', '\n')}),
'thumbnails': ('images', ..., ..., {
'url': ('source', {url_or_none}),
'width': ('width', {int_or_none}),
'height': ('height', {int_or_none}),
}),
'genres': ('genres', ..., 'displayValue'),
}),
}

View File

@ -8,28 +8,29 @@ from ..utils import (
str_or_none, str_or_none,
update_url_query, update_url_query,
) )
from ..utils.traversal import traverse_obj
class CWTVIE(InfoExtractor): class CWTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?cw(?:tv(?:pr)?|seed)\.com/(?:shows/)?(?:[^/]+/)+[^?]*\?.*\b(?:play|watch)=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})' _VALID_URL = r'https?://(?:www\.)?cw(?:tv(?:pr)?|seed)\.com/(?:shows/)?(?:[^/]+/)+[^?]*\?.*\b(?:play|watch)=(?P<id>[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12})'
_TESTS = [{ _TESTS = [{
'url': 'https://www.cwtv.com/shows/all-american-homecoming/ready-or-not/?play=d848488f-f62a-40fd-af1f-6440b1821aab', 'url': 'https://www.cwtv.com/shows/continuum/a-stitch-in-time/?play=9149a1e1-4cb2-46d7-81b2-47d35bbd332b',
'info_dict': { 'info_dict': {
'id': 'd848488f-f62a-40fd-af1f-6440b1821aab', 'id': '9149a1e1-4cb2-46d7-81b2-47d35bbd332b',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Ready Or Not', 'title': 'A Stitch in Time',
'description': 'Simone is concerned about changes taking place at Bringston; JR makes a decision about his future.', 'description': r're:(?s)City Protective Services officer Kiera Cameron is transported from 2077.+',
'thumbnail': r're:^https?://.*\.jpe?g$', 'thumbnail': r're:https?://.+\.jpe?g',
'duration': 2547, 'duration': 2632,
'timestamp': 1720519200, 'timestamp': 1736928000,
'uploader': 'CWTV', 'uploader': 'CWTV',
'chapters': 'count:6', 'chapters': 'count:5',
'series': 'All American: Homecoming', 'series': 'Continuum',
'season_number': 3, 'season_number': 1,
'episode_number': 1, 'episode_number': 1,
'age_limit': 0, 'age_limit': 14,
'upload_date': '20240709', 'upload_date': '20250115',
'season': 'Season 3', 'season': 'Season 1',
'episode': 'Episode 1', 'episode': 'Episode 1',
}, },
'params': { 'params': {
@ -42,7 +43,7 @@ class CWTVIE(InfoExtractor):
'id': '6b15e985-9345-4f60-baf8-56e96be57c63', 'id': '6b15e985-9345-4f60-baf8-56e96be57c63',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Legends of Yesterday', 'title': 'Legends of Yesterday',
'description': 'Oliver and Barry Allen take Kendra Saunders and Carter Hall to a remote location to keep them hidden from Vandal Savage while they figure out how to defeat him.', 'description': r're:(?s)Oliver and Barry Allen take Kendra Saunders and Carter Hall to a remote.+',
'duration': 2665, 'duration': 2665,
'series': 'Arrow', 'series': 'Arrow',
'season_number': 4, 'season_number': 4,
@ -71,7 +72,7 @@ class CWTVIE(InfoExtractor):
'timestamp': 1444107300, 'timestamp': 1444107300,
'age_limit': 14, 'age_limit': 14,
'uploader': 'CWTV', 'uploader': 'CWTV',
'thumbnail': r're:^https?://.*\.jpe?g$', 'thumbnail': r're:https?://.+\.jpe?g',
'chapters': 'count:4', 'chapters': 'count:4',
'episode': 'Episode 20', 'episode': 'Episode 20',
'season': 'Season 11', 'season': 'Season 11',
@ -94,9 +95,9 @@ class CWTVIE(InfoExtractor):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
data = self._download_json( data = self._download_json(
f'https://images.cwtv.com/feed/mobileapp/video-meta/apiversion_12/guid_{video_id}', video_id) f'https://images.cwtv.com/feed/app-2/video-meta/apiversion_22/device_android/guid_{video_id}', video_id)
if data.get('result') != 'ok': if traverse_obj(data, 'result') != 'ok':
raise ExtractorError(data['msg'], expected=True) raise ExtractorError(traverse_obj(data, (('error_msg', 'msg'), {str}, any)), expected=True)
video_data = data['video'] video_data = data['video']
title = video_data['title'] title = video_data['title']
mpx_url = update_url_query( mpx_url = update_url_query(

View File

@ -135,7 +135,7 @@ class DropoutIE(InfoExtractor):
self.raise_login_required(method='any') self.raise_login_required(method='any')
raise ExtractorError(login_err, expected=True) raise ExtractorError(login_err, expected=True)
embed_url = self._search_regex(r'embed_url:\s*["\'](.+?)["\']', webpage, 'embed url') embed_url = self._html_search_regex(r'embed_url:\s*["\'](.+?)["\']', webpage, 'embed url')
thumbnail = self._og_search_thumbnail(webpage) thumbnail = self._og_search_thumbnail(webpage)
watch_info = get_element_by_id('watch-info', webpage) or '' watch_info = get_element_by_id('watch-info', webpage) or ''

155
yt_dlp/extractor/eggs.py Normal file
View File

@ -0,0 +1,155 @@
import secrets
from .common import InfoExtractor
from .youtube import YoutubeIE
from ..utils import (
int_or_none,
parse_iso8601,
str_or_none,
url_or_none,
)
from ..utils.traversal import traverse_obj
class EggsBaseIE(InfoExtractor):
_API_HEADERS = {
'Accept': '*/*',
'apVersion': '8.2.00',
'deviceName': 'Android',
}
def _real_initialize(self):
self._API_HEADERS['deviceId'] = secrets.token_hex(8)
def _call_api(self, endpoint, video_id):
return self._download_json(
f'https://app-front-api.eggs.mu/v1/{endpoint}', video_id,
headers=self._API_HEADERS)
def _extract_music_info(self, data):
if yt_url := traverse_obj(data, ('youtubeUrl', {url_or_none})):
return self.url_result(yt_url, ie=YoutubeIE)
artist_name = traverse_obj(data, ('artist', 'artistName', {str_or_none}))
music_id = traverse_obj(data, ('musicId', {str_or_none}))
webpage_url = None
if artist_name and music_id:
webpage_url = f'https://eggs.mu/artist/{artist_name}/song/{music_id}'
return {
'id': music_id,
'vcodec': 'none',
'webpage_url': webpage_url,
'extractor_key': EggsIE.ie_key(),
'extractor': EggsIE.IE_NAME,
**traverse_obj(data, {
'title': ('musicTitle', {str}),
'url': ('musicDataPath', {url_or_none}),
'uploader': ('artist', 'displayName', {str}),
'uploader_id': ('artist', 'artistId', {str_or_none}),
'thumbnail': ('imageDataPath', {url_or_none}),
'view_count': ('numberOfMusicPlays', {int_or_none}),
'like_count': ('numberOfLikes', {int_or_none}),
'comment_count': ('numberOfComments', {int_or_none}),
'composers': ('composer', {str}, all),
'tags': ('tags', ..., {str}),
'timestamp': ('releaseDate', {parse_iso8601}),
'artist': ('artist', 'displayName', {str}),
})}
class EggsIE(EggsBaseIE):
IE_NAME = 'eggs:single'
_VALID_URL = r'https?://eggs\.mu/artist/[^/?#]+/song/(?P<id>[\da-f-]+)'
_TESTS = [{
'url': 'https://eggs.mu/artist/32_sunny_girl/song/0e95fd1d-4d61-4d5b-8b18-6092c551da90',
'info_dict': {
'id': '0e95fd1d-4d61-4d5b-8b18-6092c551da90',
'ext': 'm4a',
'title': 'シネマと信号',
'uploader': 'Sunny Girl',
'thumbnail': r're:https?://.*\.jpg(?:\?.*)?$',
'uploader_id': '1607',
'like_count': int,
'timestamp': 1731327327,
'composers': ['橘高連太郎'],
'view_count': int,
'comment_count': int,
'artists': ['Sunny Girl'],
'upload_date': '20241111',
'tags': ['SunnyGirl', 'シネマと信号'],
},
}, {
'url': 'https://eggs.mu/artist/KAMO_3pband/song/1d4bc45f-1af6-47a9-8b30-a70cae350b4f',
'info_dict': {
'id': '80cLKA2wnoA',
'ext': 'mp4',
'title': 'KAMO「いい女だから」Audio',
'uploader': 'KAMO',
'live_status': 'not_live',
'channel_id': 'UCsHLBw2__5Q9y55skXPotOg',
'channel_follower_count': int,
'description': 'md5:d260da711ecbec3e720293dc11401b87',
'availability': 'public',
'uploader_id': '@KAMO_band',
'upload_date': '20240925',
'thumbnail': 'https://i.ytimg.com/vi/80cLKA2wnoA/maxresdefault.jpg',
'comment_count': int,
'channel_url': 'https://www.youtube.com/channel/UCsHLBw2__5Q9y55skXPotOg',
'view_count': int,
'duration': 151,
'like_count': int,
'channel': 'KAMO',
'playable_in_embed': True,
'uploader_url': 'https://www.youtube.com/@KAMO_band',
'tags': [],
'timestamp': 1727271121,
'age_limit': 0,
'categories': ['People & Blogs'],
},
'add_ie': ['Youtube'],
'params': {'skip_download': 'Youtube'},
}]
def _real_extract(self, url):
song_id = self._match_id(url)
json_data = self._call_api(f'musics/{song_id}', song_id)
return self._extract_music_info(json_data)
class EggsArtistIE(EggsBaseIE):
IE_NAME = 'eggs:artist'
_VALID_URL = r'https?://eggs\.mu/artist/(?P<id>\w+)/?(?:[?#&]|$)'
_TESTS = [{
'url': 'https://eggs.mu/artist/32_sunny_girl',
'info_dict': {
'id': '32_sunny_girl',
'thumbnail': 'https://image-pro.eggs.mu/profile/1607.jpeg?updated_at=2024-04-03T20%3A06%3A00%2B09%3A00',
'description': 'Muddy Mine / 東京高田馬場CLUB PHASE / Gt.Vo 橘高 連太郎 / Ba.Cho 小野 ゆうき / Dr 大森 りゅうひこ',
'title': 'Sunny Girl',
},
'playlist_mincount': 18,
}, {
'url': 'https://eggs.mu/artist/KAMO_3pband',
'info_dict': {
'id': 'KAMO_3pband',
'description': '川崎発3ピースバンド',
'thumbnail': 'https://image-pro.eggs.mu/profile/35217.jpeg?updated_at=2024-11-27T16%3A31%3A50%2B09%3A00',
'title': 'KAMO',
},
'playlist_mincount': 2,
}]
def _real_extract(self, url):
artist_id = self._match_id(url)
artist_data = self._call_api(f'artists/{artist_id}', artist_id)
song_data = self._call_api(f'artists/{artist_id}/musics', artist_id)
return self.playlist_result(
traverse_obj(song_data, ('data', ..., {dict}, {self._extract_music_info})),
playlist_id=artist_id, **traverse_obj(artist_data, {
'title': ('displayName', {str}),
'description': ('profile', {str}),
'thumbnail': ('imageDataPath', {url_or_none}),
}))

View File

@ -12,7 +12,7 @@ from ..utils import (
class FirstTVIE(InfoExtractor): class FirstTVIE(InfoExtractor):
IE_NAME = '1tv' IE_NAME = '1tv'
IE_DESC = 'Первый канал' IE_DESC = 'Первый канал'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>[^/?#]+)' _VALID_URL = r'https?://(?:www\.)?(?:sport)?1tv\.ru/(?:[^/?#]+/)+(?P<id>[^/?#]+)'
_TESTS = [{ _TESTS = [{
# single format # single format
@ -52,6 +52,9 @@ class FirstTVIE(InfoExtractor):
}, { }, {
'url': 'http://www.1tv.ru/shows/tochvtoch-supersezon/vystupleniya/evgeniy-dyatlov-vladimir-vysockiy-koni-priveredlivye-toch-v-toch-supersezon-fragment-vypuska-ot-06-11-2016', 'url': 'http://www.1tv.ru/shows/tochvtoch-supersezon/vystupleniya/evgeniy-dyatlov-vladimir-vysockiy-koni-priveredlivye-toch-v-toch-supersezon-fragment-vypuska-ot-06-11-2016',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.sport1tv.ru/sport/chempionat-rossii-po-figurnomu-kataniyu-2025',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1,349 +0,0 @@
import random
import re
import string
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
join_nonempty,
js_to_json,
make_archive_id,
orderedSet,
qualities,
str_or_none,
traverse_obj,
try_get,
urlencode_postdata,
)
class FunimationBaseIE(InfoExtractor):
_NETRC_MACHINE = 'funimation'
_REGION = None
_TOKEN = None
def _get_region(self):
region_cookie = self._get_cookies('https://www.funimation.com').get('region')
region = region_cookie.value if region_cookie else self.get_param('geo_bypass_country')
return region or traverse_obj(
self._download_json(
'https://geo-service.prd.funimationsvc.com/geo/v1/region/check', None, fatal=False,
note='Checking geo-location', errnote='Unable to fetch geo-location information'),
'region') or 'US'
def _perform_login(self, username, password):
if self._TOKEN:
return
try:
data = self._download_json(
'https://prod-api-funimationnow.dadcdigital.com/api/auth/login/',
None, 'Logging in', data=urlencode_postdata({
'username': username,
'password': password,
}))
FunimationBaseIE._TOKEN = data['token']
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 401:
error = self._parse_json(e.cause.response.read().decode(), None)['error']
raise ExtractorError(error, expected=True)
raise
class FunimationPageIE(FunimationBaseIE):
IE_NAME = 'funimation:page'
_VALID_URL = r'https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?:(?P<lang>[^/]+)/)?(?:shows|v)/(?P<show>[^/]+)/(?P<episode>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.funimation.com/shows/attack-on-titan-junior-high/broadcast-dub-preview/',
'info_dict': {
'id': '210050',
'ext': 'mp4',
'title': 'Broadcast Dub Preview',
# Other metadata is tested in FunimationIE
},
'params': {
'skip_download': 'm3u8',
},
'add_ie': ['Funimation'],
}, {
# Not available in US
'url': 'https://www.funimation.com/shows/hacksign/role-play/',
'only_matching': True,
}, {
# with lang code
'url': 'https://www.funimation.com/en/shows/hacksign/role-play/',
'only_matching': True,
}, {
'url': 'https://www.funimationnow.uk/shows/puzzle-dragons-x/drop-impact/simulcast/',
'only_matching': True,
}, {
'url': 'https://www.funimation.com/v/a-certain-scientific-railgun/super-powered-level-5',
'only_matching': True,
}]
def _real_initialize(self):
if not self._REGION:
FunimationBaseIE._REGION = self._get_region()
def _real_extract(self, url):
locale, show, episode = self._match_valid_url(url).group('lang', 'show', 'episode')
video_id = traverse_obj(self._download_json(
f'https://title-api.prd.funimationsvc.com/v1/shows/{show}/episodes/{episode}',
f'{show}_{episode}', query={
'deviceType': 'web',
'region': self._REGION,
'locale': locale or 'en',
}), ('videoList', ..., 'id'), get_all=False)
return self.url_result(f'https://www.funimation.com/player/{video_id}', FunimationIE.ie_key(), video_id)
class FunimationIE(FunimationBaseIE):
_VALID_URL = r'https?://(?:www\.)?funimation\.com/player/(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.funimation.com/player/210051',
'info_dict': {
'id': '210050',
'display_id': 'broadcast-dub-preview',
'ext': 'mp4',
'title': 'Broadcast Dub Preview',
'thumbnail': r're:https?://.*\.(?:jpg|png)',
'episode': 'Broadcast Dub Preview',
'episode_id': '210050',
'season': 'Extras',
'season_id': '166038',
'season_number': 99,
'series': 'Attack on Titan: Junior High',
'description': '',
'duration': 155,
},
'params': {
'skip_download': 'm3u8',
},
}, {
'note': 'player_id should be extracted with the relevent compat-opt',
'url': 'https://www.funimation.com/player/210051',
'info_dict': {
'id': '210051',
'display_id': 'broadcast-dub-preview',
'ext': 'mp4',
'title': 'Broadcast Dub Preview',
'thumbnail': r're:https?://.*\.(?:jpg|png)',
'episode': 'Broadcast Dub Preview',
'episode_id': '210050',
'season': 'Extras',
'season_id': '166038',
'season_number': 99,
'series': 'Attack on Titan: Junior High',
'description': '',
'duration': 155,
},
'params': {
'skip_download': 'm3u8',
'compat_opts': ['seperate-video-versions'],
},
}]
@staticmethod
def _get_experiences(episode):
for lang, lang_data in episode.get('languages', {}).items():
for video_data in lang_data.values():
for version, f in video_data.items():
yield lang, version.title(), f
def _get_episode(self, webpage, experience_id=None, episode_id=None, fatal=True):
""" Extract the episode, season and show objects given either episode/experience id """
show = self._parse_json(
self._search_regex(
r'show\s*=\s*({.+?})\s*;', webpage, 'show data', fatal=fatal),
experience_id, transform_source=js_to_json, fatal=fatal) or []
for season in show.get('seasons', []):
for episode in season.get('episodes', []):
if episode_id is not None:
if str(episode.get('episodePk')) == episode_id:
return episode, season, show
continue
for _, _, f in self._get_experiences(episode):
if f.get('experienceId') == experience_id:
return episode, season, show
if fatal:
raise ExtractorError('Unable to find episode information')
else:
self.report_warning('Unable to find episode information')
return {}, {}, {}
def _real_extract(self, url):
initial_experience_id = self._match_id(url)
webpage = self._download_webpage(
url, initial_experience_id, note=f'Downloading player webpage for {initial_experience_id}')
episode, season, show = self._get_episode(webpage, experience_id=int(initial_experience_id))
episode_id = str(episode['episodePk'])
display_id = episode.get('slug') or episode_id
formats, subtitles, thumbnails, duration = [], {}, [], 0
requested_languages, requested_versions = self._configuration_arg('language'), self._configuration_arg('version')
language_preference = qualities((requested_languages or [''])[::-1])
source_preference = qualities((requested_versions or ['uncut', 'simulcast'])[::-1])
only_initial_experience = 'seperate-video-versions' in self.get_param('compat_opts', [])
for lang, version, fmt in self._get_experiences(episode):
experience_id = str(fmt['experienceId'])
if ((only_initial_experience and experience_id != initial_experience_id)
or (requested_languages and lang.lower() not in requested_languages)
or (requested_versions and version.lower() not in requested_versions)):
continue
thumbnails.append({'url': fmt.get('poster')})
duration = max(duration, fmt.get('duration', 0))
format_name = f'{version} {lang} ({experience_id})'
self.extract_subtitles(
subtitles, experience_id, display_id=display_id, format_name=format_name,
episode=episode if experience_id == initial_experience_id else episode_id)
headers = {}
if self._TOKEN:
headers['Authorization'] = f'Token {self._TOKEN}'
page = self._download_json(
f'https://www.funimation.com/api/showexperience/{experience_id}/',
display_id, headers=headers, expected_status=403, query={
'pinst_id': ''.join(random.choices(string.digits + string.ascii_letters, k=8)),
}, note=f'Downloading {format_name} JSON')
sources = page.get('items') or []
if not sources:
error = try_get(page, lambda x: x['errors'][0], dict)
if error:
self.report_warning('{} said: Error {} - {}'.format(
self.IE_NAME, error.get('code'), error.get('detail') or error.get('title')))
else:
self.report_warning('No sources found for format')
current_formats = []
for source in sources:
source_url = source.get('src')
source_type = source.get('videoType') or determine_ext(source_url)
if source_type == 'm3u8':
current_formats.extend(self._extract_m3u8_formats(
source_url, display_id, 'mp4', m3u8_id='{}-{}'.format(experience_id, 'hls'), fatal=False,
note=f'Downloading {format_name} m3u8 information'))
else:
current_formats.append({
'format_id': f'{experience_id}-{source_type}',
'url': source_url,
})
for f in current_formats:
# TODO: Convert language to code
f.update({
'language': lang,
'format_note': version,
'source_preference': source_preference(version.lower()),
'language_preference': language_preference(lang.lower()),
})
formats.extend(current_formats)
if not formats and (requested_languages or requested_versions):
self.raise_no_formats(
'There are no video formats matching the requested languages/versions', expected=True, video_id=display_id)
self._remove_duplicate_formats(formats)
return {
'id': episode_id,
'_old_archive_ids': [make_archive_id(self, initial_experience_id)],
'display_id': display_id,
'duration': duration,
'title': episode['episodeTitle'],
'description': episode.get('episodeSummary'),
'episode': episode.get('episodeTitle'),
'episode_number': int_or_none(episode.get('episodeId')),
'episode_id': episode_id,
'season': season.get('seasonTitle'),
'season_number': int_or_none(season.get('seasonId')),
'season_id': str_or_none(season.get('seasonPk')),
'series': show.get('showTitle'),
'formats': formats,
'thumbnails': thumbnails,
'subtitles': subtitles,
'_format_sort_fields': ('lang', 'source'),
}
def _get_subtitles(self, subtitles, experience_id, episode, display_id, format_name):
if isinstance(episode, str):
webpage = self._download_webpage(
f'https://www.funimation.com/player/{experience_id}/', display_id,
fatal=False, note=f'Downloading player webpage for {format_name}')
episode, _, _ = self._get_episode(webpage, episode_id=episode, fatal=False)
for _, version, f in self._get_experiences(episode):
for source in f.get('sources'):
for text_track in source.get('textTracks'):
if not text_track.get('src'):
continue
sub_type = text_track.get('type').upper()
sub_type = sub_type if sub_type != 'FULL' else None
current_sub = {
'url': text_track['src'],
'name': join_nonempty(version, text_track.get('label'), sub_type, delim=' '),
}
lang = join_nonempty(text_track.get('language', 'und'),
version if version != 'Simulcast' else None,
sub_type, delim='_')
if current_sub not in subtitles.get(lang, []):
subtitles.setdefault(lang, []).append(current_sub)
return subtitles
class FunimationShowIE(FunimationBaseIE):
IE_NAME = 'funimation:show'
_VALID_URL = r'(?P<url>https?://(?:www\.)?funimation(?:\.com|now\.uk)/(?P<locale>[^/]+)?/?shows/(?P<id>[^/?#&]+))/?(?:[?#]|$)'
_TESTS = [{
'url': 'https://www.funimation.com/en/shows/sk8-the-infinity',
'info_dict': {
'id': '1315000',
'title': 'SK8 the Infinity',
},
'playlist_count': 13,
'params': {
'skip_download': True,
},
}, {
# without lang code
'url': 'https://www.funimation.com/shows/ouran-high-school-host-club/',
'info_dict': {
'id': '39643',
'title': 'Ouran High School Host Club',
},
'playlist_count': 26,
'params': {
'skip_download': True,
},
}]
def _real_initialize(self):
if not self._REGION:
FunimationBaseIE._REGION = self._get_region()
def _real_extract(self, url):
base_url, locale, display_id = self._match_valid_url(url).groups()
show_info = self._download_json(
'https://title-api.prd.funimationsvc.com/v2/shows/{}?region={}&deviceType=web&locale={}'.format(
display_id, self._REGION, locale or 'en'), display_id)
items_info = self._download_json(
'https://prod-api-funimationnow.dadcdigital.com/api/funimation/episodes/?limit=99999&title_id={}'.format(
show_info.get('id')), display_id)
vod_items = traverse_obj(items_info, ('items', ..., lambda k, _: re.match(r'(?i)mostRecent[AS]vod', k), 'item'))
return {
'_type': 'playlist',
'id': str_or_none(show_info['id']),
'title': show_info['name'],
'entries': orderedSet(
self.url_result(
'{}/{}'.format(base_url, vod_item.get('episodeSlug')), FunimationPageIE.ie_key(),
vod_item.get('episodeId'), vod_item.get('episodeName'))
for vod_item in sorted(vod_items, key=lambda x: x.get('episodeOrder', -1))),
}

View File

@ -1,40 +1,48 @@
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
clean_html,
int_or_none, int_or_none,
str_or_none, str_or_none,
traverse_obj, traverse_obj,
url_or_none,
) )
class GoodGameIE(InfoExtractor): class GoodGameIE(InfoExtractor):
IE_NAME = 'goodgame:stream' IE_NAME = 'goodgame:stream'
_VALID_URL = r'https?://goodgame\.ru/channel/(?P<id>\w+)' _VALID_URL = r'https?://goodgame\.ru/(?!channel/)(?P<id>[\w.*-]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://goodgame.ru/channel/Pomi/#autoplay', 'url': 'https://goodgame.ru/TGW#autoplay',
'info_dict': { 'info_dict': {
'id': 'pomi', 'id': '7998',
'ext': 'mp4', 'ext': 'mp4',
'title': r're:Reynor vs Special \(1/2,bo3\) Wardi Spring EU \- playoff \(финальный день\) \d{4}-\d{2}-\d{2} \d{2}:\d{2}$', 'channel_id': '7998',
'channel_id': '1644', 'title': r're:шоуматч Happy \(NE\) vs Fortitude \(UD\), потом ладдер и дс \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
'channel': 'Pomi', 'channel_url': 'https://goodgame.ru/TGW',
'channel_url': 'https://goodgame.ru/channel/Pomi/', 'thumbnail': 'https://hls.goodgame.ru/previews/7998_240.jpg',
'description': 'md5:4a87b775ee7b2b57bdccebe285bbe171', 'uploader': 'TGW',
'thumbnail': r're:^https?://.*\.jpg$', 'channel': 'JosephStalin',
'live_status': 'is_live', 'live_status': 'is_live',
'view_count': int, 'age_limit': 18,
'channel_follower_count': int,
'uploader_id': '2899',
'concurrent_view_count': int,
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
'skip': 'May not be online', }, {
'url': 'https://goodgame.ru/Mr.Gray',
'only_matching': True,
}, {
'url': 'https://goodgame.ru/HeDoPa3yMeHue*',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
channel_name = self._match_id(url) channel_name = self._match_id(url)
response = self._download_json(f'https://api2.goodgame.ru/v2/streams/{channel_name}', channel_name) response = self._download_json(f'https://goodgame.ru/api/4/users/{channel_name}/stream', channel_name)
player_id = response['channel']['gg_player_src'] player_id = response['streamkey']
formats, subtitles = [], {} formats, subtitles = [], {}
if response.get('status') == 'Live': if response.get('status'):
formats, subtitles = self._extract_m3u8_formats_and_subtitles( formats, subtitles = self._extract_m3u8_formats_and_subtitles(
f'https://hls.goodgame.ru/manifest/{player_id}_master.m3u8', f'https://hls.goodgame.ru/manifest/{player_id}_master.m3u8',
channel_name, 'mp4', live=True) channel_name, 'mp4', live=True)
@ -45,13 +53,17 @@ class GoodGameIE(InfoExtractor):
'id': player_id, 'id': player_id,
'formats': formats, 'formats': formats,
'subtitles': subtitles, 'subtitles': subtitles,
'title': traverse_obj(response, ('channel', 'title')),
'channel': channel_name,
'channel_id': str_or_none(traverse_obj(response, ('channel', 'id'))),
'channel_url': response.get('url'),
'description': clean_html(traverse_obj(response, ('channel', 'description'))),
'thumbnail': traverse_obj(response, ('channel', 'thumb')),
'is_live': bool(formats), 'is_live': bool(formats),
'view_count': int_or_none(response.get('viewers')), **traverse_obj(response, {
'age_limit': 18 if traverse_obj(response, ('channel', 'adult')) else None, 'title': ('title', {str}),
'channel': ('channelkey', {str}),
'channel_id': ('id', {str_or_none}),
'channel_url': ('link', {url_or_none}),
'uploader': ('streamer', 'username', {str}),
'uploader_id': ('streamer', 'id', {str_or_none}),
'thumbnail': ('preview', {url_or_none}, {self._proto_relative_url}),
'concurrent_view_count': ('viewers', {int_or_none}),
'channel_follower_count': ('followers', {int_or_none}),
'age_limit': ('adult', {bool}, {lambda x: 18 if x else None}),
}),
} }

View File

@ -39,7 +39,7 @@ class LaracastsBaseIE(InfoExtractor):
'description': ('body', {clean_html}), 'description': ('body', {clean_html}),
'thumbnail': ('largeThumbnail', {url_or_none}), 'thumbnail': ('largeThumbnail', {url_or_none}),
'duration': ('length', {int_or_none}), 'duration': ('length', {int_or_none}),
'date': ('dateSegments', 'published', {unified_strdate}), 'upload_date': ('dateSegments', 'published', {unified_strdate}),
})) }))
@ -54,7 +54,7 @@ class LaracastsIE(LaracastsBaseIE):
'title': 'Hello, Laravel', 'title': 'Hello, Laravel',
'ext': 'mp4', 'ext': 'mp4',
'duration': 519, 'duration': 519,
'date': '20240312', 'upload_date': '20240312',
'thumbnail': 'https://laracasts.s3.amazonaws.com/videos/thumbnails/youtube/30-days-to-learn-laravel-11-1.png', 'thumbnail': 'https://laracasts.s3.amazonaws.com/videos/thumbnails/youtube/30-days-to-learn-laravel-11-1.png',
'description': 'md5:ddd658bb241975871d236555657e1dd1', 'description': 'md5:ddd658bb241975871d236555657e1dd1',
'season_number': 1, 'season_number': 1,

View File

@ -310,7 +310,13 @@ class LBRYIE(LBRYBaseIE):
if stream_type in self._SUPPORTED_STREAM_TYPES: if stream_type in self._SUPPORTED_STREAM_TYPES:
claim_id, is_live = result['claim_id'], False claim_id, is_live = result['claim_id'], False
streaming_url = self._call_api_proxy( streaming_url = self._call_api_proxy(
'get', claim_id, {'uri': uri}, 'streaming url')['streaming_url'] 'get', claim_id, {
'uri': uri,
**traverse_obj(parse_qs(url), {
'signature': ('signature', 0),
'signature_ts': ('signature_ts', 0),
}),
}, 'streaming url')['streaming_url']
# GET request to v3 API returns original video/audio file if available # GET request to v3 API returns original video/audio file if available
direct_url = re.sub(r'/api/v\d+/', '/api/v3/', streaming_url) direct_url = re.sub(r'/api/v\d+/', '/api/v3/', streaming_url)

View File

@ -72,6 +72,7 @@ class NaverBaseIE(InfoExtractor):
'abr': int_or_none(bitrate.get('audio')), 'abr': int_or_none(bitrate.get('audio')),
'filesize': int_or_none(stream.get('size')), 'filesize': int_or_none(stream.get('size')),
'protocol': 'm3u8_native' if stream_type == 'HLS' else None, 'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
'extra_param_to_segment_url': urllib.parse.urlencode(query, doseq=True) if stream_type == 'HLS' else None,
}) })
extract_formats(get_list('video'), 'H264') extract_formats(get_list('video'), 'H264')
@ -168,6 +169,26 @@ class NaverIE(NaverBaseIE):
'duration': 277, 'duration': 277,
'thumbnail': r're:^https?://.*\.jpg', 'thumbnail': r're:^https?://.*\.jpg',
}, },
}, {
'url': 'https://tv.naver.com/v/67838091',
'md5': '126ea384ab033bca59672c12cca7a6be',
'info_dict': {
'id': '67838091',
'ext': 'mp4',
'title': '[라인W 날씨] 내일 아침 서울 체감 -19도…호남·충남 대설',
'description': 'md5:fe026e25634c85845698aed4b59db5a7',
'timestamp': 1736347853,
'upload_date': '20250108',
'uploader': 'KBS뉴스',
'uploader_id': 'kbsnews',
'uploader_url': 'https://tv.naver.com/kbsnews',
'view_count': int,
'like_count': int,
'comment_count': int,
'duration': 69,
'thumbnail': r're:^https?://.*\.jpg',
},
'params': {'format': 'HLS_144P'},
}, { }, {
'url': 'http://tvcast.naver.com/v/81652', 'url': 'http://tvcast.naver.com/v/81652',
'only_matching': True, 'only_matching': True,

117
yt_dlp/extractor/nest.py Normal file
View File

@ -0,0 +1,117 @@
from .common import InfoExtractor
from ..utils import ExtractorError, float_or_none, update_url_query, url_or_none
from ..utils.traversal import traverse_obj
class NestIE(InfoExtractor):
_VALID_URL = r'https?://video\.nest\.com/(?:embedded/)?live/(?P<id>\w+)'
_EMBED_REGEX = [rf'<iframe [^>]*\bsrc=[\'"](?P<url>{_VALID_URL})']
_TESTS = [{
'url': 'https://video.nest.com/embedded/live/4fvYdSo8AX?autoplay=0',
'info_dict': {
'id': '4fvYdSo8AX',
'ext': 'mp4',
'title': 'startswith:Outside ',
'alt_title': 'Outside',
'description': '<null>',
'location': 'Los Angeles',
'availability': 'public',
'thumbnail': r're:https?://',
'live_status': 'is_live',
},
'params': {
# m3u8 download
'skip_download': True,
},
}, {
'url': 'https://video.nest.com/live/4fvYdSo8AX',
'only_matching': True,
}]
_WEBPAGE_TESTS = [{
'url': 'https://www.pacificblue.biz/noyo-harbor-webcam/',
'info_dict': {
'id': '4fvYdSo8AX',
'ext': 'mp4',
'title': 'startswith:Outside ',
'alt_title': 'Outside',
'description': '<null>',
'location': 'Los Angeles',
'availability': 'public',
'thumbnail': r're:https?://',
'live_status': 'is_live',
},
'params': {
# m3u8 download
'skip_download': True,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
item = self._download_json(
'https://video.nest.com/api/dropcam/cameras.get_by_public_token',
video_id, query={'token': video_id})['items'][0]
uuid = item.get('uuid')
stream_domain = item.get('live_stream_host')
if not stream_domain or not uuid:
raise ExtractorError('Unable to construct playlist URL')
thumb_domain = item.get('nexus_api_nest_domain_host')
return {
'id': video_id,
**traverse_obj(item, {
'description': ('description', {str}),
'title': (('title', 'name', 'where'), {str}, filter, any),
'alt_title': ('name', {str}),
'location': ((('timezone', {lambda x: x.split('/')[1].replace('_', ' ')}), 'where'), {str}, filter, any),
}),
'thumbnail': update_url_query(
f'https://{thumb_domain}/get_image',
{'uuid': uuid, 'public': video_id}) if thumb_domain else None,
'availability': self._availability(is_private=item.get('is_public') is False),
'formats': self._extract_m3u8_formats(
f'https://{stream_domain}/nexus_aac/{uuid}/playlist.m3u8',
video_id, 'mp4', live=True, query={'public': video_id}),
'is_live': True,
}
class NestClipIE(InfoExtractor):
_VALID_URL = r'https?://video\.nest\.com/(?:embedded/)?clip/(?P<id>\w+)'
_EMBED_REGEX = [rf'<iframe [^>]*\bsrc=[\'"](?P<url>{_VALID_URL})']
_TESTS = [{
'url': 'https://video.nest.com/clip/f34c9dd237a44eca9a0001af685e3dff',
'info_dict': {
'id': 'f34c9dd237a44eca9a0001af685e3dff',
'ext': 'mp4',
'title': 'NestClip video #f34c9dd237a44eca9a0001af685e3dff',
'thumbnail': 'https://clips.dropcam.com/f34c9dd237a44eca9a0001af685e3dff.jpg',
'timestamp': 1735413474.468,
'upload_date': '20241228',
},
}, {
'url': 'https://video.nest.com/embedded/clip/34e0432adc3c46a98529443d8ad5aa76',
'info_dict': {
'id': '34e0432adc3c46a98529443d8ad5aa76',
'ext': 'mp4',
'title': 'Shootout at Veterans Boulevard at Fleur De Lis Drive',
'thumbnail': 'https://clips.dropcam.com/34e0432adc3c46a98529443d8ad5aa76.jpg',
'upload_date': '20230817',
'timestamp': 1692262897.191,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://video.nest.com/api/dropcam/videos.get_by_filename', video_id,
query={'filename': f'{video_id}.mp4'})
return {
'id': video_id,
**traverse_obj(data, ('items', 0, {
'title': ('title', {str}),
'thumbnail': ('thumbnail_url', {url_or_none}),
'url': ('download_url', {url_or_none}),
'timestamp': ('start_time', {float_or_none}),
})),
}

View File

@ -592,8 +592,8 @@ class NiconicoPlaylistBaseIE(InfoExtractor):
@staticmethod @staticmethod
def _parse_owner(item): def _parse_owner(item):
return { return {
'uploader': traverse_obj(item, ('owner', 'name')), 'uploader': traverse_obj(item, ('owner', ('name', ('user', 'nickname')), {str}, any)),
'uploader_id': traverse_obj(item, ('owner', 'id')), 'uploader_id': traverse_obj(item, ('owner', 'id', {str})),
} }
def _fetch_page(self, list_id, page): def _fetch_page(self, list_id, page):
@ -666,7 +666,7 @@ class NiconicoPlaylistIE(NiconicoPlaylistBaseIE):
mylist.get('name'), mylist.get('description'), **self._parse_owner(mylist)) mylist.get('name'), mylist.get('description'), **self._parse_owner(mylist))
class NiconicoSeriesIE(InfoExtractor): class NiconicoSeriesIE(NiconicoPlaylistBaseIE):
IE_NAME = 'niconico:series' IE_NAME = 'niconico:series'
_VALID_URL = r'https?://(?:(?:www\.|sp\.)?nicovideo\.jp(?:/user/\d+)?|nico\.ms)/series/(?P<id>\d+)' _VALID_URL = r'https?://(?:(?:www\.|sp\.)?nicovideo\.jp(?:/user/\d+)?|nico\.ms)/series/(?P<id>\d+)'
@ -675,6 +675,9 @@ class NiconicoSeriesIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '110226', 'id': '110226',
'title': 'ご立派ァ!のシリーズ', 'title': 'ご立派ァ!のシリーズ',
'description': '楽しそうな外人の吹き替えをさせたら終身名誉ホモガキの右に出る人はいませんね…',
'uploader': 'アルファるふぁ',
'uploader_id': '44113208',
}, },
'playlist_mincount': 10, 'playlist_mincount': 10,
}, { }, {
@ -682,6 +685,9 @@ class NiconicoSeriesIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '12312', 'id': '12312',
'title': 'バトルスピリッツ お勧めカード紹介(調整中)', 'title': 'バトルスピリッツ お勧めカード紹介(調整中)',
'description': '',
'uploader': '野鳥',
'uploader_id': '2275360',
}, },
'playlist_mincount': 103, 'playlist_mincount': 103,
}, { }, {
@ -689,19 +695,21 @@ class NiconicoSeriesIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
def _call_api(self, list_id, resource, query):
return self._download_json(
f'https://nvapi.nicovideo.jp/v2/series/{list_id}', list_id,
f'Downloading {resource}', query=query,
headers=self._API_HEADERS)['data']
def _real_extract(self, url): def _real_extract(self, url):
list_id = self._match_id(url) list_id = self._match_id(url)
webpage = self._download_webpage(url, list_id) series = self._call_api(list_id, 'list', {
'pageSize': 1,
})['detail']
title = self._search_regex( return self.playlist_result(
(r'<title>「(.+)(全', self._entries(list_id), list_id,
r'<div class="TwitterShareButton"\s+data-text="(.+)\s+https:'), series.get('title'), series.get('description'), **self._parse_owner(series))
webpage, 'title', fatal=False)
if title:
title = unescapeHTML(title)
json_data = next(self._yield_json_ld(webpage, None, fatal=False))
return self.playlist_from_matches(
traverse_obj(json_data, ('itemListElement', ..., 'url')), list_id, title, ie=NiconicoIE)
class NiconicoHistoryIE(NiconicoPlaylistBaseIE): class NiconicoHistoryIE(NiconicoPlaylistBaseIE):

View File

@ -12,6 +12,7 @@ from ..utils import (
parse_iso8601, parse_iso8601,
str_or_none, str_or_none,
try_get, try_get,
update_url_query,
url_or_none, url_or_none,
urljoin, urljoin,
) )
@ -27,6 +28,12 @@ class NRKBaseIE(InfoExtractor):
)/''' )/'''
def _extract_nrk_formats(self, asset_url, video_id): def _extract_nrk_formats(self, asset_url, video_id):
asset_url = update_url_query(asset_url, {
# Remove 'adap' to return all streams (known values are: small, large, small_h265, large_h265)
'adap': [],
# Disable subtitles since they are fetched separately
's': 0,
})
if re.match(r'https?://[^/]+\.akamaihd\.net/i/', asset_url): if re.match(r'https?://[^/]+\.akamaihd\.net/i/', asset_url):
return self._extract_akamai_formats(asset_url, video_id) return self._extract_akamai_formats(asset_url, video_id)
asset_url = re.sub(r'(?:bw_(?:low|high)=\d+|no_audio_only)&?', '', asset_url) asset_url = re.sub(r'(?:bw_(?:low|high)=\d+|no_audio_only)&?', '', asset_url)
@ -58,7 +65,10 @@ class NRKBaseIE(InfoExtractor):
return self._download_json( return self._download_json(
urljoin('https://psapi.nrk.no/', path), urljoin('https://psapi.nrk.no/', path),
video_id, note or f'Downloading {item} JSON', video_id, note or f'Downloading {item} JSON',
fatal=fatal, query=query) fatal=fatal, query=query, headers={
# Needed for working stream URLs, see https://github.com/yt-dlp/yt-dlp/issues/12192
'Accept': 'application/vnd.nrk.psapi+json; version=9; player=tv-player; device=player-core',
})
class NRKIE(NRKBaseIE): class NRKIE(NRKBaseIE):
@ -77,13 +87,17 @@ class NRKIE(NRKBaseIE):
_TESTS = [{ _TESTS = [{
# video # video
'url': 'http://www.nrk.no/video/PS*150533', 'url': 'http://www.nrk.no/video/PS*150533',
'md5': 'f46be075326e23ad0e524edfcb06aeb6', 'md5': '2b88a652ad2e275591e61cf550887eec',
'info_dict': { 'info_dict': {
'id': '150533', 'id': '150533',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Dompap og andre fugler i Piip-Show', 'title': 'Dompap og andre fugler i Piip-Show',
'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f', 'description': 'md5:d9261ba34c43b61c812cb6b0269a5c8f',
'duration': 262, 'duration': 262,
'upload_date': '20140325',
'thumbnail': r're:^https?://gfx\.nrk\.no/.*$',
'timestamp': 1395751833,
'alt_title': 'md5:d9261ba34c43b61c812cb6b0269a5c8f',
}, },
}, { }, {
# audio # audio
@ -95,6 +109,10 @@ class NRKIE(NRKBaseIE):
'title': 'Slik høres internett ut når du er blind', 'title': 'Slik høres internett ut når du er blind',
'description': 'md5:a621f5cc1bd75c8d5104cb048c6b8568', 'description': 'md5:a621f5cc1bd75c8d5104cb048c6b8568',
'duration': 20, 'duration': 20,
'timestamp': 1398429565,
'alt_title': 'Cathrine Lie Wathne er blind, og bruker hurtigtaster for å navigere seg rundt på ulike nettsider.',
'thumbnail': 'https://gfx.nrk.no/urxQMSXF-WnbfjBH5ke2igLGyN27EdJVWZ6FOsEAclhA',
'upload_date': '20140425',
}, },
}, { }, {
'url': 'nrk:ecc1b952-96dc-4a98-81b9-5296dc7a98d9', 'url': 'nrk:ecc1b952-96dc-4a98-81b9-5296dc7a98d9',
@ -152,7 +170,7 @@ class NRKIE(NRKBaseIE):
return self._call_api(f'playback/{item}/{video_id}', video_id, item, query=query) return self._call_api(f'playback/{item}/{video_id}', video_id, item, query=query)
raise raise
# known values for preferredCdn: akamai, iponly, minicdn and telenor # known values for preferredCdn: akamai, globalconnect and telenor
manifest = call_playback_api('manifest', {'preferredCdn': 'akamai'}) manifest = call_playback_api('manifest', {'preferredCdn': 'akamai'})
video_id = try_get(manifest, lambda x: x['id'], str) or video_id video_id = try_get(manifest, lambda x: x['id'], str) or video_id
@ -307,6 +325,13 @@ class NRKTVIE(InfoExtractor):
'ext': 'vtt', 'ext': 'vtt',
}], }],
}, },
'upload_date': '20170627',
'timestamp': 1498591822,
'thumbnail': 'https://gfx.nrk.no/myRSc4vuFlahB60P3n6swwRTQUZI1LqJZl9B7icZFgzA',
'alt_title': 'md5:46923a6e6510eefcce23d5ef2a58f2ce',
},
'params': {
'skip_download': True,
}, },
}, { }, {
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014', 'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
@ -321,6 +346,13 @@ class NRKTVIE(InfoExtractor):
'series': '20 spørsmål', 'series': '20 spørsmål',
'episode': '23. mai 2014', 'episode': '23. mai 2014',
'age_limit': 0, 'age_limit': 0,
'timestamp': 1584593700,
'thumbnail': 'https://gfx.nrk.no/u7uCe79SEfPVGRAGVp2_uAZnNc4mfz_kjXg6Bgek8lMQ',
'season_id': '126936',
'upload_date': '20200319',
'season': 'Season 2014',
'season_number': 2014,
'episode_number': 3,
}, },
}, { }, {
'url': 'https://tv.nrk.no/program/mdfp15000514', 'url': 'https://tv.nrk.no/program/mdfp15000514',

View File

@ -63,6 +63,7 @@ class PatreonIE(PatreonBaseIE):
'info_dict': { 'info_dict': {
'id': '743933', 'id': '743933',
'ext': 'mp3', 'ext': 'mp3',
'alt_title': 'cd166.mp3',
'title': 'Episode 166: David Smalley of Dogma Debate', 'title': 'Episode 166: David Smalley of Dogma Debate',
'description': 'md5:34d207dd29aa90e24f1b3f58841b81c7', 'description': 'md5:34d207dd29aa90e24f1b3f58841b81c7',
'uploader': 'Cognitive Dissonance Podcast', 'uploader': 'Cognitive Dissonance Podcast',
@ -280,7 +281,7 @@ class PatreonIE(PatreonBaseIE):
video_id = self._match_id(url) video_id = self._match_id(url)
post = self._call_api( post = self._call_api(
f'posts/{video_id}', video_id, query={ f'posts/{video_id}', video_id, query={
'fields[media]': 'download_url,mimetype,size_bytes', 'fields[media]': 'download_url,mimetype,size_bytes,file_name',
'fields[post]': 'comment_count,content,embed,image,like_count,post_file,published_at,title,current_user_can_view', 'fields[post]': 'comment_count,content,embed,image,like_count,post_file,published_at,title,current_user_can_view',
'fields[user]': 'full_name,url', 'fields[user]': 'full_name,url',
'fields[post_tag]': 'value', 'fields[post_tag]': 'value',
@ -317,6 +318,7 @@ class PatreonIE(PatreonBaseIE):
'ext': ext, 'ext': ext,
'filesize': size_bytes, 'filesize': size_bytes,
'url': download_url, 'url': download_url,
'alt_title': traverse_obj(media_attributes, ('file_name', {str})),
}) })
elif include_type == 'user': elif include_type == 'user':

View File

@ -47,7 +47,7 @@ class PBSIE(InfoExtractor):
(r'video\.kpbs\.org', 'KPBS San Diego (KPBS)'), # http://www.kpbs.org/ (r'video\.kpbs\.org', 'KPBS San Diego (KPBS)'), # http://www.kpbs.org/
(r'video\.kqed\.org', 'KQED (KQED)'), # http://www.kqed.org (r'video\.kqed\.org', 'KQED (KQED)'), # http://www.kqed.org
(r'vids\.kvie\.org', 'KVIE Public Television (KVIE)'), # http://www.kvie.org (r'vids\.kvie\.org', 'KVIE Public Television (KVIE)'), # http://www.kvie.org
(r'video\.pbssocal\.org', 'PBS SoCal/KOCE (KOCE)'), # http://www.pbssocal.org/ (r'(?:video\.|www\.)pbssocal\.org', 'PBS SoCal/KOCE (KOCE)'), # http://www.pbssocal.org/
(r'video\.valleypbs\.org', 'ValleyPBS (KVPT)'), # http://www.valleypbs.org/ (r'video\.valleypbs\.org', 'ValleyPBS (KVPT)'), # http://www.valleypbs.org/
(r'video\.cptv\.org', 'CONNECTICUT PUBLIC TELEVISION (WEDH)'), # http://cptv.org (r'video\.cptv\.org', 'CONNECTICUT PUBLIC TELEVISION (WEDH)'), # http://cptv.org
(r'watch\.knpb\.org', 'KNPB Channel 5 (KNPB)'), # http://www.knpb.org/ (r'watch\.knpb\.org', 'KNPB Channel 5 (KNPB)'), # http://www.knpb.org/
@ -61,7 +61,7 @@ class PBSIE(InfoExtractor):
(r'video\.wyomingpbs\.org', 'Wyoming PBS (KCWC)'), # http://www.wyomingpbs.org (r'video\.wyomingpbs\.org', 'Wyoming PBS (KCWC)'), # http://www.wyomingpbs.org
(r'video\.cpt12\.org', 'Colorado Public Television / KBDI 12 (KBDI)'), # http://www.cpt12.org/ (r'video\.cpt12\.org', 'Colorado Public Television / KBDI 12 (KBDI)'), # http://www.cpt12.org/
(r'video\.kbyueleven\.org', 'KBYU-TV (KBYU)'), # http://www.kbyutv.org/ (r'video\.kbyueleven\.org', 'KBYU-TV (KBYU)'), # http://www.kbyutv.org/
(r'video\.thirteen\.org', 'Thirteen/WNET New York (WNET)'), # http://www.thirteen.org (r'(?:video\.|www\.)thirteen\.org', 'Thirteen/WNET New York (WNET)'), # http://www.thirteen.org
(r'video\.wgbh\.org', 'WGBH/Channel 2 (WGBH)'), # http://wgbh.org (r'video\.wgbh\.org', 'WGBH/Channel 2 (WGBH)'), # http://wgbh.org
(r'video\.wgby\.org', 'WGBY (WGBY)'), # http://www.wgby.org (r'video\.wgby\.org', 'WGBY (WGBY)'), # http://www.wgby.org
(r'watch\.njtvonline\.org', 'NJTV Public Media NJ (WNJT)'), # http://www.njtvonline.org/ (r'watch\.njtvonline\.org', 'NJTV Public Media NJ (WNJT)'), # http://www.njtvonline.org/
@ -185,12 +185,13 @@ class PBSIE(InfoExtractor):
_VALID_URL = r'''(?x)https?:// _VALID_URL = r'''(?x)https?://
(?: (?:
# Direct video URL
(?:{})/(?:(?:vir|port)alplayer|video)/(?P<id>[0-9]+)(?:[?/]|$) |
# Article with embedded player (or direct video)
(?:www\.)?pbs\.org/(?:[^/]+/){{1,5}}(?P<presumptive_id>[^/]+?)(?:\.html)?/?(?:$|[?\#]) |
# Player # Player
(?:video|player)\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/]+) (?:video|player)\.pbs\.org/(?:widget/)?partnerplayer/(?P<player_id>[^/?#]+) |
# Direct video URL, or article with embedded player
(?:{})/(?:
(?:(?:vir|port)alplayer|video)/(?P<id>[0-9]+)(?:[?/#]|$) |
(?:[^/?#]+/){{1,5}}(?P<presumptive_id>[^/?#]+?)(?:\.html)?/?(?:$|[?#])
)
) )
'''.format('|'.join(next(zip(*_STATIONS)))) '''.format('|'.join(next(zip(*_STATIONS))))
@ -207,16 +208,40 @@ class PBSIE(InfoExtractor):
'description': 'md5:31b664af3c65fd07fa460d306b837d00', 'description': 'md5:31b664af3c65fd07fa460d306b837d00',
'duration': 3190, 'duration': 3190,
}, },
'skip': 'dead URL',
},
{
'url': 'https://www.thirteen.org/programs/the-woodwrights-shop/carving-away-with-mary-may-tioglz/',
'info_dict': {
'id': '3004803331',
'ext': 'mp4',
'title': "The Woodwright's Shop - Carving Away with Mary May",
'description': 'md5:7cbaaaa8b9bcc78bd8f0e31911644e28',
'duration': 1606,
'display_id': 'carving-away-with-mary-may-tioglz',
'chapters': [],
'thumbnail': 'https://image.pbs.org/video-assets/NcnTxNl-asset-mezzanine-16x9-K0Keoyv.jpg',
},
}, },
{ {
'url': 'http://www.pbs.org/wgbh/pages/frontline/losing-iraq/', 'url': 'http://www.pbs.org/wgbh/pages/frontline/losing-iraq/',
'md5': '6f722cb3c3982186d34b0f13374499c7', 'md5': '372b12b670070de39438b946474df92f',
'info_dict': { 'info_dict': {
'id': '2365297690', 'id': '2365297690',
'ext': 'mp4', 'ext': 'mp4',
'title': 'FRONTLINE - Losing Iraq', 'title': 'FRONTLINE - Losing Iraq',
'description': 'md5:5979a4d069b157f622d02bff62fbe654', 'description': 'md5:5979a4d069b157f622d02bff62fbe654',
'duration': 5050, 'duration': 5050,
'chapters': [
{'start_time': 0.0, 'end_time': 1234.0, 'title': 'After Saddam, Chaos'},
{'start_time': 1233.0, 'end_time': 1719.0, 'title': 'The Insurgency Takes Root'},
{'start_time': 1718.0, 'end_time': 2461.0, 'title': 'A Light Footprint'},
{'start_time': 2460.0, 'end_time': 3589.0, 'title': 'The Surge '},
{'start_time': 3588.0, 'end_time': 4355.0, 'title': 'The Withdrawal '},
{'start_time': 4354.0, 'end_time': 5051.0, 'title': 'ISIS on the March '},
],
'display_id': 'losing-iraq',
'thumbnail': 'https://image.pbs.org/video-assets/pbs/frontline/138098/images/mezzanine_401.jpg',
}, },
}, },
{ {
@ -403,6 +428,19 @@ class PBSIE(InfoExtractor):
}, },
'expected_warnings': ['HTTP Error 403: Forbidden'], 'expected_warnings': ['HTTP Error 403: Forbidden'],
}, },
{
'url': 'https://www.pbssocal.org/shows/newshour/clip/capehart-johnson-1715984001',
'info_dict': {
'id': '3091549094',
'ext': 'mp4',
'title': 'PBS NewsHour - Capehart and Johnson on the unusual Biden-Trump debate plans',
'description': 'Capehart and Johnson on how the Biden-Trump debates could shape the campaign season',
'display_id': 'capehart-johnson-1715984001',
'duration': 593,
'thumbnail': 'https://image.pbs.org/video-assets/mF3oSVn-asset-mezzanine-16x9-QeXjXPy.jpg',
'chapters': [],
},
},
{ {
'url': 'http://player.pbs.org/widget/partnerplayer/2365297708/?start=0&end=0&chapterbar=false&endscreen=false&topbar=true', 'url': 'http://player.pbs.org/widget/partnerplayer/2365297708/?start=0&end=0&chapterbar=false&endscreen=false&topbar=true',
'only_matching': True, 'only_matching': True,
@ -463,10 +501,12 @@ class PBSIE(InfoExtractor):
r"div\s*:\s*'videoembed'\s*,\s*mediaid\s*:\s*'(\d+)'", # frontline video embed r"div\s*:\s*'videoembed'\s*,\s*mediaid\s*:\s*'(\d+)'", # frontline video embed
r'class="coveplayerid">([^<]+)<', # coveplayer r'class="coveplayerid">([^<]+)<', # coveplayer
r'<section[^>]+data-coveid="(\d+)"', # coveplayer from http://www.pbs.org/wgbh/frontline/film/real-csi/ r'<section[^>]+data-coveid="(\d+)"', # coveplayer from http://www.pbs.org/wgbh/frontline/film/real-csi/
r'\bclass="passportcoveplayer"[^>]+\bdata-media="(\d+)', # https://www.thirteen.org/programs/the-woodwrights-shop/who-wrote-the-book-of-sloyd-fggvvq/
r'<input type="hidden" id="pbs_video_id_[0-9]+" value="([0-9]+)"/>', # jwplayer r'<input type="hidden" id="pbs_video_id_[0-9]+" value="([0-9]+)"/>', # jwplayer
r"(?s)window\.PBS\.playerConfig\s*=\s*{.*?id\s*:\s*'([0-9]+)',", r"(?s)window\.PBS\.playerConfig\s*=\s*{.*?id\s*:\s*'([0-9]+)',",
r'<div[^>]+\bdata-cove-id=["\'](\d+)"', # http://www.pbs.org/wgbh/roadshow/watch/episode/2105-indianapolis-hour-2/ r'<div[^>]+\bdata-cove-id=["\'](\d+)"', # http://www.pbs.org/wgbh/roadshow/watch/episode/2105-indianapolis-hour-2/
r'<iframe[^>]+\bsrc=["\'](?:https?:)?//video\.pbs\.org/widget/partnerplayer/(\d+)', # https://www.pbs.org/wgbh/masterpiece/episodes/victoria-s2-e1/ r'<iframe[^>]+\bsrc=["\'](?:https?:)?//video\.pbs\.org/widget/partnerplayer/(\d+)', # https://www.pbs.org/wgbh/masterpiece/episodes/victoria-s2-e1/
r'\bhttps?://player\.pbs\.org/[\w-]+player/(\d+)', # last pattern to avoid false positives
] ]
media_id = self._search_regex( media_id = self._search_regex(

View File

@ -0,0 +1,99 @@
from .common import InfoExtractor
from ..utils import parse_iso8601, smuggle_url, unsmuggle_url, url_or_none
from ..utils.traversal import traverse_obj
class PiramideTVIE(InfoExtractor):
_VALID_URL = r'https?://piramide\.tv/video/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://piramide.tv/video/wWtBAORdJUTh',
'info_dict': {
'id': 'wWtBAORdJUTh',
'ext': 'mp4',
'title': 'md5:79f9c8183ea6a35c836923142cf0abcc',
'description': '',
'thumbnail': 'https://cdn.jwplayer.com/v2/media/W86PgQDn/thumbnails/B9gpIxkH.jpg',
'channel': 'León Picarón',
'channel_id': 'leonpicaron',
'timestamp': 1696460362,
'upload_date': '20231004',
},
}, {
'url': 'https://piramide.tv/video/wcYn6li79NgN',
'info_dict': {
'id': 'wcYn6li79NgN',
'ext': 'mp4',
'title': 'ACEPTO TENER UN BEBE CON MI NOVIA\u2026? | Parte 1',
'description': '',
'channel': 'ARTA GAME',
'channel_id': 'arta_game',
'thumbnail': 'https://cdn.jwplayer.com/v2/media/cnEdGp5X/thumbnails/rHAaWfP7.jpg',
'timestamp': 1703434976,
'upload_date': '20231224',
},
}]
def _extract_video(self, video_id):
video_data = self._download_json(
f'https://hermes.piramide.tv/video/data/{video_id}', video_id, fatal=False)
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
f'https://cdn.piramide.tv/video/{video_id}/manifest.m3u8', video_id, fatal=False)
next_video = traverse_obj(video_data, ('video', 'next_video', 'id', {str}))
return next_video, {
'id': video_id,
'formats': formats,
'subtitles': subtitles,
**traverse_obj(video_data, ('video', {
'id': ('id', {str}),
'title': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('media', 'thumbnail', {url_or_none}),
'channel': ('channel', 'name', {str}),
'channel_id': ('channel', 'id', {str}),
'timestamp': ('date', {parse_iso8601}),
})),
}
def _entries(self, video_id):
visited = set()
while True:
visited.add(video_id)
next_video, info = self._extract_video(video_id)
yield info
if not next_video or next_video in visited:
break
video_id = next_video
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
if self._yes_playlist(video_id, video_id, smuggled_data):
return self.playlist_result(self._entries(video_id), video_id)
return self._extract_video(video_id)[1]
class PiramideTVChannelIE(InfoExtractor):
_VALID_URL = r'https?://piramide\.tv/channel/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://piramide.tv/channel/thekalo',
'playlist_mincount': 10,
'info_dict': {
'id': 'thekalo',
},
}]
def _entries(self, channel_name):
videos = self._download_json(
f'https://hermes.piramide.tv/channel/list/{channel_name}/date/100000', channel_name)
for video in traverse_obj(videos, ('videos', lambda _, v: v['id'])):
yield self.url_result(smuggle_url(
f'https://piramide.tv/video/{video["id"]}', {'force_noplaylist': True}),
**traverse_obj(video, {
'id': ('id', {str}),
'title': ('title', {str}),
'description': ('description', {str}),
}))
def _real_extract(self, url):
channel_name = self._match_id(url)
return self.playlist_result(self._entries(channel_name), channel_name)

View File

@ -26,7 +26,7 @@ class PlVideoIE(InfoExtractor):
'comment_count': int, 'comment_count': int,
'tags': ['rusia', 'cuba', 'russia', 'miguel díaz-canel'], 'tags': ['rusia', 'cuba', 'russia', 'miguel díaz-canel'],
'description': 'md5:a1a395d900d77a86542a91ee0826c115', 'description': 'md5:a1a395d900d77a86542a91ee0826c115',
'released_timestamp': 1715096124, 'release_timestamp': 1715096124,
'channel_is_verified': True, 'channel_is_verified': True,
'like_count': int, 'like_count': int,
'timestamp': 1715095911, 'timestamp': 1715095911,
@ -62,7 +62,7 @@ class PlVideoIE(InfoExtractor):
'title': 'Белоусов отменил приказы о кадровом резерве на гражданской службе', 'title': 'Белоусов отменил приказы о кадровом резерве на гражданской службе',
'channel_follower_count': int, 'channel_follower_count': int,
'view_count': int, 'view_count': int,
'released_timestamp': 1732961458, 'release_timestamp': 1732961458,
}, },
}] }]
@ -119,7 +119,7 @@ class PlVideoIE(InfoExtractor):
'channel_is_verified': ('channel', 'verified', {bool}), 'channel_is_verified': ('channel', 'verified', {bool}),
'tags': ('tags', ..., {str}), 'tags': ('tags', ..., {str}),
'timestamp': ('createdAt', {parse_iso8601}), 'timestamp': ('createdAt', {parse_iso8601}),
'released_timestamp': ('publishedAt', {parse_iso8601}), 'release_timestamp': ('publishedAt', {parse_iso8601}),
'modified_timestamp': ('updatedAt', {parse_iso8601}), 'modified_timestamp': ('updatedAt', {parse_iso8601}),
'view_count': ('stats', 'viewTotalCount', {int_or_none}), 'view_count': ('stats', 'viewTotalCount', {int_or_none}),
'like_count': ('stats', 'likeCount', {int_or_none}), 'like_count': ('stats', 'likeCount', {int_or_none}),

View File

@ -114,7 +114,7 @@ class RedGifsBaseInfoExtractor(InfoExtractor):
class RedGifsIE(RedGifsBaseInfoExtractor): class RedGifsIE(RedGifsBaseInfoExtractor):
_VALID_URL = r'https?://(?:(?:www\.)?redgifs\.com/watch/|thumbs2\.redgifs\.com/)(?P<id>[^-/?#\.]+)' _VALID_URL = r'https?://(?:(?:www\.)?redgifs\.com/(?:watch|ifr)/|thumbs2\.redgifs\.com/)(?P<id>[^-/?#\.]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.redgifs.com/watch/squeakyhelplesswisent', 'url': 'https://www.redgifs.com/watch/squeakyhelplesswisent',
'info_dict': { 'info_dict': {
@ -147,6 +147,22 @@ class RedGifsIE(RedGifsBaseInfoExtractor):
'age_limit': 18, 'age_limit': 18,
'tags': list, 'tags': list,
}, },
}, {
'url': 'https://www.redgifs.com/ifr/squeakyhelplesswisent',
'info_dict': {
'id': 'squeakyhelplesswisent',
'ext': 'mp4',
'title': 'Hotwife Legs Thick',
'timestamp': 1636287915,
'upload_date': '20211107',
'uploader': 'ignored52',
'duration': 16,
'view_count': int,
'like_count': int,
'categories': list,
'age_limit': 18,
'tags': list,
},
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -176,6 +176,8 @@ class RTVSLOShowIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': '173250997', 'id': '173250997',
'title': 'Ekipa Bled', 'title': 'Ekipa Bled',
'description': 'md5:c88471e27a1268c448747a5325319ab7',
'thumbnail': 'https://img.rtvcdn.si/_up/ava/ava_misc/show_logos/173250997/logo_wide1.jpg',
}, },
'playlist_count': 18, 'playlist_count': 18,
}] }]
@ -187,4 +189,7 @@ class RTVSLOShowIE(InfoExtractor):
return self.playlist_from_matches( return self.playlist_from_matches(
re.findall(r'<a [^>]*\bhref="(/arhiv/[^"]+)"', webpage), re.findall(r'<a [^>]*\bhref="(/arhiv/[^"]+)"', webpage),
playlist_id, self._html_extract_title(webpage), playlist_id, self._html_extract_title(webpage),
getter=urljoin('https://365.rtvslo.si'), ie=RTVSLOIE) getter=urljoin('https://365.rtvslo.si'), ie=RTVSLOIE,
description=self._og_search_description(webpage),
thumbnail=self._og_search_thumbnail(webpage),
)

View File

@ -4,43 +4,12 @@ import urllib.parse
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import ( from ..utils import (
ExtractorError, ExtractorError,
parse_qs, UnsupportedError,
unsmuggle_url, make_archive_id,
remove_end,
url_or_none,
) )
from ..utils.traversal import traverse_obj
_COMMITTEES = {
'ag': ('76440', 'http://ag-f.akamaihd.net'),
'aging': ('76442', 'http://aging-f.akamaihd.net'),
'approps': ('76441', 'http://approps-f.akamaihd.net'),
'arch': ('', 'http://ussenate-f.akamaihd.net'),
'armed': ('76445', 'http://armed-f.akamaihd.net'),
'banking': ('76446', 'http://banking-f.akamaihd.net'),
'budget': ('76447', 'http://budget-f.akamaihd.net'),
'cecc': ('76486', 'http://srs-f.akamaihd.net'),
'commerce': ('80177', 'http://commerce1-f.akamaihd.net'),
'csce': ('75229', 'http://srs-f.akamaihd.net'),
'dpc': ('76590', 'http://dpc-f.akamaihd.net'),
'energy': ('76448', 'http://energy-f.akamaihd.net'),
'epw': ('76478', 'http://epw-f.akamaihd.net'),
'ethics': ('76449', 'http://ethics-f.akamaihd.net'),
'finance': ('76450', 'http://finance-f.akamaihd.net'),
'foreign': ('76451', 'http://foreign-f.akamaihd.net'),
'govtaff': ('76453', 'http://govtaff-f.akamaihd.net'),
'help': ('76452', 'http://help-f.akamaihd.net'),
'indian': ('76455', 'http://indian-f.akamaihd.net'),
'intel': ('76456', 'http://intel-f.akamaihd.net'),
'intlnarc': ('76457', 'http://intlnarc-f.akamaihd.net'),
'jccic': ('85180', 'http://jccic-f.akamaihd.net'),
'jec': ('76458', 'http://jec-f.akamaihd.net'),
'judiciary': ('76459', 'http://judiciary-f.akamaihd.net'),
'rpc': ('76591', 'http://rpc-f.akamaihd.net'),
'rules': ('76460', 'http://rules-f.akamaihd.net'),
'saa': ('76489', 'http://srs-f.akamaihd.net'),
'smbiz': ('76461', 'http://smbiz-f.akamaihd.net'),
'srs': ('75229', 'http://srs-f.akamaihd.net'),
'uscc': ('76487', 'http://srs-f.akamaihd.net'),
'vetaff': ('76462', 'http://vetaff-f.akamaihd.net'),
}
class SenateISVPIE(InfoExtractor): class SenateISVPIE(InfoExtractor):
@ -53,31 +22,46 @@ class SenateISVPIE(InfoExtractor):
'info_dict': { 'info_dict': {
'id': 'judiciary031715', 'id': 'judiciary031715',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Integrated Senate Video Player', 'title': 'ISVP',
'thumbnail': r're:^https?://.*\.(?:jpg|png)$', 'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
'_old_archive_ids': ['senategov judiciary031715'],
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'expected_warnings': ['Failed to download m3u8 information'],
}, { }, {
'url': 'http://www.senate.gov/isvp/?type=live&comm=commerce&filename=commerce011514.mp4&auto_play=false', 'url': 'http://www.senate.gov/isvp/?type=live&comm=commerce&filename=commerce011514.mp4&auto_play=false',
'info_dict': { 'info_dict': {
'id': 'commerce011514', 'id': 'commerce011514',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Integrated Senate Video Player', 'title': 'Integrated Senate Video Player',
'_old_archive_ids': ['senategov commerce011514'],
}, },
'params': { 'params': {
# m3u8 download # m3u8 download
'skip_download': True, 'skip_download': True,
}, },
'skip': 'This video is not available.',
}, { }, {
'url': 'http://www.senate.gov/isvp/?type=arch&comm=intel&filename=intel090613&hc_location=ufi', 'url': 'http://www.senate.gov/isvp/?type=arch&comm=intel&filename=intel090613&hc_location=ufi',
# checksum differs each time # checksum differs each time
'info_dict': { 'info_dict': {
'id': 'intel090613', 'id': 'intel090613',
'ext': 'mp4', 'ext': 'mp4',
'title': 'Integrated Senate Video Player', 'title': 'ISVP',
'_old_archive_ids': ['senategov intel090613'],
},
'expected_warnings': ['Failed to download m3u8 information'],
}, {
'url': 'https://www.senate.gov/isvp/?auto_play=false&comm=help&filename=help090920&poster=https://www.help.senate.gov/assets/images/video-poster.png&stt=950',
'info_dict': {
'id': 'help090920',
'ext': 'mp4',
'title': 'ISVP',
'thumbnail': 'https://www.help.senate.gov/assets/images/video-poster.png',
'_old_archive_ids': ['senategov help090920'],
}, },
}, { }, {
# From http://www.c-span.org/video/?96791-1 # From http://www.c-span.org/video/?96791-1
@ -85,60 +69,81 @@ class SenateISVPIE(InfoExtractor):
'only_matching': True, 'only_matching': True,
}] }]
_COMMITTEES = {
'ag': ('76440', 'https://ag-f.akamaihd.net', '2036803', 'agriculture'),
'aging': ('76442', 'https://aging-f.akamaihd.net', '2036801', 'aging'),
'approps': ('76441', 'https://approps-f.akamaihd.net', '2036802', 'appropriations'),
'arch': ('', 'https://ussenate-f.akamaihd.net', '', 'arch'),
'armed': ('76445', 'https://armed-f.akamaihd.net', '2036800', 'armedservices'),
'banking': ('76446', 'https://banking-f.akamaihd.net', '2036799', 'banking'),
'budget': ('76447', 'https://budget-f.akamaihd.net', '2036798', 'budget'),
'cecc': ('76486', 'https://srs-f.akamaihd.net', '2036782', 'srs_cecc'),
'commerce': ('80177', 'https://commerce1-f.akamaihd.net', '2036779', 'commerce'),
'csce': ('75229', 'https://srs-f.akamaihd.net', '2036777', 'srs_srs'),
'dpc': ('76590', 'https://dpc-f.akamaihd.net', '', 'dpc'),
'energy': ('76448', 'https://energy-f.akamaihd.net', '2036797', 'energy'),
'epw': ('76478', 'https://epw-f.akamaihd.net', '2036783', 'environment'),
'ethics': ('76449', 'https://ethics-f.akamaihd.net', '2036796', 'ethics'),
'finance': ('76450', 'https://finance-f.akamaihd.net', '2036795', 'finance_finance'),
'foreign': ('76451', 'https://foreign-f.akamaihd.net', '2036794', 'foreignrelations'),
'govtaff': ('76453', 'https://govtaff-f.akamaihd.net', '2036792', 'hsgac'),
'help': ('76452', 'https://help-f.akamaihd.net', '2036793', 'help'),
'indian': ('76455', 'https://indian-f.akamaihd.net', '2036791', 'indianaffairs'),
'intel': ('76456', 'https://intel-f.akamaihd.net', '2036790', 'intelligence'),
'intlnarc': ('76457', 'https://intlnarc-f.akamaihd.net', '', 'internationalnarcoticscaucus'),
'jccic': ('85180', 'https://jccic-f.akamaihd.net', '2036778', 'jccic'),
'jec': ('76458', 'https://jec-f.akamaihd.net', '2036789', 'jointeconomic'),
'judiciary': ('76459', 'https://judiciary-f.akamaihd.net', '2036788', 'judiciary'),
'rpc': ('76591', 'https://rpc-f.akamaihd.net', '', 'rpc'),
'rules': ('76460', 'https://rules-f.akamaihd.net', '2036787', 'rules'),
'saa': ('76489', 'https://srs-f.akamaihd.net', '2036780', 'srs_saa'),
'smbiz': ('76461', 'https://smbiz-f.akamaihd.net', '2036786', 'smallbusiness'),
'srs': ('75229', 'https://srs-f.akamaihd.net', '2031966', 'srs_srs'),
'uscc': ('76487', 'https://srs-f.akamaihd.net', '2036781', 'srs_uscc'),
'vetaff': ('76462', 'https://vetaff-f.akamaihd.net', '2036785', 'veteransaffairs'),
}
def _real_extract(self, url): def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
qs = urllib.parse.parse_qs(self._match_valid_url(url).group('qs')) qs = urllib.parse.parse_qs(self._match_valid_url(url).group('qs'))
if not qs.get('filename') or not qs.get('type') or not qs.get('comm'): if not qs.get('filename') or not qs.get('comm'):
raise ExtractorError('Invalid URL', expected=True) raise ExtractorError('Invalid URL', expected=True)
filename = qs['filename'][0]
video_id = re.sub(r'.mp4$', '', qs['filename'][0]) video_id = remove_end(filename, '.mp4')
webpage = self._download_webpage(url, video_id) webpage = self._download_webpage(url, video_id)
committee = qs['comm'][0]
if smuggled_data.get('force_title'): stream_num, stream_domain, stream_id, msl3 = self._COMMITTEES[committee]
title = smuggled_data['force_title']
else:
title = self._html_extract_title(webpage)
poster = qs.get('poster')
thumbnail = poster[0] if poster else None
video_type = qs['type'][0]
committee = video_type if video_type == 'arch' else qs['comm'][0]
stream_num, domain = _COMMITTEES[committee]
urls_alternatives = [f'https://www-senate-gov-media-srs.akamaized.net/hls/live/{stream_id}/{committee}/{filename}/master.m3u8',
f'https://www-senate-gov-msl3archive.akamaized.net/{msl3}/{filename}_1/master.m3u8',
f'{stream_domain}/i/{filename}_1@{stream_num}/master.m3u8',
f'{stream_domain}/i/{filename}.mp4/master.m3u8']
formats = [] formats = []
if video_type == 'arch': subtitles = {}
filename = video_id if '.' in video_id else video_id + '.mp4' for video_url in urls_alternatives:
m3u8_url = urllib.parse.urljoin(domain, 'i/' + filename + '/master.m3u8') formats, subtitles = self._extract_m3u8_formats_and_subtitles(video_url, video_id, ext='mp4', fatal=False)
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', m3u8_id='m3u8') if formats:
else: break
hdcore_sign = 'hdcore=3.1.0'
url_params = (domain, video_id, stream_num)
f4m_url = f'%s/z/%s_1@%s/manifest.f4m?{hdcore_sign}' % url_params
m3u8_url = '{}/i/{}_1@{}/master.m3u8'.format(*url_params)
for entry in self._extract_f4m_formats(f4m_url, video_id, f4m_id='f4m'):
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
formats.append(entry)
for entry in self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4', m3u8_id='m3u8'):
mobj = re.search(r'(?P<tag>(?:-p|-b)).m3u8', entry['url'])
if mobj:
entry['format_id'] += mobj.group('tag')
formats.append(entry)
return { return {
'id': video_id, 'id': video_id,
'title': title, 'title': self._html_extract_title(webpage),
'formats': formats, 'formats': formats,
'thumbnail': thumbnail, 'subtitles': subtitles,
'thumbnail': traverse_obj(qs, ('poster', 0, {url_or_none})),
'_old_archive_ids': [make_archive_id(SenateGovIE, video_id)],
} }
class SenateGovIE(InfoExtractor): class SenateGovIE(InfoExtractor):
_IE_NAME = 'senate.gov' _IE_NAME = 'senate.gov'
_VALID_URL = r'https?:\/\/(?:www\.)?(help|appropriations|judiciary|banking|armed-services|finance)\.senate\.gov' _SUBDOMAIN_RE = '|'.join(map(re.escape, (
'agriculture', 'aging', 'appropriations', 'armed-services', 'banking',
'budget', 'commerce', 'energy', 'epw', 'finance', 'foreign', 'help',
'intelligence', 'inaugural', 'judiciary', 'rules', 'sbc', 'veterans',
)))
_VALID_URL = rf'https?://(?:www\.)?(?:{_SUBDOMAIN_RE})\.senate\.gov'
_TESTS = [{ _TESTS = [{
'url': 'https://www.help.senate.gov/hearings/vaccines-saving-lives-ensuring-confidence-and-protecting-public-health', 'url': 'https://www.help.senate.gov/hearings/vaccines-saving-lives-ensuring-confidence-and-protecting-public-health',
'info_dict': { 'info_dict': {
@ -147,6 +152,9 @@ class SenateGovIE(InfoExtractor):
'title': 'Vaccines: Saving Lives, Ensuring Confidence, and Protecting Public Health', 'title': 'Vaccines: Saving Lives, Ensuring Confidence, and Protecting Public Health',
'description': 'The U.S. Senate Committee on Health, Education, Labor & Pensions', 'description': 'The U.S. Senate Committee on Health, Education, Labor & Pensions',
'ext': 'mp4', 'ext': 'mp4',
'age_limit': 0,
'thumbnail': 'https://www.help.senate.gov/assets/images/sharelogo.jpg',
'_old_archive_ids': ['senategov help090920'],
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}, { }, {
@ -156,8 +164,12 @@ class SenateGovIE(InfoExtractor):
'display_id': 'watch?hearingid=B8A25434-5056-A066-6020-1F68CB75F0CD', 'display_id': 'watch?hearingid=B8A25434-5056-A066-6020-1F68CB75F0CD',
'title': 'Review of the FY2019 Budget Request for the U.S. Army', 'title': 'Review of the FY2019 Budget Request for the U.S. Army',
'ext': 'mp4', 'ext': 'mp4',
'age_limit': 0,
'thumbnail': 'https://www.appropriations.senate.gov/themes/appropriations/images/video-poster-flash-fit.png',
'_old_archive_ids': ['senategov appropsA051518'],
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
'expected_warnings': ['Failed to download m3u8 information'],
}, { }, {
'url': 'https://www.banking.senate.gov/hearings/21st-century-communities-public-transportation-infrastructure-investment-and-fast-act-reauthorization', 'url': 'https://www.banking.senate.gov/hearings/21st-century-communities-public-transportation-infrastructure-investment-and-fast-act-reauthorization',
'info_dict': { 'info_dict': {
@ -166,32 +178,65 @@ class SenateGovIE(InfoExtractor):
'title': '21st Century Communities: Public Transportation Infrastructure Investment and FAST Act Reauthorization', 'title': '21st Century Communities: Public Transportation Infrastructure Investment and FAST Act Reauthorization',
'description': 'The Official website of The United States Committee on Banking, Housing, and Urban Affairs', 'description': 'The Official website of The United States Committee on Banking, Housing, and Urban Affairs',
'ext': 'mp4', 'ext': 'mp4',
'thumbnail': 'https://www.banking.senate.gov/themes/banking/images/sharelogo.jpg',
'age_limit': 0,
'_old_archive_ids': ['senategov banking041521'],
}, },
'params': {'skip_download': 'm3u8'}, 'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.agriculture.senate.gov/hearings/hemp-production-and-the-2018-farm-bill',
'only_matching': True,
}, {
'url': 'https://www.aging.senate.gov/hearings/the-older-americans-act-the-local-impact-of-the-law-and-the-upcoming-reauthorization',
'only_matching': True,
}, {
'url': 'https://www.budget.senate.gov/hearings/improving-care-lowering-costs-achieving-health-care-efficiency',
'only_matching': True,
}, {
'url': 'https://www.commerce.senate.gov/2024/12/communications-networks-safety-and-security',
'only_matching': True,
}, {
'url': 'https://www.energy.senate.gov/hearings/2024/2/full-committee-hearing-to-examine',
'only_matching': True,
}, {
'url': 'https://www.epw.senate.gov/public/index.cfm/hearings?ID=F63083EA-2C13-498C-B548-341BED68C209',
'only_matching': True,
}, {
'url': 'https://www.foreign.senate.gov/hearings/american-diplomacy-and-global-leadership-review-of-the-fy25-state-department-budget-request',
'only_matching': True,
}, {
'url': 'https://www.intelligence.senate.gov/hearings/foreign-threats-elections-2024-%E2%80%93-roles-and-responsibilities-us-tech-providers',
'only_matching': True,
}, {
'url': 'https://www.inaugural.senate.gov/52nd-inaugural-ceremonies/',
'only_matching': True,
}, {
'url': 'https://www.rules.senate.gov/hearings/02/07/2023/business-meeting',
'only_matching': True,
}, {
'url': 'https://www.sbc.senate.gov/public/index.cfm/hearings?ID=5B13AA6B-8279-45AF-B54B-94156DC7A2AB',
'only_matching': True,
}, {
'url': 'https://www.veterans.senate.gov/2024/5/frontier-health-care-ensuring-veterans-access-no-matter-where-they-live',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
display_id = self._generic_id(url) display_id = self._generic_id(url)
webpage = self._download_webpage(url, display_id) webpage = self._download_webpage(url, display_id)
parse_info = parse_qs(self._search_regex( url_info = next(SenateISVPIE.extract_from_webpage(self._downloader, url, webpage), None)
r'<iframe class="[^>"]*streaminghearing[^>"]*"\s[^>]*\bsrc="([^">]*)', webpage, 'hearing URL')) if not url_info:
raise UnsupportedError(url)
stream_num, stream_domain = _COMMITTEES[parse_info['comm'][-1]]
filename = parse_info['filename'][-1]
formats = self._extract_m3u8_formats(
f'{stream_domain}/i/{filename}_1@{stream_num}/master.m3u8',
display_id, ext='mp4')
title = self._html_search_regex( title = self._html_search_regex(
(*self._og_regexes('title'), r'(?s)<title>([^<]*?)</title>'), webpage, 'video title') (*self._og_regexes('title'), r'(?s)<title>([^<]*?)</title>'), webpage, 'video title', fatal=False)
return { return {
'id': re.sub(r'.mp4$', '', filename), **url_info,
'_type': 'url_transparent',
'display_id': display_id, 'display_id': display_id,
'title': re.sub(r'\s+', ' ', title.split('|')[0]).strip(), 'title': re.sub(r'\s+', ' ', title.split('|')[0]).strip(),
'description': self._og_search_description(webpage, default=None), 'description': self._og_search_description(webpage, default=None),
'thumbnail': self._og_search_thumbnail(webpage, default=None), 'thumbnail': self._og_search_thumbnail(webpage, default=None),
'age_limit': self._rta_search(webpage), 'age_limit': self._rta_search(webpage),
'formats': formats,
} }

View File

@ -361,6 +361,7 @@ class SoundcloudBaseIE(InfoExtractor):
'uploader_url': user.get('permalink_url'), 'uploader_url': user.get('permalink_url'),
'timestamp': unified_timestamp(info.get('created_at')), 'timestamp': unified_timestamp(info.get('created_at')),
'title': info.get('title'), 'title': info.get('title'),
'track': info.get('title'),
'description': info.get('description'), 'description': info.get('description'),
'thumbnails': thumbnails, 'thumbnails': thumbnails,
'duration': float_or_none(info.get('duration'), 1000), 'duration': float_or_none(info.get('duration'), 1000),
@ -393,7 +394,7 @@ class SoundcloudIE(SoundcloudBaseIE):
(?:(?:(?:www\.|m\.)?soundcloud\.com/ (?:(?:(?:www\.|m\.)?soundcloud\.com/
(?!stations/track) (?!stations/track)
(?P<uploader>[\w\d-]+)/ (?P<uploader>[\w\d-]+)/
(?!(?:tracks|albums|sets(?:/.+?)?|reposts|likes|spotlight)/?(?:$|[?#])) (?!(?:tracks|albums|sets(?:/.+?)?|reposts|likes|spotlight|comments)/?(?:$|[?#]))
(?P<title>[\w\d-]+) (?P<title>[\w\d-]+)
(?:/(?P<token>(?!(?:albums|sets|recommended))[^?]+?))? (?:/(?P<token>(?!(?:albums|sets|recommended))[^?]+?))?
(?:[?].*)?$) (?:[?].*)?$)
@ -410,6 +411,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '62986583', 'id': '62986583',
'ext': 'opus', 'ext': 'opus',
'title': 'Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1', 'title': 'Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1',
'track': 'Lostin Powers - She so Heavy (SneakPreview) Adrian Ackers Blueprint 1',
'description': 'No Downloads untill we record the finished version this weekend, i was too pumped n i had to post it , earl is prolly gonna b hella p.o\'d', 'description': 'No Downloads untill we record the finished version this weekend, i was too pumped n i had to post it , earl is prolly gonna b hella p.o\'d',
'uploader': 'E.T. ExTerrestrial Music', 'uploader': 'E.T. ExTerrestrial Music',
'uploader_id': '1571244', 'uploader_id': '1571244',
@ -432,6 +434,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '47127627', 'id': '47127627',
'ext': 'opus', 'ext': 'opus',
'title': 'Goldrushed', 'title': 'Goldrushed',
'track': 'Goldrushed',
'description': 'From Stockholm Sweden\r\nPovel / Magnus / Filip / David\r\nwww.theroyalconcept.com', 'description': 'From Stockholm Sweden\r\nPovel / Magnus / Filip / David\r\nwww.theroyalconcept.com',
'uploader': 'The Royal Concept', 'uploader': 'The Royal Concept',
'uploader_id': '9615865', 'uploader_id': '9615865',
@ -457,6 +460,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '123998367', 'id': '123998367',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Youtube - Dl Test Video \'\' Ä↭', 'title': 'Youtube - Dl Test Video \'\' Ä↭',
'track': 'Youtube - Dl Test Video \'\' Ä↭',
'description': 'test chars: "\'/\\ä↭', 'description': 'test chars: "\'/\\ä↭',
'uploader': 'jaimeMF', 'uploader': 'jaimeMF',
'uploader_id': '69767071', 'uploader_id': '69767071',
@ -481,6 +485,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '123998367', 'id': '123998367',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Youtube - Dl Test Video \'\' Ä↭', 'title': 'Youtube - Dl Test Video \'\' Ä↭',
'track': 'Youtube - Dl Test Video \'\' Ä↭',
'description': 'test chars: "\'/\\ä↭', 'description': 'test chars: "\'/\\ä↭',
'uploader': 'jaimeMF', 'uploader': 'jaimeMF',
'uploader_id': '69767071', 'uploader_id': '69767071',
@ -505,6 +510,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '343609555', 'id': '343609555',
'ext': 'wav', 'ext': 'wav',
'title': 'The Following', 'title': 'The Following',
'track': 'The Following',
'description': '', 'description': '',
'uploader': '80M', 'uploader': '80M',
'uploader_id': '312384765', 'uploader_id': '312384765',
@ -530,6 +536,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '340344461', 'id': '340344461',
'ext': 'wav', 'ext': 'wav',
'title': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]', 'title': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
'track': 'Uplifting Only 238 [No Talking] (incl. Alex Feed Guestmix) (Aug 31, 2017) [wav]',
'description': 'md5:fa20ee0fca76a3d6df8c7e57f3715366', 'description': 'md5:fa20ee0fca76a3d6df8c7e57f3715366',
'uploader': 'Ori Uplift Music', 'uploader': 'Ori Uplift Music',
'uploader_id': '12563093', 'uploader_id': '12563093',
@ -555,6 +562,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '309699954', 'id': '309699954',
'ext': 'mp3', 'ext': 'mp3',
'title': 'Sideways (Prod. Mad Real)', 'title': 'Sideways (Prod. Mad Real)',
'track': 'Sideways (Prod. Mad Real)',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e', 'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'uploader': 'garyvee', 'uploader': 'garyvee',
'uploader_id': '2366352', 'uploader_id': '2366352',
@ -581,6 +589,7 @@ class SoundcloudIE(SoundcloudBaseIE):
'id': '583011102', 'id': '583011102',
'ext': 'opus', 'ext': 'opus',
'title': 'Mezzo Valzer', 'title': 'Mezzo Valzer',
'track': 'Mezzo Valzer',
'description': 'md5:f4d5f39d52e0ccc2b4f665326428901a', 'description': 'md5:f4d5f39d52e0ccc2b4f665326428901a',
'uploader': 'Giovanni Sarani', 'uploader': 'Giovanni Sarani',
'uploader_id': '3352531', 'uploader_id': '3352531',
@ -656,6 +665,11 @@ class SoundcloudPlaylistBaseIE(SoundcloudBaseIE):
'playlistId': playlist_id, 'playlistId': playlist_id,
'playlistSecretToken': token, 'playlistSecretToken': token,
}, headers=self._HEADERS) }, headers=self._HEADERS)
album_info = traverse_obj(playlist, {
'album': ('title', {str}),
'album_artist': ('user', 'username', {str}),
'album_type': ('set_type', {str}, {lambda x: x or 'playlist'}),
})
entries = [] entries = []
for track in tracks: for track in tracks:
track_id = str_or_none(track.get('id')) track_id = str_or_none(track.get('id'))
@ -667,11 +681,17 @@ class SoundcloudPlaylistBaseIE(SoundcloudBaseIE):
if token: if token:
url += '?secret_token=' + token url += '?secret_token=' + token
entries.append(self.url_result( entries.append(self.url_result(
url, SoundcloudIE.ie_key(), track_id)) url, SoundcloudIE.ie_key(), track_id, url_transparent=True, **album_info))
return self.playlist_result( return self.playlist_result(
entries, playlist_id, entries, playlist_id,
playlist.get('title'), playlist.get('title'),
playlist.get('description')) playlist.get('description'),
**album_info,
**traverse_obj(playlist, {
'uploader': ('user', 'username', {str}),
'uploader_id': ('user', 'id', {str_or_none}),
}),
)
class SoundcloudSetIE(SoundcloudPlaylistBaseIE): class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
@ -683,6 +703,11 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
'id': '2284613', 'id': '2284613',
'title': 'The Royal Concept EP', 'title': 'The Royal Concept EP',
'description': 'md5:71d07087c7a449e8941a70a29e34671e', 'description': 'md5:71d07087c7a449e8941a70a29e34671e',
'uploader': 'The Royal Concept',
'uploader_id': '9615865',
'album': 'The Royal Concept EP',
'album_artists': ['The Royal Concept'],
'album_type': 'ep',
}, },
'playlist_mincount': 5, 'playlist_mincount': 5,
}, { }, {
@ -776,7 +801,7 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
(?:(?:www|m)\.)?soundcloud\.com/ (?:(?:www|m)\.)?soundcloud\.com/
(?P<user>[^/]+) (?P<user>[^/]+)
(?:/ (?:/
(?P<rsrc>tracks|albums|sets|reposts|likes|spotlight) (?P<rsrc>tracks|albums|sets|reposts|likes|spotlight|comments)
)? )?
/?(?:[?#].*)?$ /?(?:[?#].*)?$
''' '''
@ -830,6 +855,13 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
'title': 'Grynpyret (Spotlight)', 'title': 'Grynpyret (Spotlight)',
}, },
'playlist_mincount': 1, 'playlist_mincount': 1,
}, {
'url': 'https://soundcloud.com/one-thousand-and-one/comments',
'info_dict': {
'id': '992430331',
'title': '7x11x13-testing (Comments)',
},
'playlist_mincount': 1,
}] }]
_BASE_URL_MAP = { _BASE_URL_MAP = {
@ -840,6 +872,7 @@ class SoundcloudUserIE(SoundcloudPagedPlaylistBaseIE):
'reposts': 'stream/users/%s/reposts', 'reposts': 'stream/users/%s/reposts',
'likes': 'users/%s/likes', 'likes': 'users/%s/likes',
'spotlight': 'users/%s/spotlight', 'spotlight': 'users/%s/spotlight',
'comments': 'users/%s/comments',
} }
def _real_extract(self, url): def _real_extract(self, url):
@ -960,6 +993,11 @@ class SoundcloudPlaylistIE(SoundcloudPlaylistBaseIE):
'id': '4110309', 'id': '4110309',
'title': 'TILT Brass - Bowery Poetry Club, August \'03 [Non-Site SCR 02]', 'title': 'TILT Brass - Bowery Poetry Club, August \'03 [Non-Site SCR 02]',
'description': 're:.*?TILT Brass - Bowery Poetry Club', 'description': 're:.*?TILT Brass - Bowery Poetry Club',
'uploader': 'Non-Site Records',
'uploader_id': '33660914',
'album_artists': ['Non-Site Records'],
'album_type': 'playlist',
'album': 'TILT Brass - Bowery Poetry Club, August \'03 [Non-Site SCR 02]',
}, },
'playlist_count': 6, 'playlist_count': 6,
}] }]

View File

@ -207,7 +207,7 @@ class TheaterComplexTownVODIE(TheaterComplexTownBaseIE):
class TheaterComplexTownPPVIE(TheaterComplexTownBaseIE): class TheaterComplexTownPPVIE(TheaterComplexTownBaseIE):
_VALID_URL = r'https?://(?:www\.)?theater-complex\.town/(?:(?:en|ja)/)?ppv/(?P<id>\w+)' _VALID_URL = r'https?://(?:www\.)?theater-complex\.town/(?:(?:en|ja)/)?(?:ppv|live)/(?P<id>\w+)'
IE_NAME = 'theatercomplextown:ppv' IE_NAME = 'theatercomplextown:ppv'
_TESTS = [{ _TESTS = [{
'url': 'https://www.theater-complex.town/ppv/wytW3X7khrjJBUpKuV3jen', 'url': 'https://www.theater-complex.town/ppv/wytW3X7khrjJBUpKuV3jen',
@ -229,6 +229,9 @@ class TheaterComplexTownPPVIE(TheaterComplexTownBaseIE):
}, { }, {
'url': 'https://www.theater-complex.town/ja/ppv/qwUVmLmGEiZ3ZW6it9uGys', 'url': 'https://www.theater-complex.town/ja/ppv/qwUVmLmGEiZ3ZW6it9uGys',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.theater-complex.town/en/live/79akNM7bJeD5Fi9EP39aDp',
'only_matching': True,
}] }]
_API_PATH = 'events' _API_PATH = 'events'

View File

@ -0,0 +1,199 @@
import functools
import math
from .common import InfoExtractor
from ..utils import (
InAdvancePagedList,
int_or_none,
parse_iso8601,
try_call,
url_or_none,
)
from ..utils.traversal import traverse_obj
class SubsplashBaseIE(InfoExtractor):
def _get_headers(self, url, display_id):
token = try_call(lambda: self._get_cookies(url)['ss-token-guest'].value)
if not token:
webpage, urlh = self._download_webpage_handle(url, display_id)
token = (
try_call(lambda: self._get_cookies(url)['ss-token-guest'].value)
or urlh.get_header('x-api-token')
or self._search_json(
r'<script[^>]+\bid="shoebox-tokens"[^>]*>', webpage, 'shoebox tokens',
display_id, default={}).get('apiToken')
or self._search_regex(r'\\"tokens\\":{\\"guest\\":\\"([A-Za-z0-9._-]+)\\"', webpage, 'token', default=None))
if not token:
self.report_warning('Unable to extract auth token')
return None
return {'Authorization': f'Bearer {token}'}
def _extract_video(self, data, video_id):
formats = []
video_data = traverse_obj(data, ('_embedded', 'video', '_embedded', {dict}))
m3u8_url = traverse_obj(video_data, ('playlists', 0, '_links', 'related', 'href', {url_or_none}))
if m3u8_url:
formats.extend(self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
mp4_entry = traverse_obj(video_data, ('video-outputs', lambda _, v: url_or_none(v['_links']['related']['href']), any))
if mp4_entry:
formats.append({
'url': mp4_entry['_links']['related']['href'],
'format_id': 'direct',
'quality': 1,
**traverse_obj(mp4_entry, {
'height': ('height', {int_or_none}),
'width': ('width', {int_or_none}),
'filesize': ('file_size', {int_or_none}),
}),
})
return {
'id': video_id,
'formats': formats,
**traverse_obj(data, {
'title': ('title', {str}),
'description': ('summary_text', {str}),
'thumbnail': ('_embedded', 'images', 0, '_links', 'related', 'href', {url_or_none}),
'duration': ('_embedded', 'video', 'duration', {int_or_none(scale=1000)}),
'timestamp': ('date', {parse_iso8601}),
'release_timestamp': ('published_at', {parse_iso8601}),
'modified_timestamp': ('updated_at', {parse_iso8601}),
}),
}
class SubsplashIE(SubsplashBaseIE):
_VALID_URL = [
r'https?://(?:www\.)?subsplash\.com/(?:u/)?[^/?#]+/[^/?#]+/(?:d/|mi/\+)(?P<id>\w+)',
r'https?://(?:\w+\.)?subspla\.sh/(?P<id>\w+)',
]
_TESTS = [{
'url': 'https://subsplash.com/u/skywatchtv/media/d/5whnx5s-the-grand-delusion-taking-place-right-now',
'md5': 'd468729814e533cec86f1da505dec82d',
'info_dict': {
'id': '5whnx5s',
'ext': 'mp4',
'title': 'THE GRAND DELUSION TAKING PLACE RIGHT NOW!',
'description': 'md5:220a630865c3697b0ec9dcb3a70cbc33',
'upload_date': '20240901',
'duration': 1710,
'thumbnail': r're:https?://.*\.(?:jpg|png)$',
'modified_date': '20240901',
'release_date': '20240901',
'release_timestamp': 1725195600,
'timestamp': 1725148800,
'modified_timestamp': 1725195657,
},
}, {
'url': 'https://subsplash.com/u/prophecywatchers/media/d/n4dr8b2-the-transhumanist-plan-for-humanity-billy-crone',
'md5': '01982d58021af81c969958459bd81f13',
'info_dict': {
'id': 'n4dr8b2',
'ext': 'mp4',
'title': 'The Transhumanist Plan for Humanity | Billy Crone',
'upload_date': '20240903',
'duration': 1709,
'thumbnail': r're:https?://.*\.(?:jpg|png)$',
'timestamp': 1725321600,
'modified_date': '20241010',
'release_date': '20240903',
'release_timestamp': 1725379200,
'modified_timestamp': 1728577804,
},
}, {
'url': 'https://subsplash.com/laiglesiadelcentro/vid/mi/+ecb6a6b?autoplay=true',
'md5': '013c9b1e391dd4b34d8612439445deef',
'info_dict': {
'id': 'ecb6a6b',
'ext': 'mp4',
'thumbnail': r're:https?://.*\.(?:jpg|png)$',
'release_timestamp': 1477095852,
'title': 'En el Principio Era el Verbo | EVANGELIO DE JUAN | Ps. Gadiel Ríos',
'timestamp': 1425772800,
'upload_date': '20150308',
'description': 'md5:f368221de93176654989ba66bb564798',
'modified_timestamp': 1730258864,
'modified_date': '20241030',
'release_date': '20161022',
},
}, {
'url': 'https://prophecywatchers.subspla.sh/8gps8cx',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://core.subsplash.com/media/v1/media-items',
video_id, headers=self._get_headers(url, video_id),
query={
'filter[short_code]': video_id,
'include': 'images,audio.audio-outputs,audio.video,video.video-outputs,video.playlists,document,broadcast',
})
return self._extract_video(traverse_obj(data, ('_embedded', 'media-items', 0)), video_id)
class SubsplashPlaylistIE(SubsplashBaseIE):
IE_NAME = 'subsplash:playlist'
_VALID_URL = r'https?://(?:www\.)?subsplash\.com/[^/?#]+/(?:our-videos|media)/ms/\+(?P<id>\w+)'
_PAGE_SIZE = 15
_TESTS = [{
'url': 'https://subsplash.com/skywatchtv/our-videos/ms/+dbyjzp8',
'info_dict': {
'id': 'dbyjzp8',
'title': 'Five in Ten',
},
'playlist_mincount': 11,
}, {
'url': 'https://subsplash.com/prophecywatchers/media/ms/+n42mr48',
'info_dict': {
'id': 'n42mr48',
'title': 'Road to Zion Series',
},
'playlist_mincount': 13,
}, {
'url': 'https://subsplash.com/prophecywatchers/media/ms/+918b9f6',
'only_matching': True,
}]
def _entries(self, series_id, headers, page):
data = self._download_json(
'https://core.subsplash.com/media/v1/media-items', series_id, headers=headers,
query={
'filter[broadcast.status|broadcast.status]': 'null|on-demand',
'filter[media_series]': series_id,
'filter[status]': 'published',
'include': 'images,audio.audio-outputs,audio.video,video.video-outputs,video.playlists,document',
'page[number]': page + 1,
'page[size]': self._PAGE_SIZE,
'sort': '-position',
}, note=f'Downloading page {page + 1}')
for entry in traverse_obj(data, ('_embedded', 'media-items', lambda _, v: v['short_code'])):
entry_id = entry['short_code']
info = self._extract_video(entry, entry_id)
yield {
**info,
'webpage_url': f'https://subspla.sh/{entry_id}',
'extractor_key': SubsplashIE.ie_key(),
'extractor': SubsplashIE.IE_NAME,
}
def _real_extract(self, url):
display_id = self._match_id(url)
headers = self._get_headers(url, display_id)
data = self._download_json(
'https://core.subsplash.com/media/v1/media-series', display_id, headers=headers,
query={'filter[short_code]': display_id})
series_data = traverse_obj(data, ('_embedded', 'media-series', 0, {
'id': ('id', {str}),
'title': ('title', {str}),
'count': ('media_items_count', {int}),
}))
total_pages = math.ceil(series_data['count'] / self._PAGE_SIZE)
return self.playlist_result(
InAdvancePagedList(functools.partial(self._entries, series_data['id'], headers), total_pages, self._PAGE_SIZE),
display_id, series_data['title'])

View File

@ -118,8 +118,9 @@ class ThePlatformBaseIE(OnceIE):
'categories', lambda _, v: v.get('label') in ('category', None), 'name', {str})) or None, 'categories', lambda _, v: v.get('label') in ('category', None), 'name', {str})) or None,
'tags': traverse_obj(info, ('keywords', {lambda x: re.split(r'[;,]\s?', x) if x else None})), 'tags': traverse_obj(info, ('keywords', {lambda x: re.split(r'[;,]\s?', x) if x else None})),
'location': extract_site_specific_field('region'), 'location': extract_site_specific_field('region'),
'series': extract_site_specific_field('show'), 'series': extract_site_specific_field('show') or extract_site_specific_field('seriesTitle'),
'season_number': int_or_none(extract_site_specific_field('seasonNumber')), 'season_number': int_or_none(extract_site_specific_field('seasonNumber')),
'episode_number': int_or_none(extract_site_specific_field('episodeNumber')),
'media_type': extract_site_specific_field('programmingType') or extract_site_specific_field('type'), 'media_type': extract_site_specific_field('programmingType') or extract_site_specific_field('type'),
} }

View File

@ -24,8 +24,6 @@ class TVerIE(InfoExtractor):
'channel': 'テレビ朝日', 'channel': 'テレビ朝日',
'id': 'ep83nf3w4p', 'id': 'ep83nf3w4p',
'ext': 'mp4', 'ext': 'mp4',
'onair_label': '5月3日(火)放送分',
'ext_title': '家事ヤロウ!!! 売り場席巻のチーズSP財前直見×森泉親子の脱東京暮らし密着 テレビ朝日 5月3日(火)放送分',
}, },
'add_ie': ['BrightcoveNew'], 'add_ie': ['BrightcoveNew'],
}, { }, {

View File

@ -50,6 +50,7 @@ class KnownDRMIE(UnsupportedInfoExtractor):
r'music\.amazon\.(?:\w{2}\.)?\w+', r'music\.amazon\.(?:\w{2}\.)?\w+',
r'(?:watch|front)\.njpwworld\.com', r'(?:watch|front)\.njpwworld\.com',
r'qub\.ca/vrai', r'qub\.ca/vrai',
r'(?:beta\.)?crunchyroll\.com',
) )
_TESTS = [{ _TESTS = [{
@ -153,6 +154,12 @@ class KnownDRMIE(UnsupportedInfoExtractor):
}, { }, {
'url': 'https://www.qub.ca/vrai/l-effet-bocuse-d-or/saison-1/l-effet-bocuse-d-or-saison-1-bande-annonce-1098225063', 'url': 'https://www.qub.ca/vrai/l-effet-bocuse-d-or/saison-1/l-effet-bocuse-d-or-saison-1-bande-annonce-1098225063',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://www.crunchyroll.com/watch/GY2P1Q98Y/to-the-future',
'only_matching': True,
}, {
'url': 'https://beta.crunchyroll.com/pt-br/watch/G8WUN8VKP/the-ruler-of-conspiracy',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -14,59 +14,69 @@ class VideocampusSachsenIE(InfoExtractor):
'corporate.demo.vimp.com', 'corporate.demo.vimp.com',
'dancehalldatabase.com', 'dancehalldatabase.com',
'drehzahl.tv', 'drehzahl.tv',
'educhannel.hs-gesundheit.de', 'educhannel.hs-gesundheit.de', # Hochschule für Gesundheit NRW
'emedia.ls.haw-hamburg.de', 'emedia.ls.haw-hamburg.de',
'globale-evolution.net', 'globale-evolution.net',
'hohu.tv', 'hohu.tv',
'htvideos.hightechhigh.org', 'htvideos.hightechhigh.org',
'k210039.vimp.mivitec.net', 'k210039.vimp.mivitec.net',
'media.cmslegal.com', 'media.cmslegal.com',
'media.hs-furtwangen.de', 'media.fh-swf.de', # Fachhochschule Südwestfalen
'media.hwr-berlin.de', 'media.hs-furtwangen.de', # Hochschule Furtwangen
'media.hwr-berlin.de', # Hochschule für Wirtschaft und Recht Berlin
'mediathek.dkfz.de', 'mediathek.dkfz.de',
'mediathek.htw-berlin.de', 'mediathek.htw-berlin.de', # Hochschule für Technik und Wirtschaft Berlin
'mediathek.polizei-bw.de', 'mediathek.polizei-bw.de',
'medien.hs-merseburg.de', 'medien.hs-merseburg.de', # Hochschule Merseburg
'mportal.europa-uni.de', 'mitmedia.manukau.ac.nz', # Manukau Institute of Technology Auckland (NZ)
'mportal.europa-uni.de', # Europa-Universität Viadrina
'pacific.demo.vimp.com', 'pacific.demo.vimp.com',
'slctv.com', 'slctv.com',
'streaming.prairiesouth.ca', 'streaming.prairiesouth.ca',
'tube.isbonline.cn', 'tube.isbonline.cn',
'univideo.uni-kassel.de', 'univideo.uni-kassel.de', # Universität Kassel
'ursula2.genetics.emory.edu', 'ursula2.genetics.emory.edu',
'ursulablicklevideoarchiv.com', 'ursulablicklevideoarchiv.com',
'v.agrarumweltpaedagogik.at', 'v.agrarumweltpaedagogik.at',
'video.eplay-tv.de', 'video.eplay-tv.de',
'video.fh-dortmund.de', 'video.fh-dortmund.de', # Fachhochschule Dortmund
'video.hs-offenburg.de', 'video.hs-nb.de', # Hochschule Neubrandenburg
'video.hs-pforzheim.de', 'video.hs-offenburg.de', # Hochschule Offenburg
'video.hspv.nrw.de', 'video.hs-pforzheim.de', # Hochschule Pforzheim
'video.hspv.nrw.de', # Hochschule für Polizei und öffentliche Verwaltung NRW
'video.irtshdf.fr', 'video.irtshdf.fr',
'video.pareygo.de', 'video.pareygo.de',
'video.tu-freiberg.de', 'video.tu-dortmund.de', # Technische Universität Dortmund
'videocampus.sachsen.de', 'video.tu-freiberg.de', # Technische Universität Bergakademie Freiberg
'videoportal.uni-freiburg.de', 'videocampus.sachsen.de', # Video Campus Sachsen (gemeinsame Videoplattform sächsischer Universitäten, Hochschulen und der Berufsakademie Sachsen)
'videoportal.vm.uni-freiburg.de', 'videoportal.uni-freiburg.de', # Albert-Ludwigs-Universität Freiburg
'videoportal.vm.uni-freiburg.de', # Albert-Ludwigs-Universität Freiburg
'videos.duoc.cl', 'videos.duoc.cl',
'videos.uni-paderborn.de', 'videos.uni-paderborn.de', # Universität Paderborn
'vimp-bemus.udk-berlin.de', 'vimp-bemus.udk-berlin.de',
'vimp.aekwl.de', 'vimp.aekwl.de',
'vimp.hs-mittweida.de', 'vimp.hs-mittweida.de',
'vimp.oth-regensburg.de', 'vimp.landesfilmdienste.de',
'vimp.ph-heidelberg.de', 'vimp.oth-regensburg.de', # Ostbayerische Technische Hochschule Regensburg
'vimp.ph-heidelberg.de', # Pädagogische Hochschule Heidelberg
'vimp.sma-events.com', 'vimp.sma-events.com',
'vimp.weka-fachmedien.de', 'vimp.weka-fachmedien.de',
'vimpdesk.com',
'webtv.univ-montp3.fr', 'webtv.univ-montp3.fr',
'www.b-tu.de/media', 'www.b-tu.de/media', # Brandenburgische Technische Universität Cottbus-Senftenberg
'www.bergauf.tv', 'www.bergauf.tv',
'www.bigcitytv.de', 'www.bigcitytv.de',
'www.cad-videos.de', 'www.cad-videos.de',
'www.drehzahl.tv', 'www.drehzahl.tv',
'www.fh-bielefeld.de/medienportal',
'www.hohu.tv', 'www.hohu.tv',
'www.hsbi.de/medienportal', # Hochschule Bielefeld
'www.logistic.tv',
'www.orvovideo.com', 'www.orvovideo.com',
'www.printtube.co.uk',
'www.rwe.tv', 'www.rwe.tv',
'www.salzi.tv', 'www.salzi.tv',
'www.signtube.co.uk',
'www.twb-power.com',
'www.wenglor-media.com', 'www.wenglor-media.com',
'www2.univ-sba.dz', 'www2.univ-sba.dz',
) )
@ -188,22 +198,23 @@ class VideocampusSachsenIE(InfoExtractor):
class ViMPPlaylistIE(InfoExtractor): class ViMPPlaylistIE(InfoExtractor):
IE_NAME = 'ViMP:Playlist' IE_NAME = 'ViMP:Playlist'
_VALID_URL = r'''(?x)(?P<host>https?://(?:{}))/(?: _VALID_URL = r'''(?x)(?P<host>https?://(?:{}))/(?:
album/view/aid/(?P<album_id>[0-9]+)| (?P<mode1>album)/view/aid/(?P<album_id>[0-9]+)|
(?P<mode>category|channel)/(?P<name>[\w-]+)/(?P<id>[0-9]+) (?P<mode2>category|channel)/(?P<name>[\w-]+)/(?P<channel_id>[0-9]+)|
(?P<mode3>tag)/(?P<tag_id>[0-9]+)
)'''.format('|'.join(map(re.escape, VideocampusSachsenIE._INSTANCES))) )'''.format('|'.join(map(re.escape, VideocampusSachsenIE._INSTANCES)))
_TESTS = [{ _TESTS = [{
'url': 'https://vimp.oth-regensburg.de/channel/Designtheorie-1-SoSe-2020/3', 'url': 'https://vimp.oth-regensburg.de/channel/Designtheorie-1-SoSe-2020/3',
'info_dict': { 'info_dict': {
'id': 'channel-3', 'id': 'channel-3',
'title': 'Designtheorie 1 SoSe 2020 :: Channels :: ViMP OTH Regensburg', 'title': 'Designtheorie 1 SoSe 2020 - Channels - ViMP OTH Regensburg',
}, },
'playlist_mincount': 9, 'playlist_mincount': 9,
}, { }, {
'url': 'https://www.fh-bielefeld.de/medienportal/album/view/aid/208', 'url': 'https://www.hsbi.de/medienportal/album/view/aid/208',
'info_dict': { 'info_dict': {
'id': 'album-208', 'id': 'album-208',
'title': 'KG Praktikum ABT/MEC :: Playlists :: FH-Medienportal', 'title': 'KG Praktikum ABT/MEC - Playlists - HSBI-Medienportal',
}, },
'playlist_mincount': 4, 'playlist_mincount': 4,
}, { }, {
@ -213,6 +224,13 @@ class ViMPPlaylistIE(InfoExtractor):
'title': 'Online-Seminare ONYX - BPS - Bildungseinrichtungen - VCS', 'title': 'Online-Seminare ONYX - BPS - Bildungseinrichtungen - VCS',
}, },
'playlist_mincount': 7, 'playlist_mincount': 7,
}, {
'url': 'https://videocampus.sachsen.de/tag/26902',
'info_dict': {
'id': 'tag-26902',
'title': 'advanced mobile and v2x communication - Tags - VCS',
},
'playlist_mincount': 6,
}] }]
_PAGE_SIZE = 10 _PAGE_SIZE = 10
@ -220,34 +238,37 @@ class ViMPPlaylistIE(InfoExtractor):
webpage = self._download_webpage( webpage = self._download_webpage(
f'{host}/media/ajax/component/boxList/{url_part}', playlist_id, f'{host}/media/ajax/component/boxList/{url_part}', playlist_id,
query={'page': page, 'page_only': 1}, data=urlencode_postdata(data)) query={'page': page, 'page_only': 1}, data=urlencode_postdata(data))
urls = re.findall(r'"([^"]+/video/[^"]+)"', webpage) urls = re.findall(r'"([^"]*/video/[^"]+)"', webpage)
for url in urls: for url in urls:
yield self.url_result(host + url, VideocampusSachsenIE) yield self.url_result(host + url, VideocampusSachsenIE)
def _real_extract(self, url): def _real_extract(self, url):
host, album_id, mode, name, playlist_id = self._match_valid_url(url).group( host, album_id, name, channel_id, tag_id, mode1, mode2, mode3 = self._match_valid_url(url).group(
'host', 'album_id', 'mode', 'name', 'id') 'host', 'album_id', 'name', 'channel_id', 'tag_id', 'mode1', 'mode2', 'mode3')
webpage = self._download_webpage(url, album_id or playlist_id, fatal=False) or '' mode = mode1 or mode2 or mode3
playlist_id = album_id or channel_id or tag_id
webpage = self._download_webpage(url, playlist_id, fatal=False) or ''
title = (self._html_search_meta('title', webpage, fatal=False) title = (self._html_search_meta('title', webpage, fatal=False)
or self._html_extract_title(webpage)) or self._html_extract_title(webpage))
url_part = (f'aid/{album_id}' if album_id url_part = (f'aid/{album_id}' if album_id
else f'category/{name}/category_id/{playlist_id}' if mode == 'category' else f'category/{name}/category_id/{channel_id}' if mode == 'category'
else f'title/{name}/channel/{playlist_id}') else f'title/{name}/channel/{channel_id}' if mode == 'channel'
else f'tag/{tag_id}')
mode = mode or 'album'
data = { data = {
'vars[mode]': mode, 'vars[mode]': mode,
f'vars[{mode}]': album_id or playlist_id, f'vars[{mode}]': playlist_id,
'vars[context]': '4' if album_id else '1' if mode == 'category' else '3', 'vars[context]': '4' if album_id else '1' if mode == 'category' else '3' if mode == 'album' else '0',
'vars[context_id]': album_id or playlist_id, 'vars[context_id]': playlist_id,
'vars[layout]': 'thumb', 'vars[layout]': 'thumb',
'vars[per_page][thumb]': str(self._PAGE_SIZE), 'vars[per_page][thumb]': str(self._PAGE_SIZE),
} }
return self.playlist_result( return self.playlist_result(
OnDemandPagedList(functools.partial( OnDemandPagedList(functools.partial(
self._fetch_page, host, url_part, album_id or playlist_id, data), self._PAGE_SIZE), self._fetch_page, host, url_part, playlist_id, data), self._PAGE_SIZE),
playlist_title=title, id=f'{mode}-{album_id or playlist_id}') playlist_title=title, id=f'{mode}-{playlist_id}')

View File

@ -28,6 +28,7 @@ from ..utils import (
try_get, try_get,
unified_timestamp, unified_timestamp,
unsmuggle_url, unsmuggle_url,
url_or_none,
urlencode_postdata, urlencode_postdata,
urlhandle_detect_ext, urlhandle_detect_ext,
urljoin, urljoin,
@ -211,11 +212,7 @@ class VimeoBaseInfoExtractor(InfoExtractor):
'width': int_or_none(key), 'width': int_or_none(key),
'url': thumb, 'url': thumb,
}) })
thumbnail = video_data.get('thumbnail') thumbnails.extend(traverse_obj(video_data, (('thumbnail', 'thumbnail_url'), {'url': {url_or_none}})))
if thumbnail:
thumbnails.append({
'url': thumbnail,
})
owner = video_data.get('owner') or {} owner = video_data.get('owner') or {}
video_uploader_url = owner.get('url') video_uploader_url = owner.get('url')
@ -388,7 +385,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/businessofsoftware', 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/businessofsoftware',
'uploader_id': 'businessofsoftware', 'uploader_id': 'businessofsoftware',
'duration': 3610, 'duration': 3610,
'thumbnail': 'https://i.vimeocdn.com/video/376682406-f34043e7b766af6bef2af81366eacd6724f3fc3173179a11a97a1e26587c9529-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/376682406-f34043e7b766af6bef2af81366eacd6724f3fc3173179a11a97a1e26587c9529-d',
}, },
'params': { 'params': {
'format': 'best[protocol=https]', 'format': 'best[protocol=https]',
@ -413,7 +410,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 10, 'duration': 10,
'comment_count': int, 'comment_count': int,
'like_count': int, 'like_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d',
}, },
'params': { 'params': {
'format': 'best[protocol=https]', 'format': 'best[protocol=https]',
@ -437,7 +434,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'timestamp': 1380339469, 'timestamp': 1380339469,
'upload_date': '20130928', 'upload_date': '20130928',
'duration': 187, 'duration': 187,
'thumbnail': 'https://i.vimeocdn.com/video/450239872-a05512d9b1e55d707a7c04365c10980f327b06d966351bc403a5d5d65c95e572-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/450239872-a05512d9b1e55d707a7c04365c10980f327b06d966351bc403a5d5d65c95e572-d',
'view_count': int, 'view_count': int,
'comment_count': int, 'comment_count': int,
'like_count': int, 'like_count': int,
@ -463,7 +460,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 62, 'duration': 62,
'comment_count': int, 'comment_count': int,
'like_count': int, 'like_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/452001751-8216e0571c251a09d7a8387550942d89f7f86f6398f8ed886e639b0dd50d3c90-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/452001751-8216e0571c251a09d7a8387550942d89f7f86f6398f8ed886e639b0dd50d3c90-d',
'subtitles': { 'subtitles': {
'de': 'count:3', 'de': 'count:3',
'en': 'count:3', 'en': 'count:3',
@ -488,7 +485,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user28849593', 'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user28849593',
'uploader_id': 'user28849593', 'uploader_id': 'user28849593',
'duration': 118, 'duration': 118,
'thumbnail': 'https://i.vimeocdn.com/video/478636036-c18440305ef3df9decfb6bf207a61fe39d2d17fa462a96f6f2d93d30492b037d-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/478636036-c18440305ef3df9decfb6bf207a61fe39d2d17fa462a96f6f2d93d30492b037d-d',
}, },
'expected_warnings': ['Failed to parse XML: not well-formed'], 'expected_warnings': ['Failed to parse XML: not well-formed'],
}, },
@ -509,7 +506,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 60, 'duration': 60,
'comment_count': int, 'comment_count': int,
'view_count': int, 'view_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/231174622-dd07f015e9221ff529d451e1cc31c982b5d87bfafa48c4189b1da72824ee289a-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/231174622-dd07f015e9221ff529d451e1cc31c982b5d87bfafa48c4189b1da72824ee289a-d',
'like_count': int, 'like_count': int,
'tags': 'count:11', 'tags': 'count:11',
}, },
@ -531,7 +528,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'description': 'md5:f2edc61af3ea7a5592681ddbb683db73', 'description': 'md5:f2edc61af3ea7a5592681ddbb683db73',
'upload_date': '20200225', 'upload_date': '20200225',
'duration': 176, 'duration': 176,
'thumbnail': 'https://i.vimeocdn.com/video/859377297-836494a4ef775e9d4edbace83937d9ad34dc846c688c0c419c0e87f7ab06c4b3-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/859377297-836494a4ef775e9d4edbace83937d9ad34dc846c688c0c419c0e87f7ab06c4b3-d',
'uploader_url': 'https://vimeo.com/frameworkla', 'uploader_url': 'https://vimeo.com/frameworkla',
}, },
# 'params': {'format': 'source'}, # 'params': {'format': 'source'},
@ -556,7 +553,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 321, 'duration': 321,
'comment_count': int, 'comment_count': int,
'view_count': int, 'view_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/22728298-bfc22146f930de7cf497821c7b0b9f168099201ecca39b00b6bd31fcedfca7a6-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/22728298-bfc22146f930de7cf497821c7b0b9f168099201ecca39b00b6bd31fcedfca7a6-d',
'like_count': int, 'like_count': int,
'tags': ['[the shining', 'vimeohq', 'cv', 'vimeo tribute]'], 'tags': ['[the shining', 'vimeohq', 'cv', 'vimeo tribute]'],
}, },
@ -596,7 +593,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_id': 'user18948128', 'uploader_id': 'user18948128',
'uploader': 'Jaime Marquínez Ferrándiz', 'uploader': 'Jaime Marquínez Ferrándiz',
'duration': 10, 'duration': 10,
'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/440665496-b2c5aee2b61089442c794f64113a8e8f7d5763c3e6b3ebfaf696ae6413f8b1f4-d',
}, },
'params': { 'params': {
'format': 'best[protocol=https]', 'format': 'best[protocol=https]',
@ -633,7 +630,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'description': str, # FIXME: Dynamic SEO spam description 'description': str, # FIXME: Dynamic SEO spam description
'upload_date': '20150209', 'upload_date': '20150209',
'timestamp': 1423518307, 'timestamp': 1423518307,
'thumbnail': 'https://i.vimeocdn.com/video/default_1280', 'thumbnail': 'https://i.vimeocdn.com/video/default',
'duration': 10, 'duration': 10,
'like_count': int, 'like_count': int,
'uploader_url': 'https://vimeo.com/user20132939', 'uploader_url': 'https://vimeo.com/user20132939',
@ -666,7 +663,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'license': 'by-nc', 'license': 'by-nc',
'duration': 159, 'duration': 159,
'comment_count': int, 'comment_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/562802436-585eeb13b5020c6ac0f171a2234067938098f84737787df05ff0d767f6d54ee9-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/562802436-585eeb13b5020c6ac0f171a2234067938098f84737787df05ff0d767f6d54ee9-d',
'like_count': int, 'like_count': int,
'uploader_url': 'https://vimeo.com/aliniamedia', 'uploader_url': 'https://vimeo.com/aliniamedia',
'release_date': '20160329', 'release_date': '20160329',
@ -686,7 +683,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader': 'Firework Champions', 'uploader': 'Firework Champions',
'upload_date': '20150910', 'upload_date': '20150910',
'timestamp': 1441901895, 'timestamp': 1441901895,
'thumbnail': 'https://i.vimeocdn.com/video/534715882-6ff8e4660cbf2fea68282876d8d44f318825dfe572cc4016e73b3266eac8ae3a-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/534715882-6ff8e4660cbf2fea68282876d8d44f318825dfe572cc4016e73b3266eac8ae3a-d',
'uploader_url': 'https://vimeo.com/fireworkchampions', 'uploader_url': 'https://vimeo.com/fireworkchampions',
'tags': 'count:6', 'tags': 'count:6',
'duration': 229, 'duration': 229,
@ -715,7 +712,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'duration': 336, 'duration': 336,
'comment_count': int, 'comment_count': int,
'view_count': int, 'view_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/541243181-b593db36a16db2f0096f655da3f5a4dc46b8766d77b0f440df937ecb0c418347-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/541243181-b593db36a16db2f0096f655da3f5a4dc46b8766d77b0f440df937ecb0c418347-d',
'like_count': int, 'like_count': int,
'uploader_url': 'https://vimeo.com/karimhd', 'uploader_url': 'https://vimeo.com/karimhd',
'channel_url': 'https://vimeo.com/channels/staffpicks', 'channel_url': 'https://vimeo.com/channels/staffpicks',
@ -740,7 +737,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'release_timestamp': 1627621014, 'release_timestamp': 1627621014,
'duration': 976, 'duration': 976,
'comment_count': int, 'comment_count': int,
'thumbnail': 'https://i.vimeocdn.com/video/1202249320-4ddb2c30398c0dc0ee059172d1bd5ea481ad12f0e0e3ad01d2266f56c744b015-d_1280', 'thumbnail': 'https://i.vimeocdn.com/video/1202249320-4ddb2c30398c0dc0ee059172d1bd5ea481ad12f0e0e3ad01d2266f56c744b015-d',
'like_count': int, 'like_count': int,
'uploader_url': 'https://vimeo.com/txwestcapital', 'uploader_url': 'https://vimeo.com/txwestcapital',
'release_date': '20210730', 'release_date': '20210730',
@ -764,7 +761,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader': 'Alex Howard', 'uploader': 'Alex Howard',
'uploader_id': 'user54729178', 'uploader_id': 'user54729178',
'uploader_url': 'https://vimeo.com/user54729178', 'uploader_url': 'https://vimeo.com/user54729178',
'thumbnail': r're:https://i\.vimeocdn\.com/video/1520099929-[\da-f]+-d_1280', 'thumbnail': r're:https://i\.vimeocdn\.com/video/1520099929-[\da-f]+-d',
'duration': 2636, 'duration': 2636,
'chapters': [ 'chapters': [
{'start_time': 0, 'end_time': 10, 'title': '<Untitled Chapter 1>'}, {'start_time': 0, 'end_time': 10, 'title': '<Untitled Chapter 1>'},
@ -807,7 +804,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'like_count': int, 'like_count': int,
'view_count': int, 'view_count': int,
'comment_count': int, 'comment_count': int,
'thumbnail': r're:https://i\.vimeocdn\.com/video/1018638656-[\da-f]+-d_1280', 'thumbnail': r're:https://i\.vimeocdn\.com/video/1018638656-[\da-f]+-d',
}, },
# 'params': {'format': 'Original'}, # 'params': {'format': 'Original'},
'expected_warnings': ['Failed to parse XML: not well-formed'], 'expected_warnings': ['Failed to parse XML: not well-formed'],
@ -824,7 +821,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
'uploader_id': 'rajavirdi', 'uploader_id': 'rajavirdi',
'uploader_url': 'https://vimeo.com/rajavirdi', 'uploader_url': 'https://vimeo.com/rajavirdi',
'duration': 309, 'duration': 309,
'thumbnail': r're:https://i\.vimeocdn\.com/video/1716727772-[\da-f]+-d_1280', 'thumbnail': r're:https://i\.vimeocdn\.com/video/1716727772-[\da-f]+-d',
}, },
# 'params': {'format': 'source'}, # 'params': {'format': 'source'},
'expected_warnings': ['Failed to parse XML: not well-formed'], 'expected_warnings': ['Failed to parse XML: not well-formed'],

View File

@ -124,7 +124,7 @@ class WeiboBaseIE(InfoExtractor):
class WeiboIE(WeiboBaseIE): class WeiboIE(WeiboBaseIE):
_VALID_URL = r'https?://(?:m\.weibo\.cn/status|(?:www\.)?weibo\.com/\d+)/(?P<id>[a-zA-Z0-9]+)' _VALID_URL = r'https?://(?:m\.weibo\.cn/(?:status|detail)|(?:www\.)?weibo\.com/\d+)/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://weibo.com/7827771738/N4xlMvjhI', 'url': 'https://weibo.com/7827771738/N4xlMvjhI',
'info_dict': { 'info_dict': {
@ -164,6 +164,25 @@ class WeiboIE(WeiboBaseIE):
'like_count': int, 'like_count': int,
'repost_count': int, 'repost_count': int,
}, },
}, {
'url': 'https://m.weibo.cn/detail/4189191225395228',
'info_dict': {
'id': '4189191225395228',
'ext': 'mp4',
'display_id': 'FBqgOmDxO',
'title': '柴犬柴犬的秒拍视频',
'description': '午睡当然是要甜甜蜜蜜的啦![坏笑] Instagramshibainu.gaku http://t.cn/RHbmjzW ',
'duration': 53,
'timestamp': 1514264429,
'upload_date': '20171226',
'thumbnail': r're:https://.*\.jpg',
'uploader': '柴犬柴犬',
'uploader_id': '5926682210',
'uploader_url': 'https://weibo.com/u/5926682210',
'view_count': int,
'like_count': int,
'repost_count': int,
},
}, { }, {
'url': 'https://weibo.com/0/4224132150961381', 'url': 'https://weibo.com/0/4224132150961381',
'note': 'no playback_list example', 'note': 'no playback_list example',

View File

@ -20,7 +20,7 @@ from ..utils import (
class XHamsterIE(InfoExtractor): class XHamsterIE(InfoExtractor):
_DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster\d+\.com|xhday\.com|xhvid\.com)' _DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster\d+\.(?:com|desi)|xhday\.com|xhvid\.com)'
_VALID_URL = rf'''(?x) _VALID_URL = rf'''(?x)
https?:// https?://
(?:[^/?#]+\.)?{_DOMAINS}/ (?:[^/?#]+\.)?{_DOMAINS}/
@ -31,7 +31,7 @@ class XHamsterIE(InfoExtractor):
''' '''
_TESTS = [{ _TESTS = [{
'url': 'https://xhamster.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445', 'url': 'https://xhamster.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'md5': '34e1ab926db5dc2750fed9e1f34304bb', 'md5': 'e009ea6b849b129e3bebaeb9cf0dee51',
'info_dict': { 'info_dict': {
'id': '1509445', 'id': '1509445',
'display_id': 'femaleagent-shy-beauty-takes-the-bait', 'display_id': 'femaleagent-shy-beauty-takes-the-bait',
@ -43,6 +43,11 @@ class XHamsterIE(InfoExtractor):
'uploader_id': 'ruseful2011', 'uploader_id': 'ruseful2011',
'duration': 893, 'duration': 893,
'age_limit': 18, 'age_limit': 18,
'thumbnail': 'https://thumb-nss.xhcdn.com/a/u3Vr5F2vvcU3yK59_jJqVA/001/509/445/1280x720.8.jpg',
'uploader_url': 'https://xhamster.com/users/ruseful2011',
'description': '',
'view_count': int,
'comment_count': int,
}, },
}, { }, {
'url': 'https://xhamster.com/videos/britney-spears-sexy-booty-2221348?hd=', 'url': 'https://xhamster.com/videos/britney-spears-sexy-booty-2221348?hd=',
@ -56,6 +61,10 @@ class XHamsterIE(InfoExtractor):
'uploader': 'jojo747400', 'uploader': 'jojo747400',
'duration': 200, 'duration': 200,
'age_limit': 18, 'age_limit': 18,
'description': '',
'view_count': int,
'thumbnail': 'https://thumb-nss.xhcdn.com/a/kk5nio_iR-h4Z3frfVtoDw/002/221/348/1280x720.4.jpg',
'comment_count': int,
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -73,6 +82,11 @@ class XHamsterIE(InfoExtractor):
'uploader_id': 'parejafree', 'uploader_id': 'parejafree',
'duration': 72, 'duration': 72,
'age_limit': 18, 'age_limit': 18,
'comment_count': int,
'uploader_url': 'https://xhamster.com/users/parejafree',
'description': '',
'view_count': int,
'thumbnail': 'https://thumb-nss.xhcdn.com/a/xc8MSwVKcsQeRRiTT-saMQ/005/667/973/1280x720.2.jpg',
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -122,6 +136,9 @@ class XHamsterIE(InfoExtractor):
}, { }, {
'url': 'https://xhvid.com/videos/lk-mm-xhc6wn6', 'url': 'https://xhvid.com/videos/lk-mm-xhc6wn6',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://xhamster20.desi/videos/my-verification-video-scottishmistress23-11937369',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):
@ -267,7 +284,7 @@ class XHamsterIE(InfoExtractor):
video, lambda x: x['rating']['likes'], int)), video, lambda x: x['rating']['likes'], int)),
'dislike_count': int_or_none(try_get( 'dislike_count': int_or_none(try_get(
video, lambda x: x['rating']['dislikes'], int)), video, lambda x: x['rating']['dislikes'], int)),
'comment_count': int_or_none(video.get('views')), 'comment_count': int_or_none(video.get('comments')),
'age_limit': age_limit if age_limit is not None else 18, 'age_limit': age_limit if age_limit is not None else 18,
'categories': categories, 'categories': categories,
'formats': formats, 'formats': formats,

View File

@ -5,6 +5,7 @@ from ..utils import (
int_or_none, int_or_none,
js_to_json, js_to_json,
url_or_none, url_or_none,
urlhandle_detect_ext,
) )
from ..utils.traversal import traverse_obj from ..utils.traversal import traverse_obj
@ -46,7 +47,7 @@ class XiaoHongShuIE(InfoExtractor):
r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', display_id, transform_source=js_to_json) r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', display_id, transform_source=js_to_json)
note_info = traverse_obj(initial_state, ('note', 'noteDetailMap', display_id, 'note')) note_info = traverse_obj(initial_state, ('note', 'noteDetailMap', display_id, 'note'))
video_info = traverse_obj(note_info, ('video', 'media', 'stream', ('h264', 'av1', 'h265'), ...)) video_info = traverse_obj(note_info, ('video', 'media', 'stream', ..., ...))
formats = [] formats = []
for info in video_info: for info in video_info:
@ -56,18 +57,32 @@ class XiaoHongShuIE(InfoExtractor):
'height': ('height', {int_or_none}), 'height': ('height', {int_or_none}),
'vcodec': ('videoCodec', {str}), 'vcodec': ('videoCodec', {str}),
'acodec': ('audioCodec', {str}), 'acodec': ('audioCodec', {str}),
'abr': ('audioBitrate', {int_or_none}), 'abr': ('audioBitrate', {int_or_none(scale=1000)}),
'vbr': ('videoBitrate', {int_or_none}), 'vbr': ('videoBitrate', {int_or_none(scale=1000)}),
'audio_channels': ('audioChannels', {int_or_none}), 'audio_channels': ('audioChannels', {int_or_none}),
'tbr': ('avgBitrate', {int_or_none}), 'tbr': ('avgBitrate', {int_or_none(scale=1000)}),
'format': ('qualityType', {str}), 'format': ('qualityType', {str}),
'filesize': ('size', {int_or_none}), 'filesize': ('size', {int_or_none}),
'duration': ('duration', {float_or_none(scale=1000)}), 'duration': ('duration', {float_or_none(scale=1000)}),
}) })
formats.extend(traverse_obj(info, (('mediaUrl', ('backupUrls', ...)), { formats.extend(traverse_obj(info, (('masterUrl', ('backupUrls', ...)), {
lambda u: url_or_none(u) and {'url': u, **format_info}}))) lambda u: url_or_none(u) and {'url': u, **format_info}})))
if origin_key := traverse_obj(note_info, ('video', 'consumer', 'originVideoKey', {str})):
# Not using a head request because of false negatives
urlh = self._request_webpage(
f'https://sns-video-bd.xhscdn.com/{origin_key}', display_id,
'Checking original video availability', 'Original video is not available', fatal=False)
if urlh:
formats.append({
'format_id': 'direct',
'ext': urlhandle_detect_ext(urlh, default='mp4'),
'filesize': int_or_none(urlh.get_header('Content-Length')),
'url': urlh.url,
'quality': 1,
})
thumbnails = [] thumbnails = []
for image_info in traverse_obj(note_info, ('imageList', ...)): for image_info in traverse_obj(note_info, ('imageList', ...)):
thumbnail_info = traverse_obj(image_info, { thumbnail_info = traverse_obj(image_info, {

View File

@ -1,4 +1,5 @@
import base64 import base64
import binascii
import calendar import calendar
import collections import collections
import copy import copy
@ -69,7 +70,14 @@ from ..utils import (
) )
STREAMING_DATA_CLIENT_NAME = '__yt_dlp_client' STREAMING_DATA_CLIENT_NAME = '__yt_dlp_client'
STREAMING_DATA_PO_TOKEN = '__yt_dlp_po_token' STREAMING_DATA_INITIAL_PO_TOKEN = '__yt_dlp_po_token'
PO_TOKEN_GUIDE_URL = 'https://github.com/yt-dlp/yt-dlp/wiki/PO-Token-Guide'
class _PoTokenContext(enum.Enum):
PLAYER = 'player'
GVS = 'gvs'
# any clients starting with _ cannot be explicitly requested by the user # any clients starting with _ cannot be explicitly requested by the user
INNERTUBE_CLIENTS = { INNERTUBE_CLIENTS = {
@ -81,7 +89,7 @@ INNERTUBE_CLIENTS = {
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 1, 'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
'REQUIRE_PO_TOKEN': True, 'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'SUPPORTS_COOKIES': True, 'SUPPORTS_COOKIES': True,
}, },
# Safari UA returns pre-merged video+audio 144p/240p/360p/720p/1080p HLS formats # Safari UA returns pre-merged video+audio 144p/240p/360p/720p/1080p HLS formats
@ -94,7 +102,7 @@ INNERTUBE_CLIENTS = {
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 1, 'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
'REQUIRE_PO_TOKEN': True, 'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'SUPPORTS_COOKIES': True, 'SUPPORTS_COOKIES': True,
}, },
'web_embedded': { 'web_embedded': {
@ -116,6 +124,7 @@ INNERTUBE_CLIENTS = {
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 67, 'INNERTUBE_CONTEXT_CLIENT_NAME': 67,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'SUPPORTS_COOKIES': True, 'SUPPORTS_COOKIES': True,
}, },
# This client now requires sign-in for every video # This client now requires sign-in for every video
@ -127,6 +136,7 @@ INNERTUBE_CLIENTS = {
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 62, 'INNERTUBE_CONTEXT_CLIENT_NAME': 62,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'REQUIRE_AUTH': True, 'REQUIRE_AUTH': True,
'SUPPORTS_COOKIES': True, 'SUPPORTS_COOKIES': True,
}, },
@ -143,7 +153,7 @@ INNERTUBE_CLIENTS = {
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 3, 'INNERTUBE_CONTEXT_CLIENT_NAME': 3,
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
'REQUIRE_PO_TOKEN': True, 'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
}, },
# This client now requires sign-in for every video # This client now requires sign-in for every video
'android_music': { 'android_music': {
@ -159,7 +169,7 @@ INNERTUBE_CLIENTS = {
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 21, 'INNERTUBE_CONTEXT_CLIENT_NAME': 21,
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
'REQUIRE_PO_TOKEN': True, 'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'REQUIRE_AUTH': True, 'REQUIRE_AUTH': True,
}, },
# This client now requires sign-in for every video # This client now requires sign-in for every video
@ -176,7 +186,7 @@ INNERTUBE_CLIENTS = {
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 14, 'INNERTUBE_CONTEXT_CLIENT_NAME': 14,
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
'REQUIRE_PO_TOKEN': True, 'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'REQUIRE_AUTH': True, 'REQUIRE_AUTH': True,
}, },
# YouTube Kids videos aren't returned on this client for some reason # YouTube Kids videos aren't returned on this client for some reason
@ -202,16 +212,16 @@ INNERTUBE_CLIENTS = {
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'IOS', 'clientName': 'IOS',
'clientVersion': '19.45.4', 'clientVersion': '20.03.02',
'deviceMake': 'Apple', 'deviceMake': 'Apple',
'deviceModel': 'iPhone16,2', 'deviceModel': 'iPhone16,2',
'userAgent': 'com.google.ios.youtube/19.45.4 (iPhone16,2; U; CPU iOS 18_1_0 like Mac OS X;)', 'userAgent': 'com.google.ios.youtube/20.03.02 (iPhone16,2; U; CPU iOS 18_2_1 like Mac OS X;)',
'osName': 'iPhone', 'osName': 'iPhone',
'osVersion': '18.1.0.22B83', 'osVersion': '18.2.1.22C161',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 5, 'INNERTUBE_CONTEXT_CLIENT_NAME': 5,
'REQUIRE_PO_TOKEN': True, 'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
}, },
# This client now requires sign-in for every video # This client now requires sign-in for every video
@ -229,6 +239,7 @@ INNERTUBE_CLIENTS = {
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 26, 'INNERTUBE_CONTEXT_CLIENT_NAME': 26,
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'REQUIRE_AUTH': True, 'REQUIRE_AUTH': True,
}, },
# This client now requires sign-in for every video # This client now requires sign-in for every video
@ -246,6 +257,7 @@ INNERTUBE_CLIENTS = {
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 15, 'INNERTUBE_CONTEXT_CLIENT_NAME': 15,
'REQUIRE_JS_PLAYER': False, 'REQUIRE_JS_PLAYER': False,
'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'REQUIRE_AUTH': True, 'REQUIRE_AUTH': True,
}, },
# mweb has 'ultralow' formats # mweb has 'ultralow' formats
@ -260,14 +272,15 @@ INNERTUBE_CLIENTS = {
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 2, 'INNERTUBE_CONTEXT_CLIENT_NAME': 2,
'REQUIRE_PO_TOKEN': True, 'PO_TOKEN_REQUIRED_CONTEXTS': [_PoTokenContext.GVS],
'SUPPORTS_COOKIES': True, 'SUPPORTS_COOKIES': True,
}, },
'tv': { 'tv': {
'INNERTUBE_CONTEXT': { 'INNERTUBE_CONTEXT': {
'client': { 'client': {
'clientName': 'TVHTML5', 'clientName': 'TVHTML5',
'clientVersion': '7.20241201.18.00', 'clientVersion': '7.20250120.19.00',
'userAgent': 'Mozilla/5.0 (ChromiumStylePlatform) Cobalt/Version',
}, },
}, },
'INNERTUBE_CONTEXT_CLIENT_NAME': 7, 'INNERTUBE_CONTEXT_CLIENT_NAME': 7,
@ -313,7 +326,7 @@ def build_innertube_clients():
for client, ytcfg in tuple(INNERTUBE_CLIENTS.items()): for client, ytcfg in tuple(INNERTUBE_CLIENTS.items()):
ytcfg.setdefault('INNERTUBE_HOST', 'www.youtube.com') ytcfg.setdefault('INNERTUBE_HOST', 'www.youtube.com')
ytcfg.setdefault('REQUIRE_JS_PLAYER', True) ytcfg.setdefault('REQUIRE_JS_PLAYER', True)
ytcfg.setdefault('REQUIRE_PO_TOKEN', False) ytcfg.setdefault('PO_TOKEN_REQUIRED_CONTEXTS', [])
ytcfg.setdefault('REQUIRE_AUTH', False) ytcfg.setdefault('REQUIRE_AUTH', False)
ytcfg.setdefault('SUPPORTS_COOKIES', False) ytcfg.setdefault('SUPPORTS_COOKIES', False)
ytcfg.setdefault('PLAYER_PARAMS', None) ytcfg.setdefault('PLAYER_PARAMS', None)
@ -849,11 +862,15 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
'web': 'https://www.youtube.com', 'web': 'https://www.youtube.com',
'web_music': 'https://music.youtube.com', 'web_music': 'https://music.youtube.com',
'web_embedded': f'https://www.youtube.com/embed/{video_id}?html5=1', 'web_embedded': f'https://www.youtube.com/embed/{video_id}?html5=1',
'tv': 'https://www.youtube.com/tv',
}.get(client) }.get(client)
if not url: if not url:
return {} return {}
webpage = self._download_webpage( webpage = self._download_webpage(
url, video_id, fatal=False, note=f'Downloading {client.replace("_", " ").strip()} client config') url, video_id, fatal=False, note=f'Downloading {client.replace("_", " ").strip()} client config',
headers=traverse_obj(self._get_default_ytcfg(client), {
'User-Agent': ('INNERTUBE_CONTEXT', 'client', 'userAgent', {str}),
}))
return self.extract_ytcfg(video_id, webpage) or {} return self.extract_ytcfg(video_id, webpage) or {}
@staticmethod @staticmethod
@ -1423,8 +1440,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'401': {'ext': 'mp4', 'height': 2160, 'format_note': 'DASH video', 'vcodec': 'av01.0.12M.08'}, '401': {'ext': 'mp4', 'height': 2160, 'format_note': 'DASH video', 'vcodec': 'av01.0.12M.08'},
} }
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'vtt') _SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'vtt')
_DEFAULT_CLIENTS = ('ios', 'tv') _DEFAULT_CLIENTS = ('tv', 'ios', 'web')
_DEFAULT_AUTHED_CLIENTS = ('web_creator', 'tv') _DEFAULT_AUTHED_CLIENTS = ('tv', 'web')
_GEO_BYPASS = False _GEO_BYPASS = False
@ -2629,16 +2646,17 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'timestamp': 1657627949, 'timestamp': 1657627949,
'release_date': '20220712', 'release_date': '20220712',
'channel_url': 'https://www.youtube.com/channel/UCSJ4gkVC6NrvII8umztf0Ow', 'channel_url': 'https://www.youtube.com/channel/UCSJ4gkVC6NrvII8umztf0Ow',
'description': 'md5:13a6f76df898f5674f9127139f3df6f7', 'description': 'md5:452d5c82f72bb7e62a4e0297c3f01c23',
'age_limit': 0, 'age_limit': 0,
'thumbnail': 'https://i.ytimg.com/vi/jfKfPfyJRdk/maxresdefault.jpg', 'thumbnail': 'https://i.ytimg.com/vi/jfKfPfyJRdk/maxresdefault.jpg',
'release_timestamp': 1657641570, 'release_timestamp': 1657641570,
'uploader_url': 'https://www.youtube.com/@LofiGirl', 'uploader_url': 'https://www.youtube.com/@LofiGirl',
'channel_follower_count': int, 'channel_follower_count': int,
'channel_is_verified': True, 'channel_is_verified': True,
'title': r're:^lofi hip hop radio 📚 - beats to relax/study to', 'title': r're:^lofi hip hop radio 📚 beats to relax/study to',
'view_count': int, 'view_count': int,
'live_status': 'is_live', 'live_status': 'is_live',
'media_type': 'livestream',
'tags': 'count:32', 'tags': 'count:32',
'channel': 'Lofi Girl', 'channel': 'Lofi Girl',
'availability': 'public', 'availability': 'public',
@ -2799,6 +2817,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'skip': 'Age-restricted; requires authentication', 'skip': 'Age-restricted; requires authentication',
}, },
{ {
'note': 'Support /live/ URL + media type for post-live content',
'url': 'https://www.youtube.com/live/qVv6vCqciTM', 'url': 'https://www.youtube.com/live/qVv6vCqciTM',
'info_dict': { 'info_dict': {
'id': 'qVv6vCqciTM', 'id': 'qVv6vCqciTM',
@ -2821,6 +2840,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'channel_id': 'UCIdEIHpS0TdkqRkHL5OkLtA', 'channel_id': 'UCIdEIHpS0TdkqRkHL5OkLtA',
'categories': ['Entertainment'], 'categories': ['Entertainment'],
'live_status': 'was_live', 'live_status': 'was_live',
'media_type': 'livestream',
'release_timestamp': 1671793345, 'release_timestamp': 1671793345,
'channel': 'さなちゃんねる', 'channel': 'さなちゃんねる',
'description': 'md5:6aebf95cc4a1d731aebc01ad6cc9806d', 'description': 'md5:6aebf95cc4a1d731aebc01ad6cc9806d',
@ -3833,53 +3853,105 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
**cls._get_checkok_params(), **cls._get_checkok_params(),
} }
def _get_config_po_token(self, client): def _get_config_po_token(self, client: str, context: _PoTokenContext):
po_token_strs = self._configuration_arg('po_token', [], ie_key=YoutubeIE, casesense=True) po_token_strs = self._configuration_arg('po_token', [], ie_key=YoutubeIE, casesense=True)
for token_str in po_token_strs: for token_str in po_token_strs:
po_token_client, sep, po_token = token_str.partition('+') po_token_meta, sep, po_token = token_str.partition('+')
if not sep: if not sep:
self.report_warning( self.report_warning(
f'Invalid po_token configuration format. Expected "client+po_token", got "{token_str}"', only_once=True) f'Invalid po_token configuration format. '
f'Expected "CLIENT.CONTEXT+PO_TOKEN", got "{token_str}"', only_once=True)
continue continue
if po_token_client == client:
return po_token
def fetch_po_token(self, client='web', visitor_data=None, data_sync_id=None, player_url=None, **kwargs): po_token_client, sep, po_token_context = po_token_meta.partition('.')
# PO Token is bound to visitor_data / Visitor ID when logged out. Must have visitor_data for it to function. if po_token_client.lower() != client:
if not visitor_data and not self.is_authenticated and player_url: continue
if not sep:
# TODO(future): deprecate the old format?
self.write_debug(
f'po_token configuration for {client} client is missing a context; assuming GVS. '
'You can provide a context with the format "CLIENT.CONTEXT+PO_TOKEN"',
only_once=True)
po_token_context = _PoTokenContext.GVS.value
if po_token_context.lower() != context.value:
continue
# Clean and validate the PO Token. This will strip invalid characters off
# (e.g. additional url params the user may accidentally include)
try:
return base64.urlsafe_b64encode(base64.urlsafe_b64decode(urllib.parse.unquote(po_token))).decode()
except (binascii.Error, ValueError):
self.report_warning( self.report_warning(
f'Unable to fetch PO Token for {client} client: Missing required Visitor Data. ' f'Invalid po_token configuration for {client} client: '
f'{po_token_context} PO Token should be a base64url-encoded string.',
only_once=True)
continue
def fetch_po_token(self, client='web', context=_PoTokenContext.GVS, ytcfg=None, visitor_data=None,
data_sync_id=None, session_index=None, player_url=None, video_id=None, **kwargs):
"""
Fetch a PO Token for a given client and context. This function will validate required parameters for a given context and client.
EXPERIMENTAL: This method is unstable and may change or be removed without notice.
@param client: The client to fetch the PO Token for.
@param context: The context in which the PO Token is used.
@param ytcfg: The ytcfg for the client.
@param visitor_data: visitor data.
@param data_sync_id: data sync ID.
@param session_index: session index.
@param player_url: player URL.
@param video_id: video ID.
@param kwargs: Additional arguments to pass down. May be more added in the future.
@return: The fetched PO Token. None if it could not be fetched.
"""
# GVS WebPO Token is bound to visitor_data / Visitor ID when logged out.
# Must have visitor_data for it to function.
if player_url and context == _PoTokenContext.GVS and not visitor_data and not self.is_authenticated:
self.report_warning(
f'Unable to fetch GVS PO Token for {client} client: Missing required Visitor Data. '
f'You may need to pass Visitor Data with --extractor-args "youtube:visitor_data=XXX"') f'You may need to pass Visitor Data with --extractor-args "youtube:visitor_data=XXX"')
return return
config_po_token = self._get_config_po_token(client) if context == _PoTokenContext.PLAYER and not video_id:
if config_po_token:
# PO token is bound to data_sync_id / account Session ID when logged in. However, for the config po_token,
# if using first channel in an account then we don't need the data_sync_id anymore...
if not data_sync_id and self.is_authenticated and player_url:
self.report_warning( self.report_warning(
f'Got a PO Token for {client} client, but missing Data Sync ID for account. Formats may not work.' f'Unable to fetch Player PO Token for {client} client: Missing required Video ID')
return
config_po_token = self._get_config_po_token(client, context)
if config_po_token:
# GVS WebPO token is bound to data_sync_id / account Session ID when logged in.
if player_url and context == _PoTokenContext.GVS and not data_sync_id and self.is_authenticated:
self.report_warning(
f'Got a GVS PO Token for {client} client, but missing Data Sync ID for account. Formats may not work.'
f'You may need to pass a Data Sync ID with --extractor-args "youtube:data_sync_id=XXX"') f'You may need to pass a Data Sync ID with --extractor-args "youtube:data_sync_id=XXX"')
return config_po_token return config_po_token
# Require PO Token if logged in for external fetching # Require GVS WebPO Token if logged in for external fetching
if not data_sync_id and self.is_authenticated and player_url: if player_url and context == _PoTokenContext.GVS and not data_sync_id and self.is_authenticated:
self.report_warning( self.report_warning(
f'Unable to fetch PO Token for {client} client: Missing required Data Sync ID for account. ' f'Unable to fetch GVS PO Token for {client} client: Missing required Data Sync ID for account. '
f'You may need to pass a Data Sync ID with --extractor-args "youtube:data_sync_id=XXX"') f'You may need to pass a Data Sync ID with --extractor-args "youtube:data_sync_id=XXX"')
return return
return self._fetch_po_token( return self._fetch_po_token(
client=client, client=client,
context=context.value,
ytcfg=ytcfg,
visitor_data=visitor_data, visitor_data=visitor_data,
data_sync_id=data_sync_id, data_sync_id=data_sync_id,
session_index=session_index,
player_url=player_url, player_url=player_url,
video_id=video_id,
**kwargs, **kwargs,
) )
def _fetch_po_token(self, client, visitor_data=None, data_sync_id=None, player_url=None, **kwargs): def _fetch_po_token(self, client, **kwargs):
"""External PO Token fetch stub""" """(Unstable) External PO Token fetch stub"""
@staticmethod @staticmethod
def _is_agegated(player_response): def _is_agegated(player_response):
@ -3960,16 +4032,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if not requested_clients: if not requested_clients:
raise ExtractorError('No player clients have been requested', expected=True) raise ExtractorError('No player clients have been requested', expected=True)
if smuggled_data.get('is_music_url') or self.is_music_url(url):
for requested_client in requested_clients:
_, base_client, variant = _split_innertube_client(requested_client)
music_client = f'{base_client}_music' if base_client != 'mweb' else 'web_music'
if variant != 'music' and music_client in INNERTUBE_CLIENTS:
client_info = INNERTUBE_CLIENTS[music_client]
if not client_info['REQUIRE_AUTH'] or (self.is_authenticated and client_info['SUPPORTS_COOKIES']):
requested_clients.append(music_client)
if self.is_authenticated: if self.is_authenticated:
if (smuggled_data.get('is_music_url') or self.is_music_url(url)) and 'web_music' not in requested_clients:
requested_clients.append('web_music')
unsupported_clients = [ unsupported_clients = [
client for client in requested_clients if not INNERTUBE_CLIENTS[client]['SUPPORTS_COOKIES'] client for client in requested_clients if not INNERTUBE_CLIENTS[client]['SUPPORTS_COOKIES']
] ]
@ -4036,17 +4102,47 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
visitor_data = visitor_data or self._extract_visitor_data(master_ytcfg, initial_pr, player_ytcfg) visitor_data = visitor_data or self._extract_visitor_data(master_ytcfg, initial_pr, player_ytcfg)
data_sync_id = data_sync_id or self._extract_data_sync_id(master_ytcfg, initial_pr, player_ytcfg) data_sync_id = data_sync_id or self._extract_data_sync_id(master_ytcfg, initial_pr, player_ytcfg)
po_token = self.fetch_po_token(
client=client, visitor_data=visitor_data,
data_sync_id=data_sync_id if self.is_authenticated else None,
player_url=player_url if require_js_player else None,
)
require_po_token = self._get_default_ytcfg(client).get('REQUIRE_PO_TOKEN') fetch_po_token_args = {
if not po_token and require_po_token and 'missing_pot' in self._configuration_arg('formats'): 'client': client,
'visitor_data': visitor_data,
'video_id': video_id,
'data_sync_id': data_sync_id if self.is_authenticated else None,
'player_url': player_url if require_js_player else None,
'session_index': self._extract_session_index(master_ytcfg, player_ytcfg),
'ytcfg': player_ytcfg,
}
player_po_token = self.fetch_po_token(
context=_PoTokenContext.PLAYER, **fetch_po_token_args)
gvs_po_token = self.fetch_po_token(
context=_PoTokenContext.GVS, **fetch_po_token_args)
required_pot_contexts = self._get_default_ytcfg(client)['PO_TOKEN_REQUIRED_CONTEXTS']
if (
not player_po_token
and _PoTokenContext.PLAYER in required_pot_contexts
):
# TODO: may need to skip player response request. Unsure yet..
self.report_warning( self.report_warning(
f'No PO Token provided for {client} client, ' f'No Player PO Token provided for {client} client, '
f'which may be required for working {client} formats. This client will be deprioritized', only_once=True) f'which may be required for working {client} formats. This client will be deprioritized'
f'You can manually pass a Player PO Token for this client with --extractor-args "youtube:po_token={client}.player+XXX". '
f'For more information, refer to {PO_TOKEN_GUIDE_URL} .', only_once=True)
deprioritize_pr = True
if (
not gvs_po_token
and _PoTokenContext.GVS in required_pot_contexts
and 'missing_pot' in self._configuration_arg('formats')
):
# note: warning with help message is provided later during format processing
self.report_warning(
f'No GVS PO Token provided for {client} client, '
f'which may be required for working {client} formats. This client will be deprioritized',
only_once=True)
deprioritize_pr = True deprioritize_pr = True
pr = initial_pr if client == 'web' else None pr = initial_pr if client == 'web' else None
@ -4059,7 +4155,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
initial_pr=initial_pr, initial_pr=initial_pr,
visitor_data=visitor_data, visitor_data=visitor_data,
data_sync_id=data_sync_id, data_sync_id=data_sync_id,
po_token=po_token) po_token=player_po_token)
except ExtractorError as e: except ExtractorError as e:
self.report_warning(e) self.report_warning(e)
continue continue
@ -4070,36 +4166,24 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Save client name for introspection later # Save client name for introspection later
sd = traverse_obj(pr, ('streamingData', {dict})) or {} sd = traverse_obj(pr, ('streamingData', {dict})) or {}
sd[STREAMING_DATA_CLIENT_NAME] = client sd[STREAMING_DATA_CLIENT_NAME] = client
sd[STREAMING_DATA_PO_TOKEN] = po_token sd[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token
for f in traverse_obj(sd, (('formats', 'adaptiveFormats'), ..., {dict})): for f in traverse_obj(sd, (('formats', 'adaptiveFormats'), ..., {dict})):
f[STREAMING_DATA_CLIENT_NAME] = client f[STREAMING_DATA_CLIENT_NAME] = client
f[STREAMING_DATA_PO_TOKEN] = po_token f[STREAMING_DATA_INITIAL_PO_TOKEN] = gvs_po_token
if deprioritize_pr: if deprioritize_pr:
deprioritized_prs.append(pr) deprioritized_prs.append(pr)
else: else:
prs.append(pr) prs.append(pr)
# web_embedded can work around age-gate and age-verification for some embeddable videos
if self._is_agegated(pr) and variant != 'web_embedded':
append_client(f'web_embedded.{base_client}')
# Unauthenticated users will only get web_embedded client formats if age-gated
if self._is_agegated(pr) and not self.is_authenticated:
self.to_screen(
f'{video_id}: This video is age-restricted; some formats may be missing '
f'without authentication. {self._login_hint()}', only_once=True)
''' This code is pointless while web_creator is in _DEFAULT_AUTHED_CLIENTS
# EU countries require age-verification for accounts to access age-restricted videos # EU countries require age-verification for accounts to access age-restricted videos
# If account is not age-verified, _is_agegated() will be truthy for non-embedded clients # If account is not age-verified, _is_agegated() will be truthy for non-embedded clients
embedding_is_disabled = variant == 'web_embedded' and self._is_unplayable(pr) if self.is_authenticated and self._is_agegated(pr):
if self.is_authenticated and (self._is_agegated(pr) or embedding_is_disabled):
self.to_screen( self.to_screen(
f'{video_id}: This video is age-restricted and YouTube is requiring ' f'{video_id}: This video is age-restricted and YouTube is requiring '
'account age-verification; some formats may be missing', only_once=True) 'account age-verification; some formats may be missing', only_once=True)
# web_creator can work around the age-verification requirement # tv_embedded can work around the age-verification requirement for embeddable videos
# tv_embedded may(?) still work around age-verification if the video is embeddable # web_creator may work around age-verification for all videos but requires PO token
append_client('web_creator') append_client('tv_embedded', 'web_creator')
'''
prs.extend(deprioritized_prs) prs.extend(deprioritized_prs)
@ -4121,10 +4205,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
def _report_pot_format_skipped(self, video_id, client_name, proto): def _report_pot_format_skipped(self, video_id, client_name, proto):
msg = ( msg = (
f'{video_id}: {client_name} client {proto} formats require a PO Token which was not provided. ' f'{video_id}: {client_name} client {proto} formats require a GVS PO Token which was not provided. '
'They will be skipped as they may yield HTTP Error 403. ' 'They will be skipped as they may yield HTTP Error 403. '
f'You can manually pass a PO Token for this client with --extractor-args "youtube:po_token={client_name}+XXX". ' f'You can manually pass a GVS PO Token for this client with --extractor-args "youtube:po_token={client_name}.gvs+XXX". '
'For more information, refer to https://github.com/yt-dlp/yt-dlp/wiki/Extractors#po-token-guide . ' f'For more information, refer to {PO_TOKEN_GUIDE_URL} . '
'To enable these broken formats anyway, pass --extractor-args "youtube:formats=missing_pot"') 'To enable these broken formats anyway, pass --extractor-args "youtube:formats=missing_pot"')
# Only raise a warning for non-default clients, to not confuse users. # Only raise a warning for non-default clients, to not confuse users.
@ -4254,13 +4338,17 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
f'{video_id}: Some formats are possibly damaged. They will be deprioritized', only_once=True) f'{video_id}: Some formats are possibly damaged. They will be deprioritized', only_once=True)
client_name = fmt[STREAMING_DATA_CLIENT_NAME] client_name = fmt[STREAMING_DATA_CLIENT_NAME]
po_token = fmt.get(STREAMING_DATA_PO_TOKEN) po_token = fmt.get(STREAMING_DATA_INITIAL_PO_TOKEN)
if po_token: if po_token:
fmt_url = update_url_query(fmt_url, {'pot': po_token}) fmt_url = update_url_query(fmt_url, {'pot': po_token})
# Clients that require PO Token return videoplayback URLs that may return 403 # Clients that require PO Token return videoplayback URLs that may return 403
require_po_token = (not po_token and self._get_default_ytcfg(client_name).get('REQUIRE_PO_TOKEN')) require_po_token = (
not po_token
and _PoTokenContext.GVS in self._get_default_ytcfg(client_name)['PO_TOKEN_REQUIRED_CONTEXTS']
and itag not in ['18']) # these formats do not require PO Token
if require_po_token and 'missing_pot' not in self._configuration_arg('formats'): if require_po_token and 'missing_pot' not in self._configuration_arg('formats'):
self._report_pot_format_skipped(video_id, client_name, 'https') self._report_pot_format_skipped(video_id, client_name, 'https')
continue continue
@ -4349,7 +4437,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Clients that require PO Token return videoplayback URLs that may return 403 # Clients that require PO Token return videoplayback URLs that may return 403
# hls does not currently require PO Token # hls does not currently require PO Token
if (not po_token and self._get_default_ytcfg(client_name).get('REQUIRE_PO_TOKEN')) and proto != 'hls': if (
not po_token
and _PoTokenContext.GVS in self._get_default_ytcfg(client_name)['PO_TOKEN_REQUIRED_CONTEXTS']
and proto != 'hls'
):
if 'missing_pot' not in self._configuration_arg('formats'): if 'missing_pot' not in self._configuration_arg('formats'):
self._report_pot_format_skipped(video_id, client_name, proto) self._report_pot_format_skipped(video_id, client_name, proto)
return False return False
@ -4390,7 +4482,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
subtitles = {} subtitles = {}
for sd in streaming_data: for sd in streaming_data:
client_name = sd[STREAMING_DATA_CLIENT_NAME] client_name = sd[STREAMING_DATA_CLIENT_NAME]
po_token = sd.get(STREAMING_DATA_PO_TOKEN) po_token = sd.get(STREAMING_DATA_INITIAL_PO_TOKEN)
hls_manifest_url = 'hls' not in skip_manifests and sd.get('hlsManifestUrl') hls_manifest_url = 'hls' not in skip_manifests and sd.get('hlsManifestUrl')
if hls_manifest_url: if hls_manifest_url:
if po_token: if po_token:
@ -4717,6 +4809,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'tags': keywords, 'tags': keywords,
'playable_in_embed': get_first(playability_statuses, 'playableInEmbed'), 'playable_in_embed': get_first(playability_statuses, 'playableInEmbed'),
'live_status': live_status, 'live_status': live_status,
'media_type': 'livestream' if get_first(video_details, 'isLiveContent') else None,
'release_timestamp': live_start_time, 'release_timestamp': live_start_time,
'_format_sort_fields': ( # source_preference is lower for potentially damaged formats '_format_sort_fields': ( # source_preference is lower for potentially damaged formats
'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec', 'channels', 'acodec', 'lang', 'proto'), 'quality', 'res', 'fps', 'hdr:12', 'source', 'vcodec', 'channels', 'acodec', 'lang', 'proto'),
@ -5281,10 +5374,12 @@ class YoutubeTabBaseInfoExtractor(YoutubeBaseInfoExtractor):
yield self.url_result( yield self.url_result(
f'https://www.youtube.com/shorts/{video_id}', f'https://www.youtube.com/shorts/{video_id}',
ie=YoutubeIE, video_id=video_id, ie=YoutubeIE, video_id=video_id,
**traverse_obj(renderer, ('overlayMetadata', { **traverse_obj(renderer, {
'title': ('primaryText', 'content', {str}), 'title': ((
'view_count': ('secondaryText', 'content', {parse_count}), ('overlayMetadata', 'primaryText', 'content', {str}),
})), ('accessibilityText', {lambda x: re.fullmatch(r'(.+), (?:[\d,.]+(?:[KM]| million)?|No) views? - play Short', x)}, 1)), any),
'view_count': ('overlayMetadata', 'secondaryText', 'content', {parse_count}),
}),
thumbnails=self._extract_thumbnails(renderer, 'thumbnail', final_key='sources')) thumbnails=self._extract_thumbnails(renderer, 'thumbnail', final_key='sources'))
return return

View File

@ -5,7 +5,6 @@ from ..utils import (
NO_DEFAULT, NO_DEFAULT,
ExtractorError, ExtractorError,
determine_ext, determine_ext,
extract_attributes,
float_or_none, float_or_none,
int_or_none, int_or_none,
join_nonempty, join_nonempty,
@ -25,6 +24,11 @@ class ZDFBaseIE(InfoExtractor):
_GEO_COUNTRIES = ['DE'] _GEO_COUNTRIES = ['DE']
_QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh', 'hd', 'fhd', 'uhd') _QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh', 'hd', 'fhd', 'uhd')
def _download_v2_doc(self, document_id):
return self._download_json(
f'https://zdf-prod-futura.zdf.de/mediathekV2/document/{document_id}',
document_id)
def _call_api(self, url, video_id, item, api_token=None, referrer=None): def _call_api(self, url, video_id, item, api_token=None, referrer=None):
headers = {} headers = {}
if api_token: if api_token:
@ -320,9 +324,7 @@ class ZDFIE(ZDFBaseIE):
return self._extract_entry(player['content'], player, content, video_id) return self._extract_entry(player['content'], player, content, video_id)
def _extract_mobile(self, video_id): def _extract_mobile(self, video_id):
video = self._download_json( video = self._download_v2_doc(video_id)
f'https://zdf-cdn.live.cellular.de/mediathekV2/document/{video_id}',
video_id)
formats = [] formats = []
formitaeten = try_get(video, lambda x: x['document']['formitaeten'], list) formitaeten = try_get(video, lambda x: x['document']['formitaeten'], list)
@ -374,7 +376,7 @@ class ZDFIE(ZDFBaseIE):
class ZDFChannelIE(ZDFBaseIE): class ZDFChannelIE(ZDFBaseIE):
_VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)' _VALID_URL = r'https?://www\.zdf\.de/(?:[^/?#]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{ _TESTS = [{
'url': 'https://www.zdf.de/sport/das-aktuelle-sportstudio', 'url': 'https://www.zdf.de/sport/das-aktuelle-sportstudio',
'info_dict': { 'info_dict': {
@ -387,18 +389,19 @@ class ZDFChannelIE(ZDFBaseIE):
'info_dict': { 'info_dict': {
'id': 'planet-e', 'id': 'planet-e',
'title': 'planet e.', 'title': 'planet e.',
'description': 'md5:87e3b9c66a63cf1407ee443d2c4eb88e',
}, },
'playlist_mincount': 50, 'playlist_mincount': 50,
}, { }, {
'url': 'https://www.zdf.de/gesellschaft/aktenzeichen-xy-ungeloest', 'url': 'https://www.zdf.de/gesellschaft/aktenzeichen-xy-ungeloest',
'info_dict': { 'info_dict': {
'id': 'aktenzeichen-xy-ungeloest', 'id': 'aktenzeichen-xy-ungeloest',
'title': 'Aktenzeichen XY... ungelöst', 'title': 'Aktenzeichen XY... Ungelöst',
'entries': "lambda x: not any('xy580-fall1-kindermoerder-gesucht-100' in e['url'] for e in x)", 'description': 'md5:623ede5819c400c6d04943fa8100e6e7',
}, },
'playlist_mincount': 2, 'playlist_mincount': 2,
}, { }, {
'url': 'https://www.zdf.de/filme/taunuskrimi/', 'url': 'https://www.zdf.de/serien/taunuskrimi/',
'only_matching': True, 'only_matching': True,
}] }]
@ -406,36 +409,41 @@ class ZDFChannelIE(ZDFBaseIE):
def suitable(cls, url): def suitable(cls, url):
return False if ZDFIE.suitable(url) else super().suitable(url) return False if ZDFIE.suitable(url) else super().suitable(url)
def _og_search_title(self, webpage, fatal=False): def _extract_entry(self, entry):
title = super()._og_search_title(webpage, fatal=fatal) return self.url_result(
return re.split(r'\s+[-|]\s+ZDF(?:mediathek)?$', title or '')[0] or None entry['sharingUrl'], ZDFIE, **traverse_obj(entry, {
'id': ('basename', {str}),
'title': ('titel', {str}),
'description': ('beschreibung', {str}),
'duration': ('length', {float_or_none}),
# TODO: seasonNumber and episodeNumber can be extracted but need to also be in ZDFIE
}))
def _entries(self, data, document_id):
for entry in traverse_obj(data, (
'cluster', lambda _, v: v['type'] == 'teaser',
# If 'brandId' differs, it is a 'You might also like' video. Filter these out
'teaser', lambda _, v: v['type'] == 'video' and v['brandId'] == document_id and v['sharingUrl'],
)):
yield self._extract_entry(entry)
def _real_extract(self, url): def _real_extract(self, url):
channel_id = self._match_id(url) channel_id = self._match_id(url)
webpage = self._download_webpage(url, channel_id) webpage = self._download_webpage(url, channel_id)
document_id = self._search_regex(
r'docId\s*:\s*(["\'])(?P<doc_id>(?:(?!\1).)+)\1', webpage, 'document id', group='doc_id')
data = self._download_v2_doc(document_id)
matches = re.finditer( main_video = traverse_obj(data, (
rf'''<div\b[^>]*?\sdata-plusbar-id\s*=\s*(["'])(?P<p_id>[\w-]+)\1[^>]*?\sdata-plusbar-url=\1(?P<url>{ZDFIE._VALID_URL})\1''', 'cluster', lambda _, v: v['type'] == 'teaserContent',
webpage) 'teaser', lambda _, v: v['type'] == 'video' and v['basename'] and v['sharingUrl'], any)) or {}
if self._downloader.params.get('noplaylist', False): if not self._yes_playlist(channel_id, main_video.get('basename')):
entry = next( return self._extract_entry(main_video)
(self.url_result(m.group('url'), ie=ZDFIE.ie_key()) for m in matches),
None)
self.to_screen('Downloading just the main video because of --no-playlist')
if entry:
return entry
else:
self.to_screen(f'Downloading playlist {channel_id} - add --no-playlist to download just the main video')
def check_video(m): return self.playlist_result(
v_ref = self._search_regex( self._entries(data, document_id), channel_id,
r'''(<a\b[^>]*?\shref\s*=[^>]+?\sdata-target-id\s*=\s*(["']){}\2[^>]*>)'''.format(m.group('p_id')), re.split(r'\s+[-|]\s+ZDF(?:mediathek)?$', self._og_search_title(webpage) or '')[0] or None,
webpage, 'check id', default='') join_nonempty(
v_ref = extract_attributes(v_ref) 'headline', 'text', delim='\n\n',
return v_ref.get('data-target-video-type') != 'novideo' from_dict=traverse_obj(data, ('shortText', {dict}), default={})) or None)
return self.playlist_from_matches(
(m.group('url') for m in matches if check_video(m)),
channel_id, self._og_search_title(webpage, fatal=False))

View File

@ -685,6 +685,7 @@ def _sanitize_path_parts(parts):
elif part == '..': elif part == '..':
if sanitized_parts and sanitized_parts[-1] != '..': if sanitized_parts and sanitized_parts[-1] != '..':
sanitized_parts.pop() sanitized_parts.pop()
else:
sanitized_parts.append('..') sanitized_parts.append('..')
continue continue
# Replace invalid segments with `#` # Replace invalid segments with `#`
@ -702,7 +703,8 @@ def sanitize_path(s, force=False):
if not force: if not force:
return s return s
root = '/' if s.startswith('/') else '' root = '/' if s.startswith('/') else ''
return root + '/'.join(_sanitize_path_parts(s.split('/'))) path = '/'.join(_sanitize_path_parts(s.split('/')))
return root + path if root or path else '.'
normed = s.replace('/', '\\') normed = s.replace('/', '\\')
@ -721,7 +723,8 @@ def sanitize_path(s, force=False):
root = '\\' if normed[:1] == '\\' else '' root = '\\' if normed[:1] == '\\' else ''
parts = normed.split('\\') parts = normed.split('\\')
return root + '\\'.join(_sanitize_path_parts(parts)) path = '\\'.join(_sanitize_path_parts(parts))
return root + path if root or path else '.'
def sanitize_url(url, *, scheme='http'): def sanitize_url(url, *, scheme='http'):
@ -5330,7 +5333,7 @@ class FormatSorter:
settings = { settings = {
'vcodec': {'type': 'ordered', 'regex': True, 'vcodec': {'type': 'ordered', 'regex': True,
'order': ['av0?1', 'vp0?9.0?2', 'vp0?9', '[hx]265|he?vc?', '[hx]264|avc', 'vp0?8', 'mp4v|h263', 'theora', '', None, 'none']}, 'order': ['av0?1', r'vp0?9\.0?2', 'vp0?9', '[hx]265|he?vc?', '[hx]264|avc', 'vp0?8', 'mp4v|h263', 'theora', '', None, 'none']},
'acodec': {'type': 'ordered', 'regex': True, 'acodec': {'type': 'ordered', 'regex': True,
'order': ['[af]lac', 'wav|aiff', 'opus', 'vorbis|ogg', 'aac', 'mp?4a?', 'mp3', 'ac-?4', 'e-?a?c-?3', 'ac-?3', 'dts', '', None, 'none']}, 'order': ['[af]lac', 'wav|aiff', 'opus', 'vorbis|ogg', 'aac', 'mp?4a?', 'mp3', 'ac-?4', 'e-?a?c-?3', 'ac-?3', 'dts', '', None, 'none']},
'hdr': {'type': 'ordered', 'regex': True, 'field': 'dynamic_range', 'hdr': {'type': 'ordered', 'regex': True, 'field': 'dynamic_range',

View File

@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py # Autogenerated by devscripts/update-version.py
__version__ = '2025.01.12' __version__ = '2025.01.26'
RELEASE_GIT_HEAD = 'dade5e35c89adaad04408bfef766820dbca06ebe' RELEASE_GIT_HEAD = '3b4531934465580be22937fecbb6e1a3a9e2334f'
VARIANT = None VARIANT = None
@ -12,4 +12,4 @@ CHANNEL = 'stable'
ORIGIN = 'yt-dlp/yt-dlp' ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2025.01.12' _pkg_version = '2025.01.26'