Merge remote-tracking branch 'upstream' into biliSearchPageIE

This commit is contained in:
grqx_wsl 2024-08-17 11:20:08 +12:00
commit eac8a89b47
72 changed files with 3136 additions and 1342 deletions

2
.gitignore vendored
View File

@ -51,7 +51,6 @@ cookies
*.srt
*.ssa
*.swf
*.swp
*.tt
*.ttml
*.url
@ -119,6 +118,7 @@ yt-dlp.zip
.vscode
*.sublime-*
*.code-workspace
*.swp
# Lazy extractors
*/extractor/lazy_extractors.py

View File

@ -644,3 +644,16 @@ peisenwang
TheZ3ro
tippfehlr
varunchopra
DrakoCpp
PatrykMis
DinhHuy2010
exterrestris
harbhim
LeSuisse
DunnesH
iancmy
mokrueger
luvyana
szantnerb
hugepower
scribblemaniac

View File

@ -4,10 +4,157 @@ # Changelog
# To create a release, dispatch the https://github.com/yt-dlp/yt-dlp/actions/workflows/release.yml workflow on master
-->
### 2024.08.06
#### Core changes
- **jsinterp**: [Improve `slice` implementation](https://github.com/yt-dlp/yt-dlp/commit/bb8bf1db993f59752d20b73b861bd55e40cf0e31) ([#10664](https://github.com/yt-dlp/yt-dlp/issues/10664)) by [seproDev](https://github.com/seproDev)
#### Extractor changes
- **discoveryplusitaly**: [Support sport and olympics URLs](https://github.com/yt-dlp/yt-dlp/commit/e7d73bc4531ee3f91a46b15e218dcc1fbeb6226c) ([#10655](https://github.com/yt-dlp/yt-dlp/issues/10655)) by [bashonly](https://github.com/bashonly)
- **gem.cbc.ca**: live: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/fc5eecfa31c9571b6031cc3968aaa0394be55d7a) ([#10565](https://github.com/yt-dlp/yt-dlp/issues/10565)) by [bashonly](https://github.com/bashonly), [scribblemaniac](https://github.com/scribblemaniac)
- **niconico**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/4d9231208332d4c32364b8cd814bff8b20232cae) ([#10677](https://github.com/yt-dlp/yt-dlp/issues/10677)) by [bashonly](https://github.com/bashonly)
- **olympics**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/919540a9644e55deb78cdd6751757ec8fdaf76f4) ([#10625](https://github.com/yt-dlp/yt-dlp/issues/10625)) by [bashonly](https://github.com/bashonly)
- **youku**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/0088c6de23d832b117061a33e984dc452d992e9c) ([#10626](https://github.com/yt-dlp/yt-dlp/issues/10626)) by [hugepower](https://github.com/hugepower)
- **youtube**
- [Change default player clients to `ios,web_creator`](https://github.com/yt-dlp/yt-dlp/commit/406f4c2e47502fffc1b0c210b4ee6487c89a44cb) ([#10674](https://github.com/yt-dlp/yt-dlp/issues/10674)) by [bashonly](https://github.com/bashonly)
- [Fix `n` function name extraction for player `b12cc44b`](https://github.com/yt-dlp/yt-dlp/commit/c86891eb9434b4d7eec426d38c0c625b5e13cb2f) ([#10668](https://github.com/yt-dlp/yt-dlp/issues/10668)) by [seproDev](https://github.com/seproDev)
### 2024.08.01
#### Core changes
- **utils**: `unified_timestamp`: [Recognize Sunday](https://github.com/yt-dlp/yt-dlp/commit/6daf2c27c0464fba98337be30de0b66d520d0db1) ([#10589](https://github.com/yt-dlp/yt-dlp/issues/10589)) by [bashonly](https://github.com/bashonly)
#### Extractor changes
- **abematv**: [Fix availability extraction](https://github.com/yt-dlp/yt-dlp/commit/ef36d517f9b05785d61abca7691d9ab7d63cc75c) ([#10569](https://github.com/yt-dlp/yt-dlp/issues/10569)) by [middlingphys](https://github.com/middlingphys)
- **cbc.ca**: player: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/94a1c5e642e468cebeb51f74c6c220434cb47d96) ([#10302](https://github.com/yt-dlp/yt-dlp/issues/10302)) by [bashonly](https://github.com/bashonly), [trainman261](https://github.com/trainman261)
- **discoveryplus**: [Support olympics URLs](https://github.com/yt-dlp/yt-dlp/commit/0b7728618417e1aa382722a4d29b916b594d4459) ([#10566](https://github.com/yt-dlp/yt-dlp/issues/10566)) by [bashonly](https://github.com/bashonly)
- **kick**: clips: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/bb3936ae2b3ce96d0b53f9e17cad1082058f032b) ([#10572](https://github.com/yt-dlp/yt-dlp/issues/10572)) by [luvyana](https://github.com/luvyana)
- **learningonscreen**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/fe15d3178e242803ae7a934b90137f13598eba2e) ([#10590](https://github.com/yt-dlp/yt-dlp/issues/10590)) by [bashonly](https://github.com/bashonly), [Grub4K](https://github.com/Grub4K)
- **mediaklikk**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/7e3e4779ad13e4511c9ba3869879e53f0267bd7a) ([#10605](https://github.com/yt-dlp/yt-dlp/issues/10605)) by [szantnerb](https://github.com/szantnerb)
- **mlbtv**: [Fix makeup game extraction](https://github.com/yt-dlp/yt-dlp/commit/4b69e1b53ea21e631cd5dd68ff531e2f1671ec17) ([#10607](https://github.com/yt-dlp/yt-dlp/issues/10607)) by [bashonly](https://github.com/bashonly)
- **olympics**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/2f1ddfe12a2c174bc777264c5c8ffe7ca0922d94) ([#10604](https://github.com/yt-dlp/yt-dlp/issues/10604)) by [bashonly](https://github.com/bashonly)
- **tva**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/28d485714fef88937c82635438afba5db81f9089) ([#10567](https://github.com/yt-dlp/yt-dlp/issues/10567)) by [bashonly](https://github.com/bashonly)
- **tver**: [Support olympic URLs](https://github.com/yt-dlp/yt-dlp/commit/5260696b1cba77161828941fdb38f09f14ac6c60) ([#10600](https://github.com/yt-dlp/yt-dlp/issues/10600)) by [vvto33](https://github.com/vvto33)
- **vimeo**: review: [Fix password-protected video extraction](https://github.com/yt-dlp/yt-dlp/commit/2b6df93a243bdfb9d6bb5c1e18020625cd02d465) ([#10598](https://github.com/yt-dlp/yt-dlp/issues/10598)) by [bashonly](https://github.com/bashonly)
- **youtube**
- [Change default player clients to `ios,tv`](https://github.com/yt-dlp/yt-dlp/commit/efb42763dec23ccf6a2e3bac3afbfefce8efd012) ([#10457](https://github.com/yt-dlp/yt-dlp/issues/10457)) by [seproDev](https://github.com/seproDev)
- [Fix `n` function name extraction for player `20dfca59`](https://github.com/yt-dlp/yt-dlp/commit/011b4a04db2a636c3ef0a0ad4e2d3ae482c9fd76) ([#10611](https://github.com/yt-dlp/yt-dlp/issues/10611)) by [bashonly](https://github.com/bashonly)
- [Fix age-verification workaround](https://github.com/yt-dlp/yt-dlp/commit/d19fcb934269465fd707e68a87f735ec6983e93d) ([#10610](https://github.com/yt-dlp/yt-dlp/issues/10610)) by [bashonly](https://github.com/bashonly), [Grub4K](https://github.com/Grub4K)
- [Player client maintenance](https://github.com/yt-dlp/yt-dlp/commit/0e539617a41913c7da1edd74fb6543c10ad727b3) ([#10573](https://github.com/yt-dlp/yt-dlp/issues/10573)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- **cleanup**: Miscellaneous: [ffd7781](https://github.com/yt-dlp/yt-dlp/commit/ffd7781d6588926f820b44a34b9e6e3068fb9f97) by [bashonly](https://github.com/bashonly)
### 2024.07.25
#### Extractor changes
- **abematv**: [Adapt key retrieval to request handler framework](https://github.com/yt-dlp/yt-dlp/commit/a3bab4752a2b3d56e5a59b4e0411bb8f695c010b) ([#10491](https://github.com/yt-dlp/yt-dlp/issues/10491)) by [bashonly](https://github.com/bashonly)
- **facebook**: [Fix extraction](https://github.com/yt-dlp/yt-dlp/commit/1a34a802f44a1dab8f642c79c3cc810e21541d3b) ([#10531](https://github.com/yt-dlp/yt-dlp/issues/10531)) by [bashonly](https://github.com/bashonly)
- **mlbtv**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/f0993391e6052ec8f7aacc286609564f226943b9) ([#10515](https://github.com/yt-dlp/yt-dlp/issues/10515)) by [bashonly](https://github.com/bashonly)
- **tiktok**: [Fix and deprioritize JSON subtitles](https://github.com/yt-dlp/yt-dlp/commit/2f97779f335ac069ecccd9c7bf81abf4a83cfe7a) ([#10516](https://github.com/yt-dlp/yt-dlp/issues/10516)) by [bashonly](https://github.com/bashonly)
- **vimeo**: [Fix chapters extraction](https://github.com/yt-dlp/yt-dlp/commit/a0a1bc3d8d8e3bb9a48a06e835815a0460e90e77) ([#10544](https://github.com/yt-dlp/yt-dlp/issues/10544)) by [bashonly](https://github.com/bashonly)
- **youtube**: [Fix `n` function name extraction for player `3400486c`](https://github.com/yt-dlp/yt-dlp/commit/713b4cd18f00556771af8cfdd9cea6cc1a09e948) ([#10542](https://github.com/yt-dlp/yt-dlp/issues/10542)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- **build**: [Pin `setuptools` version](https://github.com/yt-dlp/yt-dlp/commit/e046db8a116b1c320d4785daadd48ea0b22a3987) ([#10493](https://github.com/yt-dlp/yt-dlp/issues/10493)) by [bashonly](https://github.com/bashonly)
### 2024.07.16
#### Core changes
- [Fix `noprogress` if `test=True` with `--quiet` and `--verbose`](https://github.com/yt-dlp/yt-dlp/commit/66ce3d76d87af3f81cc9dfec4be4704016cb1cdb) ([#10454](https://github.com/yt-dlp/yt-dlp/issues/10454)) by [Grub4K](https://github.com/Grub4K)
- [Support `auto-tty` and `no_color-tty` for `--color`](https://github.com/yt-dlp/yt-dlp/commit/d9cbced493cae2008508d94a2db5dd98be7c01fc) ([#10453](https://github.com/yt-dlp/yt-dlp/issues/10453)) by [Grub4K](https://github.com/Grub4K)
- **update**: [Fix network error handling](https://github.com/yt-dlp/yt-dlp/commit/ed1b9ed93dd90d2cc960c0d8eaa9d919db224203) ([#10486](https://github.com/yt-dlp/yt-dlp/issues/10486)) by [bashonly](https://github.com/bashonly)
- **utils**: `parse_codecs`: [Fix parsing of mixed case codec strings](https://github.com/yt-dlp/yt-dlp/commit/cc0070f6496e501d77352bad475fb02d6a86846a) by [bashonly](https://github.com/bashonly)
#### Extractor changes
- **adn**: [Adjust for .com domain change](https://github.com/yt-dlp/yt-dlp/commit/959b7a379b8e5da059d110a63339c964b6265736) ([#10399](https://github.com/yt-dlp/yt-dlp/issues/10399)) by [infanf](https://github.com/infanf)
- **afreecatv**: [Fix login and use `legacy_ssl`](https://github.com/yt-dlp/yt-dlp/commit/4cd41469243624d90b7a2009b95cbe0609343efe) ([#10440](https://github.com/yt-dlp/yt-dlp/issues/10440)) by [bashonly](https://github.com/bashonly)
- **box**: [Support enterprise URLs](https://github.com/yt-dlp/yt-dlp/commit/705f5b84dec75cc7af97f42fd1530e8062735970) ([#10419](https://github.com/yt-dlp/yt-dlp/issues/10419)) by [seproDev](https://github.com/seproDev)
- **digitalconcerthall**: [Extract HEVC and FLAC formats](https://github.com/yt-dlp/yt-dlp/commit/e62fa6b0e0186f8c5666c2c5ab64cf191abdafc1) ([#10470](https://github.com/yt-dlp/yt-dlp/issues/10470)) by [bashonly](https://github.com/bashonly)
- **dplay**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/39e6c4cb44b9292e89ac0afec3cd0afc2ae8775f) ([#10471](https://github.com/yt-dlp/yt-dlp/issues/10471)) by [bashonly](https://github.com/bashonly)
- **epidemicsound**: [Support sound effects URLs](https://github.com/yt-dlp/yt-dlp/commit/8531d2b03bac9cc746f2ee8098aaf8f115505f5b) ([#10436](https://github.com/yt-dlp/yt-dlp/issues/10436)) by [iancmy](https://github.com/iancmy)
- **generic**: [Fix direct video link extensions](https://github.com/yt-dlp/yt-dlp/commit/b9afb99e7c34d0eb15ddc6689cd7d20eebfda68e) ([#10468](https://github.com/yt-dlp/yt-dlp/issues/10468)) by [bashonly](https://github.com/bashonly)
- **picarto**: [Fix extractors](https://github.com/yt-dlp/yt-dlp/commit/bacd18b7df08b4995644fd12cee1f8c8e8636bc7) ([#10414](https://github.com/yt-dlp/yt-dlp/issues/10414)) by [Frankgoji](https://github.com/Frankgoji)
- **soundcloud**: permalink, user: [Extract tracks only](https://github.com/yt-dlp/yt-dlp/commit/22870b81bad97dfa6307a7add44753b2dffc76a9) ([#10463](https://github.com/yt-dlp/yt-dlp/issues/10463)) by [DunnesH](https://github.com/DunnesH)
- **tiktok**: live: [Fix room ID extraction](https://github.com/yt-dlp/yt-dlp/commit/d2189d3d36987ebeac426fd70a60a5fe86325a2b) ([#10408](https://github.com/yt-dlp/yt-dlp/issues/10408)) by [mokrueger](https://github.com/mokrueger)
- **tv5monde**: [Support browser impersonation](https://github.com/yt-dlp/yt-dlp/commit/9b95a6765a5f6325af99c4aca961587f0c426e8c) ([#10417](https://github.com/yt-dlp/yt-dlp/issues/10417)) by [bashonly](https://github.com/bashonly) (With fixes in [cc1a309](https://github.com/yt-dlp/yt-dlp/commit/cc1a3098c00995c6aebc2a16bd1050a66bad64db))
- **youtube**
- [Avoid poToken experiment player responses](https://github.com/yt-dlp/yt-dlp/commit/8b8b442cb005a8d85315f301615f83fb736b967a) ([#10456](https://github.com/yt-dlp/yt-dlp/issues/10456)) by [seproDev](https://github.com/seproDev) (With fixes in [16da8ef](https://github.com/yt-dlp/yt-dlp/commit/16da8ef9937ff76632dfef02e5062c5ba99c8ea2))
- [Invalidate nsig cache from < 2024.07.09](https://github.com/yt-dlp/yt-dlp/commit/04e17ba20a139f1b3e30ec4bafa3fba26888f0b3) ([#10401](https://github.com/yt-dlp/yt-dlp/issues/10401)) by [bashonly](https://github.com/bashonly)
- [Reduce android client priority](https://github.com/yt-dlp/yt-dlp/commit/b85eef0a615a01304f88a3847309c667e09a20df) ([#10467](https://github.com/yt-dlp/yt-dlp/issues/10467)) by [seproDev](https://github.com/seproDev)
#### Networking changes
- [Add `legacy_ssl` request extension](https://github.com/yt-dlp/yt-dlp/commit/150ecc45d9cacc919550c13b04fd998ac5103a6b) ([#10448](https://github.com/yt-dlp/yt-dlp/issues/10448)) by [coletdjnz](https://github.com/coletdjnz)
- **Request Handler**: curl_cffi: [Support `curl_cffi` 0.7.X](https://github.com/yt-dlp/yt-dlp/commit/42bfca00a6b460fc053514cdd7ac6f5b5daddf0c) by [coletdjnz](https://github.com/coletdjnz)
#### Misc. changes
- **build**
- [Include `curl_cffi` in `yt-dlp_linux`](https://github.com/yt-dlp/yt-dlp/commit/4521f30d1479315cd5c3bf4abdad19391952df98) by [bashonly](https://github.com/bashonly)
- [Pin `curl-cffi` to 0.5.10 for Windows](https://github.com/yt-dlp/yt-dlp/commit/ac30941ae682f71eab010877c9a977736a61d3cf) by [bashonly](https://github.com/bashonly)
- **cleanup**: Miscellaneous: [89a161e](https://github.com/yt-dlp/yt-dlp/commit/89a161e8c62569a662deda1c948664152efcb6b4) by [bashonly](https://github.com/bashonly)
### 2024.07.09
#### Core changes
- [Do not alter default format selection when simulated](https://github.com/yt-dlp/yt-dlp/commit/0b570f2a90ce2363ba06089217514d644e7be2e0) ([#9862](https://github.com/yt-dlp/yt-dlp/issues/9862)) by [seproDev](https://github.com/seproDev)
#### Extractor changes
- **youtube**: [Remove broken `n` function extraction fallback](https://github.com/yt-dlp/yt-dlp/commit/7ead7332af69422cee931aec3faa277288e9e212) ([#10396](https://github.com/yt-dlp/yt-dlp/issues/10396)) by [pukkandan](https://github.com/pukkandan), [seproDev](https://github.com/seproDev)
### 2024.07.08
#### Core changes
- **jsinterp**: [Implement `Function.prototype` resolving for `call` and `apply`](https://github.com/yt-dlp/yt-dlp/commit/6c056ea7aeb03660281653a9668547f2548f194f) ([#10392](https://github.com/yt-dlp/yt-dlp/issues/10392)) by [Grub4K](https://github.com/Grub4K)
#### Extractor changes
- **soundcloud**: [Fix rate-limit handling](https://github.com/yt-dlp/yt-dlp/commit/4b50b292cc98534fb8c7cdf0ae5cb85862f7ebfc) ([#10389](https://github.com/yt-dlp/yt-dlp/issues/10389)) by [bashonly](https://github.com/bashonly)
- **youtube**: [Fix JS `n` function name extraction](https://github.com/yt-dlp/yt-dlp/commit/297b0a379282a15c80d82d51f3757c961db2dae1) ([#10390](https://github.com/yt-dlp/yt-dlp/issues/10390)) by [bashonly](https://github.com/bashonly), [seproDev](https://github.com/seproDev)
### 2024.07.07
#### Important changes
- Security: [[ie/douyutv] Do not use dangerous javascript source/URL](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-3v33-3wmw-3785)
- A dependency on potentially malicious third-party JavaScript code has been removed from the Douyu extractors
#### Core changes
- [Address gaps in allowed extensions](https://github.com/yt-dlp/yt-dlp/commit/2469119490d7e0397ebbf5c5ae327316f955eef2) ([#10362](https://github.com/yt-dlp/yt-dlp/issues/10362)) by [bashonly](https://github.com/bashonly)
- [Fix `--ignore-no-formats-error`](https://github.com/yt-dlp/yt-dlp/commit/cc767e9490056efaaa11c186b0d032e4b4969180) ([#10345](https://github.com/yt-dlp/yt-dlp/issues/10345)) by [Grub4K](https://github.com/Grub4K)
#### Extractor changes
- **abematv**: [Extract availability](https://github.com/yt-dlp/yt-dlp/commit/2a1a1b8e67e864289ac7ba5d05ec63dbb19a639f) ([#10348](https://github.com/yt-dlp/yt-dlp/issues/10348)) by [middlingphys](https://github.com/middlingphys)
- **chzzk**: [Extract with API v3](https://github.com/yt-dlp/yt-dlp/commit/4862a29854d4044120e3f97b52199711ad04bee1) ([#10363](https://github.com/yt-dlp/yt-dlp/issues/10363)) by [hui1601](https://github.com/hui1601)
- **douyutv**: [Do not use dangerous javascript source/URL](https://github.com/yt-dlp/yt-dlp/commit/6075a029dba70a89675ae1250e7cdfd91f0eba41) ([#10347](https://github.com/yt-dlp/yt-dlp/issues/10347)) by [LeSuisse](https://github.com/LeSuisse)
- **jiosaavn**: playlist: [Support featured playlists](https://github.com/yt-dlp/yt-dlp/commit/f0f867f008a1728f5f6ac1224b9e014b5d27f817) ([#10382](https://github.com/yt-dlp/yt-dlp/issues/10382)) by [harbhim](https://github.com/harbhim)
- **vidyard**: [Add extractor](https://github.com/yt-dlp/yt-dlp/commit/00766ece0c5c7a80781a4ff677198c5fb69d9dc0) ([#10155](https://github.com/yt-dlp/yt-dlp/issues/10155)) by [exterrestris](https://github.com/exterrestris)
- **vimeo**: [Fix password-protected video extraction](https://github.com/yt-dlp/yt-dlp/commit/c1c9bb4adb42d0d93a2fb5d93a7de0a87b6ba884) ([#10341](https://github.com/yt-dlp/yt-dlp/issues/10341)) by [bashonly](https://github.com/bashonly)
- **vtv**: [Add extractors](https://github.com/yt-dlp/yt-dlp/commit/987a1f94c24275f2b0cd82e719956687415dd732) ([#10173](https://github.com/yt-dlp/yt-dlp/issues/10173)) by [DinhHuy2010](https://github.com/DinhHuy2010)
- **yle_areena**
- [Fix metadata extraction](https://github.com/yt-dlp/yt-dlp/commit/4cdc976bd861b5835601ae402bef543eacd88f3d) ([#10380](https://github.com/yt-dlp/yt-dlp/issues/10380)) by [seproDev](https://github.com/seproDev)
- [Fix subtitle extraction](https://github.com/yt-dlp/yt-dlp/commit/0d174e8bed32081eb38ef7f5d1a1282ae154f517) ([#10379](https://github.com/yt-dlp/yt-dlp/issues/10379)) by [Grub4K](https://github.com/Grub4K)
#### Misc. changes
- **cleanup**: Miscellaneous: [b337d29](https://github.com/yt-dlp/yt-dlp/commit/b337d2989ce0614651d363383f6f743d977248ef) by [bashonly](https://github.com/bashonly)
### 2024.07.02
#### Core changes
- [Fix `--compat-opt allow-unsafe-ext`](https://github.com/yt-dlp/yt-dlp/commit/773bbb181506856ffda95496ab60c1c9603f1f71) ([#10336](https://github.com/yt-dlp/yt-dlp/issues/10336)) by [bashonly](https://github.com/bashonly), [rdamas](https://github.com/rdamas)
#### Extractor changes
- **banbye**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/7509791385ba88cb7ec0ab17e826681f4af4b66e) ([#10332](https://github.com/yt-dlp/yt-dlp/issues/10332)) by [PatrykMis](https://github.com/PatrykMis), [seproDev](https://github.com/seproDev)
- **murrtube**: [Fix extractor](https://github.com/yt-dlp/yt-dlp/commit/6403530e2dfe259a87afe444708c4f3024cc45b8) ([#9249](https://github.com/yt-dlp/yt-dlp/issues/9249)) by [DrakoCpp](https://github.com/DrakoCpp)
- **zaiko**: [Support JWT video URLs](https://github.com/yt-dlp/yt-dlp/commit/7799e518956387bb3c1064c9beae26eab8d5044a) ([#10130](https://github.com/yt-dlp/yt-dlp/issues/10130)) by [pzhlkj6612](https://github.com/pzhlkj6612)
#### Postprocessor changes
- **embedthumbnail**: [Fix embedding with mutagen](https://github.com/yt-dlp/yt-dlp/commit/d502f4c6d95b74896f40070d07229997f0850f31) ([#10337](https://github.com/yt-dlp/yt-dlp/issues/10337)) by [bashonly](https://github.com/bashonly)
#### Misc. changes
- **cleanup**: Miscellaneous: [93d33cb](https://github.com/yt-dlp/yt-dlp/commit/93d33cb29af9e2e84369ac43589d50ce8e0160ef) by [bashonly](https://github.com/bashonly)
### 2024.07.01
#### Important changes
- Security: [[CVE-2024-10123](https://nvd.nist.gov/vuln/detail/CVE-2024-10123)] [Properly sanitize file-extension to prevent file system modification and RCE](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4j)
- Security: [[CVE-2024-38519](https://nvd.nist.gov/vuln/detail/CVE-2024-38519)] [Properly sanitize file-extension to prevent file system modification and RCE](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4j)
- Unsafe extensions are now blocked from being downloaded
#### Core changes

View File

@ -21,7 +21,7 @@ clean-test:
rm -rf test/testdata/sigs/player-*.js tmp/ *.annotations.xml *.aria2 *.description *.dump *.frag \
*.frag.aria2 *.frag.urls *.info.json *.live_chat.json *.meta *.part* *.tmp *.temp *.unknown_video *.ytdl \
*.3gp *.ape *.ass *.avi *.desktop *.f4v *.flac *.flv *.gif *.jpeg *.jpg *.lrc *.m4a *.m4v *.mhtml *.mkv *.mov *.mp3 *.mp4 \
*.mpg *.mpga *.oga *.ogg *.opus *.png *.sbv *.srt *.ssa *.swf *.swp *.tt *.ttml *.url *.vtt *.wav *.webloc *.webm *.webp
*.mpg *.mpga *.oga *.ogg *.opus *.png *.sbv *.srt *.ssa *.swf *.tt *.ttml *.url *.vtt *.wav *.webloc *.webm *.webp
clean-dist:
rm -rf yt-dlp.1.temp.md yt-dlp.1 README.txt MANIFEST build/ dist/ .coverage cover/ yt-dlp.tar.gz completions/ \
yt_dlp/extractor/lazy_extractors.py *.spec CONTRIBUTING.md.tmp yt-dlp yt-dlp.exe yt_dlp.egg-info/ AUTHORS

View File

@ -202,7 +202,7 @@ #### Impersonation
* [**curl_cffi**](https://github.com/yifeikong/curl_cffi) (recommended) - Python binding for [curl-impersonate](https://github.com/lwthiker/curl-impersonate). Provides impersonation targets for Chrome, Edge and Safari. Licensed under [MIT](https://github.com/yifeikong/curl_cffi/blob/main/LICENSE)
* Can be installed with the `curl-cffi` group, e.g. `pip install "yt-dlp[default,curl-cffi]"`
* Currently only included in `yt-dlp.exe` and `yt-dlp_macos` builds
* Currently included in `yt-dlp.exe`, `yt-dlp_linux` and `yt-dlp_macos` builds
### Metadata
@ -368,7 +368,9 @@ ## General Options:
stderr) to apply the setting to. Can be one
of "always", "auto" (default), "never", or
"no_color" (use non color terminal
sequences). Can be used multiple times
sequences). Use "auto-tty" or "no_color-tty"
to decide based on terminal support only.
Can be used multiple times
--compat-options OPTS Options that can help keep compatibility
with youtube-dl or youtube-dlc
configurations by reverting some of the
@ -1756,7 +1758,7 @@ # Replace all spaces and "_" in title and uploader with a `-`
# EXTRACTOR ARGUMENTS
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=android_embedded,web;formats=incomplete" --extractor-args "funimation:version=uncut"`
Some extractors accept additional arguments which can be passed using `--extractor-args KEY:ARGS`. `ARGS` is a `;` (semicolon) separated string of `ARG=VAL1,VAL2`. E.g. `--extractor-args "youtube:player-client=mediaconnect,web;formats=incomplete" --extractor-args "funimation:version=uncut"`
Note: In CLI, `ARG` can use `-` instead of `_`; e.g. `youtube:player-client"` becomes `youtube:player_client"`
@ -1765,7 +1767,7 @@ # EXTRACTOR ARGUMENTS
#### youtube
* `lang`: Prefer translated metadata (`title`, `description` etc) of this language code (case-sensitive). By default, the video primary language metadata is preferred, with a fallback to `en` translated. See [youtube.py](https://github.com/yt-dlp/yt-dlp/blob/c26f9b991a0681fd3ea548d535919cec1fbbd430/yt_dlp/extractor/youtube.py#L381-L390) for list of supported content language codes
* `skip`: One or more of `hls`, `dash` or `translated_subs` to skip extraction of the m3u8 manifests, dash manifests and [auto-translated subtitles](https://github.com/yt-dlp/yt-dlp/issues/4090#issuecomment-1158102032) respectively
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music`, `_embedded`, `_embedscreen`, `_creator` (e.g. `web_embedded`); and `mediaconnect`, `mweb`, `mweb_embedscreen` and `tv_embedded` (agegate bypass) with no variants. By default, `ios,web` is used, but `tv_embedded` and `creator` variants are added as required for age-gated videos. Similarly, the music variants are added for `music.youtube.com` urls. The `android` clients will always be given lowest priority since their formats are broken. You can use `all` to use all the clients, and `default` for the default clients.
* `player_client`: Clients to extract video data from. The main clients are `web`, `ios` and `android`, with variants `_music` and `_creator` (e.g. `ios_creator`); and `mediaconnect`, `mweb`, `android_producer`, `android_testsuite`, `android_vr`, `web_safari`, `web_embedded`, `tv` and `tv_embedded` with no variants. By default, `ios,web_creator` is used, and `tv_embedded`, `web_creator` and `mediaconnect` are added as required for age-gated videos. Similarly, the music variants are added for `music.youtube.com` urls. Most `android` clients will be given lowest priority since their formats are broken. You can use `all` to use all the clients, and `default` for the default clients.
* `player_skip`: Skip some network requests that are generally needed for robust extraction. One or more of `configs` (skip client configs), `webpage` (skip initial webpage), `js` (skip js player). While these options can help reduce the number of requests needed or avoid some rate-limiting, they could cause some issues. See [#860](https://github.com/yt-dlp/yt-dlp/pull/860) for more details
* `player_params`: YouTube player parameters to use for player requests. Will overwrite any default ones set by yt-dlp.
* `comment_sort`: `top` or `new` (default) - choose comment sorting mode (on YouTube's side)
@ -1773,7 +1775,7 @@ #### youtube
* E.g. `all,all,1000,10` will get a maximum of 1000 replies total, with up to 10 replies per thread. `1000,all,100` will get a maximum of 1000 comments, with a maximum of 100 replies total
* `formats`: Change the types of formats to return. `dashy` (convert HTTP to DASH), `duplicate` (identical content but different URLs or protocol; includes `dashy`), `incomplete` (cannot be downloaded completely - live dash and post-live m3u8)
* `innertube_host`: Innertube API host to use for all API requests; e.g. `studio.youtube.com`, `youtubei.googleapis.com`. Note that cookies exported from one subdomain will not work on others
* `innertube_key`: Innertube API key to use for all API requests
* `innertube_key`: Innertube API key to use for all API requests. By default, no API key is used
* `raise_incomplete_data`: `Incomplete Data Received` raises an error instead of reporting a warning
#### youtubetab (YouTube playlists, channels, feeds, etc.)
@ -1859,6 +1861,9 @@ #### orfon (orf:on)
#### bilibili
* `prefer_multi_flv`: Prefer extracting flv formats over mp4 for older videos that still provide legacy formats
#### digitalconcerthall
* `prefer_combined_hls`: Prefer extracting combined/pre-merged video and audio HLS formats. This will exclude 4K/HEVC video and lossless/FLAC audio formats, which are only available as split video/audio HLS formats
**Note**: These options may be changed/removed in the future without concern for backward compatibility
<!-- MANPAGE: MOVE "INSTALLATION" SECTION HERE -->
@ -2219,12 +2224,13 @@ ### Differences in default behavior
* yt-dlp versions between 2021.11.10 and 2023.06.21 estimated `filesize_approx` values for fragmented/manifest formats. This was added for convenience in [f2fe69](https://github.com/yt-dlp/yt-dlp/commit/f2fe69c7b0d208bdb1f6292b4ae92bc1e1a7444a), but was reverted in [0dff8e](https://github.com/yt-dlp/yt-dlp/commit/0dff8e4d1e6e9fb938f4256ea9af7d81f42fd54f) due to the potentially extreme inaccuracy of the estimated values. Use `--compat-options manifest-filesize-approx` to keep extracting the estimated values
* yt-dlp uses modern http client backends such as `requests`. Use `--compat-options prefer-legacy-http-handler` to prefer the legacy http handler (`urllib`) to be used for standard http requests.
* The sub-modules `swfinterp`, `casefold` are removed.
* Passing `--simulate` (or calling `extract_info` with `download=False`) no longer alters the default format selection. See [#9843](https://github.com/yt-dlp/yt-dlp/issues/9843) for details.
For ease of use, a few more compat options are available:
* `--compat-options all`: Use all compat options (Do NOT use)
* `--compat-options youtube-dl`: Same as `--compat-options all,-multistreams,-playlist-match-filter,-manifest-filesize-approx`
* `--compat-options youtube-dlc`: Same as `--compat-options all,-no-live-chat,-no-youtube-channel-redirect,-playlist-match-filter,-manifest-filesize-approx`
* `--compat-options all`: Use all compat options (**Do NOT use this!**)
* `--compat-options youtube-dl`: Same as `--compat-options all,-multistreams,-playlist-match-filter,-manifest-filesize-approx,-allow-unsafe-ext`
* `--compat-options youtube-dlc`: Same as `--compat-options all,-no-live-chat,-no-youtube-channel-redirect,-playlist-match-filter,-manifest-filesize-approx,-allow-unsafe-ext`
* `--compat-options 2021`: Same as `--compat-options 2022,no-certifi,filename-sanitization,no-youtube-prefer-utc-upload-date`
* `--compat-options 2022`: Same as `--compat-options 2023,playlist-match-filter,no-external-downloader-progress,prefer-legacy-http-handler,manifest-filesize-approx`
* `--compat-options 2023`: Currently does nothing. Use this to enable all future compat options

View File

@ -2,7 +2,7 @@
set -e
source ~/.local/share/pipx/venvs/pyinstaller/bin/activate
python -m devscripts.install_deps --include secretstorage
python -m devscripts.install_deps --include secretstorage --include curl-cffi
python -m devscripts.make_lazy_extractors
python devscripts/update-version.py -c "${channel}" -r "${origin}" "${version}"
python -m bundle.pyinstaller

View File

@ -179,6 +179,11 @@
{
"action": "add",
"when": "6aaf96a3d6e7d0d426e97e11a2fcf52fda00e733",
"short": "[priority] Security: [[CVE-2024-10123](https://nvd.nist.gov/vuln/detail/CVE-2024-10123)] [Properly sanitize file-extension to prevent file system modification and RCE](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4j)\n - Unsafe extensions are now blocked from being downloaded"
"short": "[priority] Security: [[CVE-2024-38519](https://nvd.nist.gov/vuln/detail/CVE-2024-38519)] [Properly sanitize file-extension to prevent file system modification and RCE](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-79w7-vh3h-8g4j)\n - Unsafe extensions are now blocked from being downloaded"
},
{
"action": "add",
"when": "6075a029dba70a89675ae1250e7cdfd91f0eba41",
"short": "[priority] Security: [[ie/douyutv] Do not use dangerous javascript source/URL](https://github.com/yt-dlp/yt-dlp/security/advisories/GHSA-3v33-3wmw-3785)\n - A dependency on potentially malicious third-party JavaScript code has been removed from the Douyu extractors"
}
]

View File

@ -9,6 +9,7 @@ maintainers = [
{name = "Grub4K", email = "contact@grub4k.xyz"},
{name = "bashonly", email = "bashonly@protonmail.com"},
{name = "coletdjnz", email = "coletdjnz@protonmail.com"},
{name = "sepro", email = "sepro@sepr0.com"},
]
description = "A feature-rich command-line audio/video downloader"
readme = "README.md"
@ -53,7 +54,10 @@ dependencies = [
[project.optional-dependencies]
default = []
curl-cffi = ["curl-cffi==0.5.10; implementation_name=='cpython'"]
curl-cffi = [
"curl-cffi==0.5.10; os_name=='nt' and implementation_name=='cpython'",
"curl-cffi>=0.5.10,!=0.6.*,<0.8; os_name!='nt' and implementation_name=='cpython'",
]
secretstorage = [
"cffi",
"secretstorage",
@ -62,7 +66,7 @@ build = [
"build",
"hatchling",
"pip",
"setuptools",
"setuptools>=71.0.2", # 71.0.0 broke pyinstaller
"wheel",
]
dev = [

View File

@ -354,7 +354,6 @@ # Supported sites
- **DigitallySpeaking**
- **Digiteka**
- **DiscogsReleasePlaylist**
- **Discovery**
- **DiscoveryLife**
- **DiscoveryNetworksDe**
- **DiscoveryPlus**
@ -363,7 +362,6 @@ # Supported sites
- **DiscoveryPlusItaly**
- **DiscoveryPlusItalyShow**
- **Disney**
- **DIYNetwork**
- **dlf**
- **dlf:corpus**: DLF Multi-feed Archives
- **dlive:stream**
@ -516,7 +514,6 @@ # Supported sites
- **GlattvisionTVLive**: [*glattvisiontv*](## "netrc machine")
- **GlattvisionTVRecordings**: [*glattvisiontv*](## "netrc machine")
- **Glide**: Glide mobile video messages (glide.me)
- **GlobalCyclingNetworkPlus**
- **GlobalPlayerAudio**
- **GlobalPlayerAudioEpisode**
- **GlobalPlayerLive**
@ -658,10 +655,11 @@ # Supported sites
- **Ketnet**
- **khanacademy**
- **khanacademy:unit**
- **Kick**
- **kick:clips**
- **kick:live**
- **kick:vod**
- **Kicker**
- **KickStarter**
- **KickVOD**
- **kinja:embed**
- **KinoPoisk**
- **Kommunetv**
@ -693,6 +691,7 @@ # Supported sites
- **Lcp**
- **LcpPlay**
- **Le**: 乐视网
- **LearningOnScreen**
- **Lecture2Go**: (**Currently broken**)
- **Lecturio**: [*lecturio*](## "netrc machine")
- **LecturioCourse**: [*lecturio*](## "netrc machine")
@ -820,8 +819,6 @@ # Supported sites
- **MotherlessGroup**
- **MotherlessUploader**
- **Motorsport**: motorsport.com (**Currently broken**)
- **MotorTrend**
- **MotorTrendOnDemand**
- **MovieFap**
- **Moviepilot**
- **MoviewPlay**
@ -839,7 +836,7 @@ # Supported sites
- **MTVUutisetArticle**: (**Currently broken**)
- **MuenchenTV**: münchen.tv (**Currently broken**)
- **MujRozhlas**
- **Murrtube**: (**Currently broken**)
- **Murrtube**
- **MurrtubeUser**: Murrtube user profile (**Currently broken**)
- **MuseAI**
- **MuseScore**
@ -1145,7 +1142,6 @@ # Supported sites
- **QuantumTV**: [*quantumtv*](## "netrc machine")
- **QuantumTVLive**: [*quantumtv*](## "netrc machine")
- **QuantumTVRecordings**: [*quantumtv*](## "netrc machine")
- **Qub**
- **R7**: (**Currently broken**)
- **R7Article**: (**Currently broken**)
- **Radiko**
@ -1522,9 +1518,9 @@ # Supported sites
- **tv5unis**
- **tv5unis:video**
- **tv8.it**
- **TVA**
- **TVANouvelles**
- **TVANouvellesArticle**
- **tvaplus**: TVA+
- **TVC**
- **TVCArticle**
- **TVer**
@ -1618,6 +1614,7 @@ # Supported sites
- **VidLii**
- **Vidly**
- **vids.io**
- **Vidyard**
- **viewlift**
- **viewlift:embed**
- **Viidea**
@ -1665,6 +1662,8 @@ # Supported sites
- **VRT**: VRT NWS, Flanders News, Flandern Info and Sporza
- **VrtNU**: [*vrtnu*](## "netrc machine") VRT MAX
- **VTM**: (**Currently broken**)
- **VTV**
- **VTVGo**
- **VTXTV**: [*vtxtv*](## "netrc machine")
- **VTXTVLive**: [*vtxtv*](## "netrc machine")
- **VTXTVRecordings**: [*vtxtv*](## "netrc machine")

View File

@ -4,6 +4,7 @@
import os
import sys
import unittest
from unittest.mock import patch
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
@ -520,7 +521,33 @@ def test_format_filtering(self):
ydl.process_ie_result(info_dict)
self.assertEqual(ydl.downloaded_info_dicts, [])
def test_default_format_spec(self):
@patch('yt_dlp.postprocessor.ffmpeg.FFmpegMergerPP.available', False)
def test_default_format_spec_without_ffmpeg(self):
ydl = YDL({})
self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio')
ydl = YDL({'simulate': True})
self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio')
ydl = YDL({})
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio')
ydl = YDL({'simulate': True})
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio')
ydl = YDL({'outtmpl': '-'})
self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio')
ydl = YDL({})
self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio')
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio')
@patch('yt_dlp.postprocessor.ffmpeg.FFmpegMergerPP.available', True)
@patch('yt_dlp.postprocessor.ffmpeg.FFmpegMergerPP.can_merge', lambda _: True)
def test_default_format_spec_with_ffmpeg(self):
ydl = YDL({})
self.assertEqual(ydl._default_format_spec({}), 'bestvideo*+bestaudio/best')
ydl = YDL({'simulate': True})
self.assertEqual(ydl._default_format_spec({}), 'bestvideo*+bestaudio/best')
@ -528,13 +555,13 @@ def test_default_format_spec(self):
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio')
ydl = YDL({'simulate': True})
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'bestvideo*+bestaudio/best')
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio')
ydl = YDL({'outtmpl': '-'})
self.assertEqual(ydl._default_format_spec({}), 'best/bestvideo+bestaudio')
ydl = YDL({})
self.assertEqual(ydl._default_format_spec({}, download=False), 'bestvideo*+bestaudio/best')
self.assertEqual(ydl._default_format_spec({}), 'bestvideo*+bestaudio/best')
self.assertEqual(ydl._default_format_spec({'is_live': True}), 'best/bestvideo+bestaudio')

View File

@ -376,6 +376,61 @@ def test_packed(self):
jsi = JSInterpreter('''function f(p,a,c,k,e,d){while(c--)if(k[c])p=p.replace(new RegExp('\\b'+c.toString(a)+'\\b','g'),k[c]);return p}''')
self.assertEqual(jsi.call_function('f', '''h 7=g("1j");7.7h({7g:[{33:"w://7f-7e-7d-7c.v.7b/7a/79/78/77/76.74?t=73&s=2s&e=72&f=2t&71=70.0.0.1&6z=6y&6x=6w"}],6v:"w://32.v.u/6u.31",16:"r%",15:"r%",6t:"6s",6r:"",6q:"l",6p:"l",6o:"6n",6m:\'6l\',6k:"6j",9:[{33:"/2u?b=6i&n=50&6h=w://32.v.u/6g.31",6f:"6e"}],1y:{6d:1,6c:\'#6b\',6a:\'#69\',68:"67",66:30,65:r,},"64":{63:"%62 2m%m%61%5z%5y%5x.u%5w%5v%5u.2y%22 2k%m%1o%22 5t%m%1o%22 5s%m%1o%22 2j%m%5r%22 16%m%5q%22 15%m%5p%22 5o%2z%5n%5m%2z",5l:"w://v.u/d/1k/5k.2y",5j:[]},\'5i\':{"5h":"5g"},5f:"5e",5d:"w://v.u",5c:{},5b:l,1x:[0.25,0.50,0.75,1,1.25,1.5,2]});h 1m,1n,5a;h 59=0,58=0;h 7=g("1j");h 2x=0,57=0,56=0;$.55({54:{\'53-52\':\'2i-51\'}});7.j(\'4z\',6(x){c(5>0&&x.1l>=5&&1n!=1){1n=1;$(\'q.4y\').4x(\'4w\')}});7.j(\'13\',6(x){2x=x.1l});7.j(\'2g\',6(x){2w(x)});7.j(\'4v\',6(){$(\'q.2v\').4u()});6 2w(x){$(\'q.2v\').4t();c(1m)19;1m=1;17=0;c(4s.4r===l){17=1}$.4q(\'/2u?b=4p&2l=1k&4o=2t-4n-4m-2s-4l&4k=&4j=&4i=&17=\'+17,6(2r){$(\'#4h\').4g(2r)});$(\'.3-8-4f-4e:4d("4c")\').2h(6(e){2q();g().4b(0);g().4a(l)});6 2q(){h $14=$("<q />").2p({1l:"49",16:"r%",15:"r%",48:0,2n:0,2o:47,46:"45(10%, 10%, 10%, 0.4)","44-43":"42"});$("<41 />").2p({16:"60%",15:"60%",2o:40,"3z-2n":"3y"}).3x({\'2m\':\'/?b=3w&2l=1k\',\'2k\':\'0\',\'2j\':\'2i\'}).2f($14);$14.2h(6(){$(3v).3u();g().2g()});$14.2f($(\'#1j\'))}g().13(0);}6 3t(){h 9=7.1b(2e);2d.2c(9);c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==2e){2d.2c(\'!!=\'+i);7.1p(i)}}}}7.j(\'3s\',6(){g().1h("/2a/3r.29","3q 10 28",6(){g().13(g().27()+10)},"2b");$("q[26=2b]").23().21(\'.3-20-1z\');g().1h("/2a/3p.29","3o 10 28",6(){h 12=g().27()-10;c(12<0)12=0;g().13(12)},"24");$("q[26=24]").23().21(\'.3-20-1z\');});6 1i(){}7.j(\'3n\',6(){1i()});7.j(\'3m\',6(){1i()});7.j("k",6(y){h 9=7.1b();c(9.n<2)19;$(\'.3-8-3l-3k\').3j(6(){$(\'#3-8-a-k\').1e(\'3-8-a-z\');$(\'.3-a-k\').p(\'o-1f\',\'11\')});7.1h("/3i/3h.3g","3f 3e",6(){$(\'.3-1w\').3d(\'3-8-1v\');$(\'.3-8-1y, .3-8-1x\').p(\'o-1g\',\'11\');c($(\'.3-1w\').3c(\'3-8-1v\')){$(\'.3-a-k\').p(\'o-1g\',\'l\');$(\'.3-a-k\').p(\'o-1f\',\'l\');$(\'.3-8-a\').1e(\'3-8-a-z\');$(\'.3-8-a:1u\').3b(\'3-8-a-z\')}3a{$(\'.3-a-k\').p(\'o-1g\',\'11\');$(\'.3-a-k\').p(\'o-1f\',\'11\');$(\'.3-8-a:1u\').1e(\'3-8-a-z\')}},"39");7.j("38",6(y){1d.37(\'1c\',y.9[y.36].1a)});c(1d.1t(\'1c\')){35("1s(1d.1t(\'1c\'));",34)}});h 18;6 1s(1q){h 9=7.1b();c(9.n>1){1r(i=0;i<9.n;i++){c(9[i].1a==1q){c(i==18){19}18=i;7.1p(i)}}}}',36,270,'|||jw|||function|player|settings|tracks|submenu||if||||jwplayer|var||on|audioTracks|true|3D|length|aria|attr|div|100|||sx|filemoon|https||event|active||false|tt|seek|dd|height|width|adb|current_audio|return|name|getAudioTracks|default_audio|localStorage|removeClass|expanded|checked|addButton|callMeMaybe|vplayer|0fxcyc2ajhp1|position|vvplay|vvad|220|setCurrentAudioTrack|audio_name|for|audio_set|getItem|last|open|controls|playbackRates|captions|rewind|icon|insertAfter||detach|ff00||button|getPosition|sec|png|player8|ff11|log|console|track_name|appendTo|play|click|no|scrolling|frameborder|file_code|src|top|zIndex|css|showCCform|data|1662367683|383371|dl|video_ad|doPlay|prevt|mp4|3E||jpg|thumbs|file|300|setTimeout|currentTrack|setItem|audioTrackChanged|dualSound|else|addClass|hasClass|toggleClass|Track|Audio|svg|dualy|images|mousedown|buttons|topbar|playAttemptFailed|beforePlay|Rewind|fr|Forward|ff|ready|set_audio_track|remove|this|upload_srt|prop|50px|margin|1000001|iframe|center|align|text|rgba|background|1000000|left|absolute|pause|setCurrentCaptions|Upload|contains|item|content|html|fviews|referer|prem|embed|3e57249ef633e0d03bf76ceb8d8a4b65|216|83|hash|view|get|TokenZir|window|hide|show|complete|slow|fadeIn|video_ad_fadein|time||cache|Cache|Content|headers|ajaxSetup|v2done|tott|vastdone2|vastdone1|vvbefore|playbackRateControls|cast|aboutlink|FileMoon|abouttext|UHD|1870|qualityLabels|sites|GNOME_POWER|link|2Fiframe|3C|allowfullscreen|22360|22640|22no|marginheight|marginwidth|2FGNOME_POWER|2F0fxcyc2ajhp1|2Fe|2Ffilemoon|2F|3A||22https|3Ciframe|code|sharing|fontOpacity|backgroundOpacity|Tahoma|fontFamily|303030|backgroundColor|FFFFFF|color|userFontScale|thumbnails|kind|0fxcyc2ajhp10000|url|get_slides|start|startparam|none|preload|html5|primary|hlshtml|androidhls|duration|uniform|stretching|0fxcyc2ajhp1_xt|image|2048|sp|6871|asn|127|srv|43200|_g3XlBcu2lmD9oDexD2NLWSmah2Nu3XcDrl93m9PwXY|m3u8||master|0fxcyc2ajhp1_x|00076|01|hls2|to|s01|delivery|storage|moon|sources|setup'''.split('|')))
def test_join(self):
test_input = list('test')
tests = [
'function f(a, b){return a.join(b)}',
'function f(a, b){return Array.prototype.join.call(a, b)}',
'function f(a, b){return Array.prototype.join.apply(a, [b])}',
]
for test in tests:
jsi = JSInterpreter(test)
self._test(jsi, 'test', args=[test_input, ''])
self._test(jsi, 't-e-s-t', args=[test_input, '-'])
self._test(jsi, '', args=[[], '-'])
def test_split(self):
test_result = list('test')
tests = [
'function f(a, b){return a.split(b)}',
'function f(a, b){return String.prototype.split.call(a, b)}',
'function f(a, b){return String.prototype.split.apply(a, [b])}',
]
for test in tests:
jsi = JSInterpreter(test)
self._test(jsi, test_result, args=['test', ''])
self._test(jsi, test_result, args=['t-e-s-t', '-'])
self._test(jsi, [''], args=['', '-'])
self._test(jsi, [], args=['', ''])
def test_slice(self):
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice()}', [0, 1, 2, 3, 4, 5, 6, 7, 8])
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(0)}', [0, 1, 2, 3, 4, 5, 6, 7, 8])
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(5)}', [5, 6, 7, 8])
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(99)}', [])
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(-2)}', [7, 8])
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(-99)}', [0, 1, 2, 3, 4, 5, 6, 7, 8])
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(0, 0)}', [])
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(1, 0)}', [])
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(0, 1)}', [0])
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(3, 6)}', [3, 4, 5])
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(1, -1)}', [1, 2, 3, 4, 5, 6, 7])
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(-1, 1)}', [])
self._test('function f(){return [0, 1, 2, 3, 4, 5, 6, 7, 8].slice(-3, -1)}', [6, 7])
self._test('function f(){return "012345678".slice()}', '012345678')
self._test('function f(){return "012345678".slice(0)}', '012345678')
self._test('function f(){return "012345678".slice(5)}', '5678')
self._test('function f(){return "012345678".slice(99)}', '')
self._test('function f(){return "012345678".slice(-2)}', '78')
self._test('function f(){return "012345678".slice(-99)}', '012345678')
self._test('function f(){return "012345678".slice(0, 0)}', '')
self._test('function f(){return "012345678".slice(1, 0)}', '')
self._test('function f(){return "012345678".slice(0, 1)}', '0')
self._test('function f(){return "012345678".slice(3, 6)}', '345')
self._test('function f(){return "012345678".slice(1, -1)}', '1234567')
self._test('function f(){return "012345678".slice(-1, 1)}', '')
self._test('function f(){return "012345678".slice(-3, -1)}', '67')
if __name__ == '__main__':
unittest.main()

View File

@ -265,6 +265,11 @@ def do_GET(self):
self.end_headers()
self.wfile.write(payload)
self.finish()
elif self.path == '/get_cookie':
self.send_response(200)
self.send_header('Set-Cookie', 'test=ytdlp; path=/')
self.end_headers()
self.finish()
else:
self._status(404)
@ -338,6 +343,52 @@ def test_ssl_error(self, handler):
validate_and_send(rh, Request(f'https://127.0.0.1:{https_port}/headers'))
assert not issubclass(exc_info.type, CertificateVerifyError)
@pytest.mark.skip_handler('CurlCFFI', 'legacy_ssl ignored by CurlCFFI')
def test_legacy_ssl_extension(self, handler):
# HTTPS server with old ciphers
# XXX: is there a better way to test this than to create a new server?
https_httpd = http.server.ThreadingHTTPServer(
('127.0.0.1', 0), HTTPTestRequestHandler)
sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
sslctx.maximum_version = ssl.TLSVersion.TLSv1_2
sslctx.set_ciphers('SHA1:AESCCM:aDSS:eNULL:aNULL')
sslctx.load_cert_chain(os.path.join(TEST_DIR, 'testcert.pem'), None)
https_httpd.socket = sslctx.wrap_socket(https_httpd.socket, server_side=True)
https_port = http_server_port(https_httpd)
https_server_thread = threading.Thread(target=https_httpd.serve_forever)
https_server_thread.daemon = True
https_server_thread.start()
with handler(verify=False) as rh:
res = validate_and_send(rh, Request(f'https://127.0.0.1:{https_port}/headers', extensions={'legacy_ssl': True}))
assert res.status == 200
res.close()
# Ensure only applies to request extension
with pytest.raises(SSLError):
validate_and_send(rh, Request(f'https://127.0.0.1:{https_port}/headers'))
@pytest.mark.skip_handler('CurlCFFI', 'legacy_ssl ignored by CurlCFFI')
def test_legacy_ssl_support(self, handler):
# HTTPS server with old ciphers
# XXX: is there a better way to test this than to create a new server?
https_httpd = http.server.ThreadingHTTPServer(
('127.0.0.1', 0), HTTPTestRequestHandler)
sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
sslctx.maximum_version = ssl.TLSVersion.TLSv1_2
sslctx.set_ciphers('SHA1:AESCCM:aDSS:eNULL:aNULL')
sslctx.load_cert_chain(os.path.join(TEST_DIR, 'testcert.pem'), None)
https_httpd.socket = sslctx.wrap_socket(https_httpd.socket, server_side=True)
https_port = http_server_port(https_httpd)
https_server_thread = threading.Thread(target=https_httpd.serve_forever)
https_server_thread.daemon = True
https_server_thread.start()
with handler(verify=False, legacy_ssl_support=True) as rh:
res = validate_and_send(rh, Request(f'https://127.0.0.1:{https_port}/headers'))
assert res.status == 200
res.close()
def test_percent_encode(self, handler):
with handler() as rh:
# Unicode characters should be encoded with uppercase percent-encoding
@ -490,6 +541,24 @@ def test_cookies(self, handler):
rh, Request(f'http://127.0.0.1:{self.http_port}/headers', extensions={'cookiejar': cookiejar})).read()
assert b'cookie: test=ytdlp' in data.lower()
def test_cookie_sync_only_cookiejar(self, handler):
# Ensure that cookies are ONLY being handled by the cookiejar
with handler() as rh:
validate_and_send(rh, Request(f'http://127.0.0.1:{self.http_port}/get_cookie', extensions={'cookiejar': YoutubeDLCookieJar()}))
data = validate_and_send(rh, Request(f'http://127.0.0.1:{self.http_port}/headers', extensions={'cookiejar': YoutubeDLCookieJar()})).read()
assert b'cookie: test=ytdlp' not in data.lower()
def test_cookie_sync_delete_cookie(self, handler):
# Ensure that cookies are ONLY being handled by the cookiejar
cookiejar = YoutubeDLCookieJar()
with handler(cookiejar=cookiejar) as rh:
validate_and_send(rh, Request(f'http://127.0.0.1:{self.http_port}/get_cookie'))
data = validate_and_send(rh, Request(f'http://127.0.0.1:{self.http_port}/headers')).read()
assert b'cookie: test=ytdlp' in data.lower()
cookiejar.clear_session_cookies()
data = validate_and_send(rh, Request(f'http://127.0.0.1:{self.http_port}/headers')).read()
assert b'cookie: test=ytdlp' not in data.lower()
def test_headers(self, handler):
with handler(headers=HTTPHeaderDict({'test1': 'test', 'test2': 'test2'})) as rh:
@ -914,7 +983,6 @@ def mock_close(*args, **kwargs):
class TestCurlCFFIRequestHandler(TestRequestHandlerBase):
@pytest.mark.parametrize('params,extensions', [
({}, {'impersonate': ImpersonateTarget('chrome')}),
({'impersonate': ImpersonateTarget('chrome', '110')}, {}),
({'impersonate': ImpersonateTarget('chrome', '99')}, {'impersonate': ImpersonateTarget('chrome', '110')}),
])
@ -1200,6 +1268,9 @@ class HTTPSupportedRH(ValidationRH):
({'timeout': 1}, False),
({'timeout': 'notatimeout'}, AssertionError),
({'unsupported': 'value'}, UnsupportedRequest),
({'legacy_ssl': False}, False),
({'legacy_ssl': True}, False),
({'legacy_ssl': 'notabool'}, AssertionError),
]),
('Requests', 'http', [
({'cookiejar': 'notacookiejar'}, AssertionError),
@ -1207,6 +1278,9 @@ class HTTPSupportedRH(ValidationRH):
({'timeout': 1}, False),
({'timeout': 'notatimeout'}, AssertionError),
({'unsupported': 'value'}, UnsupportedRequest),
({'legacy_ssl': False}, False),
({'legacy_ssl': True}, False),
({'legacy_ssl': 'notabool'}, AssertionError),
]),
('CurlCFFI', 'http', [
({'cookiejar': 'notacookiejar'}, AssertionError),
@ -1220,6 +1294,9 @@ class HTTPSupportedRH(ValidationRH):
({'impersonate': ImpersonateTarget(None, None, None, None)}, False),
({'impersonate': ImpersonateTarget()}, False),
({'impersonate': 'chrome'}, AssertionError),
({'legacy_ssl': False}, False),
({'legacy_ssl': True}, False),
({'legacy_ssl': 'notabool'}, AssertionError),
]),
(NoCheckRH, 'http', [
({'cookiejar': 'notacookiejar'}, False),
@ -1228,6 +1305,9 @@ class HTTPSupportedRH(ValidationRH):
('Websockets', 'ws', [
({'cookiejar': YoutubeDLCookieJar()}, False),
({'timeout': 2}, False),
({'legacy_ssl': False}, False),
({'legacy_ssl': True}, False),
({'legacy_ssl': 'notabool'}, AssertionError),
]),
]

View File

@ -444,6 +444,8 @@ def test_unified_timestamps(self):
self.assertEqual(unified_timestamp('Sep 11, 2013 | 5:49 AM'), 1378878540)
self.assertEqual(unified_timestamp('December 15, 2017 at 7:49 am'), 1513324140)
self.assertEqual(unified_timestamp('2018-03-14T08:32:43.1493874+00:00'), 1521016363)
self.assertEqual(unified_timestamp('Sunday, 26 Nov 2006, 19:00'), 1164567600)
self.assertEqual(unified_timestamp('wed, aug 16, 2008, 12:00pm'), 1218931200)
self.assertEqual(unified_timestamp('December 31 1969 20:00:01 EDT'), 1)
self.assertEqual(unified_timestamp('Wednesday 31 December 1969 18:01:26 MDT'), 86)
@ -929,6 +931,11 @@ def test_parse_codecs(self):
'acodec': 'none',
'dynamic_range': 'DV',
})
self.assertEqual(parse_codecs('fLaC'), {
'vcodec': 'none',
'acodec': 'flac',
'dynamic_range': None,
})
self.assertEqual(parse_codecs('theora, vorbis'), {
'vcodec': 'theora',
'acodec': 'vorbis',

View File

@ -61,6 +61,10 @@ def process_request(self, request):
return websockets.http11.Response(
status.value, status.phrase, websockets.datastructures.Headers([('Location', '/')]), b'')
return self.protocol.reject(status.value, status.phrase)
elif request.path.startswith('/get_cookie'):
response = self.protocol.accept(request)
response.headers['Set-Cookie'] = 'test=ytdlp'
return response
return self.protocol.accept(request)
@ -102,6 +106,15 @@ def create_mtls_wss_websocket_server():
return create_websocket_server(ssl_context=sslctx)
def create_legacy_wss_websocket_server():
certfn = os.path.join(TEST_DIR, 'testcert.pem')
sslctx = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER)
sslctx.maximum_version = ssl.TLSVersion.TLSv1_2
sslctx.set_ciphers('SHA1:AESCCM:aDSS:eNULL:aNULL')
sslctx.load_cert_chain(certfn, None)
return create_websocket_server(ssl_context=sslctx)
def ws_validate_and_send(rh, req):
rh.validate(req)
max_tries = 3
@ -132,6 +145,9 @@ def setup_class(cls):
cls.mtls_wss_thread, cls.mtls_wss_port = create_mtls_wss_websocket_server()
cls.mtls_wss_base_url = f'wss://127.0.0.1:{cls.mtls_wss_port}'
cls.legacy_wss_thread, cls.legacy_wss_port = create_legacy_wss_websocket_server()
cls.legacy_wss_host = f'wss://127.0.0.1:{cls.legacy_wss_port}'
def test_basic_websockets(self, handler):
with handler() as rh:
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
@ -166,6 +182,22 @@ def test_ssl_error(self, handler):
ws_validate_and_send(rh, Request(self.bad_wss_host))
assert not issubclass(exc_info.type, CertificateVerifyError)
def test_legacy_ssl_extension(self, handler):
with handler(verify=False) as rh:
ws = ws_validate_and_send(rh, Request(self.legacy_wss_host, extensions={'legacy_ssl': True}))
assert ws.status == 101
ws.close()
# Ensure only applies to request extension
with pytest.raises(SSLError):
ws_validate_and_send(rh, Request(self.legacy_wss_host))
def test_legacy_ssl_support(self, handler):
with handler(verify=False, legacy_ssl_support=True) as rh:
ws = ws_validate_and_send(rh, Request(self.legacy_wss_host))
assert ws.status == 101
ws.close()
@pytest.mark.parametrize('path,expected', [
# Unicode characters should be encoded with uppercase percent-encoding
('/中文', '/%E4%B8%AD%E6%96%87'),
@ -248,6 +280,32 @@ def test_cookies(self, handler):
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp'
ws.close()
@pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets')
def test_cookie_sync_only_cookiejar(self, handler):
# Ensure that cookies are ONLY being handled by the cookiejar
with handler() as rh:
ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie', extensions={'cookiejar': YoutubeDLCookieJar()}))
ws = ws_validate_and_send(rh, Request(self.ws_base_url, extensions={'cookiejar': YoutubeDLCookieJar()}))
ws.send('headers')
assert 'cookie' not in json.loads(ws.recv())
ws.close()
@pytest.mark.skip_handler('Websockets', 'Set-Cookie not supported by websockets')
def test_cookie_sync_delete_cookie(self, handler):
# Ensure that cookies are ONLY being handled by the cookiejar
cookiejar = YoutubeDLCookieJar()
with handler(verbose=True, cookiejar=cookiejar) as rh:
ws_validate_and_send(rh, Request(f'{self.ws_base_url}/get_cookie'))
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
assert json.loads(ws.recv())['cookie'] == 'test=ytdlp'
ws.close()
cookiejar.clear_session_cookies()
ws = ws_validate_and_send(rh, Request(self.ws_base_url))
ws.send('headers')
assert 'cookie' not in json.loads(ws.recv())
ws.close()
def test_source_address(self, handler):
source_address = f'127.0.0.{random.randint(5, 255)}'
verify_address_availability(source_address)

View File

@ -167,6 +167,22 @@
'https://www.youtube.com/s/player/590f65a6/player_ias.vflset/en_US/base.js',
'1tm7-g_A9zsI8_Lay_', 'xI4Vem4Put_rOg',
),
(
'https://www.youtube.com/s/player/b22ef6e7/player_ias.vflset/en_US/base.js',
'b6HcntHGkvBLk_FRf', 'kNPW6A7FyP2l8A',
),
(
'https://www.youtube.com/s/player/3400486c/player_ias.vflset/en_US/base.js',
'lL46g3XifCKUZn1Xfw', 'z767lhet6V2Skl',
),
(
'https://www.youtube.com/s/player/20dfca59/player_ias.vflset/en_US/base.js',
'-fLCxedkAk4LUTK2', 'O8kfRq1y1eyHGw',
),
(
'https://www.youtube.com/s/player/b12cc44b/player_ias.vflset/en_US/base.js',
'keLa5R2U00sR9SQK', 'N1OGyujjEwMnLw',
),
]

View File

@ -452,7 +452,8 @@ class YoutubeDL:
Can also just be a single color policy,
in which case it applies to all outputs.
Valid stream names are 'stdout' and 'stderr'.
Valid color policies are one of 'always', 'auto', 'no_color' or 'never'.
Valid color policies are one of 'always', 'auto',
'no_color', 'never', 'auto-tty' or 'no_color-tty'.
geo_bypass: Bypass geographic restriction via faking X-Forwarded-For
HTTP header
geo_bypass_country:
@ -659,12 +660,15 @@ def __init__(self, params=None, auto_init=True):
self.params['color'] = 'no_color'
term_allow_color = os.getenv('TERM', '').lower() != 'dumb'
no_color = bool(os.getenv('NO_COLOR'))
base_no_color = bool(os.getenv('NO_COLOR'))
def process_color_policy(stream):
stream_name = {sys.stdout: 'stdout', sys.stderr: 'stderr'}[stream]
policy = traverse_obj(self.params, ('color', (stream_name, None), {str}), get_all=False)
if policy in ('auto', None):
policy = traverse_obj(self.params, ('color', (stream_name, None), {str}, any)) or 'auto'
if policy in ('auto', 'auto-tty', 'no_color-tty'):
no_color = base_no_color
if policy.endswith('tty'):
no_color = policy.startswith('no_color')
if term_allow_color and supports_terminal_sequences(stream):
return 'no_color' if no_color else True
return False
@ -2190,9 +2194,8 @@ def _select_formats(self, formats, selector):
or all(f.get('acodec') == 'none' for f in formats)), # OR, No formats with audio
}))
def _default_format_spec(self, info_dict, download=True):
download = download and not self.params.get('simulate')
prefer_best = download and (
def _default_format_spec(self, info_dict):
prefer_best = (
self.params['outtmpl']['default'] == '-'
or info_dict.get('is_live') and not self.params.get('live_from_start'))
@ -2200,7 +2203,7 @@ def can_merge():
merger = FFmpegMergerPP(self)
return merger.available and merger.can_merge()
if not prefer_best and download and not can_merge():
if not prefer_best and not can_merge():
prefer_best = True
formats = self._get_formats(info_dict)
evaluate_formats = lambda spec: self._select_formats(formats, self.build_format_selector(spec))
@ -2959,7 +2962,7 @@ def is_wellformed(f):
continue
if format_selector is None:
req_format = self._default_format_spec(info_dict, download=download)
req_format = self._default_format_spec(info_dict)
self.write_debug(f'Default format spec: {req_format}')
format_selector = self.build_format_selector(req_format)
@ -3169,11 +3172,12 @@ def dl(self, name, info, subtitle=False, test=False):
if test:
verbose = self.params.get('verbose')
quiet = self.params.get('quiet') or not verbose
params = {
'test': True,
'quiet': self.params.get('quiet') or not verbose,
'quiet': quiet,
'verbose': verbose,
'noprogress': not verbose,
'noprogress': quiet,
'nopart': True,
'skip_unavailable_fragments': False,
'keep_fragments': False,

View File

@ -468,7 +468,7 @@ def metadataparser_actions(f):
default_downloader = ed.get_basename()
for policy in opts.color.values():
if policy not in ('always', 'auto', 'no_color', 'never'):
if policy not in ('always', 'auto', 'auto-tty', 'no_color', 'no_color-tty', 'never'):
raise ValueError(f'"{policy}" is not a valid color policy')
warnings, deprecation_warnings = [], []
@ -599,7 +599,7 @@ def report_deprecation(val, old, new=None):
warnings.append(
'Using allow-unsafe-ext opens you up to potential attacks. '
'Use with great care!')
_UnsafeExtensionError.sanitize_extension = lambda x: x
_UnsafeExtensionError.sanitize_extension = lambda x, prepend=False: x
return warnings, deprecation_warnings

View File

@ -506,7 +506,6 @@
from .digitalconcerthall import DigitalConcertHallIE
from .digiteka import DigitekaIE
from .discogs import DiscogsReleasePlaylistIE
from .discovery import DiscoveryIE
from .disney import DisneyIE
from .dispeak import DigitallySpeakingIE
from .dlf import (
@ -534,16 +533,12 @@
DiscoveryPlusIndiaShowIE,
DiscoveryPlusItalyIE,
DiscoveryPlusItalyShowIE,
DIYNetworkIE,
DPlayIE,
FoodNetworkIE,
GlobalCyclingNetworkPlusIE,
GoDiscoveryIE,
HGTVDeIE,
HGTVUsaIE,
InvestigationDiscoveryIE,
MotorTrendIE,
MotorTrendOnDemandIE,
ScienceChannelIE,
TravelChannelIE,
)
@ -946,6 +941,7 @@
KhanAcademyUnitIE,
)
from .kick import (
KickClipIE,
KickIE,
KickVODIE,
)
@ -993,6 +989,7 @@
LcpIE,
LcpPlayIE,
)
from .learningonscreen import LearningOnScreenIE
from .lecture2go import Lecture2GoIE
from .lecturio import (
LecturioCourseIE,
@ -2176,10 +2173,7 @@
TV5UnisVideoIE,
)
from .tv24ua import TV24UAVideoIE
from .tva import (
TVAIE,
QubIE,
)
from .tva import TVAIE
from .tvanouvelles import (
TVANouvellesArticleIE,
TVANouvellesIE,
@ -2326,6 +2320,7 @@
)
from .vidlii import VidLiiIE
from .vidly import VidlyIE
from .vidyard import VidyardIE
from .viewlift import (
ViewLiftEmbedIE,
ViewLiftIE,
@ -2391,6 +2386,10 @@
VrtNUIE,
)
from .vtm import VTMIE
from .vtv import (
VTVIE,
VTVGoIE,
)
from .vuclip import VuClipIE
from .vvvvid import (
VVVVIDIE,

View File

@ -9,12 +9,12 @@
import struct
import time
import urllib.parse
import urllib.request
import urllib.response
import uuid
from .common import InfoExtractor
from ..aes import aes_ecb_decrypt
from ..networking import RequestHandler, Response
from ..networking.exceptions import TransportError
from ..utils import (
ExtractorError,
OnDemandPagedList,
@ -26,37 +26,36 @@
traverse_obj,
update_url_query,
)
from ..utils.networking import clean_proxies
def add_opener(ydl, handler): # FIXME: Create proper API in .networking
"""Add a handler for opening URLs, like _download_webpage"""
# https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L426
# https://github.com/python/cpython/blob/main/Lib/urllib/request.py#L605
rh = ydl._request_director.handlers['Urllib']
if 'abematv-license' in rh._SUPPORTED_URL_SCHEMES:
return
headers = ydl.params['http_headers'].copy()
proxies = ydl.proxies.copy()
clean_proxies(proxies, headers)
opener = rh._get_instance(cookiejar=ydl.cookiejar, proxies=proxies)
assert isinstance(opener, urllib.request.OpenerDirector)
opener.add_handler(handler)
rh._SUPPORTED_URL_SCHEMES = (*rh._SUPPORTED_URL_SCHEMES, 'abematv-license')
class AbemaLicenseRH(RequestHandler):
_SUPPORTED_URL_SCHEMES = ('abematv-license',)
_SUPPORTED_PROXY_SCHEMES = None
_SUPPORTED_FEATURES = None
RH_NAME = 'abematv_license'
_STRTABLE = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
_HKEY = b'3AF0298C219469522A313570E8583005A642E73EDD58E3EA2FB7339D3DF1597E'
class AbemaLicenseHandler(urllib.request.BaseHandler):
handler_order = 499
STRTABLE = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'
HKEY = b'3AF0298C219469522A313570E8583005A642E73EDD58E3EA2FB7339D3DF1597E'
def __init__(self, ie: 'AbemaTVIE'):
# the protocol that this should really handle is 'abematv-license://'
# abematv_license_open is just a placeholder for development purposes
# ref. https://github.com/python/cpython/blob/f4c03484da59049eb62a9bf7777b963e2267d187/Lib/urllib/request.py#L510
setattr(self, 'abematv-license_open', getattr(self, 'abematv_license_open', None))
def __init__(self, *, ie: 'AbemaTVIE', **kwargs):
super().__init__(**kwargs)
self.ie = ie
def _send(self, request):
url = request.url
ticket = urllib.parse.urlparse(url).netloc
try:
response_data = self._get_videokey_from_ticket(ticket)
except ExtractorError as e:
raise TransportError(cause=e.cause) from e
except (IndexError, KeyError, TypeError) as e:
raise TransportError(cause=repr(e)) from e
return Response(
io.BytesIO(response_data), url,
headers={'Content-Length': str(len(response_data))})
def _get_videokey_from_ticket(self, ticket):
to_show = self.ie.get_param('verbose', False)
media_token = self.ie._get_media_token(to_show=to_show)
@ -72,25 +71,17 @@ def _get_videokey_from_ticket(self, ticket):
'Content-Type': 'application/json',
})
res = decode_base_n(license_response['k'], table=self.STRTABLE)
res = decode_base_n(license_response['k'], table=self._STRTABLE)
encvideokey = bytes_to_intlist(struct.pack('>QQ', res >> 64, res & 0xffffffffffffffff))
h = hmac.new(
binascii.unhexlify(self.HKEY),
binascii.unhexlify(self._HKEY),
(license_response['cid'] + self.ie._DEVICE_ID).encode(),
digestmod=hashlib.sha256)
enckey = bytes_to_intlist(h.digest())
return intlist_to_bytes(aes_ecb_decrypt(encvideokey, enckey))
def abematv_license_open(self, url):
url = url.get_full_url() if isinstance(url, urllib.request.Request) else url
ticket = urllib.parse.urlparse(url).netloc
response_data = self._get_videokey_from_ticket(ticket)
return urllib.response.addinfourl(io.BytesIO(response_data), headers={
'Content-Length': str(len(response_data)),
}, url=url, code=200)
class AbemaTVBaseIE(InfoExtractor):
_NETRC_MACHINE = 'abematv'
@ -139,7 +130,7 @@ def _get_device_token(self):
if self._USERTOKEN:
return self._USERTOKEN
add_opener(self._downloader, AbemaLicenseHandler(self))
self._downloader._request_director.add_handler(AbemaLicenseRH(ie=self, logger=None))
username, _ = self._get_login_info()
auth_cache = username and self.cache.load(self._NETRC_MACHINE, username, min_ver='2024.01.19')
@ -368,6 +359,7 @@ def _real_extract(self, url):
info['episode_number'] = epis if epis < 2000 else None
is_live, m3u8_url = False, None
availability = 'public'
if video_type == 'now-on-air':
is_live = True
channel_url = 'https://api.abema.io/v1/channels'
@ -385,10 +377,10 @@ def _real_extract(self, url):
f'https://api.abema.io/v1/video/programs/{video_id}', video_id,
note='Checking playability',
headers=headers)
ondemand_types = traverse_obj(api_response, ('terms', ..., 'onDemandType'))
if 3 not in ondemand_types:
if not traverse_obj(api_response, ('label', 'free', {bool})):
# cannot acquire decryption key for these streams
self.report_warning('This is a premium-only stream')
availability = 'premium_only'
info.update(traverse_obj(api_response, {
'series': ('series', 'title'),
'season': ('season', 'name'),
@ -408,6 +400,7 @@ def _real_extract(self, url):
headers=headers)
if not traverse_obj(api_response, ('slot', 'flags', 'timeshiftFree'), default=False):
self.report_warning('This is a premium-only stream')
availability = 'premium_only'
m3u8_url = f'https://vod-abematv.akamaized.net/slot/{video_id}/playlist.m3u8'
else:
@ -425,6 +418,7 @@ def _real_extract(self, url):
'description': description,
'formats': formats,
'is_live': is_live,
'availability': availability,
})
return info

View File

@ -16,6 +16,7 @@
float_or_none,
int_or_none,
intlist_to_bytes,
join_nonempty,
long_to_bytes,
parse_iso8601,
pkcs1pad,
@ -48,9 +49,9 @@ class ADNBaseIE(InfoExtractor):
class ADNIE(ADNBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?:animation|anime)digitalnetwork\.(?P<lang>fr|de)/video/[^/?#]+/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?animationdigitalnetwork\.com/(?:(?P<lang>de)/)?video/[^/?#]+/(?P<id>\d+)'
_TESTS = [{
'url': 'https://animationdigitalnetwork.fr/video/fruits-basket/9841-episode-1-a-ce-soir',
'url': 'https://animationdigitalnetwork.com/video/558-fruits-basket/9841-episode-1-a-ce-soir',
'md5': '1c9ef066ceb302c86f80c2b371615261',
'info_dict': {
'id': '9841',
@ -70,10 +71,7 @@ class ADNIE(ADNBaseIE):
},
'skip': 'Only available in French and German speaking Europe',
}, {
'url': 'http://animedigitalnetwork.fr/video/blue-exorcist-kyoto-saga/7778-episode-1-debut-des-hostilites',
'only_matching': True,
}, {
'url': 'https://animationdigitalnetwork.de/video/the-eminence-in-shadow/23550-folge-1',
'url': 'https://animationdigitalnetwork.com/de/video/973-the-eminence-in-shadow/23550-folge-1',
'md5': '5c5651bf5791fa6fcd7906012b9d94e8',
'info_dict': {
'id': '23550',
@ -166,7 +164,7 @@ def _perform_login(self, username, password):
'username': username,
})) or {}).get('accessToken')
if access_token:
self._HEADERS = {'authorization': 'Bearer ' + access_token}
self._HEADERS['Authorization'] = f'Bearer {access_token}'
except ExtractorError as e:
message = None
if isinstance(e.cause, HTTPError) and e.cause.status == 401:
@ -177,6 +175,7 @@ def _perform_login(self, username, password):
def _real_extract(self, url):
lang, video_id = self._match_valid_url(url).group('lang', 'id')
self._HEADERS['X-Target-Distribution'] = lang or 'fr'
video_base_url = self._PLAYER_BASE_URL + f'video/{video_id}/'
player = self._download_json(
video_base_url + 'configuration', video_id,
@ -217,7 +216,6 @@ def _real_extract(self, url):
links_data = self._download_json(
links_url, video_id, 'Downloading links JSON metadata', headers={
'X-Player-Token': authorization,
'X-Target-Distribution': lang,
**self._HEADERS,
}, query={
'freeWithAds': 'true',
@ -256,6 +254,7 @@ def _real_extract(self, url):
load_balancer_data = self._download_json(
load_balancer_url, video_id,
f'Downloading {format_id} {quality} JSON metadata',
headers=self._HEADERS,
fatal=False) or {}
m3u8_url = load_balancer_data.get('location')
if not m3u8_url:
@ -276,7 +275,7 @@ def _real_extract(self, url):
video = (self._download_json(
self._API_BASE_URL + f'video/{video_id}', video_id,
'Downloading additional video metadata', fatal=False) or {}).get('video') or {}
'Downloading additional video metadata', fatal=False, headers=self._HEADERS) or {}).get('video') or {}
show = video.get('show') or {}
return {
@ -298,9 +297,9 @@ def _real_extract(self, url):
class ADNSeasonIE(ADNBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?:animation|anime)digitalnetwork\.(?P<lang>fr|de)/video/(?P<id>[^/?#]+)/?(?:$|[#?])'
_VALID_URL = r'https?://(?:www\.)?animationdigitalnetwork\.com/(?:(?P<lang>de)/)?video/(?P<id>\d+)[^/?#]*/?(?:$|[#?])'
_TESTS = [{
'url': 'https://animationdigitalnetwork.fr/video/tokyo-mew-mew-new',
'url': 'https://animationdigitalnetwork.com/video/911-tokyo-mew-mew-new',
'playlist_count': 12,
'info_dict': {
'id': '911',
@ -311,24 +310,22 @@ class ADNSeasonIE(ADNBaseIE):
def _real_extract(self, url):
lang, video_show_slug = self._match_valid_url(url).group('lang', 'id')
self._HEADERS['X-Target-Distribution'] = lang or 'fr'
show = self._download_json(
f'{self._API_BASE_URL}show/{video_show_slug}/', video_show_slug,
'Downloading show JSON metadata', headers=self._HEADERS)['show']
show_id = str(show['id'])
episodes = self._download_json(
f'{self._API_BASE_URL}video/show/{show_id}', video_show_slug,
'Downloading episode list', headers={
'X-Target-Distribution': lang,
**self._HEADERS,
}, query={
'Downloading episode list', headers=self._HEADERS, query={
'order': 'asc',
'limit': '-1',
})
def entries():
for episode_id in traverse_obj(episodes, ('videos', ..., 'id', {str_or_none})):
yield self.url_result(
f'https://animationdigitalnetwork.{lang}/video/{video_show_slug}/{episode_id}',
ADNIE, episode_id)
yield self.url_result(join_nonempty(
'https://animationdigitalnetwork.com', lang, 'video',
video_show_slug, episode_id, delim='/'), ADNIE, episode_id)
return self.playlist_result(entries(), show_id, show.get('title'))

View File

@ -1,6 +1,7 @@
import functools
from .common import InfoExtractor
from ..networking import Request
from ..utils import (
ExtractorError,
OnDemandPagedList,
@ -58,6 +59,13 @@ def _perform_login(self, username, password):
f'Unable to login: {self.IE_NAME} said: {error}',
expected=True)
def _call_api(self, endpoint, display_id, data=None, headers=None, query=None):
return self._download_json(Request(
f'https://api.m.afreecatv.com/{endpoint}',
data=data, headers=headers, query=query,
extensions={'legacy_ssl': True}), display_id,
'Downloading API JSON', 'Unable to download API JSON')
class AfreecaTVIE(AfreecaTVBaseIE):
IE_NAME = 'afreecatv'
@ -184,12 +192,12 @@ class AfreecaTVIE(AfreecaTVBaseIE):
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://api.m.afreecatv.com/station/video/a/view', video_id,
headers={'Referer': url}, data=urlencode_postdata({
data = self._call_api(
'station/video/a/view', video_id, headers={'Referer': url},
data=urlencode_postdata({
'nTitleNo': video_id,
'nApiLevel': 10,
}), impersonate=True)['data']
}))['data']
error_code = traverse_obj(data, ('code', {int}))
if error_code == -6221:
@ -267,9 +275,9 @@ class AfreecaTVCatchStoryIE(AfreecaTVBaseIE):
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_json(
'https://api.m.afreecatv.com/catchstory/a/view', video_id, headers={'Referer': url},
query={'aStoryListIdx': '', 'nStoryIdx': video_id}, impersonate=True)
data = self._call_api(
'catchstory/a/view', video_id, headers={'Referer': url},
query={'aStoryListIdx': '', 'nStoryIdx': video_id})
return self.playlist_result(self._entries(data), video_id)

View File

@ -4,9 +4,13 @@
from .common import InfoExtractor
from ..utils import (
InAdvancePagedList,
determine_ext,
format_field,
int_or_none,
join_nonempty,
traverse_obj,
unified_timestamp,
url_or_none,
)
@ -30,6 +34,7 @@ def _extract_playlist(self, playlist_id):
class BanByeIE(BanByeBaseIE):
_VALID_URL = r'https?://(?:www\.)?banbye\.com/(?:en/)?watch/(?P<id>[\w-]+)'
_TESTS = [{
# ['src']['mp4']['levels'] direct mp4 urls only
'url': 'https://banbye.com/watch/v_ytfmvkVYLE8T',
'md5': '2f4ea15c5ca259a73d909b2cfd558eb5',
'info_dict': {
@ -58,6 +63,7 @@ class BanByeIE(BanByeBaseIE):
},
'playlist_mincount': 9,
}, {
# ['src']['mp4']['levels'] direct mp4 urls only
'url': 'https://banbye.com/watch/v_kb6_o1Kyq-CD',
'info_dict': {
'id': 'v_kb6_o1Kyq-CD',
@ -77,6 +83,48 @@ class BanByeIE(BanByeBaseIE):
'view_count': int,
'comment_count': int,
},
}, {
# ['src']['hls']['levels'] variant m3u8 urls only; master m3u8 is 404
'url': 'https://banbye.com/watch/v_a_gPFuC9LoW5',
'info_dict': {
'id': 'v_a_gPFuC9LoW5',
'ext': 'mp4',
'title': 'md5:183524056bebdfa245fd6d214f63c0fe',
'description': 'md5:943ac87287ca98d28d8b8797719827c6',
'uploader': 'wRealu24',
'channel_id': 'ch_wrealu24',
'channel_url': 'https://banbye.com/channel/ch_wrealu24',
'upload_date': '20231113',
'timestamp': 1699874062,
'view_count': int,
'like_count': int,
'dislike_count': int,
'comment_count': int,
'thumbnail': 'https://cdn.banbye.com/video/v_a_gPFuC9LoW5/96.webp',
'tags': ['jaszczur', 'sejm', 'lewica', 'polska', 'ukrainizacja', 'pierwszeposiedzeniesejmu'],
},
'expected_warnings': ['Failed to download m3u8'],
}, {
# ['src']['hls']['masterPlaylist'] m3u8 only
'url': 'https://banbye.com/watch/v_B0rsKWsr-aaa',
'info_dict': {
'id': 'v_B0rsKWsr-aaa',
'ext': 'mp4',
'title': 'md5:00b254164b82101b3f9e5326037447ed',
'description': 'md5:3fd8b48aa81954ba024bc60f5de6e167',
'uploader': 'PSTV Piotr Szlachtowicz ',
'channel_id': 'ch_KV9EVObkB9wB',
'channel_url': 'https://banbye.com/channel/ch_KV9EVObkB9wB',
'upload_date': '20240629',
'timestamp': 1719646816,
'duration': 2377,
'view_count': int,
'like_count': int,
'dislike_count': int,
'comment_count': int,
'thumbnail': 'https://cdn.banbye.com/video/v_B0rsKWsr-aaa/96.webp',
'tags': ['Biden', 'Trump', 'Wybory', 'USA'],
},
}]
def _real_extract(self, url):
@ -91,11 +139,24 @@ def _real_extract(self, url):
'id': f'{quality}p',
'url': f'{self._CDN_BASE}/video/{video_id}/{quality}.webp',
} for quality in [48, 96, 144, 240, 512, 1080]]
formats = [{
'format_id': f'http-{quality}p',
'quality': quality,
'url': f'{self._CDN_BASE}/video/{video_id}/{quality}.mp4',
} for quality in data['quality']]
formats = []
url_data = self._download_json(f'{self._API_BASE}/videos/{video_id}/url', video_id, data=b'')
if master_url := traverse_obj(url_data, ('src', 'hls', 'masterPlaylist', {url_or_none})):
formats = self._extract_m3u8_formats(master_url, video_id, 'mp4', m3u8_id='hls', fatal=False)
for format_id, format_url in traverse_obj(url_data, (
'src', ('mp4', 'hls'), 'levels', {dict.items}, lambda _, v: url_or_none(v[1]))):
ext = determine_ext(format_url)
is_hls = ext == 'm3u8'
formats.append({
'url': format_url,
'ext': 'mp4' if is_hls else ext,
'format_id': join_nonempty(is_hls and 'hls', format_id),
'protocol': 'm3u8_native' if is_hls else 'https',
'height': int_or_none(format_id),
})
self._remove_duplicate_formats(formats)
return {
'id': video_id,

View File

@ -298,7 +298,7 @@ def _get_interactive_entries(self, video_id, cid, metainfo, headers=None):
class BiliBiliIE(BilibiliBaseIE):
_VALID_URL = r'https?://(?:www\.)?bilibili\.com/(?:video/|festival/\w+\?(?:[^#]*&)?bvid=)[aAbB][vV](?P<id>[^/?#&]+)'
_VALID_URL = r'https?://(?:www\.)?bilibili\.com/(?:video/|festival/[^/?#]+\?(?:[^#]*&)?bvid=)[aAbB][vV](?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.bilibili.com/video/BV13x41117TL',
@ -622,6 +622,10 @@ class BiliBiliIE(BilibiliBaseIE):
'ext': 'mp4',
},
'skip': 'geo-restricted',
}, {
'note': 'has - in the last path segment of the url',
'url': 'https://www.bilibili.com/festival/bh3-7th?bvid=BV1tr4y1f7p2&',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -12,7 +12,7 @@
class BoxIE(InfoExtractor):
_VALID_URL = r'https?://(?:[^.]+\.)?app\.box\.com/s/(?P<shared_name>[^/?#]+)(?:/file/(?P<id>\d+))?'
_VALID_URL = r'https?://(?:[^.]+\.)?(?P<service>app|ent)\.box\.com/s/(?P<shared_name>[^/?#]+)(?:/file/(?P<id>\d+))?'
_TESTS = [{
'url': 'https://mlssoccer.app.box.com/s/0evd2o3e08l60lr4ygukepvnkord1o1x/file/510727257538',
'md5': '1f81b2fd3960f38a40a3b8823e5fcd43',
@ -38,10 +38,22 @@ class BoxIE(InfoExtractor):
'uploader_id': '239068974',
},
'params': {'skip_download': 'dash fragment too small'},
}, {
'url': 'https://thejacksonlaboratory.ent.box.com/s/2x09dm6vcg6y28o0oox1so4l0t8wzt6l/file/1536173056065',
'info_dict': {
'id': '1536173056065',
'ext': 'mp4',
'uploader_id': '18523128264',
'uploader': 'Lexi Hennigan',
'title': 'iPSC Symposium recording part 1.mp4',
'timestamp': 1716228343,
'upload_date': '20240520',
},
'params': {'skip_download': 'dash fragment too small'},
}]
def _real_extract(self, url):
shared_name, file_id = self._match_valid_url(url).groups()
shared_name, file_id, service = self._match_valid_url(url).group('shared_name', 'id', 'service')
webpage = self._download_webpage(url, file_id or shared_name)
if not file_id:
@ -57,14 +69,14 @@ def _real_extract(self, url):
request_token = self._search_json(
r'Box\.config\s*=', webpage, 'Box config', file_id)['requestToken']
access_token = self._download_json(
'https://app.box.com/app-api/enduserapp/elements/tokens', file_id,
f'https://{service}.box.com/app-api/enduserapp/elements/tokens', file_id,
'Downloading token JSON metadata',
data=json.dumps({'fileIDs': [file_id]}).encode(), headers={
'Content-Type': 'application/json',
'X-Request-Token': request_token,
'X-Box-EndUser-API': 'sharedName=' + shared_name,
})[file_id]['read']
shared_link = 'https://app.box.com/s/' + shared_name
shared_link = f'https://{service}.box.com/s/{shared_name}'
f = self._download_json(
'https://api.box.com/2.0/files/' + file_id, file_id,
'Downloading file JSON metadata', headers={

View File

@ -1,4 +1,5 @@
import base64
import functools
import json
import re
import time
@ -6,17 +7,24 @@
import xml.etree.ElementTree
from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
join_nonempty,
js_to_json,
mimetype2ext,
orderedSet,
parse_iso8601,
replace_extension,
smuggle_url,
strip_or_none,
traverse_obj,
try_get,
update_url,
url_basename,
url_or_none,
)
@ -149,6 +157,7 @@ def _real_extract(self, url):
class CBCPlayerIE(InfoExtractor):
IE_NAME = 'cbc.ca:player'
_VALID_URL = r'(?:cbcplayer:|https?://(?:www\.)?cbc\.ca/(?:player/play/(?:video/)?|i/caffeine/syndicate/\?mediaId=))(?P<id>(?:\d\.)?\d+)'
_GEO_COUNTRIES = ['CA']
_TESTS = [{
'url': 'http://www.cbc.ca/player/play/2683190193',
'md5': '64d25f841ddf4ddb28a235338af32e2c',
@ -172,21 +181,20 @@ class CBCPlayerIE(InfoExtractor):
'description': 'md5:dd3b692f0a139b0369943150bd1c46a9',
'timestamp': 1425704400,
'upload_date': '20150307',
'uploader': 'CBCC-NEW',
'thumbnail': 'http://thumbnails.cbc.ca/maven_legacy/thumbnails/sonali-karnick-220.jpg',
'thumbnail': 'https://i.cbc.ca/ais/1.2985700,1717262248558/full/max/0/default.jpg',
'chapters': [],
'duration': 494.811,
'categories': ['AudioMobile/All in a Weekend Montreal'],
'tags': 'count:8',
'categories': ['All in a Weekend Montreal'],
'tags': 'count:11',
'location': 'Quebec',
'series': 'All in a Weekend Montreal',
'season': 'Season 2015',
'season_number': 2015,
'media_type': 'Excerpt',
'genres': ['Other'],
},
}, {
'url': 'http://www.cbc.ca/i/caffeine/syndicate/?mediaId=2164402062',
'md5': '33fcd8f6719b9dd60a5e73adcb83b9f6',
'info_dict': {
'id': '2164402062',
'ext': 'mp4',
@ -194,107 +202,168 @@ class CBCPlayerIE(InfoExtractor):
'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.',
'timestamp': 1320410746,
'upload_date': '20111104',
'uploader': 'CBCC-NEW',
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/277/67/cancer_852x480_2164412612.jpg',
'thumbnail': 'https://i.cbc.ca/ais/1.1711287,1717139372111/full/max/0/default.jpg',
'chapters': [],
'duration': 186.867,
'series': 'CBC News: Windsor at 6:00',
'categories': ['News/Canada/Windsor'],
'categories': ['Windsor'],
'location': 'Windsor',
'tags': ['cancer'],
'creators': ['Allison Johnson'],
'tags': ['Cancer', 'News/Canada/Windsor', 'Windsor'],
'media_type': 'Excerpt',
'genres': ['News'],
},
'params': {'skip_download': 'm3u8'},
}, {
# Redirected from http://www.cbc.ca/player/AudioMobile/All%20in%20a%20Weekend%20Montreal/ID/2657632011/
'url': 'https://www.cbc.ca/player/play/1.2985700',
'md5': 'e5e708c34ae6fca156aafe17c43e8b75',
'info_dict': {
'id': '2657631896',
'id': '1.2985700',
'ext': 'mp3',
'title': 'CBC Montreal is organizing its first ever community hackathon!',
'description': 'The modern technology we tend to depend on so heavily, is never without it\'s share of hiccups and headaches. Next weekend - CBC Montreal will be getting members of the public for its first Hackathon.',
'timestamp': 1425704400,
'upload_date': '20150307',
'uploader': 'CBCC-NEW',
'thumbnail': 'http://thumbnails.cbc.ca/maven_legacy/thumbnails/sonali-karnick-220.jpg',
'thumbnail': 'https://i.cbc.ca/ais/1.2985700,1717262248558/full/max/0/default.jpg',
'chapters': [],
'duration': 494.811,
'categories': ['AudioMobile/All in a Weekend Montreal'],
'tags': 'count:8',
'categories': ['All in a Weekend Montreal'],
'tags': 'count:11',
'location': 'Quebec',
'series': 'All in a Weekend Montreal',
'season': 'Season 2015',
'season_number': 2015,
'media_type': 'Excerpt',
'genres': ['Other'],
},
}, {
'url': 'https://www.cbc.ca/player/play/1.1711287',
'md5': '33fcd8f6719b9dd60a5e73adcb83b9f6',
'info_dict': {
'id': '2164402062',
'id': '1.1711287',
'ext': 'mp4',
'title': 'Cancer survivor four times over',
'description': 'Tim Mayer has beaten three different forms of cancer four times in five years.',
'timestamp': 1320410746,
'upload_date': '20111104',
'uploader': 'CBCC-NEW',
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/277/67/cancer_852x480_2164412612.jpg',
'thumbnail': 'https://i.cbc.ca/ais/1.1711287,1717139372111/full/max/0/default.jpg',
'chapters': [],
'duration': 186.867,
'series': 'CBC News: Windsor at 6:00',
'categories': ['News/Canada/Windsor'],
'categories': ['Windsor'],
'location': 'Windsor',
'tags': ['cancer'],
'creators': ['Allison Johnson'],
'tags': ['Cancer', 'News/Canada/Windsor', 'Windsor'],
'media_type': 'Excerpt',
'genres': ['News'],
},
'params': {'skip_download': 'm3u8'},
}, {
# Has subtitles
# These broadcasts expire after ~1 month, can find new test URL here:
# https://www.cbc.ca/player/news/TV%20Shows/The%20National/Latest%20Broadcast
'url': 'https://www.cbc.ca/player/play/1.7159484',
'md5': '6ed6cd0fc2ef568d2297ba68a763d455',
'url': 'https://www.cbc.ca/player/play/video/9.6424403',
'md5': '8025909eaffcf0adf59922904def9a5e',
'info_dict': {
'id': '2324213316001',
'id': '9.6424403',
'ext': 'mp4',
'title': 'The National | School boards sue social media giants',
'description': 'md5:4b4db69322fa32186c3ce426da07402c',
'timestamp': 1711681200,
'duration': 2743.400,
'subtitles': {'eng': [{'ext': 'vtt', 'protocol': 'm3u8_native'}]},
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/607/559/thumbnail.jpeg',
'uploader': 'CBCC-NEW',
'title': 'The National | N.W.T. wildfire emergency',
'description': 'md5:ada33d36d1df69347ed575905bfd496c',
'timestamp': 1718589600,
'duration': 2692.833,
'subtitles': {
'en-US': [{
'name': 'English Captions',
'url': 'https://cbchls.akamaized.net/delivery/news-shows/2024/06/17/NAT_JUN16-00-55-00/NAT_JUN16_cc.vtt',
}],
},
'thumbnail': 'https://i.cbc.ca/ais/6272b5c6-5e78-4c05-915d-0e36672e33d1,1714756287822/full/max/0/default.jpg',
'chapters': 'count:5',
'upload_date': '20240329',
'categories': 'count:4',
'upload_date': '20240617',
'categories': ['News', 'The National', 'The National Latest Broadcasts'],
'series': 'The National - Full Show',
'tags': 'count:1',
'creators': ['News'],
'tags': ['The National'],
'location': 'Canada',
'media_type': 'Full Program',
'genres': ['News'],
},
}, {
'url': 'https://www.cbc.ca/player/play/video/1.7194274',
'md5': '188b96cf6bdcb2540e178a6caa957128',
'info_dict': {
'id': '2334524995812',
'id': '1.7194274',
'ext': 'mp4',
'title': '#TheMoment a rare white spirit moose was spotted in Alberta',
'description': 'md5:18ae269a2d0265c5b0bbe4b2e1ac61a3',
'timestamp': 1714788791,
'duration': 77.678,
'subtitles': {'eng': [{'ext': 'vtt', 'protocol': 'm3u8_native'}]},
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/201/543/THE_MOMENT.jpg',
'uploader': 'CBCC-NEW',
'chapters': 'count:0',
'upload_date': '20240504',
'thumbnail': 'https://i.cbc.ca/ais/1.7194274,1717224990425/full/max/0/default.jpg',
'chapters': [],
'categories': 'count:3',
'series': 'The National',
'tags': 'count:15',
'creators': ['encoder'],
'tags': 'count:17',
'location': 'Canada',
'media_type': 'Excerpt',
'upload_date': '20240504',
'genres': ['News'],
},
}, {
'url': 'https://www.cbc.ca/player/play/video/9.6427282',
'info_dict': {
'id': '9.6427282',
'ext': 'mp4',
'title': 'Men\'s Soccer - Argentina vs Morocco',
'description': 'Argentina faces Morocco on the football pitch at Saint Etienne Stadium.',
'series': 'CBC Sports',
'media_type': 'Event Coverage',
'thumbnail': 'https://i.cbc.ca/ais/a4c5c0c2-99fa-4bd3-8061-5a63879c1b33,1718828053500/full/max/0/default.jpg',
'timestamp': 1721825400.0,
'upload_date': '20240724',
'duration': 10568.0,
'chapters': [],
'genres': [],
'tags': ['2024 Paris Olympic Games'],
'categories': ['Olympics Summer Soccer', 'Summer Olympics Replays', 'Summer Olympics Soccer Replays'],
'location': 'Canada',
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://www.cbc.ca/player/play/video/9.6459530',
'md5': '6c1bb76693ab321a2e99c347a1d5ecbc',
'info_dict': {
'id': '9.6459530',
'ext': 'mp4',
'title': 'Parts of Jasper incinerated as wildfire rages',
'description': 'md5:6f1caa8d128ad3f629257ef5fecf0962',
'series': 'The National',
'media_type': 'Excerpt',
'thumbnail': 'https://i.cbc.ca/ais/507c0086-31a2-494d-96e4-bffb1048d045,1721953984375/full/max/0/default.jpg',
'timestamp': 1721964091.012,
'upload_date': '20240726',
'duration': 952.285,
'chapters': [],
'genres': [],
'tags': 'count:23',
'categories': ['News (FAST)', 'News', 'The National', 'TV News Shows', 'The National '],
},
}, {
'url': 'https://www.cbc.ca/player/play/video/9.6420651',
'md5': '71a850c2c6ee5e912de169f5311bb533',
'info_dict': {
'id': '9.6420651',
'ext': 'mp4',
'title': 'Is it a breath of fresh air? Measuring air quality in Edmonton',
'description': 'md5:3922b92cc8b69212d739bd9dd095b1c3',
'series': 'CBC News Edmonton',
'media_type': 'Excerpt',
'thumbnail': 'https://i.cbc.ca/ais/73c4ab9c-7ad4-46ee-bb9b-020fdc01c745,1718214547576/full/max/0/default.jpg',
'timestamp': 1718220065.768,
'upload_date': '20240612',
'duration': 286.086,
'chapters': [],
'genres': ['News'],
'categories': ['News', 'Edmonton'],
'tags': 'count:7',
'location': 'Edmonton',
},
}, {
'url': 'cbcplayer:1.7159484',
@ -307,25 +376,115 @@ class CBCPlayerIE(InfoExtractor):
'only_matching': True,
}]
def _parse_param(self, asset_data, name):
return traverse_obj(asset_data, ('params', lambda _, v: v['name'] == name, 'value', {str}, any))
def _real_extract(self, url):
video_id = self._match_id(url)
if '.' in video_id:
webpage = self._download_webpage(f'https://www.cbc.ca/player/play/{video_id}', video_id)
video_id = self._search_json(
r'window\.__INITIAL_STATE__\s*=', webpage,
'initial state', video_id)['video']['currentClip']['mediaId']
data = self._search_json(
r'window\.__INITIAL_STATE__\s*=', webpage, 'initial state', video_id)['video']['currentClip']
assets = traverse_obj(
data, ('media', 'assets', lambda _, v: url_or_none(v['key']) and v['type']))
if not assets and (media_id := traverse_obj(data, ('mediaId', {str}))):
# XXX: Deprecated; CBC is migrating off of ThePlatform
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
f'http://link.theplatform.com/s/ExhSPC/media/guid/2655402169/{video_id}?mbr=true&formats=MPEG4,FLV,MP3', {
f'http://link.theplatform.com/s/ExhSPC/media/guid/2655402169/{media_id}?mbr=true&formats=MPEG4,FLV,MP3', {
'force_smil_url': True,
}),
'id': video_id,
'id': media_id,
'_format_sort_fields': ('res', 'proto'), # Prioritize direct http formats over HLS
}
is_live = traverse_obj(data, ('media', 'streamType', {str})) == 'Live'
formats, subtitles = [], {}
for sub in traverse_obj(data, ('media', 'textTracks', lambda _, v: url_or_none(v['src']))):
subtitles.setdefault(sub.get('language') or 'und', []).append({
'url': sub['src'],
'name': sub.get('label'),
})
for asset in assets:
asset_key = asset['key']
asset_type = asset['type']
if asset_type != 'medianet':
self.report_warning(f'Skipping unsupported asset type "{asset_type}": {asset_key}')
continue
asset_data = self._download_json(asset_key, video_id, f'Downloading {asset_type} JSON')
ext = mimetype2ext(self._parse_param(asset_data, 'contentType'))
if ext == 'm3u8':
fmts, subs = self._extract_m3u8_formats_and_subtitles(
asset_data['url'], video_id, 'mp4', m3u8_id='hls', live=is_live)
formats.extend(fmts)
# Avoid slow/error-prone webvtt-over-m3u8 if direct https vtt is available
if not subtitles:
self._merge_subtitles(subs, target=subtitles)
if is_live or not fmts:
continue
# Check for direct https mp4 format
best_video_fmt = traverse_obj(fmts, (
lambda _, v: v.get('vcodec') != 'none' and v['tbr'], all,
{functools.partial(sorted, key=lambda x: x['tbr'])}, -1, {dict})) or {}
base_url = self._search_regex(
r'(https?://[^?#]+?/)hdntl=', best_video_fmt.get('url'), 'base url', default=None)
if not base_url or '/live/' in base_url:
continue
mp4_url = base_url + replace_extension(url_basename(best_video_fmt['url']), 'mp4')
if self._request_webpage(
HEADRequest(mp4_url), video_id, 'Checking for https format',
errnote=False, fatal=False):
formats.append({
**best_video_fmt,
'url': mp4_url,
'format_id': 'https-mp4',
'protocol': 'https',
'manifest_url': None,
'acodec': None,
})
else:
formats.append({
'url': asset_data['url'],
'ext': ext,
'vcodec': 'none' if self._parse_param(asset_data, 'mediaType') == 'audio' else None,
})
chapters = traverse_obj(data, (
'media', 'chapters', lambda _, v: float(v['startTime']) is not None, {
'start_time': ('startTime', {functools.partial(float_or_none, scale=1000)}),
'end_time': ('endTime', {functools.partial(float_or_none, scale=1000)}),
'title': ('name', {str}),
}))
# Filter out pointless single chapters with start_time==0 and no end_time
if len(chapters) == 1 and not (chapters[0].get('start_time') or chapters[0].get('end_time')):
chapters = []
return {
**traverse_obj(data, {
'title': ('title', {str}),
'description': ('description', {str.strip}),
'thumbnail': ('image', 'url', {url_or_none}, {functools.partial(update_url, query=None)}),
'timestamp': ('publishedAt', {functools.partial(float_or_none, scale=1000)}),
'media_type': ('media', 'clipType', {str}),
'series': ('showName', {str}),
'season_number': ('media', 'season', {int_or_none}),
'duration': ('media', 'duration', {float_or_none}, {lambda x: None if is_live else x}),
'location': ('media', 'region', {str}),
'tags': ('tags', ..., 'name', {str}),
'genres': ('media', 'genre', all),
'categories': ('categories', ..., 'name', {str}),
}),
'id': video_id,
'formats': formats,
'subtitles': subtitles,
'chapters': chapters,
'is_live': is_live,
}
class CBCPlayerPlaylistIE(InfoExtractor):
IE_NAME = 'cbc.ca:player:playlist'
@ -647,11 +806,11 @@ class CBCGemLiveIE(InfoExtractor):
'title': 'Ottawa',
'description': 'The live TV channel and local programming from Ottawa',
'thumbnail': 'https://thumbnails.cbc.ca/maven_legacy/thumbnails/CBC_OTT_VMS/Live_Channel_Static_Images/Ottawa_2880x1620.jpg',
'is_live': True,
'live_status': 'is_live',
'id': 'AyqZwxRqh8EH',
'ext': 'mp4',
'timestamp': 1492106160,
'upload_date': '20170413',
'release_timestamp': 1492106160,
'release_date': '20170413',
'uploader': 'CBCC-NEW',
},
'skip': 'Live might have ended',
@ -680,32 +839,65 @@ class CBCGemLiveIE(InfoExtractor):
'description': 'March 24, 2023 | President Bidens Ottawa visit ends with big pledges from both countries. Plus, Gwyneth Paltrow testifies in her ski collision trial.',
'live_status': 'is_live',
'thumbnail': r're:https://images.gem.cbc.ca/v1/cbc-gem/live/.*',
'timestamp': 1679706000,
'upload_date': '20230325',
'release_timestamp': 1679706000,
'release_date': '20230325',
},
'params': {'skip_download': True},
'skip': 'Live might have ended',
},
{ # event replay (medianetlive)
'url': 'https://gem.cbc.ca/live-event/42314',
'md5': '297a9600f554f2258aed01514226a697',
'info_dict': {
'id': '42314',
'ext': 'mp4',
'live_status': 'was_live',
'title': 'Women\'s Soccer - Canada vs New Zealand',
'description': 'md5:36200e5f1a70982277b5a6ecea86155d',
'thumbnail': r're:https://.+default\.jpg',
'release_timestamp': 1721917200,
'release_date': '20240725',
},
'params': {'skip_download': True},
'skip': 'Replay might no longer be available',
},
{ # event replay (medianetlive)
'url': 'https://gem.cbc.ca/live-event/43273',
'only_matching': True,
},
]
_GEO_COUNTRIES = ['CA']
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_info = self._search_nextjs_data(webpage, video_id)['props']['pageProps']['data']
# Two types of metadata JSON
# Three types of video_info JSON: info in root, freeTv stream/item, event replay
if not video_info.get('formattedIdMedia'):
video_info = traverse_obj(
video_info, (('freeTv', ('streams', ...)), 'items', lambda _, v: v['key'] == video_id, {dict}),
get_all=False, default={})
if traverse_obj(video_info, ('event', 'key')) == video_id:
video_info = video_info['event']
else:
video_info = traverse_obj(video_info, (
('freeTv', ('streams', ...)), 'items',
lambda _, v: v['key'].partition('-')[0] == video_id, any)) or {}
video_stream_id = video_info.get('formattedIdMedia')
if not video_stream_id:
raise ExtractorError('Couldn\'t find video metadata, maybe this livestream is now offline', expected=True)
raise ExtractorError(
'Couldn\'t find video metadata, maybe this livestream is now offline', expected=True)
live_status = 'was_live' if video_info.get('isVodEnabled') else 'is_live'
release_timestamp = traverse_obj(video_info, ('airDate', {parse_iso8601}))
if live_status == 'is_live' and release_timestamp and release_timestamp > time.time():
formats = []
live_status = 'is_upcoming'
self.raise_no_formats('This livestream has not yet started', expected=True)
else:
stream_data = self._download_json(
'https://services.radio-canada.ca/media/validation/v2/', video_id, query={
'appCode': 'mpx',
'appCode': 'medianetlive',
'connectionType': 'hd',
'deviceType': 'ipad',
'idMedia': video_stream_id,
@ -714,15 +906,17 @@ def _real_extract(self, url):
'tech': 'hls',
'manifestType': 'desktop',
})
formats = self._extract_m3u8_formats(
stream_data['url'], video_id, 'mp4', live=live_status == 'is_live')
return {
'id': video_id,
'formats': self._extract_m3u8_formats(stream_data['url'], video_id, 'mp4', live=True),
'is_live': True,
'formats': formats,
'live_status': live_status,
'release_timestamp': release_timestamp,
**traverse_obj(video_info, {
'title': 'title',
'description': 'description',
'title': ('title', {str}),
'description': ('description', {str}),
'thumbnail': ('images', 'card', 'url'),
'timestamp': ('airDate', {parse_iso8601}),
}),
}

View File

@ -1,63 +1,50 @@
from .common import InfoExtractor
from ..utils import traverse_obj
from .vidyard import VidyardBaseIE, VidyardIE
from ..utils import ExtractorError, make_archive_id, url_basename
class CellebriteIE(InfoExtractor):
class CellebriteIE(VidyardBaseIE):
_VALID_URL = r'https?://cellebrite\.com/(?:\w+)?/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://cellebrite.com/en/collect-data-from-android-devices-with-cellebrite-ufed/',
'info_dict': {
'id': '16025876',
'id': 'ZqmUss3dQfEMGpauambPuH',
'display_id': '16025876',
'ext': 'mp4',
'description': 'md5:174571cb97083fd1d457d75c684f4e2b',
'thumbnail': 'https://cellebrite.com/wp-content/uploads/2021/05/Chat-Capture-1024x559.png',
'title': 'Ask the Expert: Chat Capture - Collect Data from Android Devices in Cellebrite UFED',
'duration': 455,
'tags': [],
'description': 'md5:dee48fe12bbae5c01fe6a053f7676da4',
'thumbnail': 'https://cellebrite.com/wp-content/uploads/2021/05/Chat-Capture-1024x559.png',
'duration': 455.979,
'_old_archive_ids': ['cellebrite 16025876'],
},
}, {
'url': 'https://cellebrite.com/en/how-to-lawfully-collect-the-maximum-amount-of-data-from-android-devices/',
'info_dict': {
'id': '29018255',
'id': 'QV1U8a2yzcxigw7VFnqKyg',
'display_id': '29018255',
'ext': 'mp4',
'duration': 134,
'tags': [],
'description': 'md5:e9a3d124c7287b0b07bad2547061cacf',
'title': 'How to Lawfully Collect the Maximum Amount of Data From Android Devices',
'description': 'md5:0e943a9ac14c374d5d74faed634d773c',
'thumbnail': 'https://cellebrite.com/wp-content/uploads/2022/07/How-to-Lawfully-Collect-the-Maximum-Amount-of-Data-From-Android-Devices.png',
'title': 'Android Extractions Explained',
'duration': 134.315,
'_old_archive_ids': ['cellebrite 29018255'],
},
}]
def _get_formats_and_subtitles(self, json_data, display_id):
formats = [{'url': url} for url in traverse_obj(json_data, ('mp4', ..., 'url')) or []]
subtitles = {}
for url in traverse_obj(json_data, ('hls', ..., 'url')) or []:
fmt, sub = self._extract_m3u8_formats_and_subtitles(
url, display_id, ext='mp4', headers={'Referer': 'https://play.vidyard.com/'})
formats.extend(fmt)
self._merge_subtitles(sub, target=subtitles)
return formats, subtitles
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
slug = self._match_id(url)
webpage = self._download_webpage(url, slug)
vidyard_url = next(VidyardIE._extract_embed_urls(url, webpage), None)
if not vidyard_url:
raise ExtractorError('No Vidyard video embeds found on page')
player_uuid = self._search_regex(
r'<img\s[^>]*\bdata-uuid\s*=\s*"([^"\?]+)', webpage, 'player UUID')
json_data = self._download_json(
f'https://play.vidyard.com/player/{player_uuid}.json', display_id)['payload']['chapters'][0]
video_id = url_basename(vidyard_url)
info = self._process_video_json(self._fetch_video_json(video_id)['chapters'][0], video_id)
if info.get('display_id'):
info['_old_archive_ids'] = [make_archive_id(self, info['display_id'])]
if thumbnail := self._og_search_thumbnail(webpage, default=None):
info.setdefault('thumbnails', []).append({'url': thumbnail})
formats, subtitles = self._get_formats_and_subtitles(json_data['sources'], display_id)
return {
'id': str(json_data['videoId']),
'title': json_data.get('name') or self._og_search_title(webpage),
'formats': formats,
'subtitles': subtitles,
'description': json_data.get('description') or self._og_search_description(webpage),
'duration': json_data.get('seconds'),
'tags': json_data.get('tags'),
'thumbnail': self._og_search_thumbnail(webpage),
'http_headers': {'Referer': 'https://play.vidyard.com/'},
'description': self._og_search_description(webpage, default=None),
**info,
}

View File

@ -36,7 +36,7 @@ class CHZZKLiveIE(InfoExtractor):
def _real_extract(self, url):
channel_id = self._match_id(url)
live_detail = self._download_json(
f'https://api.chzzk.naver.com/service/v2/channels/{channel_id}/live-detail', channel_id,
f'https://api.chzzk.naver.com/service/v3/channels/{channel_id}/live-detail', channel_id,
note='Downloading channel info', errnote='Unable to download channel info')['content']
if live_detail.get('status') == 'CLOSE':
@ -106,12 +106,45 @@ class CHZZKVideoIE(InfoExtractor):
'upload_date': '20231219',
'view_count': int,
},
'skip': 'Replay video is expired',
}, {
# Manually uploaded video
'url': 'https://chzzk.naver.com/video/1980',
'info_dict': {
'id': '1980',
'ext': 'mp4',
'title': '※시청주의※한번보면 잊기 힘든 영상',
'channel': '라디유radiyu',
'channel_id': '68f895c59a1043bc5019b5e08c83a5c5',
'channel_is_verified': False,
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 95,
'timestamp': 1703102631.722,
'upload_date': '20231220',
'view_count': int,
},
}, {
# Partner channel replay video
'url': 'https://chzzk.naver.com/video/2458',
'info_dict': {
'id': '2458',
'ext': 'mp4',
'title': '첫 방송',
'channel': '강지',
'channel_id': 'b5ed5db484d04faf4d150aedd362f34b',
'channel_is_verified': True,
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 4433,
'timestamp': 1703307460.214,
'upload_date': '20231223',
'view_count': int,
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
video_meta = self._download_json(
f'https://api.chzzk.naver.com/service/v2/videos/{video_id}', video_id,
f'https://api.chzzk.naver.com/service/v3/videos/{video_id}', video_id,
note='Downloading video info', errnote='Unable to download video info')['content']
formats, subtitles = self._extract_mpd_formats_and_subtitles(
f'https://apis.naver.com/neonplayer/vodplay/v1/playback/{video_meta["videoId"]}', video_id,

View File

@ -3150,7 +3150,7 @@ def _parse_ism_formats_and_subtitles(self, ism_doc, ism_url, ism_id=None):
})
return formats, subtitles
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8_native', mpd_id=None, preference=None, quality=None):
def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8_native', mpd_id=None, preference=None, quality=None, _headers=None):
def absolute_url(item_url):
return urljoin(base_url, item_url)
@ -3174,11 +3174,11 @@ def _media_formats(src, cur_media_type, type_info=None):
formats = self._extract_m3u8_formats(
full_url, video_id, ext='mp4',
entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id,
preference=preference, quality=quality, fatal=False)
preference=preference, quality=quality, fatal=False, headers=_headers)
elif ext == 'mpd':
is_plain_url = False
formats = self._extract_mpd_formats(
full_url, video_id, mpd_id=mpd_id, fatal=False)
full_url, video_id, mpd_id=mpd_id, fatal=False, headers=_headers)
else:
is_plain_url = True
formats = [{
@ -3272,6 +3272,8 @@ def _media_formats(src, cur_media_type, type_info=None):
})
for f in media_info['formats']:
f.setdefault('http_headers', {})['Referer'] = base_url
if _headers:
f['http_headers'].update(_headers)
if media_info['formats'] or media_info['subtitles']:
entries.append(media_info)
return entries

View File

@ -1,6 +1,8 @@
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
parse_codecs,
try_get,
url_or_none,
urlencode_postdata,
@ -12,6 +14,7 @@ class DigitalConcertHallIE(InfoExtractor):
IE_DESC = 'DigitalConcertHall extractor'
_VALID_URL = r'https?://(?:www\.)?digitalconcerthall\.com/(?P<language>[a-z]+)/(?P<type>film|concert|work)/(?P<id>[0-9]+)-?(?P<part>[0-9]+)?'
_OAUTH_URL = 'https://api.digitalconcerthall.com/v2/oauth2/token'
_USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.5 Safari/605.1.15'
_ACCESS_TOKEN = None
_NETRC_MACHINE = 'digitalconcerthall'
_TESTS = [{
@ -68,33 +71,42 @@ class DigitalConcertHallIE(InfoExtractor):
}]
def _perform_login(self, username, password):
token_response = self._download_json(
login_token = self._download_json(
self._OAUTH_URL,
None, 'Obtaining token', errnote='Unable to obtain token', data=urlencode_postdata({
'affiliate': 'none',
'grant_type': 'device',
'device_vendor': 'unknown',
# device_model 'Safari' gets split streams of 4K/HEVC video and lossless/FLAC audio
'device_model': 'unknown' if self._configuration_arg('prefer_combined_hls') else 'Safari',
'app_id': 'dch.webapp',
'app_version': '1.0.0',
'app_distributor': 'berlinphil',
'app_version': '1.84.0',
'client_secret': '2ySLN+2Fwb',
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
})
self._ACCESS_TOKEN = token_response['access_token']
'Accept': 'application/json',
'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8',
'User-Agent': self._USER_AGENT,
})['access_token']
try:
self._download_json(
login_response = self._download_json(
self._OAUTH_URL,
None, note='Logging in', errnote='Unable to login', data=urlencode_postdata({
'grant_type': 'password',
'username': username,
'password': password,
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
'Accept': 'application/json',
'Content-Type': 'application/x-www-form-urlencoded;charset=UTF-8',
'Referer': 'https://www.digitalconcerthall.com',
'Authorization': f'Bearer {self._ACCESS_TOKEN}',
'Authorization': f'Bearer {login_token}',
'User-Agent': self._USER_AGENT,
})
except ExtractorError:
self.raise_login_required(msg='Login info incorrect')
except ExtractorError as error:
if isinstance(error.cause, HTTPError) and error.cause.status == 401:
raise ExtractorError('Invalid username or password', expected=True)
raise
self._ACCESS_TOKEN = login_response['access_token']
def _real_initialize(self):
if not self._ACCESS_TOKEN:
@ -108,11 +120,15 @@ def _entries(self, items, language, type_, **kwargs):
'Accept': 'application/json',
'Authorization': f'Bearer {self._ACCESS_TOKEN}',
'Accept-Language': language,
'User-Agent': self._USER_AGENT,
})
formats = []
for m3u8_url in traverse_obj(stream_info, ('channel', ..., 'stream', ..., 'url', {url_or_none})):
formats.extend(self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', fatal=False))
formats.extend(self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
for fmt in formats:
if fmt.get('format_note') and fmt.get('vcodec') == 'none':
fmt.update(parse_codecs(fmt['format_note']))
yield {
'id': video_id,
@ -140,13 +156,15 @@ def _real_extract(self, url):
f'https://api.digitalconcerthall.com/v2/{api_type}/{video_id}', video_id, headers={
'Accept': 'application/json',
'Accept-Language': language,
'User-Agent': self._USER_AGENT,
'Authorization': f'Bearer {self._ACCESS_TOKEN}',
})
album_artists = traverse_obj(vid_info, ('_links', 'artist', ..., 'name'))
videos = [vid_info] if type_ == 'film' else traverse_obj(vid_info, ('_embedded', ..., ...))
if type_ == 'work':
videos = [videos[int(part) - 1]]
album_artists = traverse_obj(vid_info, ('_links', 'artist', ..., 'name', {str}))
thumbnail = traverse_obj(vid_info, (
'image', ..., {self._proto_relative_url}, {url_or_none},
{lambda x: x.format(width=0, height=0)}, any)) # NB: 0x0 is the original size

View File

@ -1,115 +0,0 @@
import random
import string
import urllib.parse
from .discoverygo import DiscoveryGoBaseIE
from ..networking.exceptions import HTTPError
from ..utils import ExtractorError
class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?://
(?P<site>
go\.discovery|
www\.
(?:
investigationdiscovery|
discoverylife|
animalplanet|
ahctv|
destinationamerica|
sciencechannel|
tlc
)|
watch\.
(?:
hgtv|
foodnetwork|
travelchannel|
diynetwork|
cookingchanneltv|
motortrend
)
)\.com/tv-shows/(?P<show_slug>[^/]+)/(?:video|full-episode)s/(?P<id>[^./?#]+)'''
_TESTS = [{
'url': 'https://go.discovery.com/tv-shows/cash-cab/videos/riding-with-matthew-perry',
'info_dict': {
'id': '5a2f35ce6b66d17a5026e29e',
'ext': 'mp4',
'title': 'Riding with Matthew Perry',
'description': 'md5:a34333153e79bc4526019a5129e7f878',
'duration': 84,
},
'params': {
'skip_download': True, # requires ffmpeg
},
}, {
'url': 'https://www.investigationdiscovery.com/tv-shows/final-vision/full-episodes/final-vision',
'only_matching': True,
}, {
'url': 'https://go.discovery.com/tv-shows/alaskan-bush-people/videos/follow-your-own-road',
'only_matching': True,
}, {
# using `show_slug` is important to get the correct video data
'url': 'https://www.sciencechannel.com/tv-shows/mythbusters-on-science/full-episodes/christmas-special',
'only_matching': True,
}]
_GEO_COUNTRIES = ['US']
_GEO_BYPASS = False
_API_BASE_URL = 'https://api.discovery.com/v1/'
def _real_extract(self, url):
site, show_slug, display_id = self._match_valid_url(url).groups()
access_token = None
cookies = self._get_cookies(url)
# prefer Affiliate Auth Token over Anonymous Auth Token
auth_storage_cookie = cookies.get('eosAf') or cookies.get('eosAn')
if auth_storage_cookie and auth_storage_cookie.value:
auth_storage = self._parse_json(urllib.parse.unquote(
urllib.parse.unquote(auth_storage_cookie.value)),
display_id, fatal=False) or {}
access_token = auth_storage.get('a') or auth_storage.get('access_token')
if not access_token:
access_token = self._download_json(
f'https://{site}.com/anonymous', display_id,
'Downloading token JSON metadata', query={
'authRel': 'authorization',
'client_id': '3020a40c2356a645b4b4',
'nonce': ''.join(random.choices(string.ascii_letters, k=32)),
'redirectUri': 'https://www.discovery.com/',
})['access_token']
headers = self.geo_verification_headers()
headers['Authorization'] = 'Bearer ' + access_token
try:
video = self._download_json(
self._API_BASE_URL + 'content/videos',
display_id, 'Downloading content JSON metadata',
headers=headers, query={
'embed': 'show.name',
'fields': 'authenticated,description.detailed,duration,episodeNumber,id,name,parental.rating,season.number,show,tags',
'slug': display_id,
'show_slug': show_slug,
})[0]
video_id = video['id']
stream = self._download_json(
self._API_BASE_URL + 'streaming/video/' + video_id,
display_id, 'Downloading streaming JSON metadata', headers=headers)
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status in (401, 403):
e_description = self._parse_json(
e.cause.response.read().decode(), display_id)['description']
if 'resource not available for country' in e_description:
self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
if 'Authorized Networks' in e_description:
raise ExtractorError(
'This video is only available via cable service provider subscription that'
' is not currently supported. You may want to use --cookies.', expected=True)
raise ExtractorError(e_description)
raise
return self._extract_video_info(video, stream, display_id)

View File

@ -1,171 +0,0 @@
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
determine_ext,
extract_attributes,
int_or_none,
parse_age_limit,
remove_end,
unescapeHTML,
url_or_none,
)
class DiscoveryGoBaseIE(InfoExtractor):
_VALID_URL_TEMPLATE = r'''(?x)https?://(?:www\.)?(?:
discovery|
investigationdiscovery|
discoverylife|
animalplanet|
ahctv|
destinationamerica|
sciencechannel|
tlc|
velocitychannel
)go\.com/%s(?P<id>[^/?#&]+)'''
def _extract_video_info(self, video, stream, display_id):
title = video['name']
if not stream:
if video.get('authenticated') is True:
raise ExtractorError(
'This video is only available via cable service provider subscription that'
' is not currently supported. You may want to use --cookies.', expected=True)
else:
raise ExtractorError('Unable to find stream')
STREAM_URL_SUFFIX = 'streamUrl'
formats = []
for stream_kind in ('', 'hds'):
suffix = STREAM_URL_SUFFIX.capitalize() if stream_kind else STREAM_URL_SUFFIX
stream_url = stream.get(f'{stream_kind}{suffix}')
if not stream_url:
continue
if stream_kind == '':
formats.extend(self._extract_m3u8_formats(
stream_url, display_id, 'mp4', entry_protocol='m3u8_native',
m3u8_id='hls', fatal=False))
elif stream_kind == 'hds':
formats.extend(self._extract_f4m_formats(
stream_url, display_id, f4m_id=stream_kind, fatal=False))
video_id = video.get('id') or display_id
description = video.get('description', {}).get('detailed')
duration = int_or_none(video.get('duration'))
series = video.get('show', {}).get('name')
season_number = int_or_none(video.get('season', {}).get('number'))
episode_number = int_or_none(video.get('episodeNumber'))
tags = video.get('tags')
age_limit = parse_age_limit(video.get('parental', {}).get('rating'))
subtitles = {}
captions = stream.get('captions')
if isinstance(captions, list):
for caption in captions:
subtitle_url = url_or_none(caption.get('fileUrl'))
if not subtitle_url or not subtitle_url.startswith('http'):
continue
lang = caption.get('fileLang', 'en')
ext = determine_ext(subtitle_url)
subtitles.setdefault(lang, []).append({
'url': subtitle_url,
'ext': 'ttml' if ext == 'xml' else ext,
})
return {
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'duration': duration,
'series': series,
'season_number': season_number,
'episode_number': episode_number,
'tags': tags,
'age_limit': age_limit,
'formats': formats,
'subtitles': subtitles,
}
class DiscoveryGoIE(DiscoveryGoBaseIE):
_VALID_URL = DiscoveryGoBaseIE._VALID_URL_TEMPLATE % r'(?:[^/]+/)+'
_GEO_COUNTRIES = ['US']
_TEST = {
'url': 'https://www.discoverygo.com/bering-sea-gold/reaper-madness/',
'info_dict': {
'id': '58c167d86b66d12f2addeb01',
'ext': 'mp4',
'title': 'Reaper Madness',
'description': 'md5:09f2c625c99afb8946ed4fb7865f6e78',
'duration': 2519,
'series': 'Bering Sea Gold',
'season_number': 8,
'episode_number': 6,
'age_limit': 14,
},
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
container = extract_attributes(
self._search_regex(
r'(<div[^>]+class=["\']video-player-container[^>]+>)',
webpage, 'video container'))
video = self._parse_json(
container.get('data-video') or container.get('data-json'),
display_id)
stream = video.get('stream')
return self._extract_video_info(video, stream, display_id)
class DiscoveryGoPlaylistIE(DiscoveryGoBaseIE):
_VALID_URL = DiscoveryGoBaseIE._VALID_URL_TEMPLATE % ''
_TEST = {
'url': 'https://www.discoverygo.com/bering-sea-gold/',
'info_dict': {
'id': 'bering-sea-gold',
'title': 'Bering Sea Gold',
'description': 'md5:cc5c6489835949043c0cc3ad66c2fa0e',
},
'playlist_mincount': 6,
}
@classmethod
def suitable(cls, url):
return False if DiscoveryGoIE.suitable(url) else super().suitable(url)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
entries = []
for mobj in re.finditer(r'data-json=(["\'])(?P<json>{.+?})\1', webpage):
data = self._parse_json(
mobj.group('json'), display_id,
transform_source=unescapeHTML, fatal=False)
if not isinstance(data, dict) or data.get('type') != 'episode':
continue
episode_url = data.get('socialUrl')
if not episode_url:
continue
entries.append(self.url_result(
episode_url, ie=DiscoveryGoIE.ie_key(),
video_id=data.get('id')))
return self.playlist_result(
entries, display_id,
remove_end(self._og_search_title(
webpage, fatal=False), ' | Discovery GO'),
self._og_search_description(webpage))

View File

@ -24,8 +24,9 @@
class DouyuBaseIE(InfoExtractor):
def _download_cryptojs_md5(self, video_id):
for url in [
# XXX: Do NOT use cdn.bootcdn.net; ref: https://sansec.io/research/polyfill-supply-chain-attack
'https://cdnjs.cloudflare.com/ajax/libs/crypto-js/3.1.2/rollups/md5.js',
'https://cdn.bootcdn.net/ajax/libs/crypto-js/3.1.2/rollups/md5.js',
'https://unpkg.com/cryptojslib@3.1.2/rollups/md5.js',
]:
js_code = self._download_webpage(
url, video_id, note='Downloading signing dependency', fatal=False)
@ -35,7 +36,8 @@ def _download_cryptojs_md5(self, video_id):
raise ExtractorError('Unable to download JS dependency (crypto-js/md5)')
def _get_cryptojs_md5(self, video_id):
return self.cache.load('douyu', 'crypto-js-md5') or self._download_cryptojs_md5(video_id)
return self.cache.load(
'douyu', 'crypto-js-md5', min_ver='2024.07.04') or self._download_cryptojs_md5(video_id)
def _calc_sign(self, sign_func, video_id, a):
b = uuid.uuid4().hex

View File

@ -346,8 +346,16 @@ def _real_extract(self, url):
class DiscoveryPlusBaseIE(DPlayBaseIE):
"""Subclasses must set _PRODUCT, _DISCO_API_PARAMS"""
_DISCO_CLIENT_VER = '27.43.0'
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers['x-disco-client'] = f'WEB:UNKNOWN:{self._PRODUCT}:25.2.6'
headers.update({
'x-disco-params': f'realm={realm},siteLookupKey={self._PRODUCT}',
'x-disco-client': f'WEB:UNKNOWN:{self._PRODUCT}:{self._DISCO_CLIENT_VER}',
'Authorization': self._get_auth(disco_base, display_id, realm),
})
def _download_video_playback_info(self, disco_base, video_id, headers):
return self._download_json(
@ -368,6 +376,26 @@ def _real_extract(self, url):
class GoDiscoveryIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:go\.)?discovery\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://go.discovery.com/video/in-the-eye-of-the-storm-discovery-atve-us/trapped-in-a-twister',
'info_dict': {
'id': '5352642',
'display_id': 'in-the-eye-of-the-storm-discovery-atve-us/trapped-in-a-twister',
'ext': 'mp4',
'title': 'Trapped in a Twister',
'description': 'Twisters destroy Midwest towns, trapping spotters in the eye of the storm.',
'episode_number': 1,
'episode': 'Episode 1',
'season_number': 1,
'season': 'Season 1',
'series': 'In The Eye Of The Storm',
'duration': 2490.237,
'upload_date': '20240715',
'timestamp': 1721008800,
'tags': [],
'creators': ['Discovery'],
'thumbnail': 'https://us1-prod-images.disco-api.com/2024/07/10/5e39637d-cabf-3ab3-8e9a-f4e9d37bc036.jpeg',
},
}, {
'url': 'https://go.discovery.com/video/dirty-jobs-discovery-atve-us/rodbuster-galvanizer',
'info_dict': {
'id': '4164906',
@ -395,6 +423,26 @@ class GoDiscoveryIE(DiscoveryPlusBaseIE):
class TravelChannelIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:watch\.)?travelchannel\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://watch.travelchannel.com/video/the-dead-files-travel-channel/protect-the-children',
'info_dict': {
'id': '4710177',
'display_id': 'the-dead-files-travel-channel/protect-the-children',
'ext': 'mp4',
'title': 'Protect the Children',
'description': 'An evil presence threatens an Ohio woman\'s children and marriage.',
'season_number': 14,
'season': 'Season 14',
'episode_number': 10,
'episode': 'Episode 10',
'series': 'The Dead Files',
'duration': 2550.481,
'timestamp': 1664510400,
'upload_date': '20220930',
'tags': [],
'creators': ['Travel Channel'],
'thumbnail': 'https://us1-prod-images.disco-api.com/2022/03/17/5e45eace-de5d-343a-9293-f400a2aa77d5.jpeg',
},
}, {
'url': 'https://watch.travelchannel.com/video/ghost-adventures-travel-channel/ghost-train-of-ely',
'info_dict': {
'id': '2220256',
@ -422,6 +470,26 @@ class TravelChannelIE(DiscoveryPlusBaseIE):
class CookingChannelIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:watch\.)?cookingchanneltv\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://watch.cookingchanneltv.com/video/bobbys-triple-threat-food-network-atve-us/titans-vs-marcus-samuelsson',
'info_dict': {
'id': '5350005',
'ext': 'mp4',
'display_id': 'bobbys-triple-threat-food-network-atve-us/titans-vs-marcus-samuelsson',
'title': 'Titans vs Marcus Samuelsson',
'description': 'Marcus Samuelsson throws his legendary global tricks at the Titans.',
'episode_number': 1,
'episode': 'Episode 1',
'season_number': 3,
'season': 'Season 3',
'series': 'Bobby\'s Triple Threat',
'duration': 2520.851,
'upload_date': '20240710',
'timestamp': 1720573200,
'tags': [],
'creators': ['Food Network'],
'thumbnail': 'https://us1-prod-images.disco-api.com/2024/07/04/529cd095-27ec-35c5-84e9-90ebd3e5d2da.jpeg',
},
}, {
'url': 'https://watch.cookingchanneltv.com/video/carnival-eats-cooking-channel/the-postman-always-brings-rice-2348634',
'info_dict': {
'id': '2348634',
@ -449,6 +517,22 @@ class CookingChannelIE(DiscoveryPlusBaseIE):
class HGTVUsaIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:watch\.)?hgtv\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://watch.hgtv.com/video/flip-or-flop-the-final-flip-hgtv-atve-us/flip-or-flop-the-final-flip',
'info_dict': {
'id': '5025585',
'display_id': 'flip-or-flop-the-final-flip-hgtv-atve-us/flip-or-flop-the-final-flip',
'ext': 'mp4',
'title': 'Flip or Flop: The Final Flip',
'description': 'Tarek and Christina are going their separate ways after one last flip!',
'series': 'Flip or Flop: The Final Flip',
'duration': 2580.644,
'upload_date': '20231101',
'timestamp': 1698811200,
'tags': [],
'creators': ['HGTV'],
'thumbnail': 'https://us1-prod-images.disco-api.com/2022/11/27/455caa6c-1462-3f14-b63d-a026d7a5e6d3.jpeg',
},
}, {
'url': 'https://watch.hgtv.com/video/home-inspector-joe-hgtv-atve-us/this-mold-house',
'info_dict': {
'id': '4289736',
@ -476,6 +560,26 @@ class HGTVUsaIE(DiscoveryPlusBaseIE):
class FoodNetworkIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:watch\.)?foodnetwork\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://watch.foodnetwork.com/video/guys-grocery-games-food-network/wild-in-the-aisles',
'info_dict': {
'id': '2152549',
'display_id': 'guys-grocery-games-food-network/wild-in-the-aisles',
'ext': 'mp4',
'title': 'Wild in the Aisles',
'description': 'The chefs make spaghetti and meatballs with "Out of Stock" ingredients.',
'season_number': 1,
'season': 'Season 1',
'episode_number': 1,
'episode': 'Episode 1',
'series': 'Guy\'s Grocery Games',
'tags': [],
'creators': ['Food Network'],
'duration': 2520.651,
'upload_date': '20230623',
'timestamp': 1687492800,
'thumbnail': 'https://us1-prod-images.disco-api.com/2022/06/15/37fb5333-cad2-3dbb-af7c-c20ec77c89c6.jpeg',
},
}, {
'url': 'https://watch.foodnetwork.com/video/kids-baking-championship-food-network/float-like-a-butterfly',
'info_dict': {
'id': '4116449',
@ -503,6 +607,26 @@ class FoodNetworkIE(DiscoveryPlusBaseIE):
class DestinationAmericaIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?destinationamerica\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.destinationamerica.com/video/bbq-pit-wars-destination-america/smoke-on-the-water',
'info_dict': {
'id': '2218409',
'display_id': 'bbq-pit-wars-destination-america/smoke-on-the-water',
'ext': 'mp4',
'title': 'Smoke on the Water',
'description': 'The pitmasters head to Georgia for the Smoke on the Water BBQ Festival.',
'season_number': 2,
'season': 'Season 2',
'episode_number': 1,
'episode': 'Episode 1',
'series': 'BBQ Pit Wars',
'tags': [],
'creators': ['Destination America'],
'duration': 2614.878,
'upload_date': '20230623',
'timestamp': 1687492800,
'thumbnail': 'https://us1-prod-images.disco-api.com/2020/05/11/c0f8e85d-9a10-3e6f-8e43-f6faafa81ba2.jpeg',
},
}, {
'url': 'https://www.destinationamerica.com/video/alaska-monsters-destination-america-atve-us/central-alaskas-bigfoot',
'info_dict': {
'id': '4210904',
@ -530,6 +654,26 @@ class DestinationAmericaIE(DiscoveryPlusBaseIE):
class InvestigationDiscoveryIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?investigationdiscovery\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.investigationdiscovery.com/video/deadly-influence-the-social-media-murders-investigation-discovery-atve-us/rip-bianca',
'info_dict': {
'id': '5341132',
'display_id': 'deadly-influence-the-social-media-murders-investigation-discovery-atve-us/rip-bianca',
'ext': 'mp4',
'title': 'RIP Bianca',
'description': 'A teenage influencer discovers an online world of threat, harm and danger.',
'season_number': 1,
'season': 'Season 1',
'episode_number': 3,
'episode': 'Episode 3',
'series': 'Deadly Influence: The Social Media Murders',
'creators': ['Investigation Discovery'],
'tags': [],
'duration': 2490.888,
'upload_date': '20240618',
'timestamp': 1718672400,
'thumbnail': 'https://us1-prod-images.disco-api.com/2024/06/15/b567c774-9e44-3c6c-b0ba-db860a73e812.jpeg',
},
}, {
'url': 'https://www.investigationdiscovery.com/video/unmasked-investigation-discovery/the-killer-clown',
'info_dict': {
'id': '2139409',
@ -557,6 +701,26 @@ class InvestigationDiscoveryIE(DiscoveryPlusBaseIE):
class AmHistoryChannelIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?ahctv\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.ahctv.com/video/blood-and-fury-americas-civil-war-ahc/battle-of-bull-run',
'info_dict': {
'id': '2139199',
'display_id': 'blood-and-fury-americas-civil-war-ahc/battle-of-bull-run',
'ext': 'mp4',
'title': 'Battle of Bull Run',
'description': 'Two untested armies clash in the first real battle of the Civil War.',
'season_number': 1,
'season': 'Season 1',
'episode_number': 1,
'episode': 'Episode 1',
'series': 'Blood and Fury: America\'s Civil War',
'duration': 2612.509,
'upload_date': '20220923',
'timestamp': 1663905600,
'creators': ['AHC'],
'tags': [],
'thumbnail': 'https://us1-prod-images.disco-api.com/2020/05/11/4af61bd7-d705-3108-82c4-1a6e541e20fa.jpeg',
},
}, {
'url': 'https://www.ahctv.com/video/modern-sniper-ahc/army',
'info_dict': {
'id': '2309730',
@ -584,6 +748,26 @@ class AmHistoryChannelIE(DiscoveryPlusBaseIE):
class ScienceChannelIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?sciencechannel\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.sciencechannel.com/video/spaces-deepest-secrets-science-atve-us/mystery-of-the-dead-planets',
'info_dict': {
'id': '2347335',
'display_id': 'spaces-deepest-secrets-science-atve-us/mystery-of-the-dead-planets',
'ext': 'mp4',
'title': 'Mystery of the Dead Planets',
'description': 'Astronomers unmask the truly destructive nature of the cosmos.',
'season_number': 7,
'season': 'Season 7',
'episode_number': 1,
'episode': 'Episode 1',
'series': 'Space\'s Deepest Secrets',
'duration': 2524.989,
'upload_date': '20230128',
'timestamp': 1674882000,
'creators': ['Science'],
'tags': [],
'thumbnail': 'https://us1-prod-images.disco-api.com/2021/03/30/3796829d-aead-3f9a-bd8d-e49048b3cdca.jpeg',
},
}, {
'url': 'https://www.sciencechannel.com/video/strangest-things-science-atve-us/nazi-mystery-machine',
'info_dict': {
'id': '2842849',
@ -608,36 +792,29 @@ class ScienceChannelIE(DiscoveryPlusBaseIE):
}
class DIYNetworkIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:watch\.)?diynetwork\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://watch.diynetwork.com/video/pool-kings-diy-network/bringing-beach-life-to-texas',
'info_dict': {
'id': '2309730',
'display_id': 'pool-kings-diy-network/bringing-beach-life-to-texas',
'ext': 'mp4',
'title': 'Bringing Beach Life to Texas',
'description': 'The Pool Kings give a family a day at the beach in their own backyard.',
'season_number': 10,
'episode_number': 2,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://watch.diynetwork.com/video/pool-kings-diy-network/bringing-beach-life-to-texas',
'only_matching': True,
}]
_PRODUCT = 'diy'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.watch.diynetwork.com',
'realm': 'go',
'country': 'us',
}
class DiscoveryLifeIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?discoverylife\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.discoverylife.com/video/er-files-discovery-life-atve-us/sweet-charity',
'info_dict': {
'id': '2347614',
'display_id': 'er-files-discovery-life-atve-us/sweet-charity',
'ext': 'mp4',
'title': 'Sweet Charity',
'description': 'The staff at Charity Hospital treat a serious foot infection.',
'season_number': 1,
'season': 'Season 1',
'episode_number': 1,
'episode': 'Episode 1',
'series': 'ER Files',
'duration': 2364.261,
'upload_date': '20230721',
'timestamp': 1689912000,
'creators': ['Discovery Life'],
'tags': [],
'thumbnail': 'https://us1-prod-images.disco-api.com/2021/03/16/4b6f0124-360b-3546-b6a4-5552db886b86.jpeg',
},
}, {
'url': 'https://www.discoverylife.com/video/surviving-death-discovery-life-atve-us/bodily-trauma',
'info_dict': {
'id': '2218238',
@ -665,6 +842,26 @@ class DiscoveryLifeIE(DiscoveryPlusBaseIE):
class AnimalPlanetIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?animalplanet\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.animalplanet.com/video/mysterious-creatures-with-forrest-galante-animal-planet-atve-us/the-demon-of-peru',
'info_dict': {
'id': '4650835',
'display_id': 'mysterious-creatures-with-forrest-galante-animal-planet-atve-us/the-demon-of-peru',
'ext': 'mp4',
'title': 'The Demon of Peru',
'description': 'In Peru, a farming village is being terrorized by a “man-like beast.”',
'season_number': 1,
'season': 'Season 1',
'episode_number': 4,
'episode': 'Episode 4',
'series': 'Mysterious Creatures with Forrest Galante',
'duration': 2490.488,
'upload_date': '20230111',
'timestamp': 1673413200,
'creators': ['Animal Planet'],
'tags': [],
'thumbnail': 'https://us1-prod-images.disco-api.com/2022/03/01/6dbaa833-9a2e-3fee-9381-c19eddf67c0c.jpeg',
},
}, {
'url': 'https://www.animalplanet.com/video/north-woods-law-animal-planet/squirrel-showdown',
'info_dict': {
'id': '3338923',
@ -692,6 +889,26 @@ class AnimalPlanetIE(DiscoveryPlusBaseIE):
class TLCIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:go\.)?tlc\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://go.tlc.com/video/90-day-the-last-resort-tlc-atve-us/the-last-chance',
'info_dict': {
'id': '5186422',
'display_id': '90-day-the-last-resort-tlc-atve-us/the-last-chance',
'ext': 'mp4',
'title': 'The Last Chance',
'description': 'Infidelity shakes Kalani and Asuelu\'s world, and Angela threatens divorce.',
'season_number': 1,
'season': 'Season 1',
'episode_number': 1,
'episode': 'Episode 1',
'series': '90 Day: The Last Resort',
'duration': 5123.91,
'upload_date': '20230815',
'timestamp': 1692061200,
'creators': ['TLC'],
'tags': [],
'thumbnail': 'https://us1-prod-images.disco-api.com/2023/08/08/0ee367e2-ac76-334d-bf23-dbf796696a24.jpeg',
},
}, {
'url': 'https://go.tlc.com/video/my-600-lb-life-tlc/melissas-story-part-1',
'info_dict': {
'id': '2206540',
@ -716,93 +933,8 @@ class TLCIE(DiscoveryPlusBaseIE):
}
class MotorTrendIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:watch\.)?motortrend\.com/video' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://watch.motortrend.com/video/car-issues-motortrend-atve-us/double-dakotas',
'info_dict': {
'id': '"4859182"',
'display_id': 'double-dakotas',
'ext': 'mp4',
'title': 'Double Dakotas',
'description': 'Tylers buy-one-get-one Dakota deal has the Wizard pulling double duty.',
'season_number': 2,
'episode_number': 3,
},
'skip': 'Available for Premium users',
}, {
'url': 'https://watch.motortrend.com/video/car-issues-motortrend-atve-us/double-dakotas',
'only_matching': True,
}]
_PRODUCT = 'vel'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.watch.motortrend.com',
'realm': 'go',
'country': 'us',
}
class MotorTrendOnDemandIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?motortrend(?:ondemand\.com|\.com/plus)/detail' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.motortrendondemand.com/detail/wheelstanding-dump-truck-stubby-bobs-comeback/37699/784',
'info_dict': {
'id': '37699',
'display_id': 'wheelstanding-dump-truck-stubby-bobs-comeback/37699',
'ext': 'mp4',
'title': 'Wheelstanding Dump Truck! Stubby Bobs Comeback',
'description': 'md5:996915abe52a1c3dfc83aecea3cce8e7',
'season_number': 5,
'episode_number': 52,
'episode': 'Episode 52',
'season': 'Season 5',
'thumbnail': r're:^https?://.+\.jpe?g$',
'timestamp': 1388534401,
'duration': 1887.345,
'creator': 'Originals',
'series': 'Roadkill',
'upload_date': '20140101',
'tags': [],
},
}, {
'url': 'https://www.motortrend.com/plus/detail/roadworthy-rescues-teaser-trailer/4922860/',
'info_dict': {
'id': '4922860',
'ext': 'mp4',
'title': 'Roadworthy Rescues | Teaser Trailer',
'description': 'Derek Bieri helps Freiburger and Finnegan with their \'68 big-block Dart.',
'display_id': 'roadworthy-rescues-teaser-trailer/4922860',
'creator': 'Originals',
'series': 'Roadworthy Rescues',
'thumbnail': r're:^https?://.+\.jpe?g$',
'upload_date': '20220907',
'timestamp': 1662523200,
'duration': 1066.356,
'tags': [],
},
}, {
'url': 'https://www.motortrend.com/plus/detail/ugly-duckling/2450033/12439',
'only_matching': True,
}]
_PRODUCT = 'MTOD'
_DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.motortrendondemand.com',
'realm': 'motortrend',
'country': 'us',
}
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers.update({
'x-disco-params': f'realm={realm}',
'x-disco-client': f'WEB:UNKNOWN:{self._PRODUCT}:4.39.1-gi1',
'Authorization': self._get_auth(disco_base, display_id, realm),
})
class DiscoveryPlusIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/(?!it/)(?:\w{2}/)?video' + DPlayBaseIE._PATH_REGEX
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/(?!it/)(?:(?P<country>[a-z]{2})/)?video(?:/sport|/olympics)?' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.discoveryplus.com/video/property-brothers-forever-home/food-and-family',
'info_dict': {
@ -823,14 +955,45 @@ class DiscoveryPlusIE(DiscoveryPlusBaseIE):
}, {
'url': 'https://discoveryplus.com/ca/video/bering-sea-gold-discovery-ca/goldslingers',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.com/gb/video/sport/eurosport-1-british-eurosport-1-british-sport/6-hours-of-spa-review',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.com/gb/video/olympics/dplus-sport-dplus-sport-sport/rugby-sevens-australia-samoa',
'only_matching': True,
}]
_PRODUCT = 'dplus_us'
_DISCO_API_PARAMS = {
_PRODUCT = None
_DISCO_API_PARAMS = None
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers.update({
'x-disco-params': f'realm={realm},siteLookupKey={self._PRODUCT}',
'x-disco-client': f'WEB:UNKNOWN:dplus_us:{self._DISCO_CLIENT_VER}',
'Authorization': self._get_auth(disco_base, display_id, realm),
})
def _real_extract(self, url):
video_id, country = self._match_valid_url(url).group('id', 'country')
if not country:
country = 'us'
self._PRODUCT = f'dplus_{country}'
if country in ('br', 'ca', 'us'):
self._DISCO_API_PARAMS = {
'disco_host': 'us1-prod-direct.discoveryplus.com',
'realm': 'go',
'country': 'us',
'country': country,
}
else:
self._DISCO_API_PARAMS = {
'disco_host': 'eu1-prod-direct.discoveryplus.com',
'realm': 'dplay',
'country': country,
}
return self._get_disco_api_info(url, video_id, **self._DISCO_API_PARAMS)
class DiscoveryPlusIndiaIE(DiscoveryPlusBaseIE):
@ -984,16 +1147,22 @@ def _real_extract(self, url):
class DiscoveryPlusItalyIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/it/video' + DPlayBaseIE._PATH_REGEX
_VALID_URL = r'https?://(?:www\.)?discoveryplus\.com/it/video(?:/sport|/olympics)?' + DPlayBaseIE._PATH_REGEX
_TESTS = [{
'url': 'https://www.discoveryplus.com/it/video/i-signori-della-neve/stagione-2-episodio-1-i-preparativi',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.com/it/video/super-benny/trailer',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.com/it/video/olympics/dplus-sport-dplus-sport-sport/water-polo-greece-italy',
'only_matching': True,
}, {
'url': 'https://www.discoveryplus.com/it/video/sport/dplus-sport-dplus-sport-sport/lisa-vittozzi-allinferno-e-ritorno',
'only_matching': True,
}]
_PRODUCT = 'dplus_us'
_PRODUCT = 'dplus_it'
_DISCO_API_PARAMS = {
'disco_host': 'eu1-prod-direct.discoveryplus.com',
'realm': 'dplay',
@ -1002,8 +1171,8 @@ class DiscoveryPlusItalyIE(DiscoveryPlusBaseIE):
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers.update({
'x-disco-params': f'realm={realm}',
'x-disco-client': f'WEB:UNKNOWN:{self._PRODUCT}:25.2.6',
'x-disco-params': f'realm={realm},siteLookupKey={self._PRODUCT}',
'x-disco-client': f'WEB:UNKNOWN:dplus_us:{self._DISCO_CLIENT_VER}',
'Authorization': self._get_auth(disco_base, display_id, realm),
})
@ -1044,39 +1213,3 @@ class DiscoveryPlusIndiaShowIE(DiscoveryPlusShowBaseIE):
_SHOW_STR = 'show'
_INDEX = 4
_VIDEO_IE = DiscoveryPlusIndiaIE
class GlobalCyclingNetworkPlusIE(DiscoveryPlusBaseIE):
_VALID_URL = r'https?://plus\.globalcyclingnetwork\.com/watch/(?P<id>\d+)'
_TESTS = [{
'url': 'https://plus.globalcyclingnetwork.com/watch/1397691',
'info_dict': {
'id': '1397691',
'ext': 'mp4',
'title': 'The Athertons: Mountain Biking\'s Fastest Family',
'description': 'md5:75a81937fcd8b989eec6083a709cd837',
'thumbnail': 'https://us1-prod-images.disco-api.com/2021/03/04/eb9e3026-4849-3001-8281-9356466f0557.png',
'series': 'gcn',
'creator': 'Gcn',
'upload_date': '20210309',
'timestamp': 1615248000,
'duration': 2531.0,
'tags': [],
},
'skip': 'Subscription required',
'params': {'skip_download': 'm3u8'},
}]
_PRODUCT = 'web'
_DISCO_API_PARAMS = {
'disco_host': 'disco-api-prod.globalcyclingnetwork.com',
'realm': 'gcn',
'country': 'us',
}
def _update_disco_api_headers(self, headers, disco_base, display_id, realm):
headers.update({
'x-disco-params': f'realm={realm}',
'x-disco-client': f'WEB:UNKNOWN:{self._PRODUCT}:27.3.2',
'Authorization': self._get_auth(disco_base, display_id, realm),
})

View File

@ -2,6 +2,7 @@
from ..utils import (
float_or_none,
int_or_none,
join_nonempty,
orderedSet,
parse_iso8601,
parse_qs,
@ -13,7 +14,7 @@
class EpidemicSoundIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?epidemicsound\.com/track/(?P<id>[0-9a-zA-Z]+)'
_VALID_URL = r'https?://(?:www\.)?epidemicsound\.com/(?:(?P<sfx>sound-effects/tracks)|track)/(?P<id>[0-9a-zA-Z-]+)'
_TESTS = [{
'url': 'https://www.epidemicsound.com/track/yFfQVRpSPz/',
'md5': 'd98ff2ddb49e8acab9716541cbc9dfac',
@ -47,6 +48,20 @@ class EpidemicSoundIE(InfoExtractor):
'release_timestamp': 1700535606,
'release_date': '20231121',
},
}, {
'url': 'https://www.epidemicsound.com/sound-effects/tracks/2f02f54b-9faa-4daf-abac-1cfe9e9cef69/',
'md5': '35d7cf05bd8b614a84f0495a05de9388',
'info_dict': {
'id': '208931',
'ext': 'mp3',
'upload_date': '20240603',
'timestamp': 1717436529,
'categories': ['appliance'],
'display_id': '6b2NXLURPr',
'duration': 1.0,
'title': 'Oven, Grill, Door Open 01',
'thumbnail': 'https://cdn.epidemicsound.com/curation-assets/commercial-release-cover-images/default-sfx/3000x3000.jpg',
},
}]
@staticmethod
@ -77,8 +92,10 @@ def _epidemic_fmt_or_none(f):
return f
def _real_extract(self, url):
video_id = self._match_id(url)
json_data = self._download_json(f'https://www.epidemicsound.com/json/track/{video_id}', video_id)
video_id, is_sfx = self._match_valid_url(url).group('id', 'sfx')
json_data = self._download_json(join_nonempty(
'https://www.epidemicsound.com/json/track',
is_sfx and 'kosmos-id', video_id, delim='/'), video_id)
thumbnails = traverse_obj(json_data, [('imageUrl', 'cover')])
thumb_base_url = traverse_obj(json_data, ('coverArt', 'baseUrl', {url_or_none}))

View File

@ -571,16 +571,21 @@ def process_formats(info):
# Formats larger than ~500MB will return error 403 unless chunk size is regulated
f.setdefault('downloader_options', {})['http_chunk_size'] = 250 << 20
def extract_relay_data(_filter):
return self._parse_json(self._search_regex(
rf'data-sjs>({{.*?{_filter}.*?}})</script>',
webpage, 'replay data', default='{}'), video_id, fatal=False) or {}
def yield_all_relay_data(_filter):
for relay_data in re.findall(rf'data-sjs>({{.*?{_filter}.*?}})</script>', webpage):
yield self._parse_json(relay_data, video_id, fatal=False) or {}
def extract_relay_prefetched_data(_filter):
return traverse_obj(extract_relay_data(_filter), (
'require', (None, (..., ..., ..., '__bbox', 'require')),
def extract_relay_data(_filter):
return next(filter(None, yield_all_relay_data(_filter)), {})
def extract_relay_prefetched_data(_filter, target_keys=None):
path = 'data'
if target_keys is not None:
path = lambda k, v: k == 'data' and any(target in v for target in variadic(target_keys))
return traverse_obj(yield_all_relay_data(_filter), (
..., 'require', (None, (..., ..., ..., '__bbox', 'require')),
lambda _, v: any(key.startswith('RelayPrefetchedStreamCache') for key in v),
..., ..., '__bbox', 'result', 'data', {dict}), get_all=False) or {}
..., ..., '__bbox', 'result', path, {dict}), get_all=False) or {}
if not video_data:
server_js_data = self._parse_json(self._search_regex([
@ -591,7 +596,8 @@ def extract_relay_prefetched_data(_filter):
if not video_data:
data = extract_relay_prefetched_data(
r'"(?:dash_manifest|playable_url(?:_quality_hd)?)')
r'"(?:dash_manifest|playable_url(?:_quality_hd)?)',
target_keys=('video', 'event', 'nodes', 'node', 'mediaset'))
if data:
entries = []
@ -957,6 +963,7 @@ class FacebookAdsIE(InfoExtractor):
'id': '899206155126718',
'ext': 'mp4',
'title': 'video by Kandao',
'description': 'md5:0822724069e3aca97cbed5dabbab282e',
'uploader': 'Kandao',
'uploader_id': '774114102743284',
'uploader_url': r're:^https?://.*',
@ -965,6 +972,22 @@ class FacebookAdsIE(InfoExtractor):
'upload_date': '20231214',
'like_count': int,
},
}, {
# key 'watermarked_video_sd_url' missing
'url': 'https://www.facebook.com/ads/library/?id=501152689226254',
'info_dict': {
'id': '501152689226254',
'ext': 'mp4',
'title': 'video by mat.nawrocki',
'description': 'md5:02a446ace7ff8c3c37a2892922492490',
'uploader': 'mat.nawrocki',
'uploader_id': '148586968341456',
'uploader_url': r're:^https?://.*',
'timestamp': 1723452305,
'thumbnail': r're:^https?://.*',
'upload_date': '20240812',
'like_count': int,
},
}, {
'url': 'https://www.facebook.com/ads/library/?id=893637265423481',
'info_dict': {
@ -1011,34 +1034,42 @@ def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
post_data = [self._parse_json(j, video_id, fatal=False)
for j in re.findall(r's\.handle\(({.*})\);requireLazy\(', webpage)]
data = traverse_obj(post_data, (
..., 'require', ..., ..., ..., 'props', 'deeplinkAdCard', 'snapshot', {dict}), get_all=False)
post_data = traverse_obj(
re.findall(r'data-sjs>({.*?ScheduledServerJS.*?})</script>', webpage), (..., {json.loads}))
data = get_first(post_data, (
'require', ..., ..., ..., '__bbox', 'require', ..., ..., ...,
'entryPointRoot', 'otherProps', 'deeplinkAdCard', 'snapshot', {dict}))
if not data:
raise ExtractorError('Unable to extract ad data')
title = data.get('title')
if not title or title == '{{product.name}}':
title = join_nonempty('display_format', 'page_name', delim=' by ', from_dict=data)
markup_id = traverse_obj(data, ('body', '__m', {str}))
markup = traverse_obj(post_data, (
..., 'require', ..., ..., ..., '__bbox', 'markup', lambda _, v: v[0].startswith(markup_id),
..., '__html', {clean_html}, {lambda x: not x.startswith('{{product.') and x}, any))
info_dict = traverse_obj(data, {
'description': ('link_description', {str}, {lambda x: x if x != '{{product.description}}' else None}),
info_dict = merge_dicts({
'title': title,
'description': markup or None,
}, traverse_obj(data, {
'description': ('link_description', {lambda x: x if not x.startswith('{{product.') else None}),
'uploader': ('page_name', {str}),
'uploader_id': ('page_id', {str_or_none}),
'uploader_url': ('page_profile_uri', {url_or_none}),
'timestamp': ('creation_time', {int_or_none}),
'like_count': ('page_like_count', {int_or_none}),
})
}))
entries = []
for idx, entry in enumerate(traverse_obj(
data, (('videos', 'cards'), lambda _, v: any(url_or_none(v[f]) for f in self._FORMATS_MAP))), 1,
data, (('videos', 'cards'), lambda _, v: any(url_or_none(v.get(f)) for f in self._FORMATS_MAP))), 1,
):
entries.append({
'id': f'{video_id}_{idx}',
'title': entry.get('title') or title,
'description': entry.get('link_description') or info_dict.get('description'),
'description': traverse_obj(entry, 'body', 'link_description') or info_dict.get('description'),
'thumbnail': url_or_none(entry.get('video_preview_image_url')),
'formats': self._extract_formats(entry),
})

View File

@ -43,6 +43,7 @@
xpath_text,
xpath_with_ns,
)
from ..utils._utils import _UnsafeExtensionError
class GenericIE(InfoExtractor):
@ -2446,9 +2447,13 @@ def _real_extract(self, url):
if not is_html(first_bytes):
self.report_warning(
'URL could be a direct video link, returning it as such.')
ext = determine_ext(url)
if ext not in _UnsafeExtensionError.ALLOWED_EXTENSIONS:
ext = 'unknown_video'
info_dict.update({
'direct': True,
'url': url,
'ext': ext,
})
return info_dict

View File

@ -158,7 +158,7 @@ def _real_extract(self, url):
class JioSaavnPlaylistIE(JioSaavnBaseIE):
IE_NAME = 'jiosaavn:playlist'
_VALID_URL = r'https?://(?:www\.)?(?:jio)?saavn\.com/s/playlist/(?:[^/?#]+/){2}(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www\.)?(?:jio)?saavn\.com/(?:s/playlist/(?:[^/?#]+/){2}|featured/[^/?#]+/)(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.jiosaavn.com/s/playlist/2279fbe391defa793ad7076929a2f5c9/mood-english/LlJ8ZWT1ibN5084vKHRj2Q__',
'info_dict': {
@ -173,6 +173,13 @@ class JioSaavnPlaylistIE(JioSaavnBaseIE):
'title': 'Mood Hindi',
},
'playlist_mincount': 801,
}, {
'url': 'https://www.jiosaavn.com/featured/taaza-tunes/Me5RridRfDk_',
'info_dict': {
'id': 'Me5RridRfDk_',
'title': 'Taaza Tunes',
},
'playlist_mincount': 301,
}]
_PAGE_SIZE = 50

View File

@ -1,9 +1,14 @@
import functools
from .common import InfoExtractor
from ..networking import HEADRequest
from ..utils import (
UserNotLive,
determine_ext,
float_or_none,
int_or_none,
merge_dicts,
parse_iso8601,
str_or_none,
traverse_obj,
unified_timestamp,
@ -25,104 +30,192 @@ def _real_initialize(self):
def _call_api(self, path, display_id, note='Downloading API JSON', headers={}, **kwargs):
return self._download_json(
f'https://kick.com/api/v1/{path}', display_id, note=note,
f'https://kick.com/api/{path}', display_id, note=note,
headers=merge_dicts(headers, self._API_HEADERS), impersonate=True, **kwargs)
class KickIE(KickBaseIE):
IE_NAME = 'kick:live'
_VALID_URL = r'https?://(?:www\.)?kick\.com/(?!(?:video|categories|search|auth)(?:[/?#]|$))(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://kick.com/yuppy',
'url': 'https://kick.com/buddha',
'info_dict': {
'id': '6cde1-kickrp-joe-flemmingskick-info-heremust-knowmust-see21',
'id': '92722911-nopixel-40',
'ext': 'mp4',
'title': str,
'description': str,
'channel': 'yuppy',
'channel_id': '33538',
'uploader': 'Yuppy',
'uploader_id': '33793',
'upload_date': str,
'live_status': 'is_live',
'timestamp': int,
'thumbnail': r're:^https?://.*\.jpg',
'thumbnail': r're:https?://.+\.jpg',
'categories': list,
'upload_date': str,
'channel': 'buddha',
'channel_id': '32807',
'uploader': 'Buddha',
'uploader_id': '33057',
'live_status': 'is_live',
'concurrent_view_count': int,
'release_timestamp': int,
'age_limit': 18,
'release_date': str,
},
'skip': 'livestream',
'params': {'skip_download': 'livestream'},
# 'skip': 'livestream',
}, {
'url': 'https://kick.com/kmack710',
'url': 'https://kick.com/xqc',
'only_matching': True,
}]
@classmethod
def suitable(cls, url):
return False if KickClipIE.suitable(url) else super().suitable(url)
def _real_extract(self, url):
channel = self._match_id(url)
response = self._call_api(f'channels/{channel}', channel)
response = self._call_api(f'v2/channels/{channel}', channel)
if not traverse_obj(response, 'livestream', expected_type=dict):
raise UserNotLive(video_id=channel)
return {
'id': str(traverse_obj(
response, ('livestream', ('slug', 'id')), get_all=False, default=channel)),
'formats': self._extract_m3u8_formats(
response['playback_url'], channel, 'mp4', live=True),
'title': traverse_obj(
response, ('livestream', ('session_title', 'slug')), get_all=False, default=''),
'description': traverse_obj(response, ('user', 'bio')),
'channel': channel,
'channel_id': str_or_none(traverse_obj(response, 'id', ('livestream', 'channel_id'))),
'uploader': traverse_obj(response, 'name', ('user', 'username')),
'uploader_id': str_or_none(traverse_obj(response, 'user_id', ('user', 'id'))),
'is_live': True,
'timestamp': unified_timestamp(traverse_obj(response, ('livestream', 'created_at'))),
'thumbnail': traverse_obj(
response, ('livestream', 'thumbnail', 'url'), expected_type=url_or_none),
'categories': traverse_obj(response, ('recent_categories', ..., 'name')),
'formats': self._extract_m3u8_formats(response['playback_url'], channel, 'mp4', live=True),
**traverse_obj(response, {
'id': ('livestream', 'slug', {str}),
'title': ('livestream', 'session_title', {str}),
'description': ('user', 'bio', {str}),
'channel_id': (('id', ('livestream', 'channel_id')), {int}, {str_or_none}, any),
'uploader': (('name', ('user', 'username')), {str}, any),
'uploader_id': (('user_id', ('user', 'id')), {int}, {str_or_none}, any),
'timestamp': ('livestream', 'created_at', {unified_timestamp}),
'release_timestamp': ('livestream', 'start_time', {unified_timestamp}),
'thumbnail': ('livestream', 'thumbnail', 'url', {url_or_none}),
'categories': ('recent_categories', ..., 'name', {str}),
'concurrent_view_count': ('livestream', 'viewer_count', {int_or_none}),
'age_limit': ('livestream', 'is_mature', {bool}, {lambda x: 18 if x else 0}),
}),
}
class KickVODIE(KickBaseIE):
IE_NAME = 'kick:vod'
_VALID_URL = r'https?://(?:www\.)?kick\.com/video/(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})'
_TESTS = [{
'url': 'https://kick.com/video/58bac65b-e641-4476-a7ba-3707a35e60e3',
'url': 'https://kick.com/video/e74614f4-5270-4319-90ad-32179f19a45c',
'md5': '3870f94153e40e7121a6e46c068b70cb',
'info_dict': {
'id': '58bac65b-e641-4476-a7ba-3707a35e60e3',
'id': 'e74614f4-5270-4319-90ad-32179f19a45c',
'ext': 'mp4',
'title': '🤠REBIRTH IS BACK!!!!🤠!stake CODE JAREDFPS 🤠',
'description': 'md5:02b0c46f9b4197fb545ab09dddb85b1d',
'channel': 'jaredfps',
'channel_id': '26608',
'uploader': 'JaredFPS',
'uploader_id': '26799',
'upload_date': '20240402',
'timestamp': 1712097108,
'duration': 33859.0,
'title': r're:❎ MEGA DRAMA ❎ LIVE ❎ CLICK ❎ ULTIMATE SKILLS .+',
'description': 'THE BEST AT ABSOLUTELY EVERYTHING. THE JUICER. LEADER OF THE JUICERS.',
'channel': 'xqc',
'channel_id': '668',
'uploader': 'xQc',
'uploader_id': '676',
'upload_date': '20240724',
'timestamp': 1721796562,
'duration': 18566.0,
'thumbnail': r're:^https?://.*\.jpg',
'categories': ['Call of Duty: Warzone'],
'view_count': int,
'categories': ['VALORANT'],
'age_limit': 0,
},
'params': {
'skip_download': 'm3u8',
},
'expected_warnings': [r'impersonation'],
'params': {'skip_download': 'm3u8'},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
response = self._call_api(f'video/{video_id}', video_id)
response = self._call_api(f'v1/video/{video_id}', video_id)
return {
'id': video_id,
'formats': self._extract_m3u8_formats(response['source'], video_id, 'mp4'),
'title': traverse_obj(
response, ('livestream', ('session_title', 'slug')), get_all=False, default=''),
'description': traverse_obj(response, ('livestream', 'channel', 'user', 'bio')),
'channel': traverse_obj(response, ('livestream', 'channel', 'slug')),
'channel_id': str_or_none(traverse_obj(response, ('livestream', 'channel', 'id'))),
'uploader': traverse_obj(response, ('livestream', 'channel', 'user', 'username')),
'uploader_id': str_or_none(traverse_obj(response, ('livestream', 'channel', 'user_id'))),
'timestamp': unified_timestamp(response.get('created_at')),
'duration': float_or_none(traverse_obj(response, ('livestream', 'duration')), scale=1000),
'thumbnail': traverse_obj(
response, ('livestream', 'thumbnail'), expected_type=url_or_none),
'categories': traverse_obj(response, ('livestream', 'categories', ..., 'name')),
**traverse_obj(response, {
'title': ('livestream', ('session_title', 'slug'), {str}, any),
'description': ('livestream', 'channel', 'user', 'bio', {str}),
'channel': ('livestream', 'channel', 'slug', {str}),
'channel_id': ('livestream', 'channel', 'id', {int}, {str_or_none}),
'uploader': ('livestream', 'channel', 'user', 'username', {str}),
'uploader_id': ('livestream', 'channel', 'user_id', {int}, {str_or_none}),
'timestamp': ('created_at', {parse_iso8601}),
'duration': ('livestream', 'duration', {functools.partial(float_or_none, scale=1000)}),
'thumbnail': ('livestream', 'thumbnail', {url_or_none}),
'categories': ('livestream', 'categories', ..., 'name', {str}),
'view_count': ('views', {int_or_none}),
'age_limit': ('livestream', 'is_mature', {bool}, {lambda x: 18 if x else 0}),
}),
}
class KickClipIE(KickBaseIE):
IE_NAME = 'kick:clips'
_VALID_URL = r'https?://(?:www\.)?kick\.com/[\w-]+/?\?(?:[^#]+&)?clip=(?P<id>clip_[\w-]+)'
_TESTS = [{
'url': 'https://kick.com/mxddy?clip=clip_01GYXVB5Y8PWAPWCWMSBCFB05X',
'info_dict': {
'id': 'clip_01GYXVB5Y8PWAPWCWMSBCFB05X',
'ext': 'mp4',
'title': 'Maddy detains Abd D:',
'channel': 'mxddy',
'channel_id': '133789',
'uploader': 'AbdCreates',
'uploader_id': '3309077',
'thumbnail': r're:^https?://.*\.jpeg',
'duration': 35,
'timestamp': 1682481453,
'upload_date': '20230426',
'view_count': int,
'like_count': int,
'categories': ['VALORANT'],
'age_limit': 18,
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://kick.com/destiny?clip=clip_01H9SKET879NE7N9RJRRDS98J3',
'info_dict': {
'id': 'clip_01H9SKET879NE7N9RJRRDS98J3',
'title': 'W jews',
'ext': 'mp4',
'channel': 'destiny',
'channel_id': '1772249',
'uploader': 'punished_furry',
'uploader_id': '2027722',
'duration': 49.0,
'upload_date': '20230908',
'timestamp': 1694150180,
'thumbnail': 'https://clips.kick.com/clips/j3/clip_01H9SKET879NE7N9RJRRDS98J3/thumbnail.png',
'view_count': int,
'like_count': int,
'categories': ['Just Chatting'],
'age_limit': 0,
},
'params': {'skip_download': 'm3u8'},
}]
def _real_extract(self, url):
clip_id = self._match_id(url)
clip = self._call_api(f'v2/clips/{clip_id}/play', clip_id)['clip']
clip_url = clip['clip_url']
if determine_ext(clip_url) == 'm3u8':
formats = self._extract_m3u8_formats(clip_url, clip_id, 'mp4')
else:
formats = [{'url': clip_url}]
return {
'id': clip_id,
'formats': formats,
**traverse_obj(clip, {
'title': ('title', {str}),
'channel': ('channel', 'slug', {str}),
'channel_id': ('channel', 'id', {int}, {str_or_none}),
'uploader': ('creator', 'username', {str}),
'uploader_id': ('creator', 'id', {int}, {str_or_none}),
'thumbnail': ('thumbnail_url', {url_or_none}),
'duration': ('duration', {float_or_none}),
'categories': ('category', 'name', {str}, all),
'timestamp': ('created_at', {parse_iso8601}),
'view_count': ('views', {int_or_none}),
'like_count': ('likes', {int_or_none}),
'age_limit': ('is_mature', {bool}, {lambda x: 18 if x else 0}),
}),
}

View File

@ -0,0 +1,78 @@
import functools
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
clean_html,
extract_attributes,
get_element_by_class,
get_element_html_by_id,
join_nonempty,
parse_duration,
unified_timestamp,
)
from ..utils.traversal import traverse_obj
class LearningOnScreenIE(InfoExtractor):
_VALID_URL = r'https?://learningonscreen\.ac\.uk/ondemand/index\.php/prog/(?P<id>\w+)'
_TESTS = [{
'url': 'https://learningonscreen.ac.uk/ondemand/index.php/prog/005D81B2?bcast=22757013',
'info_dict': {
'id': '005D81B2',
'ext': 'mp4',
'title': 'Planet Earth',
'duration': 3600.0,
'timestamp': 1164567600.0,
'upload_date': '20061126',
'thumbnail': 'https://stream.learningonscreen.ac.uk/trilt-cover-images/005D81B2-Planet-Earth-2006-11-26T190000Z-BBC4.jpg',
},
}]
def _real_initialize(self):
if not self._get_cookies('https://learningonscreen.ac.uk/').get('PHPSESSID-BOB-LIVE'):
self.raise_login_required(
'Use --cookies for authentication. See '
' https://github.com/yt-dlp/yt-dlp/wiki/FAQ#how-do-i-pass-cookies-to-yt-dlp '
'for how to manually pass cookies', method=None)
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
details = traverse_obj(webpage, (
{functools.partial(get_element_html_by_id, 'programme-details')}, {
'title': ({functools.partial(re.search, r'<h2>([^<]+)</h2>')}, 1, {clean_html}),
'timestamp': (
{functools.partial(get_element_by_class, 'broadcast-date')},
{functools.partial(re.match, r'([^<]+)')}, 1, {unified_timestamp}),
'duration': (
{functools.partial(get_element_by_class, 'prog-running-time')},
{clean_html}, {parse_duration}),
}))
title = details.pop('title', None) or traverse_obj(webpage, (
{functools.partial(get_element_html_by_id, 'add-to-existing-playlist')},
{extract_attributes}, 'data-record-title', {clean_html}))
entries = self._parse_html5_media_entries(
'https://stream.learningonscreen.ac.uk', webpage, video_id, m3u8_id='hls', mpd_id='dash',
_headers={'Origin': 'https://learningonscreen.ac.uk', 'Referer': 'https://learningonscreen.ac.uk/'})
if not entries:
raise ExtractorError('No video found')
if len(entries) > 1:
duration = details.pop('duration', None)
for idx, entry in enumerate(entries, start=1):
entry.update(details)
entry['id'] = join_nonempty(video_id, idx)
entry['title'] = join_nonempty(title, idx)
return self.playlist_result(entries, video_id, title, duration=duration)
return {
**entries[0],
**details,
'id': video_id,
'title': title,
}

View File

@ -133,7 +133,9 @@ def _real_extract(self, url):
r'<p+\b[^>]+\bclass="article_date">([^<]+)<', webpage, 'upload date', default=None))
player_data['video'] = player_data.pop('token')
player_page = self._download_webpage('https://player.mediaklikk.hu/playernew/player.php', video_id, query=player_data)
player_page = self._download_webpage(
'https://player.mediaklikk.hu/playernew/player.php', video_id,
query=player_data, headers={'Referer': url})
player_json = self._search_json(
r'\bpl\.setup\s*\(', player_page, 'player json', video_id, end_pattern=r'\);')
playlist_url = traverse_obj(

View File

@ -1,16 +1,21 @@
import json
import re
import urllib.parse
import time
import uuid
from .common import InfoExtractor
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
determine_ext,
int_or_none,
join_nonempty,
jwt_decode_hs256,
parse_duration,
parse_iso8601,
try_get,
url_or_none,
urlencode_postdata,
)
from ..utils.traversal import traverse_obj
@ -276,81 +281,225 @@ def _download_video_data(self, display_id):
class MLBTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?mlb\.com/tv/g(?P<id>\d{6})'
_NETRC_MACHINE = 'mlb'
_TESTS = [{
'url': 'https://www.mlb.com/tv/g661581/vee2eff5f-a7df-4c20-bdb4-7b926fa12638',
'info_dict': {
'id': '661581',
'ext': 'mp4',
'title': '2022-07-02 - St. Louis Cardinals @ Philadelphia Phillies',
'release_date': '20220702',
'release_timestamp': 1656792300,
},
'params': {
'skip_download': True,
'params': {'skip_download': 'm3u8'},
}, {
# makeup game: has multiple dates, need to avoid games with 'rescheduleDate'
'url': 'https://www.mlb.com/tv/g747039/vd22541c4-5a29-45f7-822b-635ec041cf5e',
'info_dict': {
'id': '747039',
'ext': 'mp4',
'title': '2024-07-29 - Toronto Blue Jays @ Baltimore Orioles',
'release_date': '20240729',
'release_timestamp': 1722280200,
},
'params': {'skip_download': 'm3u8'},
}]
_GRAPHQL_INIT_QUERY = '''\
mutation initSession($device: InitSessionInput!, $clientType: ClientType!, $experience: ExperienceTypeInput) {
initSession(device: $device, clientType: $clientType, experience: $experience) {
deviceId
sessionId
entitlements {
code
}
location {
countryCode
regionName
zipCode
latitude
longitude
}
clientExperience
features
}
}'''
_GRAPHQL_PLAYBACK_QUERY = '''\
mutation initPlaybackSession(
$adCapabilities: [AdExperienceType]
$mediaId: String!
$deviceId: String!
$sessionId: String!
$quality: PlaybackQuality
) {
initPlaybackSession(
adCapabilities: $adCapabilities
mediaId: $mediaId
deviceId: $deviceId
sessionId: $sessionId
quality: $quality
) {
playbackSessionId
playback {
url
token
expiration
cdn
}
}
}'''
_APP_VERSION = '7.8.2'
_device_id = None
_session_id = None
_access_token = None
_token_expiry = 0
@property
def _api_headers(self):
if (self._token_expiry - 120) <= time.time():
self.write_debug('Access token has expired; re-logging in')
self._perform_login(*self._get_login_info())
return {'Authorization': f'Bearer {self._access_token}'}
def _real_initialize(self):
if not self._access_token:
self.raise_login_required(
'All videos are only available to registered users', method='password')
def _set_device_id(self, username):
if not self._device_id:
self._device_id = self.cache.load(
self._NETRC_MACHINE, 'device_ids', default={}).get(username)
if self._device_id:
return
self._device_id = str(uuid.uuid4())
self.cache.store(self._NETRC_MACHINE, 'device_ids', {username: self._device_id})
def _perform_login(self, username, password):
data = f'grant_type=password&username={urllib.parse.quote(username)}&password={urllib.parse.quote(password)}&scope=openid offline_access&client_id=0oa3e1nutA1HLzAKG356'
access_token = self._download_json(
'https://ids.mlb.com/oauth2/aus1m088yK07noBfh356/v1/token', None,
headers={
'User-Agent': 'okhttp/3.12.1',
'Content-Type': 'application/x-www-form-urlencoded',
}, data=data.encode())['access_token']
entitlement = self._download_webpage(
f'https://media-entitlement.mlb.com/api/v3/jwt?os=Android&appname=AtBat&did={uuid.uuid4()}', None,
headers={
'User-Agent': 'okhttp/3.12.1',
'Authorization': f'Bearer {access_token}',
})
data = f'grant_type=urn:ietf:params:oauth:grant-type:token-exchange&subject_token={entitlement}&subject_token_type=urn:ietf:params:oauth:token-type:jwt&platform=android-tv'
try:
self._access_token = self._download_json(
'https://us.edge.bamgrid.com/token', None,
headers={
'Accept': 'application/json',
'Authorization': 'Bearer bWxidHYmYW5kcm9pZCYxLjAuMA.6LZMbH2r--rbXcgEabaDdIslpo4RyZrlVfWZhsAgXIk',
'https://ids.mlb.com/oauth2/aus1m088yK07noBfh356/v1/token', None,
'Logging in', 'Unable to log in', headers={
'User-Agent': 'okhttp/3.12.1',
'Content-Type': 'application/x-www-form-urlencoded',
}, data=data.encode())['access_token']
}, data=urlencode_postdata({
'grant_type': 'password',
'username': username,
'password': password,
'scope': 'openid offline_access',
'client_id': '0oa3e1nutA1HLzAKG356',
}))['access_token']
except ExtractorError as error:
if isinstance(error.cause, HTTPError) and error.cause.status == 400:
raise ExtractorError('Invalid username or password', expected=True)
raise
self._token_expiry = traverse_obj(self._access_token, ({jwt_decode_hs256}, 'exp', {int})) or 0
self._set_device_id(username)
self._session_id = self._call_api({
'operationName': 'initSession',
'query': self._GRAPHQL_INIT_QUERY,
'variables': {
'device': {
'appVersion': self._APP_VERSION,
'deviceFamily': 'desktop',
'knownDeviceId': self._device_id,
'languagePreference': 'ENGLISH',
'manufacturer': '',
'model': '',
'os': '',
'osVersion': '',
},
'clientType': 'WEB',
},
}, None, 'session ID')['data']['initSession']['sessionId']
def _call_api(self, data, video_id, description='GraphQL JSON', fatal=True):
return self._download_json(
'https://media-gateway.mlb.com/graphql', video_id,
f'Downloading {description}', f'Unable to download {description}', fatal=fatal,
headers={
**self._api_headers,
'Accept': 'application/json',
'Content-Type': 'application/json',
'x-client-name': 'WEB',
'x-client-version': self._APP_VERSION,
}, data=json.dumps(data, separators=(',', ':')).encode())
def _extract_formats_and_subtitles(self, broadcast, video_id):
feed = traverse_obj(broadcast, ('homeAway', {str.title}))
medium = traverse_obj(broadcast, ('type', {str}))
language = traverse_obj(broadcast, ('language', {str.lower}))
format_id = join_nonempty(feed, medium, language)
response = self._call_api({
'operationName': 'initPlaybackSession',
'query': self._GRAPHQL_PLAYBACK_QUERY,
'variables': {
'adCapabilities': ['GOOGLE_STANDALONE_AD_PODS'],
'deviceId': self._device_id,
'mediaId': broadcast['mediaId'],
'quality': 'PLACEHOLDER',
'sessionId': self._session_id,
},
}, video_id, f'{format_id} broadcast JSON', fatal=False)
playback = traverse_obj(response, ('data', 'initPlaybackSession', 'playback', {dict}))
m3u8_url = traverse_obj(playback, ('url', {url_or_none}))
token = traverse_obj(playback, ('token', {str}))
if not (m3u8_url and token):
errors = '; '.join(traverse_obj(response, ('errors', ..., 'message', {str})))
if 'not entitled' in errors:
raise ExtractorError(errors, expected=True)
elif errors: # Only warn when 'blacked out' since radio formats are available
self.report_warning(f'API returned errors for {format_id}: {errors}')
else:
self.report_warning(f'No formats available for {format_id} broadcast; skipping')
return [], {}
cdn_headers = {'x-cdn-token': token}
fmts, subs = self._extract_m3u8_formats_and_subtitles(
m3u8_url.replace(f'/{token}/', '/'), video_id, 'mp4',
m3u8_id=format_id, fatal=False, headers=cdn_headers)
for fmt in fmts:
fmt['http_headers'] = cdn_headers
fmt.setdefault('format_note', join_nonempty(feed, medium, delim=' '))
fmt.setdefault('language', language)
if fmt.get('vcodec') == 'none' and fmt['language'] == 'en':
fmt['source_preference'] = 10
return fmts, subs
def _real_extract(self, url):
video_id = self._match_id(url)
airings = self._download_json(
f'https://search-api-mlbtv.mlb.com/svc/search/v2/graphql/persisted/query/core/Airings?variables=%7B%22partnerProgramIds%22%3A%5B%22{video_id}%22%5D%2C%22applyEsniMediaRightsLabels%22%3Atrue%7D',
video_id)['data']['Airings']
data = self._download_json(
'https://statsapi.mlb.com/api/v1/schedule', video_id, query={
'gamePk': video_id,
'hydrate': 'broadcasts(all),statusFlags',
})
metadata = traverse_obj(data, (
'dates', ..., 'games',
lambda _, v: str(v['gamePk']) == video_id and not v.get('rescheduleDate'), any))
broadcasts = traverse_obj(metadata, (
'broadcasts', lambda _, v: v['mediaId'] and v['mediaState']['mediaStateCode'] != 'MEDIA_OFF'))
formats, subtitles = [], {}
for airing in traverse_obj(airings, lambda _, v: v['playbackUrls'][0]['href']):
format_id = join_nonempty('feedType', 'feedLanguage', from_dict=airing)
m3u8_url = traverse_obj(self._download_json(
airing['playbackUrls'][0]['href'].format(scenario='browser~csai'), video_id,
note=f'Downloading {format_id} stream info JSON',
errnote=f'Failed to download {format_id} stream info, skipping',
fatal=False, headers={
'Authorization': self._access_token,
'Accept': 'application/vnd.media-service+json; version=2',
}), ('stream', 'complete', {url_or_none}))
if not m3u8_url:
continue
f, s = self._extract_m3u8_formats_and_subtitles(
m3u8_url, video_id, 'mp4', m3u8_id=format_id, fatal=False)
formats.extend(f)
self._merge_subtitles(s, target=subtitles)
for broadcast in broadcasts:
fmts, subs = self._extract_formats_and_subtitles(broadcast, video_id)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
return {
'id': video_id,
'title': traverse_obj(airings, (..., 'titles', 0, 'episodeName'), get_all=False),
'is_live': traverse_obj(airings, (..., 'mediaConfig', 'productType'), get_all=False) == 'LIVE',
'title': join_nonempty(
traverse_obj(metadata, ('officialDate', {str})),
traverse_obj(metadata, ('teams', ('away', 'home'), 'team', 'name', {str}, all, {' @ '.join})),
delim=' - '),
'is_live': traverse_obj(broadcasts, (..., 'mediaState', 'mediaStateCode', {str}, any)) == 'MEDIA_ON',
'release_timestamp': traverse_obj(metadata, ('gameDate', {parse_iso8601})),
'formats': formats,
'subtitles': subtitles,
'http_headers': {'Authorization': f'Bearer {self._access_token}'},
}

View File

@ -5,40 +5,104 @@
from ..utils import (
ExtractorError,
OnDemandPagedList,
determine_ext,
int_or_none,
try_get,
clean_html,
extract_attributes,
get_element_by_class,
get_element_html_by_id,
parse_count,
remove_end,
update_url,
urlencode_postdata,
)
class MurrtubeIE(InfoExtractor):
_WORKING = False
_VALID_URL = r'''(?x)
(?:
murrtube:|
https?://murrtube\.net/videos/(?P<slug>[a-z0-9\-]+)\-
https?://murrtube\.net/(?:v/|videos/(?P<slug>[a-z0-9-]+?)-)
)
(?P<id>[a-f0-9]{8}\-[a-f0-9]{4}\-[a-f0-9]{4}\-[a-f0-9]{4}\-[a-f0-9]{12})
(?P<id>[A-Z0-9]{4}|[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12})
'''
_TEST = {
_TESTS = [{
'url': 'https://murrtube.net/videos/inferno-x-skyler-148b6f2a-fdcc-4902-affe-9c0f41aaaca0',
'md5': '169f494812d9a90914b42978e73aa690',
'md5': '70380878a77e8565d4aea7f68b8bbb35',
'info_dict': {
'id': '148b6f2a-fdcc-4902-affe-9c0f41aaaca0',
'id': 'ca885d8456b95de529b6723b158032e11115d',
'ext': 'mp4',
'title': 'Inferno X Skyler',
'description': 'Humping a very good slutty sheppy (roomate)',
'thumbnail': r're:^https?://.*\.jpg$',
'duration': 284,
'uploader': 'Inferno Wolf',
'age_limit': 18,
'thumbnail': 'https://storage.murrtube.net/murrtube-production/ekbs3zcfvuynnqfx72nn2tkokvsd',
'comment_count': int,
'view_count': int,
'like_count': int,
'tags': ['hump', 'breed', 'Fursuit', 'murrsuit', 'bareback'],
},
}, {
'url': 'https://murrtube.net/v/0J2Q',
'md5': '31262f6ac56f0ca75e5a54a0f3fefcb6',
'info_dict': {
'id': '8442998c52134968d9caa36e473e1a6bac6ca',
'ext': 'mp4',
'uploader': 'Hayel',
'title': 'Who\'s in charge now?',
'description': 'md5:795791e97e5b0f1805ea84573f02a997',
'age_limit': 18,
'thumbnail': 'https://storage.murrtube.net/murrtube-production/fb1ojjwiucufp34ya6hxu5vfqi5s',
'comment_count': int,
'view_count': int,
'like_count': int,
},
}]
def _extract_count(self, name, html):
return parse_count(self._search_regex(
rf'([\d,]+)\s+<span[^>]*>{name}</span>', html, name, default=None))
def _real_initialize(self):
homepage = self._download_webpage(
'https://murrtube.net', None, note='Getting session token')
self._request_webpage(
'https://murrtube.net/accept_age_check', None, 'Setting age cookie',
data=urlencode_postdata(self._hidden_inputs(homepage)))
def _real_extract(self, url):
video_id = self._match_id(url)
if video_id.startswith('murrtube:'):
raise ExtractorError('Support for murrtube: prefix URLs is broken')
video_page = self._download_webpage(url, video_id)
video_attrs = extract_attributes(get_element_html_by_id('video', video_page))
playlist = update_url(video_attrs['data-url'], query=None)
video_id = self._search_regex(r'/([\da-f]+)/index.m3u8', playlist, 'video id')
return {
'id': video_id,
'title': remove_end(self._og_search_title(video_page), ' - Murrtube'),
'age_limit': 18,
'formats': self._extract_m3u8_formats(playlist, video_id, 'mp4'),
'description': self._og_search_description(video_page),
'thumbnail': update_url(self._og_search_thumbnail(video_page, default=''), query=None) or None,
'uploader': clean_html(get_element_by_class('pl-1 is-size-6 has-text-lighter', video_page)),
'view_count': self._extract_count('Views', video_page),
'like_count': self._extract_count('Likes', video_page),
'comment_count': self._extract_count('Comments', video_page),
}
class MurrtubeUserIE(InfoExtractor):
_WORKING = False
IE_DESC = 'Murrtube user profile'
_VALID_URL = r'https?://murrtube\.net/(?P<id>[^/]+)$'
_TESTS = [{
'url': 'https://murrtube.net/stormy',
'info_dict': {
'id': 'stormy',
},
'playlist_mincount': 27,
}]
_PAGE_SIZE = 10
def _download_gql(self, video_id, op, note=None, fatal=True):
result = self._download_json(
'https://murrtube.net/graphql',
@ -46,73 +110,6 @@ def _download_gql(self, video_id, op, note=None, fatal=True):
headers={'Content-Type': 'application/json'})
return result['data']
def _real_extract(self, url):
video_id = self._match_id(url)
data = self._download_gql(video_id, {
'operationName': 'Medium',
'variables': {
'id': video_id,
},
'query': '''\
query Medium($id: ID!) {
medium(id: $id) {
title
description
key
duration
commentsCount
likesCount
viewsCount
thumbnailKey
tagList
user {
name
__typename
}
__typename
}
}'''})
meta = data['medium']
storage_url = 'https://storage.murrtube.net/murrtube/'
format_url = storage_url + meta.get('key', '')
thumbnail = storage_url + meta.get('thumbnailKey', '')
if determine_ext(format_url) == 'm3u8':
formats = self._extract_m3u8_formats(
format_url, video_id, 'mp4', entry_protocol='m3u8_native', fatal=False)
else:
formats = [{'url': format_url}]
return {
'id': video_id,
'title': meta.get('title'),
'description': meta.get('description'),
'formats': formats,
'thumbnail': thumbnail,
'duration': int_or_none(meta.get('duration')),
'uploader': try_get(meta, lambda x: x['user']['name']),
'view_count': meta.get('viewsCount'),
'like_count': meta.get('likesCount'),
'comment_count': meta.get('commentsCount'),
'tags': meta.get('tagList'),
'age_limit': 18,
}
class MurrtubeUserIE(MurrtubeIE): # XXX: Do not subclass from concrete IE
_WORKING = False
IE_DESC = 'Murrtube user profile'
_VALID_URL = r'https?://murrtube\.net/(?P<id>[^/]+)$'
_TEST = {
'url': 'https://murrtube.net/stormy',
'info_dict': {
'id': 'stormy',
},
'playlist_mincount': 27,
}
_PAGE_SIZE = 10
def _fetch_page(self, username, user_id, page):
data = self._download_gql(username, {
'operationName': 'Media',

View File

@ -40,7 +40,6 @@ class NiconicoIE(InfoExtractor):
_TESTS = [{
'url': 'http://www.nicovideo.jp/watch/sm22312215',
'md5': 'd1a75c0823e2f629128c43e1212760f9',
'info_dict': {
'id': 'sm22312215',
'ext': 'mp4',
@ -56,8 +55,8 @@ class NiconicoIE(InfoExtractor):
'comment_count': int,
'genres': ['未設定'],
'tags': [],
'expected_protocol': str,
},
'params': {'skip_download': 'm3u8'},
}, {
# File downloaded with and without credentials are different, so omit
# the md5 field
@ -77,8 +76,8 @@ class NiconicoIE(InfoExtractor):
'view_count': int,
'genres': ['音楽・サウンド'],
'tags': ['Translation_Request', 'Kagamine_Rin', 'Rin_Original'],
'expected_protocol': str,
},
'params': {'skip_download': 'm3u8'},
}, {
# 'video exists but is marked as "deleted"
# md5 is unstable
@ -112,7 +111,6 @@ class NiconicoIE(InfoExtractor):
}, {
# video not available via `getflv`; "old" HTML5 video
'url': 'http://www.nicovideo.jp/watch/sm1151009',
'md5': 'f95a3d259172667b293530cc2e41ebda',
'info_dict': {
'id': 'sm1151009',
'ext': 'mp4',
@ -128,11 +126,10 @@ class NiconicoIE(InfoExtractor):
'comment_count': int,
'genres': ['ゲーム'],
'tags': [],
'expected_protocol': str,
},
'params': {'skip_download': 'm3u8'},
}, {
# "New" HTML5 video
# md5 is unstable
'url': 'http://www.nicovideo.jp/watch/sm31464864',
'info_dict': {
'id': 'sm31464864',
@ -149,12 +146,11 @@ class NiconicoIE(InfoExtractor):
'comment_count': int,
'genres': ['アニメ'],
'tags': [],
'expected_protocol': str,
},
'params': {'skip_download': 'm3u8'},
}, {
# Video without owner
'url': 'http://www.nicovideo.jp/watch/sm18238488',
'md5': 'd265680a1f92bdcbbd2a507fc9e78a9e',
'info_dict': {
'id': 'sm18238488',
'ext': 'mp4',
@ -168,8 +164,8 @@ class NiconicoIE(InfoExtractor):
'comment_count': int,
'genres': ['エンターテイメント'],
'tags': [],
'expected_protocol': str,
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'http://sp.nicovideo.jp/watch/sm28964488?ss_pos=1&cp_in=wt_tg',
'only_matching': True,
@ -458,9 +454,11 @@ def _real_extract(self, url):
if video_id.startswith('so'):
video_id = self._match_id(handle.url)
api_data = self._parse_json(self._html_search_regex(
'data-api-data="([^"]+)"', webpage,
'API data', default='{}'), video_id)
api_data = traverse_obj(
self._parse_json(self._html_search_meta('server-response', webpage) or '', video_id),
('data', 'response', {dict}))
if not api_data:
raise ExtractorError('Server response data not found')
except ExtractorError as e:
try:
api_data = self._download_json(

View File

@ -1,9 +1,19 @@
from .common import InfoExtractor
from ..utils import int_or_none, try_get
from ..networking.exceptions import HTTPError
from ..utils import (
ExtractorError,
int_or_none,
parse_iso8601,
parse_qs,
try_get,
update_url,
url_or_none,
)
from ..utils.traversal import traverse_obj
class OlympicsReplayIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?olympics\.com(?:/tokyo-2020)?/[a-z]{2}/(?:replay|video)/(?P<id>[^/#&?]+)'
_VALID_URL = r'https?://(?:www\.)?olympics\.com/[a-z]{2}/(?:paris-2024/)?(?:replay|videos?|original-series/episode)/(?P<id>[\w-]+)'
_TESTS = [{
'url': 'https://olympics.com/fr/video/men-s-109kg-group-a-weightlifting-tokyo-2020-replays',
'info_dict': {
@ -11,26 +21,105 @@ class OlympicsReplayIE(InfoExtractor):
'ext': 'mp4',
'title': '+109kg (H) Groupe A - Haltérophilie | Replay de Tokyo 2020',
'upload_date': '20210801',
'timestamp': 1627783200,
'timestamp': 1627797600,
'description': 'md5:c66af4a5bc7429dbcc43d15845ff03b3',
'uploader': 'International Olympic Committee',
},
'params': {
'skip_download': True,
'thumbnail': 'https://img.olympics.com/images/image/private/t_1-1_1280/primary/nua4o7zwyaznoaejpbk2',
'duration': 7017.0,
},
}, {
'url': 'https://olympics.com/tokyo-2020/en/replay/bd242924-4b22-49a5-a846-f1d4c809250d/mens-bronze-medal-match-hun-esp',
'only_matching': True,
'url': 'https://olympics.com/en/original-series/episode/b-boys-and-b-girls-take-the-spotlight-breaking-life-road-to-paris-2024',
'info_dict': {
'id': '32633650-c5ee-4280-8b94-fb6defb6a9b5',
'ext': 'mp4',
'title': 'B-girl Nicka - Breaking Life, Road to Paris 2024 | Episode 1',
'upload_date': '20240517',
'timestamp': 1715948200,
'description': 'md5:f63d728a41270ec628f6ac33ce471bb1',
'thumbnail': 'https://img.olympics.com/images/image/private/t_1-1_1280/primary/a3j96l7j6so3vyfijby1',
'duration': 1321.0,
},
}, {
'url': 'https://olympics.com/en/paris-2024/videos/men-s-preliminaries-gbr-esp-ned-rsa-hockey-olympic-games-paris-2024',
'info_dict': {
'id': '3d96db23-8eee-4b7c-8ef5-488a0361026c',
'ext': 'mp4',
'title': 'Men\'s Preliminaries GBR-ESP & NED-RSA | Hockey | Olympic Games Paris 2024',
'upload_date': '20240727',
'timestamp': 1722066600,
},
'skip': 'Geo-restricted to RU, BR, BT, NP, TM, BD, TL',
}, {
'url': 'https://olympics.com/en/paris-2024/videos/dnp-suni-lee-i-have-goals-and-i-have-expectations-for-myself-but-i-also-am-trying-to-give-myself-grace',
'info_dict': {
'id': 'a42f37ab-8a74-41d0-a7d9-af27b7b02a90',
'ext': 'mp4',
'title': 'md5:c7cfbc9918636a98e66400a812e4d407',
'upload_date': '20240729',
'timestamp': 1722288600,
},
}]
_GEO_BYPASS = False
def _extract_from_nextjs_data(self, webpage, video_id):
data = traverse_obj(self._search_nextjs_data(webpage, video_id, default={}), (
'props', 'pageProps', 'page', 'items',
lambda _, v: v['name'] == 'videoPlaylist', 'data', 'currentVideo', {dict}, any))
if not data:
return None
geo_countries = traverse_obj(data, ('countries', ..., {str}))
if traverse_obj(data, ('geoRestrictedVideo', {bool})):
self.raise_geo_restricted(countries=geo_countries)
is_live = traverse_obj(data, ('streamingStatus', {str})) == 'LIVE'
m3u8_url = traverse_obj(data, ('videoUrl', {url_or_none})) or data['streamUrl']
tokenized_url = self._tokenize_url(m3u8_url, data['jwtToken'], is_live, video_id)
try:
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
tokenized_url, video_id, 'mp4', m3u8_id='hls')
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and 'georestricted' in e.cause.msg:
self.raise_geo_restricted(countries=geo_countries)
raise
return {
'formats': formats,
'subtitles': subtitles,
'is_live': is_live,
**traverse_obj(data, {
'id': ('videoID', {str}),
'title': ('title', {str}),
'timestamp': ('contentDate', {parse_iso8601}),
}),
}
def _tokenize_url(self, url, token, is_live, video_id):
return self._download_json(
'https://metering.olympics.com/tokengenerator', video_id,
'Downloading tokenized m3u8 url', query={
**parse_qs(url),
'url': update_url(url, query=None),
'service-id': 'live' if is_live else 'vod',
'user-auth': token,
})['data']['url']
def _legacy_tokenize_url(self, url, video_id):
return self._download_json(
'https://olympics.com/tokenGenerator', video_id,
'Downloading legacy tokenized m3u8 url', query={'url': url})
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
if info := self._extract_from_nextjs_data(webpage, video_id):
return info
title = self._html_search_meta(('title', 'og:title', 'twitter:title'), webpage)
uuid = self._html_search_meta('episode_uid', webpage)
video_uuid = self._html_search_meta('episode_uid', webpage)
m3u8_url = self._html_search_meta('video_url', webpage)
json_ld = self._search_json_ld(webpage, uuid)
json_ld = self._search_json_ld(webpage, video_uuid)
thumbnails_list = json_ld.get('image')
if not thumbnails_list:
thumbnails_list = self._html_search_regex(
@ -48,12 +137,12 @@ def _real_extract(self, url):
'width': width,
'height': int_or_none(try_get(width, lambda x: x * height_a / width_a)),
})
m3u8_url = self._download_json(
f'https://olympics.com/tokenGenerator?url={m3u8_url}', uuid, note='Downloading m3u8 url')
formats, subtitles = self._extract_m3u8_formats_and_subtitles(m3u8_url, uuid, 'mp4', m3u8_id='hls')
formats, subtitles = self._extract_m3u8_formats_and_subtitles(
self._legacy_tokenize_url(m3u8_url, video_uuid), video_uuid, 'mp4', m3u8_id='hls')
return {
'id': uuid,
'id': video_uuid,
'title': title,
'thumbnails': thumbnails,
'formats': formats,

View File

@ -420,7 +420,7 @@ def _get_comments(self, post_id):
class PatreonCampaignIE(PatreonBaseIE):
_VALID_URL = r'https?://(?:www\.)?patreon\.com/(?!rss)(?:(?:m/(?P<campaign_id>\d+))|(?P<vanity>[-\w]+))'
_VALID_URL = r'https?://(?:www\.)?patreon\.com/(?!rss)(?:(?:m|api/campaigns)/(?P<campaign_id>\d+)|(?P<vanity>[-\w]+))'
_TESTS = [{
'url': 'https://www.patreon.com/dissonancepod/',
'info_dict': {
@ -442,25 +442,44 @@ class PatreonCampaignIE(PatreonBaseIE):
'url': 'https://www.patreon.com/m/4767637/posts',
'info_dict': {
'title': 'Not Just Bikes',
'channel_follower_count': int,
'id': '4767637',
'channel_id': '4767637',
'channel_url': 'https://www.patreon.com/notjustbikes',
'description': 'md5:595c6e7dca76ae615b1d38c298a287a1',
'description': 'md5:9f4b70051216c4d5c58afe580ffc8d0f',
'age_limit': 0,
'channel': 'Not Just Bikes',
'uploader_url': 'https://www.patreon.com/notjustbikes',
'uploader': 'Not Just Bikes',
'uploader': 'Jason',
'uploader_id': '37306634',
'thumbnail': r're:^https?://.*$',
},
'playlist_mincount': 71,
}, {
'url': 'https://www.patreon.com/api/campaigns/4243769/posts',
'info_dict': {
'title': 'Second Thought',
'channel_follower_count': int,
'id': '4243769',
'channel_id': '4243769',
'channel_url': 'https://www.patreon.com/secondthought',
'description': 'md5:69c89a3aba43efdb76e85eb023e8de8b',
'age_limit': 0,
'channel': 'Second Thought',
'uploader_url': 'https://www.patreon.com/secondthought',
'uploader': 'JT Chapman',
'uploader_id': '32718287',
'thumbnail': r're:^https?://.*$',
},
'playlist_mincount': 201,
}, {
'url': 'https://www.patreon.com/dissonancepod/posts',
'only_matching': True,
}, {
'url': 'https://www.patreon.com/m/5932659',
'only_matching': True,
}, {
'url': 'https://www.patreon.com/api/campaigns/4243769',
'only_matching': True,
}]
@classmethod

View File

@ -5,6 +5,7 @@
ExtractorError,
str_or_none,
traverse_obj,
update_url,
)
@ -43,15 +44,16 @@ def _real_extract(self, url):
url
}
}''' % (channel_id, channel_id), # noqa: UP031
})['data']
}, headers={'Accept': '*/*', 'Content-Type': 'application/json'})['data']
metadata = data['channel']
if metadata.get('online') == 0:
raise ExtractorError('Stream is offline', expected=True)
title = metadata['title']
cdn_data = self._download_json(
data['getLoadBalancerUrl']['url'] + '/stream/json_' + metadata['stream_name'] + '.js',
cdn_data = self._download_json(''.join((
update_url(data['getLoadBalancerUrl']['url'], scheme='https'),
'/stream/json_', metadata['stream_name'], '.js')),
channel_id, 'Downloading load balancing info')
formats = []
@ -99,10 +101,10 @@ class PicartoVodIE(InfoExtractor):
},
'skip': 'The VOD does not exist',
}, {
'url': 'https://picarto.tv/ArtofZod/videos/772650',
'md5': '00067a0889f1f6869cc512e3e79c521b',
'url': 'https://picarto.tv/ArtofZod/videos/771008',
'md5': 'abef5322f2700d967720c4c6754b2a34',
'info_dict': {
'id': '772650',
'id': '771008',
'ext': 'mp4',
'title': 'Art of Zod - Drawing and Painting',
'thumbnail': r're:^https?://.*\.jpg',
@ -131,7 +133,7 @@ def _real_extract(self, url):
}}
}}
}}''',
})['data']['video']
}, headers={'Accept': '*/*', 'Content-Type': 'application/json'})['data']['video']
file_name = data['file_name']
netloc = urllib.parse.urlparse(data['video_recording_image_url']).netloc

View File

@ -314,23 +314,11 @@ def add_format(f, protocol, is_preview=False):
self.write_debug(f'"{identifier}" is not a requested format, skipping')
continue
stream = None
for retry in self.RetryManager(fatal=False):
try:
stream = self._call_api(
# XXX: if not extract_flat, 429 error must be caught where _extract_info_dict is called
stream_url = traverse_obj(self._call_api(
format_url, track_id, f'Downloading {identifier} format info JSON',
query=query, headers=self._HEADERS)
except ExtractorError as e:
if isinstance(e.cause, HTTPError) and e.cause.status == 429:
self.report_warning(
'You have reached the API rate limit, which is ~600 requests per '
'10 minutes. Use the --extractor-retries and --retry-sleep options '
'to configure an appropriate retry count and wait time', only_once=True)
retry.error = e.cause
else:
self.report_warning(e.msg)
query=query, headers=self._HEADERS), ('url', {url_or_none}))
stream_url = traverse_obj(stream, ('url', {url_or_none}))
if invalid_url(stream_url):
continue
format_urls.add(stream_url)
@ -647,7 +635,17 @@ def _real_extract(self, url):
info = self._call_api(
info_json_url, full_title, 'Downloading info JSON', query=query, headers=self._HEADERS)
for retry in self.RetryManager():
try:
return self._extract_info_dict(info, full_title, token)
except ExtractorError as e:
if not isinstance(e.cause, HTTPError) or not e.cause.status == 429:
raise
self.report_warning(
'You have reached the API rate limit, which is ~600 requests per '
'10 minutes. Use the --extractor-retries and --retry-sleep options '
'to configure an appropriate retry count and wait time', only_once=True)
retry.error = e.cause
class SoundcloudPlaylistBaseIE(SoundcloudBaseIE):
@ -873,7 +871,7 @@ class SoundcloudUserPermalinkIE(SoundcloudPagedPlaylistBaseIE):
'id': '30909869',
'title': 'neilcic',
},
'playlist_mincount': 23,
'playlist_mincount': 22,
}]
def _real_extract(self, url):
@ -882,7 +880,7 @@ def _real_extract(self, url):
self._resolv_url(url), user_id, 'Downloading user info', headers=self._HEADERS)
return self._extract_playlist(
f'{self._API_V2_BASE}stream/users/{user["id"]}', str(user['id']), user.get('username'))
f'{self._API_V2_BASE}users/{user["id"]}/tracks', str(user['id']), user.get('username'))
class SoundcloudTrackStationIE(SoundcloudPagedPlaylistBaseIE):

View File

@ -1,55 +1,31 @@
from .common import InfoExtractor
from ..utils import ExtractorError, int_or_none, traverse_obj
from .vidyard import VidyardBaseIE
from ..utils import ExtractorError, int_or_none, make_archive_id
class SwearnetEpisodeIE(InfoExtractor):
class SwearnetEpisodeIE(VidyardBaseIE):
_VALID_URL = r'https?://www\.swearnet\.com/shows/(?P<id>[\w-]+)/seasons/(?P<season_num>\d+)/episodes/(?P<episode_num>\d+)'
_TESTS = [{
'url': 'https://www.swearnet.com/shows/gettin-learnt-with-ricky/seasons/1/episodes/1',
'info_dict': {
'id': '232819',
'id': 'wicK2EOzjOdxkUXGDIgcPw',
'display_id': '232819',
'ext': 'mp4',
'episode_number': 1,
'episode': 'Episode 1',
'duration': 719,
'description': 'md5:c48ef71440ce466284c07085cd7bd761',
'description': r're:Are you drunk and high and craving a grilled cheese sandwich.+',
'season': 'Season 1',
'title': 'Episode 1 - Grilled Cheese Sammich',
'season_number': 1,
'thumbnail': 'https://cdn.vidyard.com/thumbnails/232819/_RX04IKIq60a2V6rIRqq_Q_small.jpg',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/custom/0dd74f9b-388a-452e-b570-b407fb64435b_small.jpg',
'tags': ['Getting Learnt with Ricky', 'drunk', 'grilled cheese', 'high'],
'_old_archive_ids': ['swearnetepisode 232819'],
},
}]
def _get_formats_and_subtitle(self, video_source, video_id):
video_source = video_source or {}
formats, subtitles = [], {}
for key, value in video_source.items():
if key == 'hls':
for video_hls in value:
fmts, subs = self._extract_m3u8_formats_and_subtitles(video_hls.get('url'), video_id)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
else:
formats.extend({
'url': video_mp4.get('url'),
'ext': 'mp4',
} for video_mp4 in value)
return formats, subtitles
def _get_direct_subtitle(self, caption_json):
subs = {}
for caption in caption_json:
subs.setdefault(caption.get('language') or 'und', []).append({
'url': caption.get('vttUrl'),
'name': caption.get('name'),
})
return subs
def _real_extract(self, url):
display_id, season_number, episode_number = self._match_valid_url(url).group('id', 'season_num', 'episode_num')
webpage = self._download_webpage(url, display_id)
slug, season_number, episode_number = self._match_valid_url(url).group('id', 'season_num', 'episode_num')
webpage = self._download_webpage(url, slug)
try:
external_id = self._search_regex(r'externalid\s*=\s*"([^"]+)', webpage, 'externalid')
@ -58,22 +34,12 @@ def _real_extract(self, url):
self.raise_login_required()
raise
json_data = self._download_json(
f'https://play.vidyard.com/player/{external_id}.json', display_id)['payload']['chapters'][0]
formats, subtitles = self._get_formats_and_subtitle(json_data['sources'], display_id)
self._merge_subtitles(self._get_direct_subtitle(json_data.get('captions')), target=subtitles)
info = self._process_video_json(self._fetch_video_json(external_id)['chapters'][0], external_id)
if info.get('display_id'):
info['_old_archive_ids'] = [make_archive_id(self, info['display_id'])]
return {
'id': str(json_data['videoId']),
'title': json_data.get('name') or self._html_search_meta(['og:title', 'twitter:title'], webpage),
'description': (json_data.get('description')
or self._html_search_meta(['og:description', 'twitter:description'], webpage)),
'duration': int_or_none(json_data.get('seconds')),
'formats': formats,
'subtitles': subtitles,
**info,
'season_number': int_or_none(season_number),
'episode_number': int_or_none(episode_number),
'thumbnails': [{'url': thumbnail_url}
for thumbnail_url in traverse_obj(json_data, ('thumbnailUrls', ...))],
}

View File

@ -23,7 +23,6 @@
mimetype2ext,
parse_qs,
qualities,
remove_start,
srt_subtitles_timecode,
str_or_none,
traverse_obj,
@ -254,7 +253,16 @@ def _extract_web_data_and_status(self, url, video_id, fatal=True):
def _get_subtitles(self, aweme_detail, aweme_id, user_name):
# TODO: Extract text positioning info
EXT_MAP = { # From lowest to highest preference
'creator_caption': 'json',
'srt': 'srt',
'webvtt': 'vtt',
}
preference = qualities(tuple(EXT_MAP.values()))
subtitles = {}
# aweme/detail endpoint subs
captions_info = traverse_obj(
aweme_detail, ('interaction_stickers', ..., 'auto_video_caption_info', 'auto_captions', ...), expected_type=dict)
@ -278,8 +286,8 @@ def _get_subtitles(self, aweme_detail, aweme_id, user_name):
if not caption.get('url'):
continue
subtitles.setdefault(caption.get('lang') or 'en', []).append({
'ext': remove_start(caption.get('caption_format'), 'web'),
'url': caption['url'],
'ext': EXT_MAP.get(caption.get('Format')),
})
# webpage subs
if not subtitles:
@ -288,9 +296,14 @@ def _get_subtitles(self, aweme_detail, aweme_id, user_name):
self._create_url(user_name, aweme_id), aweme_id, fatal=False)
for caption in traverse_obj(aweme_detail, ('video', 'subtitleInfos', lambda _, v: v['Url'])):
subtitles.setdefault(caption.get('LanguageCodeName') or 'en', []).append({
'ext': remove_start(caption.get('Format'), 'web'),
'url': caption['Url'],
'ext': EXT_MAP.get(caption.get('Format')),
})
# Deprioritize creator_caption json since it can't be embedded or used by media players
for lang, subs_list in subtitles.items():
subtitles[lang] = sorted(subs_list, key=lambda x: preference(x['ext']))
return subtitles
def _parse_url_key(self, url_key):
@ -1458,9 +1471,11 @@ def _real_extract(self, url):
if webpage:
data = self._get_sigi_state(webpage, uploader or room_id)
room_id = (traverse_obj(data, ('UserModule', 'users', ..., 'roomId', {str_or_none}), get_all=False)
or self._search_regex(r'snssdk\d*://live\?room_id=(\d+)', webpage, 'room ID', default=None)
or room_id)
room_id = (
traverse_obj(data, ((
('LiveRoom', 'liveRoomUserInfo', 'user'),
('UserModule', 'users', ...)), 'roomId', {str}, any))
or self._search_regex(r'snssdk\d*://live\?room_id=(\d+)', webpage, 'room ID', default=room_id))
uploader = uploader or traverse_obj(
data, ('LiveRoom', 'liveRoomUserInfo', 'user', 'uniqueId'),
('UserModule', 'users', ..., 'uniqueId'), get_all=False, expected_type=str)

View File

@ -28,35 +28,11 @@ class ToggleIE(InfoExtractor):
'skip_download': 'm3u8 download',
},
}, {
'note': 'DRM-protected video',
'url': 'http://www.mewatch.sg/en/movies/dug-s-special-mission/341413',
'info_dict': {
'id': '341413',
'ext': 'wvm',
'title': 'Dug\'s Special Mission',
'description': 'md5:e86c6f4458214905c1772398fabc93e0',
'upload_date': '20150827',
'timestamp': 1440644006,
},
'params': {
'skip_download': 'DRM-protected wvm download',
},
'only_matching': True,
}, {
# this also tests correct video id extraction
'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay',
'url': 'http://www.mewatch.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
'info_dict': {
'id': '332861',
'ext': 'mp4',
'title': '28th SEA Games (5 Show) - Episode 11',
'description': 'md5:3cd4f5f56c7c3b1340c50a863f896faa',
'upload_date': '20150605',
'timestamp': 1433480166,
},
'params': {
'skip_download': 'DRM-protected wvm download',
},
'skip': 'm3u8 links are geo-restricted',
'only_matching': True,
}, {
'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
'only_matching': True,

View File

@ -96,7 +96,7 @@ def _extract_subtitles(data_captions):
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
webpage = self._download_webpage(url, display_id, impersonate=True)
if ">Ce programme n'est malheureusement pas disponible pour votre zone géographique.<" in webpage:
self.raise_geo_restricted(countries=['FR'])
@ -122,8 +122,9 @@ def process_video_files(v):
if not token:
continue
deferred_json = self._download_json(
f'https://api.tv5monde.com/player/asset/{d_param}/resolve?condenseKS=true', display_id,
note='Downloading deferred info', headers={'Authorization': f'Bearer {token}'}, fatal=False)
f'https://api.tv5monde.com/player/asset/{d_param}/resolve?condenseKS=true',
display_id, 'Downloading deferred info', fatal=False, impersonate=True,
headers={'Authorization': f'Bearer {token}'})
v_url = traverse_obj(deferred_json, (0, 'url', {url_or_none}))
if not v_url:
continue

View File

@ -1,60 +1,29 @@
import functools
import re
from .brightcove import BrightcoveNewIE
from .common import InfoExtractor
from ..utils import float_or_none, int_or_none, smuggle_url, strip_or_none
from ..utils.traversal import traverse_obj
class TVAIE(InfoExtractor):
_VALID_URL = r'https?://videos?\.tva\.ca/details/_(?P<id>\d+)'
IE_NAME = 'tvaplus'
IE_DESC = 'TVA+'
_VALID_URL = r'https?://(?:www\.)?tvaplus\.ca/(?:[^/?#]+/)*[\w-]+-(?P<id>\d+)(?:$|[#?])'
_TESTS = [{
'url': 'https://videos.tva.ca/details/_5596811470001',
'info_dict': {
'id': '5596811470001',
'ext': 'mp4',
'title': 'Un extrait de l\'épisode du dimanche 8 octobre 2017 !',
'uploader_id': '5481942443001',
'upload_date': '20171003',
'timestamp': 1507064617,
},
'params': {
# m3u8 download
'skip_download': True,
},
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'https://video.tva.ca/details/_5596811470001',
'only_matching': True,
}]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5481942443001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
video_id = self._match_id(url)
return {
'_type': 'url_transparent',
'id': video_id,
'url': smuggle_url(self.BRIGHTCOVE_URL_TEMPLATE % video_id, {'geo_countries': ['CA']}),
'ie_key': 'BrightcoveNew',
}
class QubIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?qub\.ca/(?:[^/]+/)*[0-9a-z-]+-(?P<id>\d+)'
_TESTS = [{
'url': 'https://www.qub.ca/tvaplus/tva/alerte-amber/saison-1/episode-01-1000036619',
'url': 'https://www.tvaplus.ca/tva/alerte-amber/saison-1/episode-01-1000036619',
'md5': '949490fd0e7aee11d0543777611fbd53',
'info_dict': {
'id': '6084352463001',
'ext': 'mp4',
'title': 'Ép 01. Mon dernier jour',
'title': 'Mon dernier jour',
'uploader_id': '5481942443001',
'upload_date': '20190907',
'timestamp': 1567899756,
'description': 'md5:9c0d7fbb90939420c651fd977df90145',
'thumbnail': r're:https://.+\.jpg',
'episode': 'Ép 01. Mon dernier jour',
'episode': 'Mon dernier jour',
'episode_number': 1,
'tags': ['alerte amber', 'alerte amber saison 1', 'surdemande'],
'duration': 2625.963,
@ -64,23 +33,36 @@ class QubIE(InfoExtractor):
'channel': 'TVA',
},
}, {
'url': 'https://www.qub.ca/tele/video/lcn-ca-vous-regarde-rev-30s-ap369664-1009357943',
'only_matching': True,
'url': 'https://www.tvaplus.ca/tva/le-baiser-du-barbu/le-baiser-du-barbu-886644190',
'info_dict': {
'id': '6354448043112',
'ext': 'mp4',
'title': 'Le Baiser du barbu',
'uploader_id': '5481942443001',
'upload_date': '20240606',
'timestamp': 1717694023,
'description': 'md5:025b1219086c1cbf4bc27e4e034e8b57',
'thumbnail': r're:https://.+\.jpg',
'episode': 'Le Baiser du barbu',
'tags': ['fullepisode', 'films'],
'duration': 6053.504,
'series': 'Le Baiser du barbu',
'channel': 'TVA',
},
}]
# reference_id also works with old account_id(5481942443001)
# BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5813221784001/default_default/index.html?videoId=ref:%s'
_BC_URL_TMPL = 'https://players.brightcove.net/5481942443001/default_default/index.html?videoId={}'
def _real_extract(self, url):
entity_id = self._match_id(url)
webpage = self._download_webpage(url, entity_id)
entity = self._search_nextjs_data(webpage, entity_id)['props']['initialProps']['pageProps']['fallbackData']
entity = self._search_nextjs_data(webpage, entity_id)['props']['pageProps']['staticEntity']
video_id = entity['videoId']
episode = strip_or_none(entity.get('name'))
return {
'_type': 'url_transparent',
'url': f'https://videos.tva.ca/details/_{video_id}',
'ie_key': TVAIE.ie_key(),
'url': smuggle_url(self._BC_URL_TMPL.format(video_id), {'geo_countries': ['CA']}),
'ie_key': BrightcoveNewIE.ie_key(),
'id': video_id,
'title': episode,
'episode': episode,

View File

@ -10,7 +10,7 @@
class TVerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?tver\.jp/(?:(?P<type>lp|corner|series|episodes?|feature|tokyo2020/video)/)+(?P<id>[a-zA-Z0-9]+)'
_VALID_URL = r'https?://(?:www\.)?tver\.jp/(?:(?P<type>lp|corner|series|episodes?|feature|tokyo2020/video|olympic/paris2024/video)/)+(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'skip': 'videos are only available for 7 days',
'url': 'https://tver.jp/episodes/ep83nf3w4p',
@ -23,6 +23,20 @@ class TVerIE(InfoExtractor):
'channel': 'テレビ朝日',
},
'add_ie': ['BrightcoveNew'],
}, {
'url': 'https://tver.jp/olympic/paris2024/video/6359578055112/',
'info_dict': {
'id': '6359578055112',
'ext': 'mp4',
'title': '堀米雄斗 金メダルで五輪連覇!「みんなの応援が最後に乗れたカギ」',
'timestamp': 1722279928,
'upload_date': '20240729',
'tags': ['20240729', 'japanese', 'japanmedal', 'paris'],
'uploader_id': '4774017240001',
'thumbnail': r're:https?://[^/?#]+boltdns\.net/[^?#]+/1920x1080/match/image\.jpg',
'duration': 670.571,
},
'params': {'skip_download': 'm3u8'},
}, {
'url': 'https://tver.jp/corner/f0103888',
'only_matching': True,
@ -47,7 +61,15 @@ def _real_initialize(self):
def _real_extract(self, url):
video_id, video_type = self._match_valid_url(url).group('id', 'type')
if video_type not in {'series', 'episodes'}:
if video_type == 'olympic/paris2024/video':
# Player ID is taken from .content.brightcove.E200.pro.pc.account_id:
# https://tver.jp/olympic/paris2024/req/api/hook?q=https%3A%2F%2Folympic-assets.tver.jp%2Fweb-static%2Fjson%2Fconfig.json&d=
return self.url_result(smuggle_url(
self.BRIGHTCOVE_URL_TEMPLATE % ('4774017240001', video_id),
{'geo_countries': ['JP']}), 'BrightcoveNew')
elif video_type not in {'series', 'episodes'}:
webpage = self._download_webpage(url, video_id, note='Resolving to new URL')
video_id = self._match_id(self._search_regex(
(r'canonical"\s*href="(https?://tver\.jp/[^"]+)"', r'&link=(https?://tver\.jp/[^?&]+)[?&]'),

View File

@ -49,6 +49,7 @@ class KnownDRMIE(UnsupportedInfoExtractor):
r'amazon\.(?:\w{2}\.)?\w+/gp/video',
r'music\.amazon\.(?:\w{2}\.)?\w+',
r'(?:watch|front)\.njpwworld\.com',
r'qub\.ca/vrai',
)
_TESTS = [{
@ -149,6 +150,9 @@ class KnownDRMIE(UnsupportedInfoExtractor):
}, {
'url': 'https://front.njpwworld.com/p/s_series_00563_16_bs',
'only_matching': True,
}, {
'url': 'https://www.qub.ca/vrai/l-effet-bocuse-d-or/saison-1/l-effet-bocuse-d-or-saison-1-bande-annonce-1098225063',
'only_matching': True,
}]
def _real_extract(self, url):

426
yt_dlp/extractor/vidyard.py Normal file
View File

@ -0,0 +1,426 @@
import functools
import re
from .common import InfoExtractor
from ..utils import (
extract_attributes,
float_or_none,
int_or_none,
join_nonempty,
mimetype2ext,
parse_resolution,
str_or_none,
unescapeHTML,
url_or_none,
)
from ..utils.traversal import traverse_obj
class VidyardBaseIE(InfoExtractor):
_HEADERS = {'Referer': 'https://play.vidyard.com/'}
def _get_formats_and_subtitles(self, sources, video_id):
formats, subtitles = [], {}
def add_hls_fmts_and_subs(m3u8_url):
fmts, subs = self._extract_m3u8_formats_and_subtitles(
m3u8_url, video_id, 'mp4', m3u8_id='hls', headers=self._HEADERS, fatal=False)
formats.extend(fmts)
self._merge_subtitles(subs, target=subtitles)
hls_list = isinstance(sources, dict) and sources.pop('hls', None)
if master_m3u8_url := traverse_obj(
hls_list, (lambda _, v: v['profile'] == 'auto', 'url', {url_or_none}, any)):
add_hls_fmts_and_subs(master_m3u8_url)
if not formats: # These are duplicate and unnecesary requests if we got 'auto' hls fmts
for variant_m3u8_url in traverse_obj(hls_list, (..., 'url', {url_or_none})):
add_hls_fmts_and_subs(variant_m3u8_url)
for source_type, source_list in traverse_obj(sources, ({dict.items}, ...)):
for source in traverse_obj(source_list, lambda _, v: url_or_none(v['url'])):
profile = source.get('profile')
formats.append({
'url': source['url'],
'ext': mimetype2ext(source.get('mimeType'), default=None),
'format_id': join_nonempty('http', source_type, profile),
**parse_resolution(profile),
})
self._remove_duplicate_formats(formats)
return formats, subtitles
def _get_direct_subtitles(self, caption_json):
subs = {}
for caption in traverse_obj(caption_json, lambda _, v: url_or_none(v['vttUrl'])):
subs.setdefault(caption.get('language') or 'und', []).append({
'url': caption['vttUrl'],
'name': caption.get('name'),
})
return subs
def _fetch_video_json(self, video_id):
return self._download_json(
f'https://play.vidyard.com/player/{video_id}.json', video_id)['payload']
def _process_video_json(self, json_data, video_id):
formats, subtitles = self._get_formats_and_subtitles(json_data['sources'], video_id)
self._merge_subtitles(self._get_direct_subtitles(json_data.get('captions')), target=subtitles)
return {
**traverse_obj(json_data, {
'id': ('facadeUuid', {str}),
'display_id': ('videoId', {int}, {str_or_none}),
'title': ('name', {str}),
'description': ('description', {str}, {unescapeHTML}, {lambda x: x or None}),
'duration': ((
('milliseconds', {functools.partial(float_or_none, scale=1000)}),
('seconds', {int_or_none})), any),
'thumbnails': ('thumbnailUrls', ('small', 'normal'), {'url': {url_or_none}}),
'tags': ('tags', ..., 'name', {str}),
}),
'formats': formats,
'subtitles': subtitles,
'http_headers': self._HEADERS,
}
class VidyardIE(VidyardBaseIE):
_VALID_URL = [
r'https?://[\w-]+(?:\.hubs)?\.vidyard\.com/watch/(?P<id>[\w-]+)',
r'https?://(?:embed|share)\.vidyard\.com/share/(?P<id>[\w-]+)',
r'https?://play\.vidyard\.com/(?:player/)?(?P<id>[\w-]+)',
]
_EMBED_REGEX = [r'<iframe[^>]* src=["\'](?P<url>(?:https?:)?//play\.vidyard\.com/[\w-]+)']
_TESTS = [{
'url': 'https://vyexample03.hubs.vidyard.com/watch/oTDMPlUv--51Th455G5u7Q',
'info_dict': {
'id': 'oTDMPlUv--51Th455G5u7Q',
'display_id': '50347',
'ext': 'mp4',
'title': 'Homepage Video',
'description': 'Look I changed the description.',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/50347/OUPa5LTKV46849sLYngMqQ_small.jpg',
'duration': 99,
'tags': ['these', 'are', 'all', 'tags'],
},
}, {
'url': 'https://share.vidyard.com/watch/PaQzDAT1h8JqB8ivEu2j6Y?',
'info_dict': {
'id': 'PaQzDAT1h8JqB8ivEu2j6Y',
'display_id': '9281024',
'ext': 'mp4',
'title': 'Inline Embed',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/spacer.gif',
'duration': 41.186,
},
}, {
'url': 'https://embed.vidyard.com/share/oTDMPlUv--51Th455G5u7Q',
'info_dict': {
'id': 'oTDMPlUv--51Th455G5u7Q',
'display_id': '50347',
'ext': 'mp4',
'title': 'Homepage Video',
'description': 'Look I changed the description.',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/50347/OUPa5LTKV46849sLYngMqQ_small.jpg',
'duration': 99,
'tags': ['these', 'are', 'all', 'tags'],
},
}, {
# First video from playlist below
'url': 'https://embed.vidyard.com/share/SyStyHtYujcBHe5PkZc5DL',
'info_dict': {
'id': 'SyStyHtYujcBHe5PkZc5DL',
'display_id': '41974005',
'ext': 'mp4',
'title': 'Prepare the Frame and Track for Palm Beach Polysatin Shutters With BiFold Track',
'description': r're:In this video, you will learn how to prepare the frame.+',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/41974005/IJw7oCaJcF1h7WWu3OVZ8A_small.png',
'duration': 258.666,
},
}, {
# Playlist
'url': 'https://thelink.hubs.vidyard.com/watch/pwu7pCYWSwAnPxs8nDoFrE',
'info_dict': {
'id': 'pwu7pCYWSwAnPxs8nDoFrE',
'title': 'PLAYLIST - Palm Beach Shutters- Bi-Fold Track System Installation',
'entries': [{
'id': 'SyStyHtYujcBHe5PkZc5DL',
'display_id': '41974005',
'ext': 'mp4',
'title': 'Prepare the Frame and Track for Palm Beach Polysatin Shutters With BiFold Track',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/41974005/IJw7oCaJcF1h7WWu3OVZ8A_small.png',
'duration': 258.666,
}, {
'id': '1Fw4B84jZTXLXWqkE71RiM',
'display_id': '5861113',
'ext': 'mp4',
'title': 'Palm Beach - Bi-Fold Track System "Frame Installation"',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/5861113/29CJ54s5g1_aP38zkKLHew_small.jpg',
'duration': 167.858,
}, {
'id': 'DqP3wBvLXSpxrcqpT5kEeo',
'display_id': '41976334',
'ext': 'mp4',
'title': 'Install the Track for Palm Beach Polysatin Shutters With BiFold Track',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/5861090/RwG2VaTylUa6KhSTED1r1Q_small.png',
'duration': 94.229,
}, {
'id': 'opfybfxpzQArxqtQYB6oBU',
'display_id': '41976364',
'ext': 'mp4',
'title': 'Install the Panel for Palm Beach Polysatin Shutters With BiFold Track',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/5860926/JIOaJR08dM4QgXi_iQ2zGA_small.png',
'duration': 191.467,
}, {
'id': 'rWrXvkbTNNaNqD6189HJya',
'display_id': '41976382',
'ext': 'mp4',
'title': 'Adjust the Panels for Palm Beach Polysatin Shutters With BiFold Track',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/5860687/CwHxBv4UudAhOh43FVB4tw_small.png',
'duration': 138.155,
}, {
'id': 'eYPTB521MZ9TPEArSethQ5',
'display_id': '41976409',
'ext': 'mp4',
'title': 'Assemble and Install the Valance for Palm Beach Polysatin Shutters With BiFold Track',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/5861425/0y68qlMU4O5VKU7bJ8i_AA_small.png',
'duration': 148.224,
}],
},
'playlist_count': 6,
}, {
# Non hubs.vidyard.com playlist
'url': 'https://salesforce.vidyard.com/watch/d4vqPjs7Q5EzVEis5QT3jd',
'info_dict': {
'id': 'd4vqPjs7Q5EzVEis5QT3jd',
'title': 'How To: Service Cloud: Import External Content in Lightning Knowledge',
'entries': [{
'id': 'mcjDpSZir2iSttbvFkx6Rv',
'display_id': '29479036',
'ext': 'mp4',
'title': 'Welcome to this Expert Coaching Series',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/ouyQi9WuwyiOupChUWNmjQ/7170d3485ba602e012df05_small.jpg',
'duration': 38.205,
}, {
'id': '84bPYwpg243G6xYEfJdYw9',
'display_id': '21820704',
'ext': 'mp4',
'title': 'Chapter 1 - Title + Agenda',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/HFPN0ZgQq4Ow8BghGcQSow/bfaa30123c8f6601e7d7f2_small.jpg',
'duration': 98.016,
}, {
'id': 'nP17fMuvA66buVHUrzqjTi',
'display_id': '21820707',
'ext': 'mp4',
'title': 'Chapter 2 - Import Options',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/rGRIF5nFjPI9OOA2qJ_Dbg/86a8d02bfec9a566845dd4_small.jpg',
'duration': 199.136,
}, {
'id': 'm54EcwXdpA5gDBH5rgCYoV',
'display_id': '21820710',
'ext': 'mp4',
'title': 'Chapter 3 - Importing Article Translations',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/IVX4XR8zpSsiNIHx45kz-A/1ccbf8a29a33856d06b3ed_small.jpg',
'duration': 184.352,
}, {
'id': 'j4nzS42oq4hE9oRV73w3eQ',
'display_id': '21820716',
'ext': 'mp4',
'title': 'Chapter 4 - Best Practices',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/BtrRrQpRDLbA4AT95YQyog/1f1e6b8e7fdc3fa95ec8d3_small.jpg',
'duration': 296.960,
}, {
'id': 'y28PYfW5pftvers9PXzisC',
'display_id': '21820727',
'ext': 'mp4',
'title': 'Chapter 5 - Migration Steps',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/K2CdQOXDfLcrVTF60r0bdw/a09239ada28b6ffce12b1f_small.jpg',
'duration': 620.640,
}, {
'id': 'YWU1eQxYvhj29SjYoPw5jH',
'display_id': '21820733',
'ext': 'mp4',
'title': 'Chapter 6 - Demo',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/rsmhP-cO8dAa8ilvFGCX0g/7911ef415167cd14032068_small.jpg',
'duration': 631.456,
}, {
'id': 'nmEvVqpwdJUgb74zKsLGxn',
'display_id': '29479037',
'ext': 'mp4',
'title': 'Schedule Your Follow-Up',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/Rtwc7X4PEkF4Ae5kHi-Jvw/174ebed3f34227b1ffa1d0_small.jpg',
'duration': 33.608,
}],
},
'playlist_count': 8,
}, {
# URL of iframe embed src
'url': 'https://play.vidyard.com/iDqTwWGrd36vaLuaCY3nTs.html',
'info_dict': {
'id': 'iDqTwWGrd36vaLuaCY3nTs',
'display_id': '9281009',
'ext': 'mp4',
'title': 'Lightbox Embed',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/spacer.gif',
'duration': 39.035,
},
}, {
# Player JSON URL
'url': 'https://play.vidyard.com/player/7GAApnNNbcZZ46k6JqJQSh.json?disable_analytics=0',
'info_dict': {
'id': '7GAApnNNbcZZ46k6JqJQSh',
'display_id': '820026',
'ext': 'mp4',
'title': 'The Art of Storytelling: How to Deliver Your Brand Story with Content & Social',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/MhbE-5sEFQu4x3fI6FkNlA/41eb5717c557cd19456910_small.jpg',
'duration': 2153.013,
'tags': ['Summit2017'],
},
}, {
'url': 'http://share.vidyard.com/share/diYeo6YR2yiGgL8odvS8Ri',
'only_matching': True,
}, {
'url': 'https://play.vidyard.com/FFlz3ZpxhIfKQ1fd9DAryA',
'only_matching': True,
}, {
'url': 'https://play.vidyard.com/qhMAu5A76GZVrFzOPgSf9A/type/standalone',
'only_matching': True,
}]
_WEBPAGE_TESTS = [{
# URL containing inline/lightbox embedded video
'url': 'https://resources.altium.com/p/2-the-extreme-importance-of-pc-board-stack-up',
'info_dict': {
'id': 'GDx1oXrFWj4XHbipfoXaMn',
'display_id': '3225198',
'ext': 'mp4',
'title': 'The Extreme Importance of PC Board Stack Up',
'thumbnail': 'https://cdn.vidyard.com/thumbnails/73_Q3_hBexWX7Og1sae6cg/9998fa4faec921439e2c04_small.jpg',
'duration': 3422.742,
},
}, {
# <script ... id="vidyard_embed_code_DXx2sW4WaLA6hTdGFz7ja8" src="//play.vidyard.com/DXx2sW4WaLA6hTdGFz7ja8.js?
'url': 'http://videos.vivint.com/watch/DXx2sW4WaLA6hTdGFz7ja8',
'info_dict': {
'id': 'DXx2sW4WaLA6hTdGFz7ja8',
'display_id': '2746529',
'ext': 'mp4',
'title': 'How To Powercycle the Smart Hub Panel',
'duration': 30.613,
'thumbnail': 'https://cdn.vidyard.com/thumbnails/_-6cw8xQUJ3qiCs_JENc_A/b21d7a5e47967f49399d30_small.jpg',
},
}, {
# <script id="vidyard_embed_code_MIBHhiLVTxga7wqLsuoDjQ" src="//embed.vidyard.com/embed/MIBHhiLVTxga7wqLsuoDjQ/inline?v=2.1">
'url': 'https://www.babypips.com/learn/forex/introduction-to-metatrader4',
'info_dict': {
'id': 'MIBHhiLVTxga7wqLsuoDjQ',
'display_id': '20291',
'ext': 'mp4',
'title': 'Lesson 1 - Opening an MT4 Account',
'description': 'Never heard of MetaTrader4? Here\'s the 411 on the popular trading platform!',
'duration': 168,
'thumbnail': 'https://cdn.vidyard.com/thumbnails/20291/IM-G2WXQR9VBLl2Cmzvftg_small.jpg',
},
}, {
# <iframe ... src="//play.vidyard.com/d61w8EQoZv1LDuPxDkQP2Q/type/background?preview=1"
'url': 'https://www.avaya.com/en/',
'info_dict': {
# These values come from the generic extractor and don't matter
'id': str,
'title': str,
'age_limit': 0,
'upload_date': str,
'description': str,
'thumbnail': str,
'timestamp': float,
},
'playlist': [{
'info_dict': {
'id': 'd61w8EQoZv1LDuPxDkQP2Q',
'display_id': '42456529',
'ext': 'mp4',
'title': 'GettyImages-1027',
'duration': 6.0,
'thumbnail': 'https://cdn.vidyard.com/thumbnails/42061563/p6bY08d2N4e4IDz-7J4_wkgsPq3-qgcx_small.jpg',
},
}, {
'info_dict': {
'id': 'VAsYDi7eiqZRbHodUA2meC',
'display_id': '42456569',
'ext': 'mp4',
'title': 'GettyImages-1325598833',
'duration': 6.083,
'thumbnail': 'https://cdn.vidyard.com/thumbnails/42052358/y3qrbDpn_2quWr_5XBi7yzS3UvEI__ZM_small.jpg',
},
}],
'playlist_count': 2,
}, {
# <div class="vidyard-player-embed" data-uuid="vpCWTVHw3qrciLtVY94YkS"
'url': 'https://www.gogoair.com/',
'info_dict': {
# These values come from the generic extractor and don't matter
'id': str,
'title': str,
'description': str,
'age_limit': 0,
},
'playlist': [{
'info_dict': {
'id': 'vpCWTVHw3qrciLtVY94YkS',
'display_id': '40780699',
'ext': 'mp4',
'title': 'Upgrade to AVANCE 100% worth it - Jason Talley, Owner and Pilot, Testimonial',
'description': 'md5:f609824839439a51990cef55ffc472aa',
'duration': 70.737,
'thumbnail': 'https://cdn.vidyard.com/thumbnails/40780699/KzjfYZz5MZl2gHF_e-4i2c6ib1cLDweQ_small.jpg',
},
}, {
'info_dict': {
'id': 'xAmV9AsLbnitCw35paLBD8',
'display_id': '31130867',
'ext': 'mp4',
'title': 'Brad Keselowski goes faster with Gogo AVANCE inflight Wi-Fi',
'duration': 132.565,
'thumbnail': 'https://cdn.vidyard.com/thumbnails/31130867/HknyDtLdm2Eih9JZ4A5XLjhfBX_6HRw5_small.jpg',
},
}, {
'info_dict': {
'id': 'RkkrFRNxfP79nwCQavecpF',
'display_id': '39009815',
'ext': 'mp4',
'title': 'Live Demo of Gogo Galileo',
'description': 'md5:e2df497236f4e12c3fef8b392b5f23e0',
'duration': 112.128,
'thumbnail': 'https://cdn.vidyard.com/thumbnails/38144873/CWLlxfUbJ4Gh0ThbUum89IsEM4yupzMb_small.jpg',
},
}],
'playlist_count': 3,
}]
@classmethod
def _extract_embed_urls(cls, url, webpage):
# Handle protocol-less embed URLs
for embed_url in super()._extract_embed_urls(url, webpage):
if embed_url.startswith('//'):
embed_url = f'https:{embed_url}'
yield embed_url
# Extract inline/lightbox embeds
for embed_element in re.findall(
r'(<(?:img|div)[^>]* class=(["\'])(?:[^>"\']* )?vidyard-player-embed(?: [^>"\']*)?\2[^>]+>)', webpage):
if video_id := extract_attributes(embed_element[0]).get('data-uuid'):
yield f'https://play.vidyard.com/{video_id}'
for embed_id in re.findall(r'<script[^>]* id=["\']vidyard_embed_code_([\w-]+)["\']', webpage):
yield f'https://play.vidyard.com/{embed_id}'
def _real_extract(self, url):
video_id = self._match_id(url)
video_json = self._fetch_video_json(video_id)
if len(video_json['chapters']) == 1:
return self._process_video_json(video_json['chapters'][0], video_id)
return self.playlist_result(
[self._process_video_json(chapter, video_id) for chapter in video_json['chapters']],
str(video_json['playerUuid']), video_json.get('name'))

View File

@ -1,6 +1,7 @@
import base64
import functools
import itertools
import json
import re
import urllib.parse
@ -14,6 +15,7 @@
determine_ext,
get_element_by_class,
int_or_none,
join_nonempty,
js_to_json,
merge_dicts,
parse_filesize,
@ -84,29 +86,23 @@ def _get_video_password(self):
expected=True)
return password
def _verify_video_password(self, url, video_id, password, token, vuid):
if url.startswith('http://'):
# vimeo only supports https now, but the user can give an http url
url = url.replace('http://', 'https://')
self._set_vimeo_cookie('vuid', vuid)
def _verify_video_password(self, video_id, password, token):
url = f'https://vimeo.com/{video_id}'
try:
return self._download_webpage(
url + '/password', video_id, 'Verifying the password',
'Wrong password', data=urlencode_postdata({
f'{url}/password', video_id,
'Submitting video password', data=json.dumps({
'password': password,
'token': token,
}), headers={
'Content-Type': 'application/x-www-form-urlencoded',
}, separators=(',', ':')).encode(), headers={
'Accept': '*/*',
'Content-Type': 'application/json',
'Referer': url,
})
def _extract_xsrft_and_vuid(self, webpage):
xsrft = self._search_regex(
r'(?:(?P<q1>["\'])xsrft(?P=q1)\s*:|xsrft\s*[=:])\s*(?P<q>["\'])(?P<xsrft>.+?)(?P=q)',
webpage, 'login token', group='xsrft')
vuid = self._search_regex(
r'["\']vuid["\']\s*:\s*(["\'])(?P<vuid>.+?)\1',
webpage, 'vuid', group='vuid')
return xsrft, vuid
}, impersonate=True)
except ExtractorError as error:
if isinstance(error.cause, HTTPError) and error.cause.status == 418:
raise ExtractorError('Wrong password', expected=True)
raise
def _extract_vimeo_config(self, webpage, video_id, *args, **kwargs):
vimeo_config = self._search_regex(
@ -216,16 +212,6 @@ def _parse_config(self, config, video_id):
owner = video_data.get('owner') or {}
video_uploader_url = owner.get('url')
duration = int_or_none(video_data.get('duration'))
chapter_data = try_get(config, lambda x: x['embed']['chapters']) or []
chapters = [{
'title': current_chapter.get('title'),
'start_time': current_chapter.get('timecode'),
'end_time': next_chapter.get('timecode'),
} for current_chapter, next_chapter in zip(chapter_data, chapter_data[1:] + [{'timecode': duration}])]
if chapters and chapters[0]['start_time']: # Chapters may not start from 0
chapters[:0] = [{'title': '<Untitled>', 'start_time': 0, 'end_time': chapters[0]['start_time']}]
return {
'id': str_or_none(video_data.get('id')) or video_id,
'title': video_title,
@ -233,8 +219,12 @@ def _parse_config(self, config, video_id):
'uploader_id': video_uploader_url.split('/')[-1] if video_uploader_url else None,
'uploader_url': video_uploader_url,
'thumbnails': thumbnails,
'duration': duration,
'chapters': chapters or None,
'duration': int_or_none(video_data.get('duration')),
'chapters': sorted(traverse_obj(config, (
'embed', 'chapters', lambda _, v: int(v['timecode']) is not None, {
'title': ('title', {str}),
'start_time': ('timecode', {int_or_none}),
})), key=lambda c: c['start_time']) or None,
'formats': formats,
'subtitles': subtitles,
'live_status': live_status,
@ -712,6 +702,39 @@ class VimeoIE(VimeoBaseInfoExtractor):
'skip_download': True,
},
},
{
# chapters must be sorted, see: https://github.com/yt-dlp/yt-dlp/issues/5308
'url': 'https://player.vimeo.com/video/756714419',
'info_dict': {
'id': '756714419',
'ext': 'mp4',
'title': 'Dr Arielle Schwartz - Therapeutic yoga for optimum sleep',
'uploader': 'Alex Howard',
'uploader_id': 'user54729178',
'uploader_url': 'https://vimeo.com/user54729178',
'thumbnail': r're:https://i\.vimeocdn\.com/video/1520099929-[\da-f]+-d_1280',
'duration': 2636,
'chapters': [
{'start_time': 0, 'end_time': 10, 'title': '<Untitled Chapter 1>'},
{'start_time': 10, 'end_time': 106, 'title': 'Welcoming Dr Arielle Schwartz'},
{'start_time': 106, 'end_time': 305, 'title': 'What is therapeutic yoga?'},
{'start_time': 305, 'end_time': 594, 'title': 'Vagal toning practices'},
{'start_time': 594, 'end_time': 888, 'title': 'Trauma and difficulty letting go'},
{'start_time': 888, 'end_time': 1059, 'title': "Dr Schwartz' insomnia experience"},
{'start_time': 1059, 'end_time': 1471, 'title': 'A strategy for helping sleep issues'},
{'start_time': 1471, 'end_time': 1667, 'title': 'Yoga nidra'},
{'start_time': 1667, 'end_time': 2121, 'title': 'Wisdom in stillness'},
{'start_time': 2121, 'end_time': 2386, 'title': 'What helps us be more able to let go?'},
{'start_time': 2386, 'end_time': 2510, 'title': 'Practical tips to help ourselves'},
{'start_time': 2510, 'end_time': 2636, 'title': 'Where to find out more'},
],
},
'params': {
'http_headers': {'Referer': 'https://sleepsuperconference.com'},
'skip_download': 'm3u8',
},
'expected_warnings': ['Failed to parse XML: not well-formed'],
},
{
# user playlist alias -> https://vimeo.com/258705797
'url': 'https://vimeo.com/user26785108/newspiritualguide',
@ -745,21 +768,34 @@ def _verify_player_video_password(self, url, video_id, headers):
raise ExtractorError('Wrong video password', expected=True)
return checked
def _extract_from_api(self, video_id, unlisted_hash=None):
token = self._download_json(
'https://vimeo.com/_rv/jwt', video_id, headers={
'X-Requested-With': 'XMLHttpRequest',
})['token']
api_url = 'https://api.vimeo.com/videos/' + video_id
if unlisted_hash:
api_url += ':' + unlisted_hash
video = self._download_json(
api_url, video_id, headers={
'Authorization': 'jwt ' + token,
def _call_videos_api(self, video_id, jwt_token, unlisted_hash=None):
return self._download_json(
join_nonempty(f'https://api.vimeo.com/videos/{video_id}', unlisted_hash, delim=':'),
video_id, 'Downloading API JSON', headers={
'Authorization': f'jwt {jwt_token}',
'Accept': 'application/json',
}, query={
'fields': 'config_url,created_time,description,license,metadata.connections.comments.total,metadata.connections.likes.total,release_time,stats.plays',
})
def _extract_from_api(self, video_id, unlisted_hash=None):
viewer = self._download_json(
'https://vimeo.com/_next/viewer', video_id, 'Downloading viewer info')
for retry in (False, True):
try:
video = self._call_videos_api(video_id, viewer['jwt'], unlisted_hash)
except ExtractorError as e:
if (not retry and isinstance(e.cause, HTTPError) and e.cause.status == 400
and 'password' in traverse_obj(
e.cause.response.read(),
({bytes.decode}, {json.loads}, 'invalid_parameters', ..., 'field'),
)):
self._verify_video_password(
video_id, self._get_video_password(), viewer['xsrft'])
continue
raise
info = self._parse_config(self._download_json(
video['config_url'], video_id), video_id)
get_timestamp = lambda x: parse_iso8601(video.get(x + '_time'))
@ -865,12 +901,6 @@ def _real_extract(self, url):
redirect_url, video_id, headers)
return self._parse_config(config, video_id)
if re.search(r'<form[^>]+?id="pw_form"', webpage):
video_password = self._get_video_password()
token, vuid = self._extract_xsrft_and_vuid(webpage)
webpage = self._verify_video_password(
redirect_url, video_id, video_password, token, vuid)
vimeo_config = self._extract_vimeo_config(webpage, video_id, default=None)
if vimeo_config:
seed_status = vimeo_config.get('seed_status') or {}
@ -1237,7 +1267,7 @@ class VimeoGroupsIE(VimeoChannelIE): # XXX: Do not subclass from concrete IE
class VimeoReviewIE(VimeoBaseInfoExtractor):
IE_NAME = 'vimeo:review'
IE_DESC = 'Review pages on vimeo'
_VALID_URL = r'(?P<url>https://vimeo\.com/[^/]+/review/(?P<id>[^/]+)/[0-9a-f]{10})'
_VALID_URL = r'https?://vimeo\.com/(?P<user>[^/?#]+)/review/(?P<id>\d+)/(?P<hash>[\da-f]{10})'
_TESTS = [{
'url': 'https://vimeo.com/user21297594/review/75524534/3c257a1b5d',
'md5': 'c507a72f780cacc12b2248bb4006d253',
@ -1283,28 +1313,22 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
}]
def _real_extract(self, url):
page_url, video_id = self._match_valid_url(url).groups()
data = self._download_json(
page_url.replace('/review/', '/review/data/'), video_id)
user, video_id, review_hash = self._match_valid_url(url).group('user', 'id', 'hash')
data_url = f'https://vimeo.com/{user}/review/data/{video_id}/{review_hash}'
data = self._download_json(data_url, video_id)
if data.get('isLocked') is True:
video_password = self._get_video_password()
viewer = self._download_json(
'https://vimeo.com/_rv/viewer', video_id)
webpage = self._verify_video_password(
'https://vimeo.com/' + video_id, video_id,
video_password, viewer['xsrft'], viewer['vuid'])
clip_page_config = self._parse_json(self._search_regex(
r'window\.vimeo\.clip_page_config\s*=\s*({.+?});',
webpage, 'clip page config'), video_id)
config_url = clip_page_config['player']['config_url']
clip_data = clip_page_config.get('clip') or {}
else:
self._verify_video_password(video_id, video_password, viewer['xsrft'])
data = self._download_json(data_url, video_id)
clip_data = data['clipData']
config_url = clip_data['configUrl']
config = self._download_json(config_url, video_id)
info_dict = self._parse_config(config, video_id)
source_format = self._extract_original_format(
page_url + '/action', video_id)
f'https://vimeo.com/{user}/review/{video_id}/{review_hash}/action', video_id,
unlisted_hash=traverse_obj(config_url, ({parse_qs}, 'h', -1)))
if source_format:
info_dict['formats'].append(source_format)
info_dict['description'] = clean_html(clip_data.get('description'))

108
yt_dlp/extractor/vtv.py Normal file
View File

@ -0,0 +1,108 @@
from .common import InfoExtractor
from ..utils import extract_attributes, get_element_html_by_class, remove_start
class VTVGoIE(InfoExtractor):
_VALID_URL = [
r'https?://(?:www\.)?vtvgo\.vn/(kho-video|tin-tuc)/[\w.-]*?(?P<id>\d+)(?:\.[a-z]+|/)?(?:$|[?#])',
r'https?://(?:www\.)?vtvgo\.vn/digital/detail\.php\?(?:[^#]+&)?content_id=(?P<id>\d+)',
]
_TESTS = [{
'url': 'https://vtvgo.vn/kho-video/bep-vtv-vit-chao-rieng-so-24-888456.html',
'info_dict': {
'id': '888456',
'ext': 'mp4',
'title': 'Bếp VTV | Vịt chao riềng | Số 24',
'description': 'md5:2b4e93ec2b954304170d32be288ce2c8',
'thumbnail': 'https://vtvgo-images.vtvdigital.vn/images/20230201/VIT-CHAO-RIENG_VTV_638108894672812459.jpg',
},
}, {
'url': 'https://vtvgo.vn/tin-tuc/hot-search-1-zlife-khong-ngo-toi-phai-khong-862074',
'info_dict': {
'id': '862074',
'ext': 'mp4',
'title': 'Hot Search #1 | Zlife | Không ngờ tới phải không? ',
'description': 'md5:e967d0e2efbbebbee8814a55799b4d0f',
'thumbnail': 'https://vtvgo-images.vtvdigital.vn/images/20220504/6b9a8552-e71c-46ce-bc9d-50c9bb506f9c.jpeg',
},
}, {
'url': 'https://vtvgo.vn/kho-video/918311.html',
'info_dict': {
'id': '918311',
'title': 'Cà phê sáng | 05/02/2024 | Tái hiện hình ảnh Hà Nội xưa tại ngôi nhà di sản',
'ext': 'mp4',
'thumbnail': 'https://vtvgo-images.vtvdigital.vn/images/20240205/0506_ca_phe_sang_638427226021318322.jpg',
'description': 'md5:b121c67948f1ce58e6a036042fc14c1b',
},
}, {
'url': 'https://vtvgo.vn/digital/detail.php?digital_id=168&content_id=918634',
'info_dict': {
'id': '918634',
'ext': 'mp4',
'title': 'Gặp nhau cuối năm | Táo quân 2024',
'description': 'md5:a1c221e78e5954d29d49b2a11c20513c',
'thumbnail': 'https://vtvgo-images.vtvdigital.vn/images/20240210/d0f73369-8f03-4108-9edd-83d4bc3997b2.png',
},
}, {
'url': 'https://vtvgo.vn/digital/detail.php?content_id=919358',
'info_dict': {
'id': '919358',
'ext': 'mp4',
'title': 'Chúng ta của 8 năm sau | Tập 45 | Dương có bằng chứng, nhân chứng vạch mặt ông Khiêm',
'description': 'md5:16ff5208cac6585137f554472a4677f3',
'thumbnail': 'https://vtvgo-images.vtvdigital.vn/images/20240221/550deff9-7736-4a0e-8b5d-33274d97cd7d.jpg',
},
}, {
'url': 'https://vtvgo.vn/kho-video/888456',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
m3u8_url = self._search_regex(
r'(?:var\s+link\s*=\s*|addPlayer\()["\'](https://[^"\']+/index\.m3u8)["\']', webpage, 'm3u8 url')
return {
'id': video_id,
'title': self._og_search_title(webpage, default=None),
'description': self._og_search_description(webpage, default=None),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'formats': self._extract_m3u8_formats(m3u8_url, video_id, 'mp4'),
}
class VTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?vtv\.vn/video/[\w-]*?(?P<id>\d+)\.htm'
_TESTS = [{
'url': 'https://vtv.vn/video/thoi-su-20h-vtv1-12-6-2024-680411.htm',
'info_dict': {
'id': '680411',
'ext': 'mp4',
'title': 'Thời sự 20h VTV1 - 12/6/2024 - Video đã phát trên VTV1 | VTV.VN',
'thumbnail': 'https://cdn-images.vtv.vn/zoom/600_315/66349b6076cb4dee98746cf1/2024/06/12/thumb/1206-ts-20h-02929741475480320806760.mp4/thumb0.jpg',
},
}, {
'url': 'https://vtv.vn/video/zlife-1-khong-ngo-toi-phai-khong-vtv24-560248.htm',
'info_dict': {
'id': '560248',
'ext': 'mp4',
'title': 'ZLife #1: Không ngờ tới phải không? | VTV24 - Video đã phát trên VTV-NEWS | VTV.VN',
'description': 'Ai đứng sau vụ việc thay đổi ảnh đại diện trên các trang mạng xã hội của VTV Digital tối 2/5?',
'thumbnail': 'https://video-thumbs.mediacdn.vn/zoom/600_315/vtv/2022/5/13/t67s6btf3ji-16524555726231894427334.jpg',
},
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
data_vid = extract_attributes(get_element_html_by_class(
'VCSortableInPreviewMode', get_element_html_by_class(
'video-highlight-box', webpage)))['data-vid']
m3u8_url = f'https://cdn-videos.vtv.vn/{remove_start(data_vid, "vtv.mediacdn.vn/")}/master.m3u8'
return {
'id': video_id,
'title': self._og_search_title(webpage, default=None),
'description': self._og_search_description(webpage, default=None),
'thumbnail': self._og_search_thumbnail(webpage, default=None),
'formats': self._extract_m3u8_formats(m3u8_url, video_id, 'mp4'),
}

View File

@ -11,6 +11,7 @@
class YleAreenaIE(InfoExtractor):
_VALID_URL = r'https?://areena\.yle\.fi/(?P<id>[\d-]+)'
_GEO_COUNTRIES = ['FI']
_TESTS = [
{
'url': 'https://areena.yle.fi/1-4371942',
@ -19,7 +20,7 @@ class YleAreenaIE(InfoExtractor):
'id': '0_a3tjk92c',
'ext': 'mp4',
'title': 'Pouchit',
'description': 'md5:d487309c3abbe5650265bbd1742d2f82',
'description': 'md5:01071d7056ceec375f63960f90c35366',
'series': 'Modernit miehet',
'season': 'Season 1',
'season_number': 1,
@ -87,8 +88,8 @@ def _real_extract(self, url):
})
# Example title: 'K1, J2: Pouchit | Modernit miehet'
series, season_number, episode_number, episode = self._search_regex(
r'K(?P<season_no>[\d]+),\s*J(?P<episode_no>[\d]+):?\s*\b(?P<episode>[^|]+)\s*|\s*(?P<series>.+)',
season_number, episode_number, episode, series = self._search_regex(
r'K(?P<season_no>\d+),\s*J(?P<episode_no>\d+):?\s*\b(?P<episode>[^|]+)\s*|\s*(?P<series>.+)',
info.get('title') or '', 'episode metadata', group=('season_no', 'episode_no', 'episode', 'series'),
default=(None, None, None, None))
description = traverse_obj(video_data, ('data', 'ongoing_ondemand', 'description', 'fin'), expected_type=str)
@ -110,10 +111,12 @@ def _real_extract(self, url):
'ie_key': KalturaIE.ie_key(),
}
else:
formats, subs = self._extract_m3u8_formats_and_subtitles(
video_data['data']['ongoing_ondemand']['manifest_url'], video_id, 'mp4', m3u8_id='hls')
self._merge_subtitles(subs, target=subtitles)
info_dict = {
'id': video_id,
'formats': self._extract_m3u8_formats(
video_data['data']['ongoing_ondemand']['manifest_url'], video_id, 'mp4', m3u8_id='hls'),
'formats': formats,
}
return {
@ -129,6 +132,6 @@ def _real_extract(self, url):
or int_or_none(episode_number)),
'thumbnails': traverse_obj(info, ('thumbnails', ..., {'url': 'url'})),
'age_limit': traverse_obj(video_data, ('data', 'ongoing_ondemand', 'content_rating', 'age_restriction'), expected_type=int_or_none),
'subtitles': subtitles,
'subtitles': subtitles or None,
'release_date': unified_strdate(traverse_obj(video_data, ('data', 'ongoing_ondemand', 'start_time'), expected_type=str)),
}

View File

@ -136,7 +136,7 @@ def _real_extract(self, url):
# request basic data
basic_data_params = {
'vid': video_id,
'ccode': '0524',
'ccode': '0564',
'client_ip': '192.168.1.1',
'utid': cna,
'client_ts': time.time() / 1000,

View File

@ -72,133 +72,169 @@
# any clients starting with _ cannot be explicitly requested by the user
INNERTUBE_CLIENTS = {
'web': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB',
'clientVersion': '2.20220801.00.00',
'clientVersion': '2.20240726.00.00',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
},
# Safari UA returns pre-merged video+audio 144p/240p/360p/720p/1080p HLS formats
'web_safari': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB',
'clientVersion': '2.20240726.00.00',
'userAgent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.5 Safari/605.1.15,gzip(gfe)',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 1,
},
'web_embedded': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB_EMBEDDED_PLAYER',
'clientVersion': '1.20220731.00.00',
'clientVersion': '1.20240723.01.00',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 56,
},
'web_music': {
'INNERTUBE_API_KEY': 'AIzaSyC9XL3ZjWddXya6X74dJoCTL-WEYFDNX30',
'INNERTUBE_HOST': 'music.youtube.com',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB_REMIX',
'clientVersion': '1.20220727.01.00',
'clientVersion': '1.20240724.00.00',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 67,
},
'web_creator': {
'INNERTUBE_API_KEY': 'AIzaSyBUPetSUmoZL-OhlxA7wSac5XinrygCqMo',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'WEB_CREATOR',
'clientVersion': '1.20220726.00.00',
'clientVersion': '1.20240723.03.00',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 62,
},
'android': {
'INNERTUBE_API_KEY': 'AIzaSyA8eiZmM1FaDVjRy-df2KTyQ_vz_yYM39w',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID',
'clientVersion': '19.09.37',
'clientVersion': '19.29.37',
'androidSdkVersion': 30,
'userAgent': 'com.google.android.youtube/19.09.37 (Linux; U; Android 11) gzip',
'userAgent': 'com.google.android.youtube/19.29.37 (Linux; U; Android 11) gzip',
'osName': 'Android',
'osVersion': '11',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 3,
'REQUIRE_JS_PLAYER': False,
},
'android_embedded': {
'INNERTUBE_API_KEY': 'AIzaSyCjc_pVEDi4qsv5MtC2dMXzpIaDoRFLsxw',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_EMBEDDED_PLAYER',
'clientVersion': '19.09.37',
'androidSdkVersion': 30,
'userAgent': 'com.google.android.youtube/19.09.37 (Linux; U; Android 11) gzip',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 55,
'REQUIRE_JS_PLAYER': False,
},
'android_music': {
'INNERTUBE_API_KEY': 'AIzaSyAOghZGza2MQSZkY_zfZ370N-PUdXEo8AI',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_MUSIC',
'clientVersion': '6.42.52',
'clientVersion': '7.11.50',
'androidSdkVersion': 30,
'userAgent': 'com.google.android.apps.youtube.music/6.42.52 (Linux; U; Android 11) gzip',
'userAgent': 'com.google.android.apps.youtube.music/7.11.50 (Linux; U; Android 11) gzip',
'osName': 'Android',
'osVersion': '11',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 21,
'REQUIRE_JS_PLAYER': False,
},
'android_creator': {
'INNERTUBE_API_KEY': 'AIzaSyD_qjV8zaaUMehtLkrKFgVeSX_Iqbtyws8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_CREATOR',
'clientVersion': '22.30.100',
'clientVersion': '24.30.100',
'androidSdkVersion': 30,
'userAgent': 'com.google.android.apps.youtube.creator/22.30.100 (Linux; U; Android 11) gzip',
'userAgent': 'com.google.android.apps.youtube.creator/24.30.100 (Linux; U; Android 11) gzip',
'osName': 'Android',
'osVersion': '11',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 14,
'REQUIRE_JS_PLAYER': False,
},
# YouTube Kids videos aren't returned on this client for some reason
'android_vr': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_VR',
'clientVersion': '1.57.29',
'deviceMake': 'Oculus',
'deviceModel': 'Quest 3',
'androidSdkVersion': 32,
'userAgent': 'com.google.android.apps.youtube.vr.oculus/1.57.29 (Linux; U; Android 12L; eureka-user Build/SQ3A.220605.009.A1) gzip',
'osName': 'Android',
'osVersion': '12L',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 28,
'REQUIRE_JS_PLAYER': False,
},
'android_testsuite': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_TESTSUITE',
'clientVersion': '1.9',
'androidSdkVersion': 30,
'userAgent': 'com.google.android.youtube/1.9 (Linux; U; Android 11) gzip',
'osName': 'Android',
'osVersion': '11',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 30,
'REQUIRE_JS_PLAYER': False,
'PLAYER_PARAMS': '2AMB',
},
# This client only has legacy formats and storyboards
'android_producer': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'ANDROID_PRODUCER',
'clientVersion': '0.111.1',
'androidSdkVersion': 30,
'userAgent': 'com.google.android.apps.youtube.producer/0.111.1 (Linux; U; Android 11) gzip',
'osName': 'Android',
'osVersion': '11',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 91,
'REQUIRE_JS_PLAYER': False,
},
# iOS clients have HLS live streams. Setting device model to get 60fps formats.
# See: https://github.com/TeamNewPipe/NewPipeExtractor/issues/680#issuecomment-1002724558
'ios': {
'INNERTUBE_API_KEY': 'AIzaSyB-63vPrdThhKuerbB2N_l7Kwwcxj6yUAc',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'IOS',
'clientVersion': '19.09.3',
'deviceModel': 'iPhone14,3',
'userAgent': 'com.google.ios.youtube/19.09.3 (iPhone14,3; U; CPU iOS 15_6 like Mac OS X)',
'clientVersion': '19.29.1',
'deviceMake': 'Apple',
'deviceModel': 'iPhone16,2',
'userAgent': 'com.google.ios.youtube/19.29.1 (iPhone16,2; U; CPU iOS 17_5_1 like Mac OS X;)',
'osName': 'iPhone',
'osVersion': '17.5.1.21F90',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 5,
'REQUIRE_JS_PLAYER': False,
},
'ios_embedded': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'IOS_MESSAGES_EXTENSION',
'clientVersion': '19.09.3',
'deviceModel': 'iPhone14,3',
'userAgent': 'com.google.ios.youtube/19.09.3 (iPhone14,3; U; CPU iOS 15_6 like Mac OS X)',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 66,
'REQUIRE_JS_PLAYER': False,
},
'ios_music': {
'INNERTUBE_API_KEY': 'AIzaSyBAETezhkwP0ZWA02RsqT1zu78Fpt0bC_s',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'IOS_MUSIC',
'clientVersion': '6.33.3',
'deviceModel': 'iPhone14,3',
'userAgent': 'com.google.ios.youtubemusic/6.33.3 (iPhone14,3; U; CPU iOS 15_6 like Mac OS X)',
'clientVersion': '7.08.2',
'deviceMake': 'Apple',
'deviceModel': 'iPhone16,2',
'userAgent': 'com.google.ios.youtubemusic/7.08.2 (iPhone16,2; U; CPU iOS 17_5_1 like Mac OS X;)',
'osName': 'iPhone',
'osVersion': '17.5.1.21F90',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 26,
@ -208,9 +244,12 @@
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'IOS_CREATOR',
'clientVersion': '22.33.101',
'deviceModel': 'iPhone14,3',
'userAgent': 'com.google.ios.ytcreator/22.33.101 (iPhone14,3; U; CPU iOS 15_6 like Mac OS X)',
'clientVersion': '24.30.100',
'deviceMake': 'Apple',
'deviceModel': 'iPhone16,2',
'userAgent': 'com.google.ios.ytcreator/24.30.100 (iPhone16,2; U; CPU iOS 17_5_1 like Mac OS X;)',
'osName': 'iPhone',
'osVersion': '17.5.1.21F90',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 15,
@ -219,19 +258,26 @@
# mweb has 'ultralow' formats
# See: https://github.com/yt-dlp/yt-dlp/pull/557
'mweb': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'MWEB',
'clientVersion': '2.20220801.00.00',
'clientVersion': '2.20240726.01.00',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 2,
},
'tv': {
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'TVHTML5',
'clientVersion': '7.20240724.13.00',
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 7,
},
# This client can access age restricted videos (unless the uploader has disabled the 'allow embedding' option)
# See: https://github.com/zerodytrash/YouTube-Internal-Clients
'tv_embedded': {
'INNERTUBE_API_KEY': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
'INNERTUBE_CONTEXT': {
'client': {
'clientName': 'TVHTML5_SIMPLY_EMBEDDED_PLAYER',
@ -249,6 +295,7 @@
},
},
'INNERTUBE_CONTEXT_CLIENT_NAME': 95,
'REQUIRE_JS_PLAYER': False,
},
}
@ -262,7 +309,7 @@ def _split_innertube_client(client_name):
def short_client_name(client_name):
main, *parts = _split_innertube_client(client_name)[0].replace('embedscreen', 'e_s').split('_')
main, *parts = _split_innertube_client(client_name)[0].split('_')
return join_nonempty(main[:4], ''.join(x[0] for x in parts)).upper()
@ -270,27 +317,22 @@ def build_innertube_clients():
THIRD_PARTY = {
'embedUrl': 'https://www.youtube.com/', # Can be any valid URL
}
BASE_CLIENTS = ('ios', 'android', 'web', 'tv', 'mweb')
BASE_CLIENTS = ('ios', 'web', 'tv', 'mweb', 'android')
priority = qualities(BASE_CLIENTS[::-1])
for client, ytcfg in tuple(INNERTUBE_CLIENTS.items()):
ytcfg.setdefault('INNERTUBE_API_KEY', 'AIzaSyDCU8hByM-4DrUqRUYnGn-3llEO78bcxq8')
ytcfg.setdefault('INNERTUBE_HOST', 'www.youtube.com')
ytcfg.setdefault('REQUIRE_JS_PLAYER', True)
ytcfg.setdefault('PLAYER_PARAMS', None)
ytcfg['INNERTUBE_CONTEXT']['client'].setdefault('hl', 'en')
_, base_client, variant = _split_innertube_client(client)
ytcfg['priority'] = 10 * priority(base_client)
if not variant:
INNERTUBE_CLIENTS[f'{client}_embedscreen'] = embedscreen = copy.deepcopy(ytcfg)
embedscreen['INNERTUBE_CONTEXT']['client']['clientScreen'] = 'EMBED'
embedscreen['INNERTUBE_CONTEXT']['thirdParty'] = THIRD_PARTY
embedscreen['priority'] -= 3
elif variant == 'embedded':
if variant == 'embedded':
ytcfg['INNERTUBE_CONTEXT']['thirdParty'] = THIRD_PARTY
ytcfg['priority'] -= 2
else:
elif variant:
ytcfg['priority'] -= 3
@ -566,9 +608,6 @@ def _select_api_hostname(self, req_api_hostname, default_client=None):
return (self._configuration_arg('innertube_host', [''], ie_key=YoutubeIE.ie_key())[0]
or req_api_hostname or self._get_innertube_host(default_client or 'web'))
def _extract_api_key(self, ytcfg=None, default_client='web'):
return self._ytcfg_get_safe(ytcfg, lambda x: x['INNERTUBE_API_KEY'], str, default_client)
def _extract_context(self, ytcfg=None, default_client='web'):
context = get_first(
(ytcfg, self._get_default_ytcfg(default_client)), 'INNERTUBE_CONTEXT', expected_type=dict)
@ -614,13 +653,15 @@ def _call_api(self, ep, query, video_id, fatal=True, headers=None,
real_headers.update({'content-type': 'application/json'})
if headers:
real_headers.update(headers)
api_key = (self._configuration_arg('innertube_key', [''], ie_key=YoutubeIE.ie_key(), casesense=True)[0]
or api_key or self._extract_api_key(default_client=default_client))
return self._download_json(
f'https://{self._select_api_hostname(api_hostname, default_client)}/youtubei/v1/{ep}',
video_id=video_id, fatal=fatal, note=note, errnote=errnote,
data=json.dumps(data).encode('utf8'), headers=real_headers,
query={'key': api_key, 'prettyPrint': 'false'})
query=filter_dict({
'key': self._configuration_arg(
'innertube_key', [api_key], ie_key=YoutubeIE.ie_key(), casesense=True)[0],
'prettyPrint': 'false',
}, cndn=lambda _, v: v))
def extract_yt_initial_data(self, item_id, webpage, fatal=True):
return self._search_json(self._YT_INITIAL_DATA_RE, webpage, 'yt initial data', item_id, fatal=fatal)
@ -972,7 +1013,6 @@ def _extract_response(self, item_id, query, note='Downloading API JSON', headers
ep=ep, fatal=True, headers=headers,
video_id=item_id, query=query, note=note,
context=self._extract_context(ytcfg, default_client),
api_key=self._extract_api_key(ytcfg, default_client),
api_hostname=api_hostname, default_client=default_client)
except ExtractorError as e:
if not isinstance(e.cause, network_exceptions):
@ -1294,6 +1334,12 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'401': {'ext': 'mp4', 'height': 2160, 'format_note': 'DASH video', 'vcodec': 'av01.0.12M.08'},
}
_SUBTITLE_FORMATS = ('json3', 'srv1', 'srv2', 'srv3', 'ttml', 'vtt')
_POTOKEN_EXPERIMENTS = ('51217476', '51217102')
_BROKEN_CLIENTS = {
short_client_name(client): client
for client in ('android', 'android_creator', 'android_music')
}
_DEFAULT_CLIENTS = ('ios', 'web_creator')
_GEO_BYPASS = False
@ -3128,11 +3174,42 @@ def _decrypt_nsig(self, s, video_id, player_url):
self.write_debug(f'Decrypted nsig {s} => {ret}')
return ret
def _extract_n_function_name(self, jscode):
def _extract_n_function_name(self, jscode, player_url=None):
# Examples (with placeholders nfunc, narray, idx):
# * .get("n"))&&(b=nfunc(b)
# * .get("n"))&&(b=narray[idx](b)
# * b=String.fromCharCode(110),c=a.get(b))&&c=narray[idx](c)
# * a.D&&(b="nn"[+a.D],c=a.get(b))&&(c=narray[idx](c),a.set(b,c),narray.length||nfunc("")
# * a.D&&(PL(a),b=a.j.n||null)&&(b=narray[0](b),a.set("n",b),narray.length||nfunc("")
# * a.D&&(b="nn"[+a.D],vL(a),c=a.j[b]||null)&&(c=narray[idx](c),a.set(b,c),narray.length||nfunc("")
funcname, idx = self._search_regex(
r'\.get\("n"\)\)&&\(b=(?P<nfunc>[a-zA-Z0-9$]+)(?:\[(?P<idx>\d+)\])?\([a-zA-Z0-9]\)',
jscode, 'Initial JS player n function name', group=('nfunc', 'idx'))
if not idx:
r'''(?x)
(?:
\.get\("n"\)\)&&\(b=|
(?:
b=String\.fromCharCode\(110\)|
(?P<str_idx>[a-zA-Z0-9_$.]+)&&\(b="nn"\[\+(?P=str_idx)\]
)
(?:
,[a-zA-Z0-9_$]+\(a\))?,c=a\.
(?:
get\(b\)|
[a-zA-Z0-9_$]+\[b\]\|\|null
)\)&&\(c=|
\b(?P<var>[a-zA-Z0-9_$]+)=
)(?P<nfunc>[a-zA-Z0-9_$]+)(?:\[(?P<idx>\d+)\])?\([a-zA-Z]\)
(?(var),[a-zA-Z0-9_$]+\.set\("n"\,(?P=var)\),(?P=nfunc)\.length)''',
jscode, 'n function name', group=('nfunc', 'idx'), default=(None, None))
if not funcname:
self.report_warning(join_nonempty(
'Falling back to generic n function search',
player_url and f' player = {player_url}', delim='\n'))
return self._search_regex(
r'''(?xs)
;\s*(?P<name>[a-zA-Z0-9_$]+)\s*=\s*function\([a-zA-Z0-9_$]+\)
\s*\{(?:(?!};).)+?["']enhanced_except_''',
jscode, 'Initial JS player n function name', group='name')
elif not idx:
return funcname
return json.loads(js_to_json(self._search_regex(
@ -3141,25 +3218,15 @@ def _extract_n_function_name(self, jscode):
def _extract_n_function_code(self, video_id, player_url):
player_id = self._extract_player_info(player_url)
func_code = self.cache.load('youtube-nsig', player_id, min_ver='2022.09.1')
func_code = self.cache.load('youtube-nsig', player_id, min_ver='2024.07.09')
jscode = func_code or self._load_player(video_id, player_url)
jsi = JSInterpreter(jscode)
if func_code:
return jsi, player_id, func_code
func_name = self._extract_n_function_name(jscode)
func_name = self._extract_n_function_name(jscode, player_url=player_url)
# For redundancy
func_code = self._search_regex(
rf'''(?xs){func_name}\s*=\s*function\s*\((?P<var>[\w$]+)\)\s*
# NB: The end of the regex is intentionally kept strict
{{(?P<code>.+?}}\s*return\ [\w$]+.join\(""\))}};''',
jscode, 'nsig function', group=('var', 'code'), default=None)
if func_code:
func_code = ([func_code[0]], func_code[1])
else:
self.write_debug('Extracting nsig function with jsinterp')
func_code = jsi.extract_function_code(func_name)
self.cache.store('youtube-nsig', player_id, func_code)
@ -3662,9 +3729,10 @@ def _extract_player_response(self, client, video_id, master_ytcfg, player_ytcfg,
'videoId': video_id,
}
pp_arg = self._configuration_arg('player_params', [None], casesense=True)[0]
if pp_arg:
yt_query['params'] = pp_arg
default_pp = traverse_obj(
INNERTUBE_CLIENTS, (_split_innertube_client(client)[0], 'PLAYER_PARAMS', {str}))
if player_params := self._configuration_arg('player_params', [default_pp], casesense=True)[0]:
yt_query['params'] = player_params
yt_query.update(self._generate_player_context(sts))
return self._extract_response(
@ -3676,30 +3744,40 @@ def _extract_player_response(self, client, video_id, master_ytcfg, player_ytcfg,
def _get_requested_clients(self, url, smuggled_data):
requested_clients = []
android_clients = []
default = ['ios', 'web']
broken_clients = []
excluded_clients = []
allowed_clients = sorted(
(client for client in INNERTUBE_CLIENTS if client[:1] != '_'),
key=lambda client: INNERTUBE_CLIENTS[client]['priority'], reverse=True)
for client in self._configuration_arg('player_client'):
if client == 'default':
requested_clients.extend(default)
requested_clients.extend(self._DEFAULT_CLIENTS)
elif client == 'all':
requested_clients.extend(allowed_clients)
elif client.startswith('-'):
excluded_clients.append(client[1:])
elif client not in allowed_clients:
self.report_warning(f'Skipping unsupported client {client}')
elif client.startswith('android'):
android_clients.append(client)
self.report_warning(f'Skipping unsupported client "{client}"')
elif client in self._BROKEN_CLIENTS.values():
broken_clients.append(client)
else:
requested_clients.append(client)
# Force deprioritization of broken Android clients for format de-duplication
requested_clients.extend(android_clients)
# Force deprioritization of _BROKEN_CLIENTS for format de-duplication
requested_clients.extend(broken_clients)
if not requested_clients:
requested_clients = default
requested_clients.extend(self._DEFAULT_CLIENTS)
for excluded_client in excluded_clients:
if excluded_client in requested_clients:
requested_clients.remove(excluded_client)
if not requested_clients:
raise ExtractorError('No player clients have been requested', expected=True)
if smuggled_data.get('is_music_url') or self.is_music_url(url):
requested_clients.extend(
f'{client}_music' for client in requested_clients if f'{client}_music' in INNERTUBE_CLIENTS)
for requested_client in requested_clients:
_, base_client, variant = _split_innertube_client(requested_client)
music_client = f'{base_client}_music'
if variant != 'music' and music_client in INNERTUBE_CLIENTS:
requested_clients.append(music_client)
return orderedSet(requested_clients)
@ -3710,8 +3788,15 @@ def _invalid_player_response(self, pr, video_id):
return pr_id
def _extract_player_responses(self, clients, video_id, webpage, master_ytcfg, smuggled_data):
initial_pr = None
initial_pr = ignore_initial_response = None
if webpage:
if 'web' in clients:
experiments = traverse_obj(master_ytcfg, (
'WEB_PLAYER_CONTEXT_CONFIGS', ..., 'serializedExperimentIds', {lambda x: x.split(',')}, ...))
if all(x in experiments for x in self._POTOKEN_EXPERIMENTS):
self.report_warning(
'Webpage contains broken formats (poToken experiment detected). Ignoring initial player response')
ignore_initial_response = True
initial_pr = self._search_json(
self._YT_INITIAL_PLAYER_RESPONSE_RE, webpage, 'initial player response', video_id, fatal=False)
@ -3741,8 +3826,10 @@ def append_client(*client_names):
skipped_clients = {}
while clients:
client, base_client, variant = _split_innertube_client(clients.pop())
player_ytcfg = master_ytcfg if client == 'web' else {}
if 'configs' not in self._configuration_arg('player_skip') and client != 'web':
player_ytcfg = {}
if client == 'web':
player_ytcfg = self._get_default_ytcfg() if ignore_initial_response else master_ytcfg
elif 'configs' not in self._configuration_arg('player_skip'):
player_ytcfg = self._download_ytcfg(client, video_id) or player_ytcfg
player_url = player_url or self._extract_player_url(master_ytcfg, player_ytcfg, webpage=webpage)
@ -3755,11 +3842,22 @@ def append_client(*client_names):
player_url = self._download_player_url(video_id)
tried_iframe_fallback = True
pr = initial_pr if client == 'web' and not ignore_initial_response else None
for retry in self.RetryManager(fatal=False):
try:
pr = initial_pr if client == 'web' and initial_pr else self._extract_player_response(
client, video_id, player_ytcfg or master_ytcfg, player_ytcfg, player_url if require_js_player else None, initial_pr, smuggled_data)
pr = pr or self._extract_player_response(
client, video_id, player_ytcfg or master_ytcfg, player_ytcfg,
player_url if require_js_player else None, initial_pr, smuggled_data)
except ExtractorError as e:
self.report_warning(e)
break
experiments = traverse_obj(pr, (
'responseContext', 'serviceTrackingParams', lambda _, v: v['service'] == 'GFEEDBACK',
'params', lambda _, v: v['key'] == 'e', 'value', {lambda x: x.split(',')}, ...))
if all(x in experiments for x in self._POTOKEN_EXPERIMENTS):
pr = None
retry.error = ExtractorError('API returned broken formats (poToken experiment detected)', expected=True)
if not pr:
continue
if pr_id := self._invalid_player_response(pr, video_id):
@ -3773,14 +3871,27 @@ def append_client(*client_names):
f[STREAMING_DATA_CLIENT_NAME] = name
prs.append(pr)
# creator clients can bypass AGE_VERIFICATION_REQUIRED if logged in
if variant == 'embedded' and self._is_unplayable(pr) and self.is_authenticated:
append_client(f'{base_client}_creator')
elif self._is_agegated(pr):
if variant == 'tv_embedded':
append_client(f'{base_client}_embedded')
elif not variant:
append_client(f'tv_embedded.{base_client}', f'{base_client}_embedded')
# tv_embedded can work around age-gate and age-verification IF the video is embeddable
if self._is_agegated(pr) and variant != 'tv_embedded':
append_client(f'tv_embedded.{base_client}')
# Unauthenticated users will only get tv_embedded client formats if age-gated
if self._is_agegated(pr) and not self.is_authenticated:
self.to_screen(
f'{video_id}: This video is age-restricted; some formats may be missing '
f'without authentication. {self._login_hint()}', only_once=True)
# EU countries require age-verification for accounts to access age-restricted videos
# If account is not age-verified, _is_agegated() will be truthy for non-embedded clients
# If embedding is disabled for the video, _is_unplayable() will be truthy for tv_embedded
embedding_is_disabled = variant == 'tv_embedded' and self._is_unplayable(pr)
if self.is_authenticated and (self._is_agegated(pr) or embedding_is_disabled):
self.to_screen(
f'{video_id}: This video is age-restricted and YouTube is requiring '
'account age-verification; some formats may be missing', only_once=True)
# web_creator and mediaconnect can work around the age-verification requirement
# _producer, _testsuite, & _vr variants can also work around age-verification
append_client('web_creator', 'mediaconnect')
if skipped_clients:
self.report_warning(
@ -3916,13 +4027,13 @@ def build_fragments(f):
f'{video_id}: Some formats are possibly damaged. They will be deprioritized', only_once=True)
client_name = fmt.get(STREAMING_DATA_CLIENT_NAME)
# Android client formats are broken due to integrity check enforcement
# _BROKEN_CLIENTS return videoplayback URLs that expire after 30 seconds
# Ref: https://github.com/yt-dlp/yt-dlp/issues/9554
is_broken = client_name and client_name.startswith(short_client_name('android'))
is_broken = client_name in self._BROKEN_CLIENTS
if is_broken:
self.report_warning(
f'{video_id}: Android client formats are broken and may yield HTTP Error 403. '
'They will be deprioritized', only_once=True)
f'{video_id}: {self._BROKEN_CLIENTS[client_name]} client formats are broken '
'and may yield HTTP Error 403. They will be deprioritized', only_once=True)
name = fmt.get('qualityLabel') or quality.replace('audio_quality_', '') or ''
fps = int_or_none(fmt.get('fps')) or 0

View File

@ -66,7 +66,9 @@ def _real_extract(self, url):
stream_meta['stream-access']['video_source'], video_id,
'Downloading player page', headers={'referer': 'https://zaiko.io/'})
player_meta = self._parse_vue_element_attr('player', player_page, video_id)
status = traverse_obj(player_meta, ('initial_event_info', 'status', {str}))
initial_event_info = traverse_obj(player_meta, ('initial_event_info', {dict})) or {}
status = traverse_obj(initial_event_info, ('status', {str}))
live_status, msg, expected = {
'vod': ('was_live', 'No VOD stream URL was found', False),
'archiving': ('post_live', 'Event VOD is still being processed', True),
@ -80,14 +82,20 @@ def _real_extract(self, url):
'cancelled': ('not_live', 'Event has been cancelled', True),
}.get(status) or ('not_live', f'Unknown event status "{status}"', False)
stream_url = traverse_obj(player_meta, ('initial_event_info', 'endpoint', {url_or_none}))
if traverse_obj(initial_event_info, ('is_jwt_protected', {bool})):
stream_url = self._download_json(
initial_event_info['jwt_token_url'], video_id, 'Downloading JWT-protected stream URL',
'Failed to download JWT-protected stream URL')['playback_url']
else:
stream_url = traverse_obj(initial_event_info, ('endpoint', {url_or_none}))
formats = self._extract_m3u8_formats(
stream_url, video_id, live=True, fatal=False) if stream_url else []
if not formats:
self.raise_no_formats(msg, expected=expected)
thumbnail_urls = [
traverse_obj(player_meta, ('initial_event_info', 'poster_url')),
traverse_obj(initial_event_info, ('poster_url', {url_or_none})),
self._og_search_thumbnail(self._download_webpage(
f'https://zaiko.io/event/{video_id}', video_id, 'Downloading event page', fatal=False) or ''),
]
@ -103,9 +111,7 @@ def _real_extract(self, url):
'release_timestamp': ('stream', 'start', 'timestamp', {int_or_none}),
'categories': ('event', 'genres', ..., {lambda x: x or None}),
}),
**traverse_obj(player_meta, ('initial_event_info', {
'alt_title': ('title', {str}),
})),
'alt_title': traverse_obj(initial_event_info, ('title', {str})),
'thumbnails': [{'url': url, 'id': url_basename(url)} for url in thumbnail_urls if url_or_none(url)],
}

View File

@ -636,6 +636,8 @@ def assertion(cndn, msg):
raise self.Exception(f'{member} {msg}', expr)
def eval_method():
nonlocal member
if (variable, member) == ('console', 'debug'):
if Debugger.ENABLED:
Debugger.write(self.interpret_expression(f'[{arg_str}]', local_vars, allow_recursion))
@ -644,6 +646,7 @@ def eval_method():
types = {
'String': str,
'Math': float,
'Array': list,
}
obj = local_vars.get(variable, types.get(variable, NO_DEFAULT))
if obj is NO_DEFAULT:
@ -667,6 +670,21 @@ def eval_method():
self.interpret_expression(v, local_vars, allow_recursion)
for v in self._separate(arg_str)]
# Fixup prototype call
if isinstance(obj, type) and member.startswith('prototype.'):
new_member, _, func_prototype = member.partition('.')[2].partition('.')
assertion(argvals, 'takes one or more arguments')
assertion(isinstance(argvals[0], obj), f'needs binding to type {obj}')
if func_prototype == 'call':
obj, *argvals = argvals
elif func_prototype == 'apply':
assertion(len(argvals) == 2, 'takes two arguments')
obj, argvals = argvals
assertion(isinstance(argvals, list), 'second argument needs to be a list')
else:
raise self.Exception(f'Unsupported Function method {func_prototype}', expr)
member = new_member
if obj is str:
if member == 'fromCharCode':
assertion(argvals, 'takes one or more arguments')
@ -691,9 +709,9 @@ def eval_method():
obj.reverse()
return obj
elif member == 'slice':
assertion(isinstance(obj, list), 'must be applied on a list')
assertion(len(argvals) == 1, 'takes exactly one argument')
return obj[argvals[0]:]
assertion(isinstance(obj, (list, str)), 'must be applied on a list or string')
assertion(len(argvals) <= 2, 'takes between 0 and 2 arguments')
return obj[slice(*argvals, None)]
elif member == 'splice':
assertion(isinstance(obj, list), 'must be applied on a list')
assertion(argvals, 'takes one or more arguments')

View File

@ -2,6 +2,7 @@
import io
import math
import re
import urllib.parse
from ._helper import InstanceStoreMixin, select_proxy
@ -27,11 +28,12 @@
if curl_cffi is None:
raise ImportError('curl_cffi is not installed')
curl_cffi_version = tuple(int_or_none(x, default=0) for x in curl_cffi.__version__.split('.'))
if curl_cffi_version != (0, 5, 10):
curl_cffi_version = tuple(map(int, re.split(r'[^\d]+', curl_cffi.__version__)[:3]))
if curl_cffi_version != (0, 5, 10) and not ((0, 7, 0) <= curl_cffi_version < (0, 8, 0)):
curl_cffi._yt_dlp__version = f'{curl_cffi.__version__} (unsupported)'
raise ImportError('Only curl_cffi 0.5.10 is supported')
raise ImportError('Only curl_cffi versions 0.5.10, 0.7.X are supported')
import curl_cffi.requests
from curl_cffi.const import CurlECode, CurlOpt
@ -110,6 +112,13 @@ class CurlCFFIRH(ImpersonateRequestHandler, InstanceStoreMixin):
_SUPPORTED_FEATURES = (Features.NO_PROXY, Features.ALL_PROXY)
_SUPPORTED_PROXY_SCHEMES = ('http', 'https', 'socks4', 'socks4a', 'socks5', 'socks5h')
_SUPPORTED_IMPERSONATE_TARGET_MAP = {
**({
ImpersonateTarget('chrome', '124', 'macos', '14'): curl_cffi.requests.BrowserType.chrome124,
ImpersonateTarget('chrome', '123', 'macos', '14'): curl_cffi.requests.BrowserType.chrome123,
ImpersonateTarget('chrome', '120', 'macos', '14'): curl_cffi.requests.BrowserType.chrome120,
ImpersonateTarget('chrome', '119', 'macos', '14'): curl_cffi.requests.BrowserType.chrome119,
ImpersonateTarget('chrome', '116', 'windows', '10'): curl_cffi.requests.BrowserType.chrome116,
} if curl_cffi_version >= (0, 7, 0) else {}),
ImpersonateTarget('chrome', '110', 'windows', '10'): curl_cffi.requests.BrowserType.chrome110,
ImpersonateTarget('chrome', '107', 'windows', '10'): curl_cffi.requests.BrowserType.chrome107,
ImpersonateTarget('chrome', '104', 'windows', '10'): curl_cffi.requests.BrowserType.chrome104,
@ -118,9 +127,15 @@ class CurlCFFIRH(ImpersonateRequestHandler, InstanceStoreMixin):
ImpersonateTarget('chrome', '99', 'windows', '10'): curl_cffi.requests.BrowserType.chrome99,
ImpersonateTarget('edge', '101', 'windows', '10'): curl_cffi.requests.BrowserType.edge101,
ImpersonateTarget('edge', '99', 'windows', '10'): curl_cffi.requests.BrowserType.edge99,
**({
ImpersonateTarget('safari', '17.0', 'macos', '14'): curl_cffi.requests.BrowserType.safari17_0,
} if curl_cffi_version >= (0, 7, 0) else {}),
ImpersonateTarget('safari', '15.5', 'macos', '12'): curl_cffi.requests.BrowserType.safari15_5,
ImpersonateTarget('safari', '15.3', 'macos', '11'): curl_cffi.requests.BrowserType.safari15_3,
ImpersonateTarget('chrome', '99', 'android', '12'): curl_cffi.requests.BrowserType.chrome99_android,
**({
ImpersonateTarget('safari', '17.2', 'ios', '17.2'): curl_cffi.requests.BrowserType.safari17_2_ios,
} if curl_cffi_version >= (0, 7, 0) else {}),
}
def _create_instance(self, cookiejar=None):
@ -131,6 +146,9 @@ def _check_extensions(self, extensions):
extensions.pop('impersonate', None)
extensions.pop('cookiejar', None)
extensions.pop('timeout', None)
# CurlCFFIRH ignores legacy ssl options currently.
# Impersonation generally uses a looser SSL configuration than urllib/requests.
extensions.pop('legacy_ssl', None)
def send(self, request: Request) -> Response:
target = self._get_request_target(request)
@ -187,7 +205,7 @@ def _send(self, request: Request):
timeout = self._calculate_timeout(request)
# set CURLOPT_LOW_SPEED_LIMIT and CURLOPT_LOW_SPEED_TIME to act as a read timeout. [1]
# curl_cffi does not currently do this. [2]
# This is required only for 0.5.10 [2]
# Note: CURLOPT_LOW_SPEED_TIME is in seconds, so we need to round up to the nearest second. [3]
# [1] https://unix.stackexchange.com/a/305311
# [2] https://github.com/yifeikong/curl_cffi/issues/156
@ -203,7 +221,7 @@ def _send(self, request: Request):
data=request.data,
verify=self.verify,
max_redirects=5,
timeout=timeout,
timeout=(timeout, timeout),
impersonate=self._SUPPORTED_IMPERSONATE_TARGET_MAP.get(
self._get_request_target(request)),
interface=self.source_address,
@ -222,7 +240,7 @@ def _send(self, request: Request):
elif (
e.code == CurlECode.PROXY
or (e.code == CurlECode.RECV_ERROR and 'Received HTTP code 407 from proxy after CONNECT' in str(e))
or (e.code == CurlECode.RECV_ERROR and 'CONNECT' in str(e))
):
raise ProxyError(cause=e) from e
else:

View File

@ -295,11 +295,12 @@ def _check_extensions(self, extensions):
super()._check_extensions(extensions)
extensions.pop('cookiejar', None)
extensions.pop('timeout', None)
extensions.pop('legacy_ssl', None)
def _create_instance(self, cookiejar):
def _create_instance(self, cookiejar, legacy_ssl_support=None):
session = RequestsSession()
http_adapter = RequestsHTTPAdapter(
ssl_context=self._make_sslcontext(),
ssl_context=self._make_sslcontext(legacy_ssl_support=legacy_ssl_support),
source_address=self.source_address,
max_retries=urllib3.util.retry.Retry(False),
)
@ -318,7 +319,10 @@ def _send(self, request):
max_redirects_exceeded = False
session = self._get_instance(cookiejar=self._get_cookiejar(request))
session = self._get_instance(
cookiejar=self._get_cookiejar(request),
legacy_ssl_support=request.extensions.get('legacy_ssl'),
)
try:
requests_res = session.request(

View File

@ -348,14 +348,15 @@ def _check_extensions(self, extensions):
super()._check_extensions(extensions)
extensions.pop('cookiejar', None)
extensions.pop('timeout', None)
extensions.pop('legacy_ssl', None)
def _create_instance(self, proxies, cookiejar):
def _create_instance(self, proxies, cookiejar, legacy_ssl_support=None):
opener = urllib.request.OpenerDirector()
handlers = [
ProxyHandler(proxies),
HTTPHandler(
debuglevel=int(bool(self.verbose)),
context=self._make_sslcontext(),
context=self._make_sslcontext(legacy_ssl_support=legacy_ssl_support),
source_address=self.source_address),
HTTPCookieProcessor(cookiejar),
DataHandler(),
@ -391,6 +392,7 @@ def _send(self, request):
opener = self._get_instance(
proxies=self._get_proxies(request),
cookiejar=self._get_cookiejar(request),
legacy_ssl_support=request.extensions.get('legacy_ssl'),
)
try:
res = opener.open(urllib_req, timeout=self._calculate_timeout(request))

View File

@ -118,6 +118,7 @@ def _check_extensions(self, extensions):
super()._check_extensions(extensions)
extensions.pop('timeout', None)
extensions.pop('cookiejar', None)
extensions.pop('legacy_ssl', None)
def close(self):
# Remove the logging handler that contains a reference to our logger
@ -154,13 +155,14 @@ def _send(self, request):
address=(wsuri.host, wsuri.port),
**create_conn_kwargs,
)
ssl_ctx = self._make_sslcontext(legacy_ssl_support=request.extensions.get('legacy_ssl'))
conn = websockets.sync.client.connect(
sock=sock,
uri=request.url,
additional_headers=headers,
open_timeout=timeout,
user_agent_header=None,
ssl_context=self._make_sslcontext() if wsuri.secure else None,
ssl_context=ssl_ctx if wsuri.secure else None,
close_timeout=0, # not ideal, but prevents yt-dlp hanging
)
return WebsocketsResponseAdapter(conn, url=request.url)

View File

@ -205,6 +205,7 @@ class RequestHandler(abc.ABC):
The following extensions are defined for RequestHandler:
- `cookiejar`: Cookiejar to use for this request.
- `timeout`: socket timeout to use for this request.
- `legacy_ssl`: Enable legacy SSL options for this request. See legacy_ssl_support.
To enable these, add extensions.pop('<extension>', None) to _check_extensions
Apart from the url protocol, proxies dict may contain the following keys:
@ -247,10 +248,10 @@ def __init__(
self.legacy_ssl_support = legacy_ssl_support
super().__init__()
def _make_sslcontext(self):
def _make_sslcontext(self, legacy_ssl_support=None):
return make_ssl_context(
verify=self.verify,
legacy_support=self.legacy_ssl_support,
legacy_support=legacy_ssl_support if legacy_ssl_support is not None else self.legacy_ssl_support,
use_certifi=not self.prefer_system_certs,
**self._client_cert,
)
@ -262,7 +263,8 @@ def _calculate_timeout(self, request):
return float(request.extensions.get('timeout') or self.timeout)
def _get_cookiejar(self, request):
return request.extensions.get('cookiejar') or self.cookiejar
cookiejar = request.extensions.get('cookiejar')
return self.cookiejar if cookiejar is None else cookiejar
def _get_proxies(self, request):
return (request.proxies or self.proxies).copy()
@ -314,6 +316,7 @@ def _check_extensions(self, extensions):
"""Check extensions for unsupported extensions. Subclasses should extend this."""
assert isinstance(extensions.get('cookiejar'), (YoutubeDLCookieJar, NoneType))
assert isinstance(extensions.get('timeout'), (float, int, NoneType))
assert isinstance(extensions.get('legacy_ssl'), (bool, NoneType))
def _validate(self, request):
self._check_url_scheme(request)

View File

@ -462,6 +462,7 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
'the STREAM (stdout or stderr) to apply the setting to. '
'Can be one of "always", "auto" (default), "never", or '
'"no_color" (use non color terminal sequences). '
'Use "auto-tty" or "no_color-tty" to decide based on terminal support only. '
'Can be used multiple times'))
general.add_option(
'--compat-options',
@ -476,8 +477,8 @@ def _alias_callback(option, opt_str, value, parser, opts, nargs):
'no-youtube-channel-redirect', 'no-youtube-unavailable-videos', 'no-youtube-prefer-utc-upload-date',
'prefer-legacy-http-handler', 'manifest-filesize-approx', 'allow-unsafe-ext',
}, 'aliases': {
'youtube-dl': ['all', '-multistreams', '-playlist-match-filter', '-manifest-filesize-approx'],
'youtube-dlc': ['all', '-no-youtube-channel-redirect', '-no-live-chat', '-playlist-match-filter', '-manifest-filesize-approx'],
'youtube-dl': ['all', '-multistreams', '-playlist-match-filter', '-manifest-filesize-approx', '-allow-unsafe-ext'],
'youtube-dlc': ['all', '-no-youtube-channel-redirect', '-no-live-chat', '-playlist-match-filter', '-manifest-filesize-approx', '-allow-unsafe-ext'],
'2021': ['2022', 'no-certifi', 'filename-sanitization'],
'2022': ['2023', 'no-external-downloader-progress', 'playlist-match-filter', 'prefer-legacy-http-handler', 'manifest-filesize-approx'],
'2023': [],

View File

@ -134,7 +134,7 @@ def run(self, info):
meta = MP4(filename)
# NOTE: the 'covr' atom is a non-standard MPEG-4 atom,
# Apple iTunes 'M4A' files include the 'moov.udta.meta.ilst' atom.
meta.tags['covr'] = [MP4Cover(data=thumb_data, imageformat=f)]
meta.tags['covr'] = [MP4Cover(data=thumb_data, imageformat=f[type_])]
meta.save()
temp_filename = filename
except Exception as err:

View File

@ -310,6 +310,7 @@ def _download_update_spec(self, source_tags):
if isinstance(error, HTTPError) and error.status == 404:
continue
self._report_network_error(f'fetch update spec: {error}')
return None
self._report_error(
f'The requested tag {self.requested_tag} does not exist for {self.requested_repo}', True)
@ -557,9 +558,10 @@ def _report_permission_error(self, file):
def _report_network_error(self, action, delim=';', tag=None):
if not tag:
tag = self.requested_tag
path = tag if tag == 'latest' else f'tag/{tag}'
self._report_error(
f'Unable to {action}{delim} visit https://github.com/{self.requested_repo}/releases/'
+ tag if tag == 'latest' else f'tag/{tag}', True)
f'Unable to {action}{delim} visit '
f'https://github.com/{self.requested_repo}/releases/{path}', True)
# XXX: Everything below this line in this class is deprecated / for compat only
@property

View File

@ -1217,7 +1217,7 @@ def unified_timestamp(date_str, day_first=True):
return None
date_str = re.sub(r'\s+', ' ', re.sub(
r'(?i)[,|]|(mon|tues?|wed(nes)?|thu(rs)?|fri|sat(ur)?)(day)?', '', date_str))
r'(?i)[,|]|(mon|tues?|wed(nes)?|thu(rs)?|fri|sat(ur)?|sun)(day)?', '', date_str))
pm_delta = 12 if re.search(r'(?i)PM', date_str) else 0
timezone, date_str = extract_timezone(date_str)
@ -2984,6 +2984,7 @@ def parse_codecs(codecs_str):
str.strip, codecs_str.strip().strip(',').split(','))))
vcodec, acodec, scodec, hdr = None, None, None, None
for full_codec in split_codecs:
full_codec = re.sub(r'^([^.]+)', lambda m: m.group(1).lower(), full_codec)
parts = re.sub(r'0+(?=\d)', '', full_codec).split('.')
if parts[0] in ('avc1', 'avc2', 'avc3', 'avc4', 'vp9', 'vp8', 'hev1', 'hev2',
'h263', 'h264', 'mp4v', 'hvc1', 'av1', 'theora', 'dvh1', 'dvhe'):
@ -5026,7 +5027,7 @@ def items_(self):
common_video=('avi', 'flv', 'mkv', 'mov', 'mp4', 'webm'),
video=('3g2', '3gp', 'f4v', 'mk3d', 'divx', 'mpg', 'ogv', 'm4v', 'wmv'),
common_audio=('aiff', 'alac', 'flac', 'm4a', 'mka', 'mp3', 'ogg', 'opus', 'wav'),
audio=('aac', 'ape', 'asf', 'f4a', 'f4b', 'm4b', 'm4p', 'm4r', 'oga', 'ogx', 'spx', 'vorbis', 'wma', 'weba'),
audio=('aac', 'ape', 'asf', 'f4a', 'f4b', 'm4b', 'm4r', 'oga', 'ogx', 'spx', 'vorbis', 'wma', 'weba'),
thumbnails=('jpg', 'png', 'webp'),
storyboards=('mhtml', ),
subtitles=('srt', 'vtt', 'ass', 'lrc'),
@ -5059,36 +5060,64 @@ class _UnsafeExtensionError(Exception):
# video
*MEDIA_EXTENSIONS.video,
'avif',
'asx',
'ismv',
'm2t',
'm2ts',
'm2v',
'm4s',
'mng',
'mp2v',
'mp4v',
'mpe',
'mpeg',
'mpeg1',
'mpeg2',
'mpeg4',
'mxf',
'ogm',
'qt',
'rm',
'swf',
'ts',
'vob',
'vp9',
'wvm',
# audio
*MEDIA_EXTENSIONS.audio,
'3ga',
'ac3',
'adts',
'aif',
'au',
'dts',
'isma',
'it',
'mid',
'mod',
'mpga',
'mp1',
'mp2',
'mp4a',
'mpa',
'ra',
'shn',
'xm',
# image
*MEDIA_EXTENSIONS.thumbnails,
'avif',
'bmp',
'gif',
'heic',
'ico',
'image',
'jng',
'jpeg',
'jxl',
'svg',
'tif',
'tiff',
'wbmp',
# subtitle
@ -5096,11 +5125,16 @@ class _UnsafeExtensionError(Exception):
'dfxp',
'fs',
'ismt',
'json3',
'sami',
'scc',
'srv1',
'srv2',
'srv3',
'ssa',
'tt',
'ttml',
'xml',
# others
*MEDIA_EXTENSIONS.manifests,
@ -5111,7 +5145,6 @@ class _UnsafeExtensionError(Exception):
'sbv',
'url',
'webloc',
'xml',
])
def __init__(self, extension, /):
@ -5120,6 +5153,9 @@ def __init__(self, extension, /):
@classmethod
def sanitize_extension(cls, extension, /, *, prepend=False):
if extension is None:
return None
if '/' in extension or '\\' in extension:
raise cls(extension)

View File

@ -1,8 +1,8 @@
# Autogenerated by devscripts/update-version.py
__version__ = '2024.07.01'
__version__ = '2024.08.06'
RELEASE_GIT_HEAD = '5ce582448ececb8d9c30c8c31f58330090ced03a'
RELEASE_GIT_HEAD = '4d9231208332d4c32364b8cd814bff8b20232cae'
VARIANT = None
@ -12,4 +12,4 @@
ORIGIN = 'yt-dlp/yt-dlp'
_pkg_version = '2024.07.01'
_pkg_version = '2024.08.06'