Skip to content

Commit dabec62

Browse files
authored
Redo indexing mechanism (#16)
* Bumped up the line length because I fear no man * Refactored indexing Previously, indexing worked by collecting the video IDs of only videos that matched indexing criteria. This new model instead stores ALL videos for a given source, but will only _download_ videos that meet that criteria. This lets us backfill without indexing, makes it easier to add in other backends, lets us download one-off videos for a source that don't quite meet criteria, you name it. * Updated media finders to respect format filters; Added credo file
1 parent 89497c4 commit dabec62

31 files changed

+645
-384
lines changed

.credo.exs

+215
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
# This file contains the configuration for Credo and you are probably reading
2+
# this after creating it with `mix credo.gen.config`.
3+
#
4+
# If you find anything wrong or unclear in this file, please report an
5+
# issue on GitHub: https://github.com/rrrene/credo/issues
6+
#
7+
%{
8+
#
9+
# You can have as many configs as you like in the `configs:` field.
10+
configs: [
11+
%{
12+
#
13+
# Run any config using `mix credo -C <name>`. If no config name is given
14+
# "default" is used.
15+
#
16+
name: "default",
17+
#
18+
# These are the files included in the analysis:
19+
files: %{
20+
#
21+
# You can give explicit globs or simply directories.
22+
# In the latter case `**/*.{ex,exs}` will be used.
23+
#
24+
included: [
25+
"lib/",
26+
"src/",
27+
"test/",
28+
"web/",
29+
"apps/*/lib/",
30+
"apps/*/src/",
31+
"apps/*/test/",
32+
"apps/*/web/"
33+
],
34+
excluded: [~r"/_build/", ~r"/deps/", ~r"/node_modules/"]
35+
},
36+
#
37+
# Load and configure plugins here:
38+
#
39+
plugins: [],
40+
#
41+
# If you create your own checks, you must specify the source files for
42+
# them here, so they can be loaded by Credo before running the analysis.
43+
#
44+
requires: [],
45+
#
46+
# If you want to enforce a style guide and need a more traditional linting
47+
# experience, you can change `strict` to `true` below:
48+
#
49+
strict: false,
50+
#
51+
# To modify the timeout for parsing files, change this value:
52+
#
53+
parse_timeout: 5000,
54+
#
55+
# If you want to use uncolored output by default, you can change `color`
56+
# to `false` below:
57+
#
58+
color: true,
59+
#
60+
# You can customize the parameters of any check by adding a second element
61+
# to the tuple.
62+
#
63+
# To disable a check put `false` as second element:
64+
#
65+
# {Credo.Check.Design.DuplicatedCode, false}
66+
#
67+
checks: %{
68+
enabled: [
69+
#
70+
## Consistency Checks
71+
#
72+
{Credo.Check.Consistency.ExceptionNames, []},
73+
{Credo.Check.Consistency.LineEndings, []},
74+
{Credo.Check.Consistency.ParameterPatternMatching, []},
75+
{Credo.Check.Consistency.SpaceAroundOperators, []},
76+
{Credo.Check.Consistency.SpaceInParentheses, []},
77+
{Credo.Check.Consistency.TabsOrSpaces, []},
78+
79+
#
80+
## Design Checks
81+
#
82+
# You can customize the priority of any check
83+
# Priority values are: `low, normal, high, higher`
84+
#
85+
{Credo.Check.Design.AliasUsage, [priority: :low, if_nested_deeper_than: 2, if_called_more_often_than: 0]},
86+
{Credo.Check.Design.TagFIXME, []},
87+
# You can also customize the exit_status of each check.
88+
# If you don't want TODO comments to cause `mix credo` to fail, just
89+
# set this value to 0 (zero).
90+
#
91+
{Credo.Check.Design.TagTODO, [exit_status: 2]},
92+
93+
#
94+
## Readability Checks
95+
#
96+
{Credo.Check.Readability.AliasOrder, []},
97+
{Credo.Check.Readability.FunctionNames, []},
98+
{Credo.Check.Readability.LargeNumbers, []},
99+
{Credo.Check.Readability.MaxLineLength, [priority: :low, max_length: 120]},
100+
{Credo.Check.Readability.ModuleAttributeNames, []},
101+
{Credo.Check.Readability.ModuleDoc, []},
102+
{Credo.Check.Readability.ModuleNames, []},
103+
{Credo.Check.Readability.ParenthesesInCondition, []},
104+
{Credo.Check.Readability.ParenthesesOnZeroArityDefs, []},
105+
{Credo.Check.Readability.PipeIntoAnonymousFunctions, []},
106+
{Credo.Check.Readability.PredicateFunctionNames, []},
107+
{Credo.Check.Readability.PreferImplicitTry, []},
108+
{Credo.Check.Readability.RedundantBlankLines, []},
109+
{Credo.Check.Readability.Semicolons, []},
110+
{Credo.Check.Readability.SpaceAfterCommas, []},
111+
{Credo.Check.Readability.StringSigils, []},
112+
{Credo.Check.Readability.TrailingBlankLine, []},
113+
{Credo.Check.Readability.TrailingWhiteSpace, []},
114+
{Credo.Check.Readability.UnnecessaryAliasExpansion, []},
115+
{Credo.Check.Readability.VariableNames, []},
116+
{Credo.Check.Readability.WithSingleClause, []},
117+
118+
#
119+
## Refactoring Opportunities
120+
#
121+
{Credo.Check.Refactor.Apply, []},
122+
{Credo.Check.Refactor.CondStatements, []},
123+
{Credo.Check.Refactor.FilterCount, []},
124+
{Credo.Check.Refactor.FilterFilter, []},
125+
{Credo.Check.Refactor.FunctionArity, []},
126+
{Credo.Check.Refactor.LongQuoteBlocks, []},
127+
{Credo.Check.Refactor.MapJoin, []},
128+
{Credo.Check.Refactor.MatchInCondition, []},
129+
{Credo.Check.Refactor.NegatedConditionsInUnless, []},
130+
{Credo.Check.Refactor.NegatedConditionsWithElse, []},
131+
{Credo.Check.Refactor.Nesting, []},
132+
{Credo.Check.Refactor.RedundantWithClauseResult, []},
133+
{Credo.Check.Refactor.RejectReject, []},
134+
{Credo.Check.Refactor.UnlessWithElse, []},
135+
{Credo.Check.Refactor.WithClauses, []},
136+
137+
#
138+
## Warnings
139+
#
140+
{Credo.Check.Warning.ApplicationConfigInModuleAttribute, []},
141+
{Credo.Check.Warning.BoolOperationOnSameValues, []},
142+
{Credo.Check.Warning.Dbg, []},
143+
{Credo.Check.Warning.ExpensiveEmptyEnumCheck, []},
144+
{Credo.Check.Warning.IExPry, []},
145+
{Credo.Check.Warning.IoInspect, []},
146+
{Credo.Check.Warning.MissedMetadataKeyInLoggerConfig, []},
147+
{Credo.Check.Warning.OperationOnSameValues, []},
148+
{Credo.Check.Warning.OperationWithConstantResult, []},
149+
{Credo.Check.Warning.RaiseInsideRescue, []},
150+
{Credo.Check.Warning.SpecWithStruct, []},
151+
{Credo.Check.Warning.UnsafeExec, []},
152+
{Credo.Check.Warning.UnusedEnumOperation, []},
153+
{Credo.Check.Warning.UnusedFileOperation, []},
154+
{Credo.Check.Warning.UnusedKeywordOperation, []},
155+
{Credo.Check.Warning.UnusedListOperation, []},
156+
{Credo.Check.Warning.UnusedPathOperation, []},
157+
{Credo.Check.Warning.UnusedRegexOperation, []},
158+
{Credo.Check.Warning.UnusedStringOperation, []},
159+
{Credo.Check.Warning.UnusedTupleOperation, []},
160+
{Credo.Check.Warning.WrongTestFileExtension, []}
161+
],
162+
disabled: [
163+
#
164+
# Checks scheduled for next check update (opt-in for now, just replace `false` with `[]`)
165+
166+
#
167+
# Controversial and experimental checks (opt-in, just move the check to `:enabled`
168+
# and be sure to use `mix credo --strict` to see low priority checks)
169+
#
170+
{Credo.Check.Refactor.CyclomaticComplexity, []},
171+
{Credo.Check.Consistency.MultiAliasImportRequireUse, []},
172+
{Credo.Check.Consistency.UnusedVariableNames, []},
173+
{Credo.Check.Design.DuplicatedCode, []},
174+
{Credo.Check.Design.SkipTestWithoutComment, []},
175+
{Credo.Check.Readability.AliasAs, []},
176+
{Credo.Check.Readability.BlockPipe, []},
177+
{Credo.Check.Readability.ImplTrue, []},
178+
{Credo.Check.Readability.MultiAlias, []},
179+
{Credo.Check.Readability.NestedFunctionCalls, []},
180+
{Credo.Check.Readability.OneArityFunctionInPipe, []},
181+
{Credo.Check.Readability.OnePipePerLine, []},
182+
{Credo.Check.Readability.SeparateAliasRequire, []},
183+
{Credo.Check.Readability.SingleFunctionToBlockPipe, []},
184+
{Credo.Check.Readability.SinglePipe, []},
185+
{Credo.Check.Readability.Specs, []},
186+
{Credo.Check.Readability.StrictModuleLayout, []},
187+
{Credo.Check.Readability.WithCustomTaggedTuple, []},
188+
{Credo.Check.Refactor.ABCSize, []},
189+
{Credo.Check.Refactor.AppendSingleItem, []},
190+
{Credo.Check.Refactor.DoubleBooleanNegation, []},
191+
{Credo.Check.Refactor.FilterReject, []},
192+
{Credo.Check.Refactor.IoPuts, []},
193+
{Credo.Check.Refactor.MapMap, []},
194+
{Credo.Check.Refactor.ModuleDependencies, []},
195+
{Credo.Check.Refactor.NegatedIsNil, []},
196+
{Credo.Check.Refactor.PassAsyncInTestCases, []},
197+
{Credo.Check.Refactor.PipeChainStart, []},
198+
{Credo.Check.Refactor.RejectFilter, []},
199+
{Credo.Check.Refactor.VariableRebinding, []},
200+
{Credo.Check.Warning.LazyLogging, []},
201+
{Credo.Check.Warning.LeakyEnvironment, []},
202+
{Credo.Check.Warning.MapGetUnsafePass, []},
203+
{Credo.Check.Warning.MixEnv, []},
204+
{Credo.Check.Warning.UnsafeToAtom, []}
205+
206+
# {Credo.Check.Refactor.MapInto, []},
207+
208+
#
209+
# Custom checks can be created using `mix credo.gen.check`.
210+
#
211+
]
212+
}
213+
}
214+
]
215+
}

.formatter.exs

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,5 +3,5 @@
33
subdirectories: ["priv/*/migrations"],
44
plugins: [Phoenix.LiveView.HTMLFormatter],
55
inputs: ["*.{heex,ex,exs}", "{config,lib,test}/**/*.{heex,ex,exs}", "priv/*/seeds.exs"],
6-
line_length: 100
6+
line_length: 120
77
]

.iex.exs

+1-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ defmodule IexHelpers do
4444
:channel -> channel_url()
4545
end
4646

47-
SourceDetails.get_video_ids(source)
47+
SourceDetails.get_media_attributes(source)
4848
end
4949
end
5050

lib/pinchflat/media.ex

+56-5
Original file line numberDiff line numberDiff line change
@@ -19,15 +19,31 @@ defmodule Pinchflat.Media do
1919

2020
@doc """
2121
Returns a list of pending media_items for a given source, where
22-
pending means the `media_filepath` is `nil`.
22+
pending means the `media_filepath` is `nil` AND the media_item
23+
matches the format selection rules of the parent media_profile.
24+
25+
See `build_format_clauses` but tl;dr is it _may_ filter based
26+
on shorts or livestreams depending on the media_profile settings.
2327
2428
Returns [%MediaItem{}, ...].
2529
"""
2630
def list_pending_media_items_for(%Source{} = source) do
27-
from(
28-
m in MediaItem,
29-
where: m.source_id == ^source.id and is_nil(m.media_filepath)
30-
)
31+
media_profile = Repo.preload(source, :media_profile).media_profile
32+
33+
MediaItem
34+
|> where([mi], mi.source_id == ^source.id and is_nil(mi.media_filepath))
35+
|> where(^build_format_clauses(media_profile))
36+
|> Repo.all()
37+
end
38+
39+
@doc """
40+
Returns a list of downloaded media_items for a given source.
41+
42+
Returns [%MediaItem{}, ...].
43+
"""
44+
def list_downloaded_media_items_for(%Source{} = source) do
45+
MediaItem
46+
|> where([mi], mi.source_id == ^source.id and not is_nil(mi.media_filepath))
3147
|> Repo.all()
3248
end
3349

@@ -72,4 +88,39 @@ defmodule Pinchflat.Media do
7288
def change_media_item(%MediaItem{} = media_item, attrs \\ %{}) do
7389
MediaItem.changeset(media_item, attrs)
7490
end
91+
92+
defp build_format_clauses(media_profile) do
93+
mapped_struct = Map.from_struct(media_profile)
94+
95+
Enum.reduce(mapped_struct, dynamic(true), fn attr, dynamic ->
96+
case {attr, media_profile} do
97+
{{:shorts_behaviour, :only}, %{livestream_behaviour: :only}} ->
98+
dynamic([mi], ^dynamic and (mi.livestream == true or fragment("? ILIKE ?", mi.original_url, "%/shorts/%")))
99+
100+
# Technically redundant, but makes the other clauses easier to parse
101+
# (redundant because this condition is the same as the condition above, just flipped)
102+
{{:livestream_behaviour, :only}, %{shorts_behaviour: :only}} ->
103+
dynamic
104+
105+
{{:shorts_behaviour, :only}, _} ->
106+
# return records with /shorts/ in the original_url
107+
dynamic([mi], ^dynamic and fragment("? ILIKE ?", mi.original_url, "%/shorts/%"))
108+
109+
{{:livestream_behaviour, :only}, _} ->
110+
# return records with livestream: true
111+
dynamic([mi], ^dynamic and mi.livestream == true)
112+
113+
{{:shorts_behaviour, :exclude}, %{livestream_behaviour: lb}} when lb != :only ->
114+
# return records without /shorts/ in the original_url
115+
dynamic([mi], ^dynamic and fragment("? NOT ILIKE ?", mi.original_url, "%/shorts/%"))
116+
117+
{{:livestream_behaviour, :exclude}, %{shorts_behaviour: sb}} when sb != :only ->
118+
# return records with livestream: false
119+
dynamic([mi], ^dynamic and mi.livestream == false)
120+
121+
_ ->
122+
dynamic
123+
end
124+
end)
125+
end
75126
end

lib/pinchflat/media/media_item.ex

+5-1
Original file line numberDiff line numberDiff line change
@@ -13,17 +13,21 @@ defmodule Pinchflat.Media.MediaItem do
1313
@allowed_fields ~w(
1414
title
1515
media_id
16+
original_url
17+
livestream
1618
media_filepath
1719
source_id
1820
subtitle_filepaths
1921
thumbnail_filepath
2022
metadata_filepath
2123
)a
22-
@required_fields ~w(media_id source_id)a
24+
@required_fields ~w(title original_url livestream media_id source_id)a
2325

2426
schema "media_items" do
2527
field :title, :string
2628
field :media_id, :string
29+
field :original_url, :string
30+
field :livestream, :boolean, default: false
2731
field :media_filepath, :string
2832
field :thumbnail_filepath, :string
2933
field :metadata_filepath, :string

lib/pinchflat/media_client/backends/yt_dlp/video_collection.ex

+14-6
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,26 @@ defmodule Pinchflat.MediaClient.Backends.YtDlp.VideoCollection do
44
videos (aka: a source [ie: channels, playlists]).
55
"""
66

7+
alias Pinchflat.Utils.FunctionUtils
8+
79
@doc """
8-
Returns a list of strings representing the video ids in the collection.
10+
Returns a list of maps representing the videos in the collection.
911
10-
Returns {:ok, [binary()]} | {:error, any, ...}.
12+
Returns {:ok, [map()]} | {:error, any, ...}.
1113
"""
12-
def get_video_ids(url, command_opts \\ []) do
14+
def get_media_attributes(url, command_opts \\ []) do
1315
runner = Application.get_env(:pinchflat, :yt_dlp_runner)
1416
opts = command_opts ++ [:simulate, :skip_download]
1517

16-
case runner.run(url, opts, "%(id)s") do
17-
{:ok, output} -> {:ok, String.split(output, "\n", trim: true)}
18-
res -> res
18+
case runner.run(url, opts, "%(.{id,title,was_live,original_url})j") do
19+
{:ok, output} ->
20+
output
21+
|> String.split("\n", trim: true)
22+
|> Enum.map(&Phoenix.json_library().decode!/1)
23+
|> FunctionUtils.wrap_ok()
24+
25+
res ->
26+
res
1927
end
2028
end
2129

0 commit comments

Comments
 (0)