-
Notifications
You must be signed in to change notification settings - Fork 1
Version/v0.3 #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Version/v0.3 #24
Changes from 8 commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
d9040c3
🎈 perf: 可视化图谱优化
ligeaaa e3e1683
🦄 refactor: 重构数据库,将爬虫状态和博客状态拆分,使其语义不混淆
ligeaaa db61b2a
🎈 perf: 优化icon_url的提取逻辑
ligeaaa bdffd60
🐞 fix: 修复了图谱中不加载icon的问题
ligeaaa 40704e8
📃 docs:
ligeaaa bd022ba
🐞 fix: 修改了有些博客之间的边没有正常持久化的问题
ligeaaa ed9d044
✨ feat: 添加了可视化图谱测试
ligeaaa 63406a2
🐞 fix: ruff
ligeaaa d7f18c1
📃 docs: readme
ligeaaa 4880a8e
✨ feat: 清理首页
ligeaaa 4856a02
🎈 perf: 优化渲染ticks选择
ligeaaa 8482837
✨ feat: 可视化新增“精简”模式
ligeaaa 69a193a
🦄 refactor: 首页重构
ligeaaa 8b0c34f
✨ feat: 博客详情页初版
ligeaaa 5927259
✨ feat: 完善了博客搜索,现在没搜索到的可以选择直接加入网络了
ligeaaa 911f8d0
✨ feat: 博客详情页新增“发现路径”“博客关联图”
ligeaaa da18deb
🐞 fix: ruff
ligeaaa 69e0f24
🐞 fix: 对应的修改测试案例(修改测试案例好怪
ligeaaa 00ae89b
✨ feat: 随机博客界面新增博客详情入口
ligeaaa f461766
✨ feat: 添加用户打开博客详情和外部链接的数据统计
ligeaaa b9b51b7
🎈 perf: 现在随机博客界面打开博客详情会另开一个标签页了
ligeaaa 2aaa8da
🎈 perf: 优化随机博客权重为不考虑blog label
ligeaaa 242d5b7
🐳 chore: 修改seed.csv
ligeaaa adfcc44
🦄 refactor: 清理掉早期的“申请添加博客”“重试过滤链”相关代码和数据库
ligeaaa e784808
🦄 refactor: 清理掉blog_interactions和recommendation_impressions表中的blog_id字段
ligeaaa d3ed85d
🐳 chore: 修改seed
ligeaaa 3f33a2e
✨ feat: 博客详情页添加当前博客爬虫状态
ligeaaa 61562fc
🐞 fix: ruff
ligeaaa 46f87ad
🎈 perf: 优化可视化图谱
ligeaaa 401420d
Delete tracker directory
ligeaaa e58aa56
✨ feat: 完善用户系统
ligeaaa 9ae18e5
Merge branch 'version/v0.3' of github.com:ligeaaa/HeyBlog into versio…
ligeaaa effbe83
✨ feat: 添加统计信息
ligeaaa 62867de
🎈 perf: 默认关闭token暴露
ligeaaa c494823
🐞 fix: pytest
ligeaaa 13aa4e3
✨ feat: 添加每小时自动开启爬虫
ligeaaa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
115 changes: 115 additions & 0 deletions
115
alembic/versions/20260602_01_add_blog_acceptance_status.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,115 @@ | ||
| """Split blog acceptance from crawl execution status. | ||
|
|
||
| Revision ID: 20260602_01 | ||
| Revises: 20260601_01 | ||
| Create Date: 2026-06-02 21:30:29 BST | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| from alembic import op | ||
| import sqlalchemy as sa | ||
|
|
||
|
|
||
| revision = "20260602_01" | ||
| down_revision = "20260601_01" | ||
| branch_labels = None | ||
| depends_on = None | ||
|
|
||
|
|
||
| def _columns(table_name: str) -> set[str]: | ||
| """Return currently present column names for one table. | ||
|
|
||
| Args: | ||
| table_name: Database table name to inspect. | ||
|
|
||
| Returns: | ||
| Set of column names currently present in the database. | ||
| """ | ||
| return {column["name"] for column in sa.inspect(op.get_bind()).get_columns(table_name)} | ||
|
|
||
|
|
||
| def upgrade() -> None: | ||
| """Add acceptance and crawl-error fields, then backfill accepted graph rows. | ||
|
|
||
| Args: | ||
| None. | ||
|
|
||
| Returns: | ||
| None. The migration mutates the active database schema in place. | ||
| """ | ||
| blog_columns = _columns("blogs") | ||
| if "acceptance_status" not in blog_columns: | ||
| op.add_column( | ||
| "blogs", | ||
| sa.Column("acceptance_status", sa.Text(), nullable=False, server_default="UNKNOWN"), | ||
| ) | ||
| for column_name in ( | ||
| "accepted_by", | ||
| "crawl_error_kind", | ||
| "crawl_error_message", | ||
| ): | ||
| if column_name not in blog_columns: | ||
| op.add_column("blogs", sa.Column(column_name, sa.Text(), nullable=True)) | ||
| for column_name in ( | ||
| "accepted_at", | ||
| "last_crawl_attempt_at", | ||
| "successful_crawl_at", | ||
| ): | ||
| if column_name not in blog_columns: | ||
| op.add_column("blogs", sa.Column(column_name, sa.DateTime(timezone=True), nullable=True)) | ||
|
|
||
| op.execute( | ||
| """ | ||
| UPDATE blogs b | ||
| SET acceptance_status = 'ACCEPTED', | ||
| accepted_by = COALESCE(b.accepted_by, r.accepted_by, 'unknown'), | ||
| accepted_at = COALESCE(b.accepted_at, r.updated_at, b.created_at) | ||
| FROM raw_discovered_urls r | ||
| WHERE b.normalized_url = r.normalized_url | ||
| AND r.status = 'success' | ||
| AND b.acceptance_status = 'UNKNOWN' | ||
| """ | ||
| ) | ||
| op.execute( | ||
| """ | ||
| UPDATE blogs | ||
| SET acceptance_status = 'ACCEPTED', | ||
| accepted_by = COALESCE(accepted_by, 'seed'), | ||
| accepted_at = COALESCE(accepted_at, created_at) | ||
| WHERE acceptance_status = 'UNKNOWN' | ||
| AND blog_id NOT IN (SELECT to_blog_id FROM edges) | ||
| """ | ||
| ) | ||
| op.execute( | ||
| """ | ||
| UPDATE blogs | ||
| SET acceptance_status = 'ACCEPTED', | ||
| accepted_by = COALESCE(accepted_by, 'graph'), | ||
| accepted_at = COALESCE(accepted_at, created_at) | ||
| WHERE acceptance_status = 'UNKNOWN' | ||
| AND blog_id IN (SELECT from_blog_id FROM edges UNION SELECT to_blog_id FROM edges) | ||
| """ | ||
| ) | ||
|
|
||
|
|
||
| def downgrade() -> None: | ||
| """Remove acceptance and crawl-error fields. | ||
|
|
||
| Args: | ||
| None. | ||
|
|
||
| Returns: | ||
| None. The migration mutates the active database schema in place. | ||
| """ | ||
| for column_name in ( | ||
| "successful_crawl_at", | ||
| "last_crawl_attempt_at", | ||
| "crawl_error_message", | ||
| "crawl_error_kind", | ||
| "accepted_at", | ||
| "accepted_by", | ||
| "acceptance_status", | ||
| ): | ||
| if column_name in _columns("blogs"): | ||
| op.drop_column("blogs", column_name) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For attacker-controlled icon hosts, this fetch re-resolves
current_urlafter_validate_icon_proxy_urlhas already done its safety check, so DNS rebinding can return a public address during validation and a loopback/link-local address for the actualhttpx.streamconnection. Because this endpoint proxies arbitrary user-supplied icon URLs, that still allows SSRF against internal services despite the private-address guard; the fetch needs to use the validated resolved address (or otherwise prevent a second unsafe resolution) for every redirect hop.Useful? React with 👍 / 👎.