News Text Pattern Self-Check

A Streamlit app to discover repetitive phrases and template-like structures in CSV exports (e.g., Facebook/Instagram titles, descriptions, captions). It supports mixed Chinese/English text and can surface attribution-heavy phrasing, platform terms, and near-duplicate headlines.

Features

Upload CSV and select which text columns to analyze
Language-aware normalization (lowercase Latin, retain CJK)
N-gram mining (1–5) with document frequency
Chinese character-level n-grams to capture short words like 「自爆」「自嘲」
Near-duplicate headline/description detection using fuzzy matching
Keyword quick check
Download normalized and annotated results

Quickstart

python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\Activate
pip install -r requirements.txt
streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
sample_data		sample_data
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Text Pattern Self-Check

Features

Quickstart

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

News Text Pattern Self-Check

Features

Quickstart

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages