Skip to content

MabelHsu/news-text-pattern-check

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News Text Pattern Self-Check

A Streamlit app to discover repetitive phrases and template-like structures in CSV exports (e.g., Facebook/Instagram titles, descriptions, captions). It supports mixed Chinese/English text and can surface attribution-heavy phrasing, platform terms, and near-duplicate headlines.

Features

  • Upload CSV and select which text columns to analyze
  • Language-aware normalization (lowercase Latin, retain CJK)
  • N-gram mining (1–5) with document frequency
  • Chinese character-level n-grams to capture short words like 「自爆」「自嘲」
  • Near-duplicate headline/description detection using fuzzy matching
  • Keyword quick check
  • Download normalized and annotated results

Quickstart

python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\Activate
pip install -r requirements.txt
streamlit run app.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages