Topic-Modelling-for-PornHub-s-Titles

Topic Modelling for PornHub's Videos to exploit any differences in words used for general videos, and for videos appreciated by the female audience, to see if the video title affects the target.

Number of Titles:

General Videos: 48009
Popular With Women: 40570

Steps:

Scraping Using Beautiful Soup and Requests to obtain:
1. Video Title;
2. Video Rating;
3. Video Views;
4. Video Length;
For General videos (link: https://www.pornhub.com/video) and for videos popular with women (link: https://www.pornhub.com/popularwithwomen);
Detect title language to delete each not english word, and any number to avoid any misleading result in the analysis;
Write all titles in two CSV, one for General Videos and one for videos Popular with Women;
Perform Topic Modelling for each set of titles using Latent Dirichlet Allocation and GridSearchCV, and produce two file.html with all the results and the most used words.

Issues: Before performing LDA You have to change some attributes, in particular:

Inside "sklearn.py" of the package "pyLDAvis" for the visualization of the results, You have to change inside the function "_get_vocab" the method "get_feature_names" with "get_feature_names_out";
Inside "_prepare.py" of the package "pyLDAvis", You have to change inside the method "drop" associated with head(R), the number "1" with the write axis=1, so You have to specify the parameter "axis" to avoid this error.

Inside the Repository You'll find, except this README:

Topic_Modelling_PH.py, where You can find the script;
title_videos.csv, where You can find the titles used for the analysis in data 18/08/2023;
title_videos_fem.csv, as the previous point but with titles of videos popular with women;
LDA_panel_female videos.html, where You can find the visualization of the results of LDA performed for videos popular with women;
LDA_panel_general videos.html, where You can find the visualization of the results of LDA performed for general videos.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitattributes		.gitattributes
LDA_panel_female_videos.html		LDA_panel_female_videos.html
LDA_panel_general_videos.html		LDA_panel_general_videos.html
README.md		README.md
Topic_Modelling_PH.py		Topic_Modelling_PH.py
title_videos.csv		title_videos.csv
title_videos_fem.csv		title_videos_fem.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Topic-Modelling-for-PornHub-s-Titles

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

dyomed93/Topic-Modelling-for-PornHub-s-Titles

Folders and files

Latest commit

History

Repository files navigation

Topic-Modelling-for-PornHub-s-Titles

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages