-
One of the great appeals and power of deep learning is the democratization it allows to practitioners. Where as in the past, statisticians or econometricians needed a fair amount of domain knowledge to build models. Deep learning and ensemble methods mitigate against that requirement by automatically creating different combinations of interactions to find optimality, making it unnecessary to engineer specific interaction terms. That being said, there are likely times when feature engineering, aided with domain knowledge, would benefit model building, and combining those features with different data types I think is super interesting, such as text/nlp with regular numeric data. There is a short blog post I found here that introduces the idea, although I haven't really seen any classes/tutorials do it using deep learning. I'm actually attempting to do it myself, but am running into a bit of a trouble. I know this goes beyond the scope of the course, but curious if anyone could spot how to fix this. I'm using the wine reviews data set from kaggle, and turned the points column that's a range from 0 to 100 into a dichotomous target varible so it's more clear-cut classification problem. There is a description column with the actual text, along with other variables, but just for the sake of this proof-of-concept the only additional numeric variables I'm using are a scaled number of words in the description and a scaled price. (there could be many more useful ones to use, but just for the sake of simplicity I'm using these for now). I tried to follow the blog post as best I could, but it's pretty scant. After loading in the data and doing the feature engineering and preprocessing, the workflow and error message is what I have below.
And the error message
Any ideas on how to solve that? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Just as an update to this (sorry, but I do find this topic intersitng) I found the answer on stack, and it's pretty simple.
What's also super interesting is that changing the model to include both text and extracted features, the model's performance improved DRAMATICALLY. I wasn't necessarily surprised by this, but it was gratifying to see.. |
Beta Was this translation helpful? Give feedback.
Just as an update to this (sorry, but I do find this topic intersitng) I found the answer on stack, and it's pretty simple.
What's also super interesting is that changing the model to include both text and extracted features, the model's performance improved DRAMATICALLY. I wasn't necessarily surprised by this, but it was gratifying to see..