Tips for working with large datasets 

Hi I'm working with a 200MB file and using the command group_similar_strings, however, this is taking so long that it's never completing (running for several days). I've tried several n_gram sizes with no luck. Do you have any tips to run on large datasets?