Skip to content

Conversation

@liyiecho
Copy link

@liyiecho liyiecho commented Mar 19, 2018

Fixes: #65

Copy link
Owner

@dixudx dixudx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the redundant binary file.

try:
medium_url = self._handle_medium_url(medium_type, post)
medium_url_bak = medium_url
medium_url =re.sub(u'[^/]*media.tumblr.com', u'data.tumblr.com', medium_url)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing to this?

medium_url = self._handle_medium_url(medium_type, post)
medium_url_bak = medium_url
medium_url =re.sub(u'[^/]*media.tumblr.com', u'data.tumblr.com', medium_url)
if (b'_100.' in medium_url):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this exhaustive way. Hard coded is not a good choice.

Why not splitting the string and replacing with raw instead?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you should not change at here. Only photos/images are applicable with raw.

Method _download(**) is the right place.

medium_url = self._handle_medium_url(medium_type, post)
if medium_url is not None:
self._download(medium_type, medium_url, target_folder)
#print("medium url is %s", medium_url)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment line.

self._download(medium_type, medium_url, target_folder, resp_raw)
elif medium_type == "photo":
medium_url_bak = medium_url
medium_url_dot = medium_url.split('.')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The url parsing here seems complex and error-prone.

Below part is a better way. WDYT?

    def download(self, medium_type, post, target_folder):
        try:
            medium_url = self._handle_medium_url(medium_type, post)
            if medium_url is not None:
                if medium_type == "photo":
                    try:
                        # try to download raw image
                        medium_url_raw = medium_url.replace("68.media.tumblr.com", "data.tumblr.com")
                        raw_matched = self.hd_photo_regex.match(medium_url_raw)
                        if raw_matched is not None:
                            replace_raw = raw_matched.groups()[0]
                            replace_raw = replace_raw.replace(raw_matched.groups()[1], "raw")
                            medium_url_raw = medium_url_raw.replace(raw_matched.groups()[0], replace_raw)
                            self._download(medium_type, medium_url_raw, target_folder)
                            return
                    except:
                        pass
                self._download(medium_type, medium_url, target_folder)
        except TypeError:
            pass

    # can register differnet regex match rules
    def _register_regex_match_rules(self):
        # will iterate all the rules
        # the first matched result will be returned
        self.regex_rules = [video_hd_match(), video_default_match()]
        self.hd_photo_regex = re.compile(r".*(tumblr_\w+_(\d+))", re.IGNORECASE)

@liyiecho
Copy link
Author

medium_url_raw = medium_url.replace("68.media.tumblr.com", "data.tumblr.com")
It doesn't always 68.media.tumblr.com

@dixudx
Copy link
Owner

dixudx commented Mar 19, 2018

It doesn't always 68.media.tumblr.com

@liyiecho So just use regex to match and replace it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants