Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the possibility to exclude a page from being included #4

Open
andreamoro opened this issue Feb 8, 2015 · 18 comments
Open

Add the possibility to exclude a page from being included #4

andreamoro opened this issue Feb 8, 2015 · 18 comments

Comments

@andreamoro
Copy link

Not all pages should be included in the sitemap. E.g. the thank-you page
For this reason it would be great having a tag in the YAML that would be recognised as way to avoid such inclusion.

@gitviola
Copy link

gitviola commented Feb 9, 2015

Yes, it would be great to have the option to exclude pages!

@jeremysmithco
Copy link
Contributor

Here's what that could look like: #5

@andreamoro
Copy link
Author

It's should probably be something like
Sitemap-ignore: true

Just to about any confusion with additional plugin, but also to make it 100% clear to the person using MM.

@gitviola
Copy link

I agree with @andreamoro!

Also it would be great to exclude entire directories in the config.rb. Currently I am using my own helper for that:

def in_sitemap?(page)
  page.path =~ /\.html/ && !page.data.noindex == true && !(/api/.match(page.path))
end

@andreamoro
Copy link
Author

@schurig have you already done some implementation to work on top of the plugin?

It would be great if you can share the whole bunch of code as I do need it for a project of mine, but I am struggling in time at present.

@gitviola
Copy link

Unfortunately not. I'm not using any plugin at the moment. The reason is this issue here - I really need to exclude pages and directories. But I can share the entire code that I'm using at the moment:

# sitemap.xml.builder

xml.instruct!
xml.urlset 'xmlns' => 'http://www.sitemaps.org/schemas/sitemap/0.9' do
  sitemap.resources.select { |page| in_sitemap?(page) }.each do |page|
    xml.url do
      xml.loc site_url + page.url
      xml.loc page.path
      xml.lastmod Date.today.to_time.iso8601
      xml.changefreq page.data.changefreq || 'monthly'
      xml.priority page.data.priority || '0.9'
    end
  end
end
# config.rb

require 'builder'

helpers do
  def in_sitemap?(page)
    page.path =~ /\.html/ && !page.data.noindex == true && !(/api/.match(page.path))
  end
end
# Gemfile

gem 'builder'

Hope that helps! :)

@andreamoro
Copy link
Author

@schurig thanks for the code.
I believe your solution does what it says out of the box and really doesn't require the use of the plugin. Unless I'm not missing something?

@gitviola
Copy link

@andreamoro almost! It unfortunately doesn't generate a sitemap.xml.gz file.

@jeremysmithco
Copy link
Contributor

@andreamoro I was concerned about the frontmatter options colliding as well. Actually, I think it would be best to just namespace them all, like this:

---
sitemap:
  changefreq: weekly
  priority: 0.3
  ignore: true
---

That way, all options are accessible from the sitemap. namespace.

Since this would be a breaking change, it would probably be best to release it with a new major version, so people who are updating minor/patch versions don't get hosed when all their frontmatter options suddenly stop working.

@andreamoro
Copy link
Author

@bentoncreation make absolutely sense, and it allows options for expanding the project. E.g. assuming you want to include an image in the sitemap, by adding something like the following bits it can be easily parsed and appended in the page.

sitemap:
   images:
      img:
        loc: http://www..... 
        caption: bla bla
        title: this is the title of image 1
      img:
        loc: http://www..... 
        caption: bla bla
        title: this is the title of image 2

@jeremysmithco
Copy link
Contributor

@andreamoro Yeah, totally!

@andreamoro
Copy link
Author

So we have to wait for @stantonjr to code this bit :)

@jeremysmithco
Copy link
Contributor

@schurig I was thinking about how you might ignore whole directories and I'm wondering if this makes sense. In your config, have an ignored_paths option, like so:

activate :sitemap do |sitemap|
  sitemap.hostname = "http://www.mysite.com"
  sitemap.ignored_paths = %W(
    /private
    /stuff
  )
end

And then, when getting pages (in my proposed private get_pages method), filter out those that match anything found in ignored_paths.

@gitviola
Copy link

@bentoncreation sounds good! But what about single pages? I think there are situations where you want to exclude sites without writing

sitemap:
  ignore: true

into them.

activate :sitemap do |sitemap|
  sitemap.hostname = "http://www.mysite.com"
  sitemap.ignore = %r{^/api/contact_form.php*}
end

@jeremysmithco
Copy link
Contributor

@schurig What kind of situations are you thinking of? I think your .php file example would already be excluded because the sitemap builder is only looking at .html files.

@gitviola
Copy link

@bentoncreation oh, you're right! However, I think it would still be good to let the user decide whether he wants to go into his config.rb or in each of the files to see and manage his ignores. But for now we will be good with writing it in the file I think.

@jeremysmithco
Copy link
Contributor

@schurig Yeah, I could see that. I wouldn't normally think it was a good the idea to have multiple ways to set the same option, but maybe it's not a big deal in this case. If I get my other pull request accepted I may look at adding this concept as well.

@andreamoro
Copy link
Author

I believe there should not be a method to remove page that is clashing with another. But that's my idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants