Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[!] Reformatted site from https://example.com/folder/#with_index.php# to https://example.com/ #96

Open
martianmaikel opened this issue Aug 6, 2021 · 8 comments

Comments

@martianmaikel
Copy link

martianmaikel commented Aug 6, 2021

An important path for my website is reformatted and not written into the sitemap.xml.

This is the first action the script takes.

warning: [!] Reformatted site from https://example.com/folder/#with_index.php# to https://example.com/

After that the script writes every path and page in the sitemap correctly.

Edit: Path = https://example.com/folder/index.php

no hastags

@vezaynk
Copy link
Owner

vezaynk commented Aug 8, 2021

This is definitely a bad behavior. I suspect the double # are at fault somehow.

@martianmaikel
Copy link
Author

Oh sorry, no there is no hashtag in the filename "index.php".

It is simply ./folder/index.php.

@vezaynk
Copy link
Owner

vezaynk commented Aug 8, 2021

Ah okay I understand. This behaviour is normal. Is there a reason you need to start at that location and not the root of the domain?

@vezaynk
Copy link
Owner

vezaynk commented Aug 8, 2021

Sitemaps should generally start at the domain root. Most people use sitemaps for SEO purposes, and if your domain root doesn't link to any of your webpages, your search ranking will already be underwater.

@martianmaikel
Copy link
Author

martianmaikel commented Aug 9, 2021

I have a index.php in the domain root. Thats why I don't understand, that the script transforms a (random?) folder into the root.

I linked a picture of the domain structure. Every folder that it should take like es, fr, helpcenter etc. it writes into the sitemap - but not "ratgeber". This is the important path it needs to write into the sitemap because it is a blog-like folder.

Screenshot 2021-08-09 085644

@vezaynk
Copy link
Owner

vezaynk commented Aug 9, 2021

Could you post a link to the site? I'm not sure what's going on, and I don't think I'll be able to figure it out without running it myself.

Repository owner deleted a comment from martianmaikel Aug 10, 2021
@vezaynk
Copy link
Owner

vezaynk commented Aug 10, 2021

@martianmaikel I deleted your comment with the links. I'll look into the issue soon.

@JoyKevinMaldini
Copy link

JoyKevinMaldini commented Aug 5, 2022

I guess this is an old script etc. but I just found it and wanted to share my two cents.

I tried crawling a domain like https://domain.com. This would result in an error being thrown "URL is a redirect." and the script to not crawl the whole page because we have a 301 redirect to language tags e.g. /de, /en etc installed.
Then of course i tried using https://domain.com/de as the site i would like to crawl as this isn't a redirected route. It would once again reformat it to the domain root again (https://domain.com) throwing me the error that it's a redirect. ("URL is a redirect.")

My solution to this was simply change the code in sitemap.php from

$real_site = domain_root($site);

to

$real_site = $site;

which then seemed to work but only crawled the pages of one language tag. So at least it works, even if I then have to crawl the page for every language tag one after one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants