Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standard output #36

Open
ghost opened this issue Sep 8, 2017 · 9 comments
Open

Standard output #36

ghost opened this issue Sep 8, 2017 · 9 comments

Comments

@ghost
Copy link

ghost commented Sep 8, 2017

If --file is not defined, shouldn't the sitemap.xml be written to stdout instead of a file?

@vezaynk
Copy link
Owner

vezaynk commented Sep 8, 2017

It is usually defined. If it is an empty string, sending to stdout would be logical.

logger() would have to be disabled completely in that case to not mess it up.

@ghost
Copy link
Author

ghost commented Sep 8, 2017

I am not really sure about the usefulness of the logger. It's a nice idea especially if the script is run command line, but when it's part of a web service or run from cron it becomes obsolete. Also the configuration with array is somewhat complicated. What kind of use cases you have had in mind for your script?

@vezaynk
Copy link
Owner

vezaynk commented Sep 8, 2017

Logger is mostly there for debugging purposes when just setting up the first time, it becomes useless afterwards.

I'm thinking of 3 things:

  1. a flag to enable stdout output
  2. debug flags should be disabled by the user
  3. no file will be written if $file is not set

This allows maximum options to users without forcing anything on them. I rather not take away any choices even if it would be the "right way" to do things.

These behaviors should be documented and perhaps even put into the README since it's probably a common use-case to pipe or redirect the output.

@ghost
Copy link
Author

ghost commented Sep 8, 2017

I would just do it simply with --debug and --verbose options so that --verbose would print what logger is printing now and --debug all possible info. And I would put both of these off by default and maybe just show some kind of animation of the progress, number of pages scanned or something similar to inform the user that the script is running and it has not crashed.

@vezaynk
Copy link
Owner

vezaynk commented Sep 8, 2017

Interesting suggestion.

Debug disabled by default and some animation/counter enabled by default that will be ignored when piped?

I'm not sure how to do that, I don't have much experience writing php cli scripts.

@ghost
Copy link
Author

ghost commented Sep 8, 2017

Debug disabled by default and some animation/counter enabled by default that will be ignored when piped?

Yes, exactly, or could even be stats about the crawl.

It could look something like this: https://www.xml-sitemaps.com/

I'm not sure how to do that, I don't have much experience writing php cli scripts.

The solution should be working both with CLI and html output, it should be generic. I have even less CLI experience but I am happy to figure that out if nobody else can help. Today was the first time I touched such code. Maybe someone here knows?

@vezaynk
Copy link
Owner

vezaynk commented Sep 8, 2017

I can do it myself but only in a weeks time because college.

If you would like to try yourself at it, go for it!

@ghost
Copy link
Author

ghost commented Sep 8, 2017

I think the image stuff is most critical right now. It's anyway a very small and almost cosmetic change that should not take a lot of time to make especially if someone who already knows how to do that will help.

@vezaynk
Copy link
Owner

vezaynk commented Sep 8, 2017

Most critical stuff are bugs such as #34, everything else is meh. Images will be a massive time sink.

As I mentioned previously, this project is centered around being lightweight and not having dependencies (except from cURL, although there can be an attempt to replace it).

This means that we can't make our lives easy and use things like DOMDocument to parse html and must heed the call of Cthulhu. We have a working regex to extract attribute values, shouldn't be an issue to port it over to work with images.

I'm going on a tangent here but image indexing will definitely be not easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant