Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developer access to web logs #85

Open
coke opened this issue Feb 8, 2023 · 5 comments
Open

Developer access to web logs #85

coke opened this issue Feb 8, 2023 · 5 comments
Assignees

Comments

@coke
Copy link
Contributor

coke commented Feb 8, 2023

We could definitely use access to whatever is serving the content's web logs so we can (at least) track any 404 requests, which probably indicate a rename or gap not addressed by the .htaccess mappings (or equivalent)

See also #104 #164 #181

@coke coke added this to the Cleanup milestone Feb 8, 2023
@dontlaugh dontlaugh self-assigned this Feb 11, 2023
@dontlaugh
Copy link
Collaborator

The current deployment artifact is a container with nginx. So let's assume that is the parsing target.

The deployment environment is Portainer. It allows shared volumes between containers. So, one way to achieve this with minimal changes to our current deployment artifact:

  • add a shared volume and mount it to our deployment artifact's /var/log/nginx (or equivalent, need to check nginx config)
  • deploy a logging sidecar that supports log rotation, and mount the same volume to it
  • priority 1 is ensuring our log rotation is solid so we don't fill up our disk
    • the containerized Volume should protect the host, but the volume can still fill up on its own
  • priority 2 is writing a Vector config that aggregates data into a useful form. Easy version is probably HTTP response codes (200, 404, etc) counted and grouped by URL. We use this to find missing pages like you say.
  • priority 3 (reach goal) - set up HTTP redirects in Nginx based on common page misses

@dontlaugh
Copy link
Collaborator

Look into .htaccess support for existing mappings

@coke coke modified the milestones: Cleanup, 2023-Quarter 2 Feb 24, 2023
@dontlaugh
Copy link
Collaborator

dontlaugh commented Mar 2, 2023

I've fetched logs from the past 24 hours of production

journalctl -u raku-doc-website --since '24 hours ago' --no-pager > logs.txt

I will parse out the 404s. I'd paste them here, but I don't want to reveal any potential PII.

UPDATE: the logs are very truncated. I think this server's default journalctl configuration might be to aggressively limit the size of logs. Or it might be a podman thing. I'll keep looking.

@dontlaugh
Copy link
Collaborator

#184 gave us additional access logging, so after a day I have pulled down some aggregate info

counts.txt

Some of it is the usual randomness from the public internet, but there are legitimate clues to some missing stuff, too.

@dontlaugh
Copy link
Collaborator

@finanalyst See the file I've linked in my previous comment for some counts of 404s per uri from production.

Our Caddy access logs give us json of the following form:

{"level":"info","ts":1677922396.2839906,"logger":"http.log.access.log0","msg":"handled request","request":{"remote_ip":"REDACTED","remote_port":"8850","proto":"HTTP/1.1","method":"GET","host":"REDACTED","uri":"/","headers":{"Content-Length":["0"],"Connection":["close"],"User-Agent":["HCLB-HealthCheck"]}},"user_id":"","duration":0.000351099,"size":18097,"status":200,"resp_headers":{"Server":["Caddy"],"Etag":["\"rqxhwadyp\""],"Content-Type":["text/html; charset=utf-8"],"Last-Modified":["Fri, 03 Mar 2023 05:00:10 GMT"],"Accept-Ranges":["bytes"],"Content-Length":["18097"]}}

We can ask journalctl for just that JSON (omitting other journal metadata with --output cat):

journalctl --output cat -u raku-doc-website > logs.txt

Then with jq and awk we can do the counting

cat logs.txt |  jq -r '"\(.status)\t\(.request.uri)"' | \
  awk '
    /^404/ {hist[$2]++} 
    END { 
        for (item in hist) {
            printf "%s\t-> %s\n", hist[item], item}
        }
    ' > counts.txt

@coke coke modified the milestones: 2023-Quarter 2, 2023-Quarter 3 Jul 12, 2023
@coke coke modified the milestones: 2023-Quarter 3, 2023-Quarter 4 Oct 8, 2023
@coke coke modified the milestones: 2023-Quarter 4, 2024-Quarter-2 Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants