Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix pre-commit issues, add pre-commit CI job for PRs #39

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/workflows/pr.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
name: pr

concurrency:
group: ci-on-${{ github.event_name }}-from-${{ github.ref_name }}
cancel-in-progress: true

on:
pull_request:
branches:
- main

jobs:
checks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 1
- uses: pre-commit/[email protected]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't most of our repos just run pre-commit directly in their ci/check_style.sh script? I'm wondering if we can somehow integrate this into shared-workflows and reduce duplication.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, many of the repos end up having a check_style.sh that does this:

  • set up a conda environment (which almost always just has pre-commit in it)
  • run pre-commit run --all-files

For those cases, just using the pre-commit action like this would be faster and simpler, I think. pre-commit already manages creating a venv with the dependencies, and that action comes with some across-multiple-runs caching built in.

HOWEVER... some repos do other stuff in check_style.sh.

For example, a couple download cmake-format files like this:

https://github.com/rapidsai/cudf/blob/bbf4f7824c23c0c482f52bafdf1ece1213da2f65/ci/check_style.sh#L21-L24

The pattern of running a ci/check_style.sh is there to enable those per-repo types of customizations, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can somehow integrate this into shared-workflows and reduce duplication.

Also worth noting... for this particular PR, I wouldn't want to introduce a shared-workflows dependency here. That'd create a circular dependency between shared-workflows and this repo, and I'm not sure what the implications of that would be.

8 changes: 6 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,16 @@ repos:
- id: check-yaml
- id: end-of-file-fixer
- repo: https://github.com/rhysd/actionlint
rev: v1.7.4
rev: v1.7.6
hooks:
- id: actionlint-docker
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.8.3
rev: v0.9.1
hooks:
- id: ruff
args: ["--fix"]
- id: ruff-format
- repo: https://github.com/rapidsai/pre-commit-hooks
rev: v0.4.0
hooks:
- id: verify-copyright
5 changes: 3 additions & 2 deletions check_nightly_success/check-nightly-success/check.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
# Copyright (c) 2024-2025, NVIDIA CORPORATION.

"""Check whether a GHA workflow has run successfully in the last N days."""
# ruff: noqa: INP001

Expand Down Expand Up @@ -78,8 +80,7 @@ def main(
)
else:
print( # noqa: T201
f"{latest_branch} has no successful runs of {workflow_id} in the last "
f"{max_days_without_success} days"
f"{latest_branch} has no successful runs of {workflow_id} in the last {max_days_without_success} days"
)

# We are producing Unix return codes so success/failure is inverted from the
Expand Down
18 changes: 16 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,24 @@
# Copyright (c) 2024, NVIDIA CORPORATION.
# Copyright (c) 2024-2025, NVIDIA CORPORATION.

[tool.ruff]
line-length = 120
target-version = "py310"

[tool.ruff.lint]
select = ["ALL"]
select = [
# pycodestyle
"E",
# pyflakes
"F",
# isort
"I",
# numpy
"NPY",
# pyupgrade
"UP",
# flake8-bugbear
"B"
]
ignore = [
# Incompatible with D211
"D203",
Expand Down
48 changes: 31 additions & 17 deletions telemetry-impls/summarize/bump_time.py
Original file line number Diff line number Diff line change
@@ -1,37 +1,51 @@
#!/usr/bin/env python
# Copyright (c) 2024-2025, NVIDIA CORPORATION.

# This script is meant to act on an 'all_jobs.json' file that comes from
# the summarize job when debug info is enabled. Bumping the time makes
# it easier to re-run the span-sending python script and check results
# in either Jaeger or Grafana

import json
import datetime
import json

with open('all_jobs.json') as f:
with open("all_jobs.json") as f:
jobs = json.load(f)

parse_time = lambda x: int(datetime.datetime.strptime(x, "%Y-%m-%dT%H:%M:%SZ").timestamp() * 1e9)

start_time = parse_time(jobs[0]['created_at'])
needed_time = parse_time(jobs[-3]['completed_at']) - parse_time(jobs[0]['created_at'])
def _parse_time(x: str) -> int:
return int(datetime.datetime.strptime(x, "%Y-%m-%dT%H:%M:%SZ").timestamp() * 1e9)


start_time = _parse_time(jobs[0]["created_at"])
needed_time = _parse_time(jobs[-3]["completed_at"]) - _parse_time(jobs[0]["created_at"])
new_start_time = datetime.datetime.utcnow() - datetime.timedelta(minutes=60)

for idx, job in enumerate(jobs):
if job['created_at']:
job['created_at'] = (new_start_time + datetime.timedelta(seconds=(parse_time(job['created_at']) - start_time)/1e9)).strftime("%Y-%m-%dT%H:%M:%SZ")
if job['started_at']:
job['started_at'] = (new_start_time + datetime.timedelta(seconds=(parse_time(job['started_at']) - start_time)/1e9)).strftime("%Y-%m-%dT%H:%M:%SZ")
if job['completed_at']:
job['completed_at'] = (new_start_time + datetime.timedelta(seconds=(parse_time(job['completed_at']) - start_time)/1e9)).strftime("%Y-%m-%dT%H:%M:%SZ")
if job["created_at"]:
job["created_at"] = (
new_start_time + datetime.timedelta(seconds=(_parse_time(job["created_at"]) - start_time) / 1e9)
).strftime("%Y-%m-%dT%H:%M:%SZ")
if job["started_at"]:
job["started_at"] = (
new_start_time + datetime.timedelta(seconds=(_parse_time(job["started_at"]) - start_time) / 1e9)
).strftime("%Y-%m-%dT%H:%M:%SZ")
if job["completed_at"]:
job["completed_at"] = (
new_start_time + datetime.timedelta(seconds=(_parse_time(job["completed_at"]) - start_time) / 1e9)
).strftime("%Y-%m-%dT%H:%M:%SZ")
steps = []
for step in job['steps']:
if step['started_at']:
step['started_at'] = (new_start_time + datetime.timedelta(seconds=(parse_time(step['started_at']) - start_time)/1e9)).strftime("%Y-%m-%dT%H:%M:%SZ")
if step['completed_at']:
step['completed_at'] = (new_start_time + datetime.timedelta(seconds=(parse_time(step['completed_at']) - start_time)/1e9)).strftime("%Y-%m-%dT%H:%M:%SZ")
for step in job["steps"]:
if step["started_at"]:
step["started_at"] = (
new_start_time + datetime.timedelta(seconds=(_parse_time(step["started_at"]) - start_time) / 1e9)
).strftime("%Y-%m-%dT%H:%M:%SZ")
if step["completed_at"]:
step["completed_at"] = (
new_start_time + datetime.timedelta(seconds=(_parse_time(step["completed_at"]) - start_time) / 1e9)
).strftime("%Y-%m-%dT%H:%M:%SZ")
steps.append(step)
job['steps'] = steps
job["steps"] = steps

jobs[idx] = job

Expand Down
88 changes: 50 additions & 38 deletions telemetry-impls/summarize/send_trace.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2019-2024, NVIDIA CORPORATION.
# Copyright (c) 2019-2025, NVIDIA CORPORATION.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -13,55 +13,60 @@
# limitations under the License.
"""Processes a GitHub Actions workflow log record and outputs OpenTelemetry span data."""


from __future__ import annotations
from datetime import datetime, timezone

import hashlib
import json
import logging
import os
from datetime import datetime, timezone
from pathlib import Path
from typing import Optional, Dict

from opentelemetry import trace
from opentelemetry.context import attach, detach
from opentelemetry.propagate import extract
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.trace.status import StatusCode
from opentelemetry.sdk.trace.id_generator import IdGenerator
from opentelemetry.trace.status import StatusCode

match os.getenv("OTEL_EXPORTER_OTLP_PROTOCOL"):
case "http/protobuf":
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.http.trace_exporter import (
OTLPSpanExporter,
)
case "grpc":
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
OTLPSpanExporter,
)
case _:
from opentelemetry.sdk.trace.export import ConsoleSpanExporter as OTLPSpanExporter
from opentelemetry.sdk.trace.export import (
ConsoleSpanExporter as OTLPSpanExporter,
)


import logging
logging.basicConfig(level=logging.WARNING)

SpanProcessor = BatchSpanProcessor


def parse_attribute_file(filename: str) -> Dict[str, str]:
def parse_attribute_file(filename: str) -> dict[str, str]:
attributes = {}
with open(filename, "r") as attribute_file:
with open(filename) as attribute_file:
for line in attribute_file.readlines():
key, value = line.strip().split('=', 1)
key, value = line.strip().split("=", 1)
attributes[key] = value
return attributes


def date_str_to_epoch(date_str: str, value_if_not_set: Optional[int] = 0) -> int:
def date_str_to_epoch(date_str: str, value_if_not_set: int | None = 0) -> int:
if date_str:
# replace bit is to attach the UTC timezone to our datetime object, so
# that it doesn't "help" us by adjusting our string value, which is
# already in UTC
timestamp_ns = int(datetime.strptime(date_str, "%Y-%m-%dT%H:%M:%SZ").replace(tzinfo=timezone.utc).timestamp() * 1e9)
timestamp_ns = int(
datetime.strptime(date_str, "%Y-%m-%dT%H:%M:%SZ").replace(tzinfo=timezone.utc).timestamp() * 1e9
)
else:
timestamp_ns = value_if_not_set or 0
return timestamp_ns
Expand All @@ -75,25 +80,26 @@ def map_conclusion_to_status_code(conclusion: str) -> StatusCode:
else:
return StatusCode.UNSET


def load_env_vars():
env_vars = {}
with open('telemetry-artifacts/telemetry-env-vars') as f:
with open("telemetry-artifacts/telemetry-env-vars") as f:
for line in f.readlines():
k, v = line.split("=", 1)
env_vars[k] = v.strip().strip('"')
return env_vars


class LoadTraceParentGenerator(IdGenerator):
def __init__(self, traceparent) -> None:
# purpose of this is to keep the trace ID constant if the same data is sent different times,
# which mainly happens during testing. Having the trace ID be something that we control
# will also be useful for tying together logs and metrics with our traces.
ctx = extract(
carrier={'traceparent': traceparent},
carrier={"traceparent": traceparent},
)
self.context = list(ctx.values())[0].get_span_context()


def generate_span_id(self) -> int:
"""Get a new span ID.

Expand All @@ -116,6 +122,7 @@ def generate_trace_id(self) -> int:
"""
return self.context.trace_id


class RapidsSpanIdGenerator(IdGenerator):
def __init__(self, trace_id, job_name) -> None:
self.trace_id = trace_id
Expand Down Expand Up @@ -159,7 +166,7 @@ def __init__(self, traceparent) -> None:
# which mainly happens during testing. Having the trace ID be something that we control
# will also be useful for tying together logs and metrics with our traces.
ctx = extract(
carrier={'traceparent': traceparent},
carrier={"traceparent": traceparent},
)
self.context = list(ctx.values())[0].get_span_context()

Expand All @@ -169,7 +176,6 @@ def update_span_job_name(self, new_name):
def update_span_step_name(self, new_name):
self.step_name = new_name


def generate_span_id(self) -> int:
"""Get a new span ID.

Expand Down Expand Up @@ -203,20 +209,23 @@ def main(args):
# track the latest timestamp observed and use it for any unavailable times.
last_timestamp = date_str_to_epoch(jobs[0]["completed_at"])

attribute_files = list(Path.cwd().glob(f"telemetry-artifacts/attrs-*"))
attribute_files = list(Path.cwd().glob("telemetry-artifacts/attrs-*"))
if attribute_files:
attribute_file = attribute_files[0]
attributes = parse_attribute_file(attribute_file.as_posix())
else:
attributes = {}
global_attrs = {}
for k, v in attributes.items():
if k.startswith('git.'):
if k.startswith("git."):
global_attrs[k] = v

global_attrs['service.name'] = env_vars['OTEL_SERVICE_NAME']
global_attrs["service.name"] = env_vars["OTEL_SERVICE_NAME"]

provider = TracerProvider(resource=Resource(global_attrs), id_generator=LoadTraceParentGenerator(env_vars["TRACEPARENT"]))
provider = TracerProvider(
resource=Resource(global_attrs),
id_generator=LoadTraceParentGenerator(env_vars["TRACEPARENT"]),
)
provider.add_span_processor(span_processor=SpanProcessor(OTLPSpanExporter()))
tracer = trace.get_tracer("GitHub Actions parser", "0.0.1", tracer_provider=provider)

Expand All @@ -233,7 +242,7 @@ def main(args):
job_last_timestamp = date_str_to_epoch(job["completed_at"], job_start)

if job_start == 0:
logging.info(f"Job is empty (no start time) - bypassing")
logging.info("Job is empty (no start time) - bypassing")
continue

attribute_file = Path.cwd() / f"telemetry-artifacts/attrs-{job_id}"
Expand All @@ -252,35 +261,38 @@ def main(args):
job_provider.add_span_processor(span_processor=SpanProcessor(OTLPSpanExporter()))
job_tracer = trace.get_tracer("GitHub Actions parser", "0.0.1", tracer_provider=job_provider)

with job_tracer.start_as_current_span(job['name'], start_time=job_create, end_on_exit=False) as job_span:
with job_tracer.start_as_current_span(job["name"], start_time=job_create, end_on_exit=False) as job_span:
job_span.set_status(map_conclusion_to_status_code(job["conclusion"]))

job_id_generator.update_step_name('start delay time')
job_id_generator.update_step_name("start delay time")
with job_tracer.start_as_current_span(
name="start delay time",
start_time=job_create,
end_on_exit=False,
) as delay_span:
name="start delay time",
start_time=job_create,
end_on_exit=False,
) as delay_span:
delay_span.end(job_start)

for step in job["steps"]:
start = date_str_to_epoch(step["started_at"], job_last_timestamp)
end = date_str_to_epoch(step["completed_at"], start)
job_id_generator.update_step_name(step['name'])
job_id_generator.update_step_name(step["name"])

if (end - start) / 1e9 > 1:
logging.info(f"processing step: '{step['name']}'")
with job_tracer.start_as_current_span(
name=step['name'],
start_time=start,
end_on_exit=False,
) as step_span:
name=step["name"],
start_time=start,
end_on_exit=False,
) as step_span:
step_span.set_status(map_conclusion_to_status_code(step["conclusion"]))
step_span.end(end)

job_last_timestamp = max(end, job_last_timestamp)

job_end = max(date_str_to_epoch(job["completed_at"], job_last_timestamp), job_last_timestamp)
job_end = max(
date_str_to_epoch(job["completed_at"], job_last_timestamp),
job_last_timestamp,
)
last_timestamp = max(job_end, last_timestamp)
job_span.end(job_end)
root_span.end(last_timestamp)
Expand Down
Loading