Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Activation Hessian computation runtime optimization #1092

Merged
merged 15 commits into from
Jun 5, 2024

Conversation

ofirgo
Copy link
Collaborator

@ofirgo ofirgo commented Jun 2, 2024

Pull Request Description:

Improve Activation Hessian computation runtime for GPTQ and Mixed precision with the following optimizations:

  • Enable batch computation.
  • Enable computation on a set of nodes (instead of a single node only).
  • Other minor loop and implementation modifications.

Major design remarks:

  • TraceHessianRequest receives a list of target nodes (target_nodes) instead of a single BaseNode.
  • HessianInfoService produces a batch of samples for each computation iteration. We also fixed a bug where the HessianInfoService representative generator would have been initiated for each iteration instead of running on the generator from end-to-end.
    • The output of the "fetch" call of HessianInfoService is of the structure: List (per target nodes) of List (per image) of Hessian approximations (tensor).
    • The service's cache mechanism still saves results per request for a single node. We split and construct requests per-node to save and retrieve results for a certain node.
    • Batch computation requires keeping track of remaining samples from a given representative dataset batch, in case the requested Hessian computation batch is smaller (we don't want to "throw" samples away).
    • In addition, we assume that the Hessians computation batch_size is <= to the representative dataset batch size.
  • Weights Hessian computation is still limited to a single image and per-node computation.
  • Hessian results tensor includes a batch dimension.
  • Default values for the number of samples and number of iterations for Hessians computation for GPTQ and Mixed precision have been modified.

Checklist before requesting a review:

  • I set the appropriate labels on the pull request.
  • I have added/updated the release note draft (if necessary).
  • I have updated the documentation to reflect my changes (if necessary).
  • All function and files are well documented.
  • All function and classes have type hints.
  • There is a licenses in all file.
  • The function and variable names are informative.
  • I have checked for code duplications.
  • I have added new unittest (if necessary).

@ofirgo ofirgo requested a review from reuvenperetz June 4, 2024 05:41
@@ -44,6 +45,7 @@ def __init__(self,
norm_scores (bool): Whether to normalize the returned scores for the weighted distance metric (to get values between 0 and 1).
refine_mp_solution (bool): Whether to try to improve the final mixed-precision configuration using a greedy algorithm that searches layers to increase their bit-width, or not.
metric_normalization_threshold (float): A threshold for checking the mixed precision distance metric values, In case of values larger than this threshold, the metric will be scaled to prevent numerical issues.
hessian_batch_size (int): The Hessian computation batch size. used only if using mixed precision with Hessian-based objective.
Copy link
Collaborator

@reuvenperetz reuvenperetz Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used, but not crucial.

@ofirgo ofirgo merged commit d13319f into sony:main Jun 5, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants