Skip to content

Improve Acceleration Framework IntegrationΒ #205

@fabianlim

Description

@fabianlim

Is your feature request related to a problem? Please describe.

@Ssukriti has some suggestions to improve the integration that was completed in #157

Remaining work in subsequent PRs after this PR is merged:

we need to ensure that in CI/CD all the tests run regularly and they are not skipped. That means all dependencuies should be installed for our tests to run regularly . Purpose is to ensure with every release, all tests pass.
Unit tests - Additional unit tests added are good, thank you. I did want to ensure model after tuning after GPTQLora is of correct format , and can be loaded and inferred correctly. We have had issues in past, when something would change and model format produced is no longer correct - we should have tests to capture that to have full confidence (will DM about this)

Describe the solution you'd like

To enable the unit tests, we need to enable cuda in the GH workflows. This is because quantized kernels can only run on GPU.

Also we need to maybe make changes to the inference script to incorporate the AccelerationFramework there also

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the feature request here.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions