Model.compile ignores framework_version when compiling for ml_inf1

**Describe the bug**
When attempting to compile for ml_inf1 via the SDK a model which was trained / fine-tuned with PyTorch 1.9.1, the `framework_version` argument is ignored, resulting in a version mismatch between the one at training and the one chosen automatically by SageMaker. 

**To reproduce**
```
import json
from sagemaker.pytorch.model import PyTorchModel
from sagemaker.predictor import Predictor

sm_model = PyTorchModel(
    model_data=traced_model_url,
    predictor_cls=Predictor,
    framework_version="1.9",
    role="<role_arn>",
    entry_point="inference.py",
    source_dir="code",
    py_version="py3",
    name="<name>"
) 

compiled_inf_model = sm_model.compile(
    target_instance_family="ml_inf1",
    input_shape=<input_shape>,
    job_name="<job_name>",
    role="<role_arn>",
    framework="pytorch",
    framework_version="1.9",
    output_path="<output_path>"
    compiler_options=json.dumps("--dtype int64"),
    compile_max_run=1000, 
)
```

**Expected behavior**
The Compilation Job should also contain the "Framework version" when opening it in the AWS Console. However, only the `PYTORCH` Framework value is present, and the compilation fails after 5 minutes with the error message
`ClientError: InputConfiguration: Unable to load PyTorch model:', '\nUnknown type name \'NoneType\':\nSerialized File "code/__torch__/torch/nn/modules/activation/___torch_mangle_8258.py", line 7\n _is_full_backward_hook : Optional[bool]\n def forward(self: __torch__.torch.nn.modules.activation.___torch_mangle_8258.Tanh,\n argument_1: Tensor) -> NoneType:\n ~~~~~~~~ <--- HERE\n return None\n') For further troubleshooting common failures please visit: https://docs.aws.amazon.com/sagemaker/latest/dg/neo-troubleshooting-compilation.html`
<img width="209" alt="image" src="https://user-images.githubusercontent.com/13518349/177185684-e389d602-7a84-4222-9ca6-f7e9ffdd0ce4.png">

If, however, I clone the failed job in the AWS Console and just add the 1.9 "Framework version" manually, the job runs to completion. 

**Screenshots or logs**
```
localhost compiler-container-Primary[5078]: Traceback (most recent call last):
--
localhost compiler-container-Primary[5078]:   File "/opt/amazon/lib/python3.6/site-packages/neo_inferentia_compiler/pytorch_framework.py", line 107, in compile_model
localhost compiler-container-Primary[5078]:     model = torch.jit.load(self.model_file)
localhost compiler-container-Primary[5078]:   File "/opt/amazon/lib/python3.6/site-packages/torch_neuron/jit_load_wrapper.py", line 13, in wrapper
localhost compiler-container-Primary[5078]:     script_module = jit_load(*args, **kwargs)
localhost compiler-container-Primary[5078]:   File "/opt/amazon/lib/python3.6/site-packages/torch/jit/_serialization.py", line 161, in load
localhost compiler-container-Primary[5078]:     cpp_module = torch._C.import_ir_module(cu, f, map_location, _extra_files)
localhost compiler-container-Primary[5078]: RuntimeError:
localhost compiler-container-Primary[5078]: Unknown type name 'NoneType':
localhost compiler-container-Primary[5078]: Serialized   File "code/__torch__/torch/nn/modules/activation/___torch_mangle_8258.py", line 7
localhost compiler-container-Primary[5078]:   _is_full_backward_hook : Optional[bool]
localhost compiler-container-Primary[5078]:   def forward(self: __torch__.torch.nn.modules.activation.___torch_mangle_8258.Tanh,
localhost compiler-container-Primary[5078]:     argument_1: Tensor) -> NoneType:
localhost compiler-container-Primary[5078]:                            ~~~~~~~~ <--- HERE
localhost compiler-container-Primary[5078]:     return None
localhost compiler-container-Primary[5078]: During handling of the above exception, another exception occurred:
localhost compiler-container-Primary[5078]: Traceback (most recent call last):
localhost compiler-container-Primary[5078]:   File "/opt/amazon/bin/neo_main.py", line 101, in <module>
localhost compiler-container-Primary[5078]:     compile()
localhost compiler-container-Primary[5078]:   File "/opt/amazon/bin/neo_main.py", line 74, in compile
localhost compiler-container-Primary[5078]:     compiler_options
localhost compiler-container-Primary[5078]:   File "/opt/amazon/bin/neo_main.py", line 32, in compile_model
localhost compiler-container-Primary[5078]:     return framework_instance.compile_model()
localhost compiler-container-Primary[5078]:   File "/opt/amazon/lib/python3.6/site-packages/neo_inferentia_compiler/pytorch_framework.py", line 109, in compile_model
localhost compiler-container-Primary[5078]:     raise RuntimeError("InputConfiguration: Unable to load PyTorch model:", str(e))
localhost compiler-container-Primary[5078]: RuntimeError: ('InputConfiguration: Unable to load PyTorch model:', ' Unknown type name \'NoneType\': Serialized   File "code/__torch__/torch/nn/modules/activation/___torch_mangle_8258.py", line 7   _is_full_backward_hook : Optional[bool]   def forward(self: __torch__.torch.nn.modules.activation.___torch_mangle_8258.Tanh,     argument_1: Tensor) -> NoneType:                            ~~~~~~~~ <--- HERE     return None ')
```


**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**: 2.97.0
- **Framework name (eg. PyTorch) or algorithm (eg. KMeans)**: PyTorch
- **Framework version**: 1.9.1
- **Python version**: 3.8
- **CPU or GPU**: CPU/Inf
- **Custom Docker image (Y/N)**: -

**Additional context**
The problem may lie in the negative lookahead regex group `(?!ml_inf)` at this line: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L735.
Is this condition still applicable? 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model.compile ignores framework_version when compiling for ml_inf1 #3209

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Model.compile ignores framework_version when compiling for ml_inf1 #3209

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions