-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MacOS C Reference Implementation #2
base: main
Are you sure you want to change the base?
Conversation
include/cpid/cpid_macos.h
Outdated
// Microsoft SAL annotations are not available on macOS | ||
// Define them to enable use for documentation purposes | ||
#ifndef _In_ | ||
#define _In_ | ||
#endif | ||
#ifndef _Out_ | ||
#define _Out_ | ||
#endif | ||
#ifndef _Inout_ | ||
#define _Inout_ | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You look to be using doxygen documentation comments for the API in this file. Would use of https://www.doxygen.nl/manual/commands.html#cmdparam be a better approach then these Windows-specific defines? They aren't the most OS agnostic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmm ... coming back to this, its a little bit silly to have these for non-windows since each header file is platform specific
include/cpid/cpid_macos.h
Outdated
* | ||
* @return 0 on success, -1 on error. | ||
*/ | ||
int cpid_make_uuid(_In_ cpid_handle_t library_handle, _In_ uint32_t pid, _In_ uint64_t creation_time_unix_epoch_seconds, _In_ uint32_t creation_time_micros_offset, _Out_ uuid_t uuid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would pid_t
be a more appropriate type for pid? On my machine (and most I believe) pid_t is signed, which makes me think an unsigned type may not be appropriate.
https://www.gnu.org/software/libc/manual/html_node/Process-Identification.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I take your point about pid being signed. It might be a good idea to change the CPID spec so the pid part of digest inputs is an int64_t
instead of uint64_t
. This would technically be a breaking change, but no one is using the max range of uint64_t
so I think it would be a good change in practice.
Thoughts?
src/constants.h
Outdated
#pragma once | ||
|
||
#define SHA256_BUFFER_SIZE 32 | ||
#define MAX_PID 0x7FFFFFFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value is configurable and system dependent, does it make sense to define it ourselves?
On a standard Linux (potentially unix) system the max default is 32768.
Why are we defining this number in hex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With making the PID field pid_t
/int64_t
, I've removed this check and the OS call will just fail.
src/constants.h
Outdated
#define RETURN_SUCCESS 0 | ||
#define RETURN_ERROR -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally this seems a tad overkill, these are very standard return values with standard meanings.
We've also specified the meaning of the return values explicitly in the API comments
@return 0 on success, -1 on error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My biggest concern about this implementation is that every function returns only success or failure. If it's a failure, the caller has no way to determine why it has failed. Given that this is a "UNIXy" implementation, I'd have thought that either:
- Each function would directly return 0 on success or else an error code from errno.h.
- Each function would return success/failure and set
errno
to an error code from errno.h.
By way of comparison, each function in the Windows implementation returns a standard winerror.h code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be just return true/false from stdbool
and simplify the API if we simply don't care about HOW it failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got rid of constants. returning 0 and -1 directly
src/linux/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove this empty file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty windows and linuc cmake linux files are just there to establish directory structure
src/macos/CMakeLists.txt
Outdated
find_package(OpenSSL REQUIRED) | ||
|
||
find_library(IOKIT_LIBRARY IOKit) | ||
if(NOT IOKIT_LIBRARY) | ||
message(FATAL_ERROR "IO Kit not found") | ||
endif() | ||
|
||
find_library(COREFOUNDATION_LIBRARY CoreFoundation) | ||
if(NOT COREFOUNDATION_LIBRARY) | ||
message(FATAL_ERROR "Core Foundation not found") | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the README explain how to acquire these dependencies if they are not available by default? Should we be specifying minimum versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe these are part of the base macos installation
src/macos/cpid_macos.c
Outdated
#include <openssl/evp.h> | ||
|
||
#include "cpid/cpid_macos.h" | ||
#include "../constants.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it is preferable to add directory one level above to the include search paths or explicitly include the path to files based on a single top level include. Relative search paths like these are prone to breaking and make it harder to integrate a library in my experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got rid of constants.h so this no longer applies
src/macos/cpid_macos.c
Outdated
char serial_number[MACOS_SERIAL_NUMBER_BUFFER_SIZE]; // 16 bytes | ||
uuid_t hardware_uuid; // 16 bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing these comments, I don't think they add anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think they have a usefulness. The MACOS_EXPECTED_DIGEST_INPUT_CONTENT_SIZE
macro is mysteriously set to 88 and having field sizes in trailing comments like this helps explain where that value comes from. I'd actually be inclined to add a similar trailing comment to the other fields in this struct, e.g.
typedef struct {
char serial_number[MACOS_SERIAL_NUMBER_BUFFER_SIZE]; // 16 bytes
uuid_t hardware_uuid; // + 16 bytes
process_creation_time_t kernel_task_creation_time; // + 16 bytes
process_creation_time_t launchd_creation_time; // + 16 bytes
process_creation_time_t process_creation_time; // + 16 bytes
uint64_t pid; // + 8 bytes
} digest_input_content_t; // = 88 bytes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could go either way for sure. If we keep them then I agree that we should be consistent and add them for all. But equally I would question what value the reader gets from knowing this information and whether we are adding comment management overhead (although admittedly minimal) to changes to this struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments like dave has. The byte arithmetic for the structure should not change as this is the thing that gets hashed.
src/macos/cpid_macos.c
Outdated
|
||
int cpid_get_uuid_string(_In_ cpid_handle_t library_handle, _In_ uint32_t pid, _Out_ uuid_string_t uuid_string) { | ||
|
||
if (NULL == library_handle || NULL == uuid_string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the following styling.
if (NULL == library_handle || NULL == uuid_string) { | |
if (!library_handle || !uuid_string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. A NULL == foo
or foo == NULL
test always makes me feel like I'm reading Java or C# 😄
Note that this practice is throughout this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do
src/macos/cpid_macos.c
Outdated
return RETURN_ERROR; | ||
} | ||
|
||
uuid_t uuid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most standard c-styling would suggest declaring this at the top of the scope. What styling are you following in this project? It should probably be specified in the README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno about that tbh...we're not writing K&R C here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really a proponent for it, rather I'm just picking up on the fact now as it's something that often comes up once there are more eyes on the code. Simply linking to some public styling guide or tool in the README would put this kind of comment to rest from the start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What styling are you following in this project?
declare a variable as close as possible to its use
src/macos/cpid_macos.c
Outdated
// set the uuid version (UUIDv8) | ||
library_handle->digest_destination_buffer[6] = (library_handle->digest_destination_buffer[6] & 0x0F) | 0x80; | ||
|
||
// set the uuid variant | ||
library_handle->digest_destination_buffer[8] = (library_handle->digest_destination_buffer[8] & 0x3F) | 0x80; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider defining these magic numbers as constants and remove the comments. That why the code is (more) self-documenting.
src/macos/cpid_macos.c
Outdated
if (NULL == library_handle || pid > MAX_PID || creation_time_micros_offset > MAX_MICROS_OFFSET || NULL == uuid) { | ||
return RETURN_ERROR; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider breaking these checks up.
First validate the basic null check on pointer params, then perform further validation on the param data itself.
src/macos/cpid_macos.c
Outdated
cpid_finalize(library_handle); | ||
library_handle = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more flexible to move the NULL set into the cpid_finalize
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then you'd have to pass the address of that pointer, and that would potentially complicate calling this function from elsewhere. Consider, for example, that in other context where it is called the variable might be const.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a big deal to me either way
src/macos/cpid_macos.c
Outdated
int return_code = sysctl(mib, 4, &process_info, &info_size, NULL, 0); | ||
if (0 != return_code) { | ||
return RETURN_ERROR; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return_code doesn't seem to be used elsewhere and isn't used as a "return_code" in the context of this funciton. I'd suggest removing it.
int return_code = sysctl(mib, 4, &process_info, &info_size, NULL, 0); | |
if (0 != return_code) { | |
return RETURN_ERROR; | |
} | |
if (0 != sysctl(mib, 4, &process_info, &info_size, NULL, 0)) { | |
return RETURN_ERROR; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is similar code in other functions in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason people traditionally do things like that is that it makes debugging easier because the value returned by the function call isn't lost. In a release build, the temporary variable will be optimised away anyway.
An obvious exception is where the return type of the called function is boolean. In that case, the value is clear from the direction that the if-statement takes 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I get it will be optimized away. If you're at the point of step-through debugging it is easy enough to break out the information you want if needed. To me defining a variable for it it to be basically unused seems verbose.
Certainly not a deal breaker either way.
test/windows/CMakeLists.txt
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty file, should be removed
README.md
Outdated
Compile: | ||
``` | ||
mkdir build && cd build | ||
cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_TESTING=ON .. | ||
make | ||
ctest | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defining a CMakePresets.txt could potentially simplify these steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other major comment I have relates to const correctness. But let's discuss offline so that I'm clearer about the intention behind this code.
src/macos/cpid_macos.c
Outdated
char serial_number[MACOS_SERIAL_NUMBER_BUFFER_SIZE]; // 16 bytes | ||
uuid_t hardware_uuid; // 16 bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think they have a usefulness. The MACOS_EXPECTED_DIGEST_INPUT_CONTENT_SIZE
macro is mysteriously set to 88 and having field sizes in trailing comments like this helps explain where that value comes from. I'd actually be inclined to add a similar trailing comment to the other fields in this struct, e.g.
typedef struct {
char serial_number[MACOS_SERIAL_NUMBER_BUFFER_SIZE]; // 16 bytes
uuid_t hardware_uuid; // + 16 bytes
process_creation_time_t kernel_task_creation_time; // + 16 bytes
process_creation_time_t launchd_creation_time; // + 16 bytes
process_creation_time_t process_creation_time; // + 16 bytes
uint64_t pid; // + 8 bytes
} digest_input_content_t; // = 88 bytes
src/macos/cpid_macos.c
Outdated
int return_code = sysctl(mib, 4, &process_info, &info_size, NULL, 0); | ||
if (0 != return_code) { | ||
return RETURN_ERROR; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason people traditionally do things like that is that it makes debugging easier because the value returned by the function call isn't lost. In a release build, the temporary variable will be optimised away anyway.
An obvious exception is where the return type of the called function is boolean. In that case, the value is clear from the direction that the if-statement takes 😄
src/macos/cpid_macos.c
Outdated
cpid_finalize(library_handle); | ||
library_handle = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then you'd have to pass the address of that pointer, and that would potentially complicate calling this function from elsewhere. Consider, for example, that in other context where it is called the variable might be const.
include/cpid/cpid_macos.h
Outdated
#define _Inout_ | ||
#endif | ||
|
||
typedef struct cpid_struct *cpid_handle_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think cpid_handle_t
should be an opaque void*
as it is in the Windows implementation.
src/macos/cpid_macos.c
Outdated
|
||
#pragma pack(push, 1) | ||
typedef struct { | ||
uint64_t unix_epoch_seconds; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to include <stdint.h>
to get the fixed width integer types? If it's working now, it's most likely just a happy side-effect of including another header.
src/macos/cpid_macos.c
Outdated
EVP_MD *sha256; | ||
uint8_t digest_destination_buffer[EVP_MAX_MD_SIZE]; | ||
digest_input_content_t digest_input_content; | ||
} *cpid_handle_t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted above, I don't think you should declaratively associate cpid_handle_t
and cpid_struct
.
src/macos/cpid_macos.c
Outdated
|
||
int cpid_get_uuid_string(_In_ cpid_handle_t library_handle, _In_ uint32_t pid, _Out_ uuid_string_t uuid_string) { | ||
|
||
if (NULL == library_handle || NULL == uuid_string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. A NULL == foo
or foo == NULL
test always makes me feel like I'm reading Java or C# 😄
Note that this practice is throughout this code.
src/macos/cpid_macos.c
Outdated
return RETURN_ERROR; | ||
} | ||
|
||
uuid_t uuid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno about that tbh...we're not writing K&R C here.
src/constants.h
Outdated
#define RETURN_SUCCESS 0 | ||
#define RETURN_ERROR -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My biggest concern about this implementation is that every function returns only success or failure. If it's a failure, the caller has no way to determine why it has failed. Given that this is a "UNIXy" implementation, I'd have thought that either:
- Each function would directly return 0 on success or else an error code from errno.h.
- Each function would return success/failure and set
errno
to an error code from errno.h.
By way of comparison, each function in the Windows implementation returns a standard winerror.h code.
Update digest input to use signed integers Basic methods for efficient calculation exposed in header Use an allocatable data structure as handle so data and resources can be reused Unit testing implemented with CUnit Some const-correctness implemented Signed-off-by: Mitchell Wasson <[email protected]>
Basic methods for efficient calculation exposed in header
Use an allocatable data structure as handle so data and resources can be reused
Unit testing implemented with CUnit