Skip to content

NtWriteCode/karton-filetype

Repository files navigation

Karton Filetype Engine

forthebadge forthebadge

A Different Approach to File Classification for MWDB Karton

The Karton Filetype Engine is a powerful tool designed for the MWDB Karton system. It's inspired by the karton-classifier, however it follows an entirely different approach. While the classifier tries to put all the possible labels on it hoping that at least one of the will be correct and consumed by the correct consumer, this repository tries its best to assign it to a SINGLE, but as correct file type as possible.

Utilized third party tools

In order to achieve the best accuracy Filetype engine uses all of the following tools:

Also it utizises some external database/lists too to improve its mimetype knowledge:

Input/Output

Consumes

{
    "type": "sample",
    "kind": "raw"
    "payload": {
        "magic":  "output from 'file' command",
        "sample": <Resource>
    }
}

Produces

It produces a similar structure to classifier, however in no way it's compatible with that.

{
'type': 'sample',
'stage': 'recognized',
'extension': '',    # Literally an extension used by the file format
                    # In some cases it's not the actual extension, but a placeholder, for example
                    # for PEs it's "pe", which is nonexistent
                    # By default "bin" is used.
'mime': '',         # The actual MIME type it identifies. Most of the cases it's provided by Magika and Tika,
                    # hence they should be stable to use.
                    # In case of no match "application/octet-stream" is used as default
'kind': '',         # A mixed hybrid of the TOP level items from:
                    # https://www.digipres.org/formats/mime-types/
                    # And one extra-custom introduced element for archives.
                    # So, every mimetype will have either the TOP mimetype element or "archive"
... (other fields are derived from incoming task)
}

I know, Filetypeis more complicated to check. TODO

Getting Started

TODO

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published