PBIXRay is a Python library designed to parse and analyze PBIX files, which are used with Microsoft Power BI. This library provides a straightforward way to extract valuable information from PBIX files, including tables, metadata, Power Query code, and more.
This library is the Python implementation of the logic embedded in the DuckDB extension duckdb-pbix-extension.
Note: PBIXRay also supports Excel (XLSX) files with embedded PowerPivot models. You can use the same API to extract and analyze data models from XLSX files that contain PowerPivot data.
Before using PBIXRay, ensure you have the following Python modules installed: apsw, kaitaistruct, and pbixray. You can install them using pip:
pip install pbixrayTo start using PBIXRay, import the module and initialize it with the path to your PBIX file:
from pbixray import PBIXRay
model = PBIXRay('path/to/your/file.pbix')To list all tables in the model:
tables = model.tables
print(tables)To get metadata about the Power BI configuration used during model creation:
metadata = model.metadata
print(metadata)To display all M/Power Query code used for data transformation, in a dataframe with TableName and Expression columns:
power_query = model.power_query
print(power_query)To display all M Parameters values in a dataframe with ParameterName, Description, Expression and ModifiedTime columns:
m_parameters = model.m_parameters
print(m_parameters)To find out the model size in bytes:
size = model.size
print(f"Model size: {size} bytes")To view DAX calculated tables in a dataframe with TableName and Expression columns:
dax_tables = model.dax_tables
print(dax_tables)To access DAX measures in a dataframe with TableName, Name, Expression, DisplayFolder, and Description columns:
dax_measures = model.dax_measures
print(dax_measures)To access calculated column DAX expressions in a dataframe with TableName,ColumnName and Expression columns:
dax_columns = model.dax_columns
print(dax_columns)To get details about the data model schema and column types in a dataframe with TableName, ColumnName, and PandasDataType columns:
schema = model.schema
print(schema)To get the details about the data model relationships in a dataframe with FromTableName, FromColumnName, ToTableName, ToColumnName, IsActive, Cardinality, CrossFilteringBehavior, FromKeyCount, ToKeyCount and RelyOnReferentialIntegrity columns:
relationships = model.relationships
print(relationships)To get the details about Row-Level Security roles and permissions in a dataframe with TableName, RoleName, RoleDescription, FilterExpression, State and MetadataPermission columns:
rls = model.rls
print(rls)To retrieve the contents of a specified table:
table_name = 'YourTableName'
table_contents = model.get_table(table_name)
print(table_contents)To get statistics about the model, including column cardinality and byte sizes of dictionary, hash index, and data components, in a dataframe with columns TableName, ColumnName, Cardinality, Dictionary, HashIndex, and DataSize:
statistics = model.statistics
print(statistics)