Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create initial version of notebooks #862

Merged

Conversation

sreedhar-guda
Copy link
Contributor

@sreedhar-guda sreedhar-guda commented Nov 20, 2024

Type of PR

  • Documentation changes
  • Code changes
  • Test changes
  • CI-CD changes

Purpose

The changes are to create the first version of the setup book notebook which creates database schemas and tables needed for Parking sensor sample. The changes include:

  • creation of the notebook without any default lakehouse attached
  • A script to capture telemetry using Open Telemetry
  • A config to store all the DDLs (specific for this notebook)
  • A common config file to store common information like workspace id, lakehouse id etc.
  • Additional changes to Ci/CD linter hooks are done based on the errors encountered in the DEV container, and the changes were made with help and guidance of Naga.

What it includes:

  • A notebook for setup activity. This is a full working example with all the steps: inputs params, mounting, OTEL logging and fully implemented code functions to create schemas and tables. Default lakehouse only attached during runtime using user input parameters.
  • Notebooks for Standardization and Transform steps. These are similar to Setup notebook - but the code functions are right now dummy, while rest of it like params, OTEL implementation is all set. Once ready, developers can fill that portion and rest of the portion/s need not be updated.
  • A data pipeline with all notebooks, The pipeline is coded to support high concurrency executions if the feature is enabled at workspace level for data pipelines. This would be equiavalent to using %run_multiple to call notebooks using another notebook. All notebooks are also re-written to support high concurrency executions.
  • Configuration files for all notebooks
  • Configuration file to store DDLs for the lakehouse
  • Python script to capture telemetry using Open Telemetry

Note that, the config and OTEL script files are named and in locations (configs and otel script) as per discussion (References #840).

Does this introduce a breaking change? If yes, details on what can break

No

Author pre-publish checklist

  • Added test to prove my fix is effective or new feature works
  • No PII in logs
  • Made corresponding changes to the documentation

Validation steps

  • ...

Issues Closed or Referenced

  • References

@sreedhar-guda sreedhar-guda added the e2e: fabric Related with E2E Fabric Sample label Nov 20, 2024
@sreedhar-guda sreedhar-guda self-assigned this Nov 20, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

sreedhar-guda and others added 3 commits November 20, 2024 01:42
Updated instructions on notebook execution and dependencies
@sreedhar-guda sreedhar-guda changed the title Create initial version of setup notebook Create initial version of notebooks Nov 22, 2024
Copy link
Collaborator

@ydaponte ydaponte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sreedhar-guda, great job - just minor comments and the main thing for me is the ReadMe really needs to be updated before merging this, otherwise there is no connection to what to do with the content for the user.

@promisinganuj
Copy link
Contributor

@ydaponte , I have created a separate item for updating the documentation
#917

@yuna-s yuna-s linked an issue Dec 6, 2024 that may be closed by this pull request
1 task
@promisinganuj promisinganuj mentioned this pull request Dec 9, 2024
3 tasks
trace_exporter = AzureMonitorTraceExporter(connection_string=self.conn_string)
span_processor = BatchSpanProcessor(trace_exporter)
tracer_provider.add_span_processor(span_processor)
tracer = trace.get_tracer(tracer_name, tracer_provider=tracer_provider)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @sreedhar-guda, it's tiny thing but let me ask.
Do you have any intention here why not setting tracer_provider globally like trace.set_tracer_provider(tracer_provider).
The reason why I'm asking this is because this code will have a following error (but it's not blocker) during exporting.

ERROR:azure.monitor.opentelemetry.exporter.export.trace._exporter:Failed to derive Resource from Tracer Provider: 'ProxyTracerProvider' object has no attribute 'resource'

(Reference: azure monitor otel github looks globally setting tracer_provider

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maniSbindra , do you know the above details?

Copy link
Contributor

@naga-nandyala naga-nandyala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried deploying and running notebooks. All looks good.

@promisinganuj promisinganuj merged commit 01543b0 into feat/e2e-fabric-dataops-sample Dec 10, 2024
3 checks passed
@promisinganuj promisinganuj deleted the feat/sguda-setup-notebook-updates branch December 10, 2024 02:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e2e: fabric Related with E2E Fabric Sample
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spike of Transform notebook on Fabric
5 participants