Microsoft Azure Data Engineering Associate (DP-203)

Settings – Data Sources and Ingestion

The Settings section contains three subsections: User Settings, Access Tokens, and Manage Account. The Manage Account subsection links you back into the Azure Portal to the provisioned instance of your Azure Databricks Service. FIGUER 3.74…

Configure Delta Lake – Data Sources and Ingestion

df = spark.read \.option(“header”,”true”).csv(“/FileStoretablesbrainwavesMeditation.csv”)df.write.mode(“overwrite”).format(“delta”).save(“/FileStore/ data/2022/03/14”)brainwaves=spark.read.format(“delta”).load(“FileStoredata/2022/03/14”)display(brainwaves)print(brainwaves.count()) 5. Run the following code snippet: display(spark.sql(“DROP TABLE IF EXISTS BRAINWAVES”))display(spark \.sql(“CREATE TABLE BRAINWAVES USING DELTALOCATION’FileStoredata/2022/03/14′”))display(spark.table(“BRAINWAVES”).select(“*”).show(10))print(spark.table(“BRAINWAVES”).select(“*”).count())6. Upload the brainwavesPlayingGuitar.csv file using the same process performed in step 2 ➢ navigate…

Workspace – Data Sources and Ingestion

The Workspace section provides access to the assets that exist within the workspace. For example, you created an Azure Databricks workspace in Exercise 3.14. Depending on your permissions, you can find shared workspaces or workspaces…

NOTEBOOK DISTRIBUTION – Data Sources and Ingestion

If you want to share a notebook with someone who does not have access to the workspace in which it exists, the Export can help. Within the File drop‐down menu, with the notebook in focus,…

Table – Data Sources and Ingestion

When you navigate to the Create New Table page, you will see three tabs: Upload File, DBFS, and Other Data Sources. The Upload File feature supports users uploading files directly onto the platform, which is…

Clusters – Data Sources and Ingestion

In Exercise 3.14 you created an all‐purpose cluster. The attributes and configuration details were covered in the discussion following the exercise. In most cases an all‐purpose cluster and a job cluster are the same. The…

Environments – Data Sources and Ingestion

There are three Azure Databricks environments, as listed in Table 3.16. TABLE 3.16 Azure Databricks environments Environment Description SQL A platform optimized for those who typically run SQL queries to create dashboards and explore data…

Create an Azure Databricks Workspace with an External Hive Metastore – Data Sources and Ingestion-2

Numerous Azure Databricks runtime versions are selectable from the Databricks Runtime Version drop‐down list box. Table 3.14 lists the options. TABLE 3.14 Databricks runtime versions Runtime version Ecosystem 10.3 and 10.4 Scala 2.12, Spark 3.2.1…

Create an Azure Databricks Workspace with an External Hive Metastore – Data Sources and Ingestion-1

datanucleus.schema.autoCreateTables true spark.hadoop.javax.jdo.option.ConnectionUserName userid@servername datanucleus.fixedDatastore false spark.hadoop.javax.jdo.option.ConnectionURL jdbc:sqlserver://*:1433;data base=dbname spark.hadoop.javax.jdo.option.ConnectionPassword * spark.hadoop.javax.jdo.option.ConnectionDriverName com.microsoft.sqlserver.jdbc.SQLServerDriver The text is located in the Chapter03/Ch03Ex14 directory on GitHub at https://github.com/benperk/ADE. The file is named AzureDatabricksAdvancedOptions.txt. Update the text with your…