DP-100: Designing and Implementing a Data Science Solution on Azure (beta) → DP-100: Designing and Implementing a Data Science Solution on Azure (beta) Topic 2

DP-100: Designing and Implementing a Data Science Solution on Azure (beta) Topic 2

Question #: 18
Topic #: 4
You use the Azure Machine Learning Python SDK to define a pipeline that consists of multiple steps.
When you run the pipeline, you observe that some steps do not run. The cached output from a previous run is used instead.
You need to ensure that every step in the pipeline is run, even if the parameters and contents of the source directory have not changed since the previous run.
What are two possible ways to achieve this goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

A. Use a PipelineData object that references a datastore other than the default datastore.
B. Set the regenerate_outputs property of the pipeline to True.
C. Set the allow_reuse property of each step in the pipeline to False.
D. Restart the compute cluster where the pipeline experiment is configured to run.
E. Set the outputs property of each step in the pipeline to True.

Selected Answer: BC

Question #: 19
Topic #: 5
You are a data scientist building a deep convolutional neural network (CNN) for image classification.
The CNN model you build shows signs of overfitting.
You need to reduce overfitting and converge the model to an optimal fit.
Which two actions should you perform? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

A. Add an additional dense layer with 512 input units.
B. Add L1/L2 regularization.
C. Use training data augmentation.
D. Reduce the amount of training data.
E. Add an additional dense layer with 64 input units.

Selected Answer: BC

Question #: 19
Topic #: 2
You are creating a machine learning model. You have a dataset that contains null rows.
You need to use the Clean Missing Data module in Azure Machine Learning Studio to identify and resolve the null and missing data in the dataset.
Which parameter should you use?

A. Replace with mean
B. Remove entire column
C. Remove entire row
D. Hot Deck
E. Custom substitution value
F. Replace with mode

Selected Answer: C

Question #: 19
Topic #: 4
You train a model and register it in your Azure Machine Learning workspace. You are ready to deploy the model as a real-time web service.
You deploy the model to an Azure Kubernetes Service (AKS) inference cluster, but the deployment fails because an error occurs when the service runs the entry script that is associated with the model deployment.
You need to debug the error by iteratively modifying the code and reloading the service, without requiring a re-deployment of the service for each code update.
What should you do?

A. Modify the AKS service deployment configuration to enable application insights and re-deploy to AKS.
B. Create an Azure Container Instances (ACI) web service deployment configuration and deploy the model on ACI.
C. Add a breakpoint to the first line of the entry script and redeploy the service to AKS.
D. Create a local web service deployment configuration and deploy the model to a local Docker container.
E. Register a new version of the model and update the entry script to load the new version of the model from its registered path.

Selected Answer: D

Question #: 20
Topic #: 4
You use Azure Machine Learning designer to create a training pipeline for a regression model.
You need to prepare the pipeline for deployment as an endpoint that generates predictions asynchronously for a dataset of input data values.
What should you do?

A. Clone the training pipeline.
B. Create a batch inference pipeline from the training pipeline.
C. Create a real-time inference pipeline from the training pipeline.
D. Replace the dataset in the training pipeline with an Enter Data Manually module.

Selected Answer: B

Question #: 20
Topic #: 1
You are planning to host practical training to acquaint learners with data visualization creation using Python. Learner devices are able to connect to the internet.
Learner devices are currently NOT configured for Python development. Also, learners are unable to install software on their devices as they lack administrator permissions. Furthermore, they are unable to access Azure subscriptions.
It is imperative that learners are able to execute Python-based data visualization code.
Which of the following actions should you take?

A. You should consider configuring the use of Azure Container Instance.
B. You should consider configuring the use of Azure BatchAI.
C. You should consider configuring the use of Azure Notebooks.
D. You should consider configuring the use of Azure Kubernetes Service.

Selected Answer: C

Question #: 20
Topic #: 5
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a model to predict the price of a student’s artwork depending on the following variables: the student’s length of education, degree type, and art form.
You start by creating a linear regression model.
You need to evaluate the linear regression model.
Solution: Use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC.
Does the solution meet the goal?

A. Yes
B. No

Selected Answer: B

Question #: 21
Topic #: 4
You retrain an existing model.
You need to register the new version of a model while keeping the current version of the model in the registry.
What should you do?

A. Register a model with a different name from the existing model and a custom property named version with the value 2.
B. Register the model with the same name as the existing model.
C. Save the new model in the default datastore with the same name as the existing model. Do not register the new model.
D. Delete the existing model and register the new one with the same name.

Selected Answer: B

Question #: 21
Topic #: 3
You plan to run a script as an experiment using a Script Run Configuration. The script uses modules from the scipy library as well as several Python packages that are not typically installed in a default conda environment.
You plan to run the experiment on your local workstation for small datasets and scale out the experiment by running it on more powerful remote compute clusters for larger datasets.
You need to ensure that the experiment runs successfully on local and remote compute with the least administrative effort.
What should you do?

A. Do not specify an environment in the run configuration for the experiment. Run the experiment by using the default environment.
B. Create a virtual machine (VM) with the required Python configuration and attach the VM as a compute target. Use this compute target for all experiment runs.
C. Create and register an Environment that includes the required packages. Use this Environment for all experiment runs.
D. Create a config.yaml file defining the conda packages that are required and save the file in the experiment folder.
E. Always run the experiment with an Estimator by using the default packages.

Selected Answer: C

Question #: 21
Topic #: 5
You are building a binary classification model by using a supplied training set.
The training set is imbalanced between two classes.
You need to resolve the data imbalance.
What are three possible ways to achieve this goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

A. Penalize the classification
B. Resample the dataset using undersampling or oversampling
C. Normalize the training feature set
D. Generate synthetic samples in the minority class
E. Use accuracy as the evaluation metric of the model

Selected Answer: ABD

Question #: 21
Topic #: 2
You plan to provision an Azure Machine Learning Basic edition workspace for a data science project.
You need to identify the tasks you will be able to perform in the workspace.
Which three tasks will you be able to perform? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

A. Create a Compute Instance and use it to run code in Jupyter notebooks.
B. Create an Azure Kubernetes Service (AKS) inference cluster.
C. Use the designer to train a model by dragging and dropping pre-defined modules.
D. Create a tabular dataset that supports versioning.
E. Use the Automated Machine Learning user interface to train a model.

Selected Answer: ACE

Question #: 22
Topic #: 4
You use the Azure Machine Learning SDK to run a training experiment that trains a classification model and calculates its accuracy metric.
The model will be retrained each month as new data is available.
You must register the model for use in a batch inference pipeline.
You need to register the model and ensure that the models created by subsequent retraining experiments are registered only if their accuracy is higher than the currently registered model.
What are two possible ways to achieve this goal? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

A. Specify a different name for the model each time you register it.
B. Register the model with the same name each time regardless of accuracy, and always use the latest version of the model in the batch inferencing pipeline.
C. Specify the model framework version when registering the model, and only register subsequent models if this value is higher.
D. Specify a property named accuracy with the accuracy metric as a value when registering the model, and only register subsequent models if their accuracy is higher than the accuracy property value of the currently registered model.
E. Specify a tag named accuracy with the accuracy metric as a value when registering the model, and only register subsequent models if their accuracy is higher than the accuracy tag value of the currently registered model.

Selected Answer: DE

Question #: 22
Topic #: 3
You write a Python script that processes data in a comma-separated values (CSV) file.
You plan to run this script as an Azure Machine Learning experiment.
The script loads the data and determines the number of rows it contains using the following code:

You need to record the row count as a metric named row_count that can be returned using the get_metrics method of the Run object after the experiment run completes.
Which code should you use?

A. run.upload_file(T3 row_count’, ‘./data.csv’)
B. run.log(‘row_count’, rows)
C. run.tag(‘row_count’, rows)
D. run.log_table(‘row_count’, rows)
E. run.log_row(‘row_count’, rows)

Selected Answer: B

Question #: 22
Topic #: 1
You have recently concluded the construction of a binary classification machine learning model.
You are currently assessing the model. You want to make use of a visualization that allows for precision to be used as the measurement for the assessment.
Which of the following actions should you take?

A. You should consider using Venn diagram visualization.
B. You should consider using Receiver Operating Characteristic (ROC) curve visualization.
C. You should consider using Box plot visualization.
D. You should consider using the Binary classification confusion matrix visualization.

Selected Answer: D

Question #: 23
Topic #: 4
You are a data scientist working for a hotel booking website company. You use the Azure Machine Learning service to train a model that identifies fraudulent transactions.
You must deploy the model as an Azure Machine Learning real-time web service using the Model.deploy method in the Azure Machine Learning SDK. The deployed web service must return real-time predictions of fraud based on transaction data input.
You need to create the script that is specified as the entry_script parameter for the InferenceConfig class used to deploy the model.
What should the entry script do?

A. Register the model with appropriate tags and properties.
B. Create a Conda environment for the web service compute and install the necessary Python packages.
C. Load the model and use it to predict labels from input data.
D. Start a node on the inference cluster where the web service is deployed.
E. Specify the number of cores and the amount of memory required for the inference compute.

Selected Answer: C

Question #: 23
Topic #: 1
This question is included in a number of questions that depicts the identical set-up. However, every question has a distinctive result. Establish if the recommendation satisfies the requirements.
You have been tasked with evaluating your model on a partial data sample via k-fold cross-validation.
You have already configured a k parameter as the number of splits. You now have to configure the k parameter for the cross-validation with the usual value choice.
Recommendation: You configure the use of the value k=1.
Will the requirements be satisfied?

A. Yes
B. No

Selected Answer: B

Question #: 23
Topic #: 2
A set of CSV files contains sales records. All the CSV files have the same data schema.
Each CSV file contains the sales record for a particular month and has the filename sales.csv. Each file is stored in a folder that indicates the month and year when the data was recorded. The folders are in an Azure blob container for which a datastore has been defined in an Azure Machine Learning workspace. The folders are organized in a parent folder named sales to create the following hierarchical structure:

At the end of each month, a new folder with that month’s sales file is added to the sales folder.
You plan to use the sales data to train a machine learning model based on the following requirements:
✑ You must define a dataset that loads all of the sales data to date into a structure that can be easily converted to a dataframe.
✑ You must be able to create experiments that use only data that was created before a specific previous month, ignoring any data that was added after that month.
✑ You must register the minimum number of datasets possible.
You need to register the sales data as a dataset in Azure Machine Learning service workspace.
What should you do?

A. Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset each month, replacing the existing dataset and specifying a tag named month indicating the month and year it was registered. Use this dataset for all experiments.
B. Create a tabular dataset that references the datastore and specifies the path ‘sales/*/sales.csv’, register the dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use this dataset for all experiments.
C. Create a new tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for experiments.
D. Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/sales.csv’ file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.

Selected Answer: D

Question #: 23
Topic #: 3
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Does the solution meet the goal?

A. Yes
B. No

Selected Answer: A

Question #: 24
Topic #: 3
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Stratified split for the sampling mode.
Does the solution meet the goal?

A. Yes
B. No

Selected Answer: B

Question #: 25
Topic #: 1
This question is included in a number of questions that depicts the identical set-up. However, every question has a distinctive result. Establish if the recommendation satisfies the requirements.
You are in the process of carrying out feature engineering on a dataset.
You want to add a feature to the dataset and fill the column value.
Recommendation: You must make use of the Group Categorical Values Azure Machine Learning Studio module.
Will the requirements be satisfied?

A. Yes
B. No

Selected Answer: A

Question #: 25
Topic #: 4
You develop and train a machine learning model to predict fraudulent transactions for a hotel booking website.
Traffic to the site varies considerably. The site experiences heavy traffic on Monday and Friday and much lower traffic on other days. Holidays are also high web traffic days.
You need to deploy the model as an Azure Machine Learning real-time web service endpoint on compute that can dynamically scale up and down to support demand.
Which deployment compute option should you use?

A. attached Azure Databricks cluster
B. Azure Container Instance (ACI)
C. Azure Kubernetes Service (AKS) inference cluster
D. Azure Machine Learning Compute Instance
E. attached virtual machine in a different region

Selected Answer: C

Question #: 25
Topic #: 5
You create a binary classification model. The model is registered in an Azure Machine Learning workspace. You use the Azure Machine Learning Fairness SDK to assess the model fairness.
You develop a training script for the model on a local machine.
You need to load the model fairness metrics into Azure Machine Learning studio.
What should you do?

A. Implement the download_dashboard_by_upload_id function
B. Implement the create_group_metric_set function
C. Implement the upload_dashboard_dictionary function
D. Upload the training script

Selected Answer: C

Question #: 25
Topic #: 3
You are creating a machine learning model.
You need to identify outliers in the data.
Which two visualizations can you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

A. Venn diagram
B. Box plot
C. ROC curve
D. Random forest diagram
E. Scatter plot

Selected Answer: BE

Question #: 26
Topic #: 3
You are evaluating a completed binary classification machine learning model.
You need to use the precision as the evaluation metric.
Which visualization should you use?

A. Violin plot
B. Gradient descent
C. Box plot
D. Binary classification confusion matrix

Selected Answer: D

Question #: 26
Topic #: 1
This question is included in a number of questions that depicts the identical set-up. However, every question has a distinctive result. Establish if the recommendation satisfies the requirements.
You are in the process of carrying out feature engineering on a dataset.
You want to add a feature to the dataset and fill the column value.
Recommendation: You must make use of the Join Data Azure Machine Learning Studio module.
Will the requirements be satisfied?

A. Yes
B. No

Selected Answer: B

Question #: 26
Topic #: 5
You have a dataset that includes confidential data. You use the dataset to train a model.
You must use a differential privacy parameter to keep the data of individuals safe and private.
You need to reduce the effect of user data on aggregated results.
What should you do?

A. Decrease the value of the epsilon parameter to reduce the amount of noise added to the data
B. Increase the value of the epsilon parameter to decrease privacy and increase accuracy
C. Decrease the value of the epsilon parameter to increase privacy and reduce accuracy
D. Set the value of the epsilon parameter to 1 to ensure maximum privacy

Selected Answer: C

Question #: 26
Topic #: 4
You use the designer to create a training pipeline for a classification model. The pipeline uses a dataset that includes the features and labels required for model training.
You create a real-time inference pipeline from the training pipeline. You observe that the schema for the generated web service input is based on the dataset and includes the label column that the model predicts. Client applications that use the service must not be required to submit this value.
You need to modify the inference pipeline to meet the requirement.
What should you do?

A. Add a Select Columns in Dataset module to the inference pipeline after the dataset and use it to select all columns other than the label.
B. Delete the dataset from the training pipeline and recreate the real-time inference pipeline.
C. Delete the Web Service Input module from the inference pipeline.
D. Replace the dataset in the inference pipeline with an Enter Data Manually module that includes data for the feature columns but not the label column.

Selected Answer: A

Question #: 27
Topic #: 4
You use the Azure Machine Learning designer to create and run a training pipeline. You then create a real-time inference pipeline.
You must deploy the real-time inference pipeline as a web service.
What must you do before you deploy the real-time inference pipeline?

A. Run the real-time inference pipeline.
B. Create a batch inference pipeline.
C. Clone the training pipeline.
D. Create an Azure Machine Learning compute cluster.

Selected Answer: D

Question #: 27
Topic #: 3
You create a multi-class image classification deep learning model that uses the PyTorch deep learning framework.
You must configure Azure Machine Learning Hyperdrive to optimize the hyperparameters for the classification model.
You need to define a primary metric to determine the hyperparameter values that result in the model with the best accuracy score.
Which three actions must you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

A. Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to maximize.
B. Add code to the bird_classifier_train.py script to calculate the validation loss of the model and log it as a float value with the key loss.
C. Set the primary_metric_goal of the estimator used to run the bird_classifier_train.py script to minimize.
D. Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to accuracy.
E. Set the primary_metric_name of the estimator used to run the bird_classifier_train.py script to loss.
F. Add code to the bird_classifier_train.py script to calculate the validation accuracy of the model and log it as a float value with the key accuracy.

Selected Answer: ADF

Question #: 27
Topic #: 1
This question is included in a number of questions that depicts the identical set-up. However, every question has a distinctive result. Establish if the recommendation satisfies the requirements.
You are in the process of carrying out feature engineering on a dataset.
You want to add a feature to the dataset and fill the column value.
Recommendation: You must make use of the Edit Metadata Azure Machine Learning Studio module.
Will the requirements be satisfied?

A. Yes
B. No

Selected Answer: B

Question #: 28
Topic #: 5
You develop a machine learning project on a local machine. The project uses the Azure Machine Learning SDK for Python. You use Git as version control for scripts.
You submit a training run that returns a Run object.
You need to retrieve the active Git branch for the training run.
Which two code segments should you use? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

A. details = run.get_environment()
B. details.properties[‘azureml.git.branch’] C. details.properties[‘azureml.git.commit’] D. details = run.get_details()

Selected Answer: BD

Question #: 28
Topic #: 2
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are using Azure Machine Learning Studio to perform feature engineering on a dataset.
You need to normalize values to produce a feature column grouped into bins.
Solution: Apply an Entropy Minimum Description Length (MDL) binning mode.
Does the solution meet the goal?

A. Yes
B. No

Selected Answer: B

Question #: 28
Topic #: 1
You have been tasked with ascertaining if two sets of data differ considerably. You will make use of Azure Machine Learning Studio to complete your task.
You plan to perform a paired t-test.
Which of the following are conditions that must apply to use a paired t-test? (Choose all that apply.)

A. All scores are independent from each other.
B. You have a matched pairs of scores.
C. The sampling distribution of d is normal.
D. The sampling distribution of x1- x2 is normal.

Selected Answer: BC

Question #: 28
Topic #: 4
You create an Azure Machine Learning workspace named ML-workspace. You also create an Azure Databricks workspace named DB-workspace. DB-workspace contains a cluster named DB-cluster.
You must use DB-cluster to run experiments from notebooks that you import into DB-workspace.
You need to use ML-workspace to track MLflow metrics and artifacts generated by experiments running on DB-cluster. The solution must minimize the need for custom code.
What should you do?

A. From DB-cluster, configure the Advanced Logging option.
B. From DB-workspace, configure the Link Azure ML workspace option.
C. From ML-workspace, create an attached compute.
D. From ML-workspace, create a compute cluster.

Selected Answer: A

Question #: 29
Topic #: 1
You want to train a classification model using data located in a comma-separated values (CSV) file.
The classification model will be trained via the Automated Machine Learning interface using the Classification task type.
You have been informed that only linear models need to be assessed by the Automated Machine Learning.
Which of the following actions should you take?

A. You should disable deep learning.
B. You should enable automatic featurization.
C. You should disable automatic featurization.
D. You should set the task type to Forecasting.

Selected Answer: A

Question #: 29
Topic #: 5
You are attaching an Azure Databricks-based compute resource to an Azure Machine Learning development workspace.
You need to configure parameters to attach the resource.
Which three parameters should you use? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

A. Workspace name
B. Compute name
C. Workspace user credentials
D. Workspace resource ID
E. Access token

Selected Answer: ABE

Question #: 30
Topic #: 3
You are performing a filter-based feature selection for a dataset to build a multi-class classifier by using Azure Machine Learning Studio.
The dataset contains categorical features that are highly correlated to the output label column.
You need to select the appropriate feature scoring statistical method to identify the key predictors.
Which method should you use?

A. Kendall correlation
B. Spearman correlation
C. Chi-squared
D. Pearson correlation

Selected Answer: C

Question #: 30
Topic #: 1
You are preparing to train a regression model via automated machine learning. The data available to you has features with missing values, as well as categorical features with little discrete values.
You want to make sure that automated machine learning is configured as follows:
✑ missing values must be automatically imputed.
✑ categorical features must be encoded as part of the training task.
Which of the following actions should you take?

A. You should make use of the featurization parameter with the ‘auto’ value pair.
B. You should make use of the featurization parameter with the ‘off’ value pair.
C. You should make use of the featurization parameter with the ‘on’ value pair.
D. You should make use of the featurization parameter with the ‘FeaturizationConfig’ value pair.

Selected Answer: A

Question #: 30
Topic #: 2
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are a data scientist using Azure Machine Learning Studio.
You need to normalize values to produce an output column into bins to predict a target column.
Solution: Apply a Quantiles normalization with a QuantileIndex normalization.
Does the solution meet the goal?

A. Yes
B. No

Selected Answer: A

Question #: 30
Topic #: 4
You are planning to register a trained model in an Azure Machine Learning workspace.
You must store additional metadata about the model in a key-value format. You must be able to add new metadata and modify or delete metadata after creation.
You need to register the model.
Which parameter should you use?

A. description
B. model_framework
C. tags
D. properties

Selected Answer: C

Question #: 31
Topic #: 1
You make use of Azure Machine Learning Studio to develop a linear regression model. You perform an experiment to assess various algorithms.
Which of the following is an algorithm that reduces the variances between actual and predicted values?

A. Fast Forest Quantile Regression
B. Poisson Regression
C. Boosted Decision Tree Regression
D. Linear Regression

Selected Answer: D

Question #: 31
Topic #: 2
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Scale and Reduce sampling mode.
Does the solution meet the goal?

A. Yes
B. No

Selected Answer: A

Question #: 31
Topic #: 4
You have a Python script that executes a pipeline. The script includes the following code: from azureml.core import Experiment pipeline_run = Experiment(ws, ‘pipeline_test’).submit(pipeline)
You want to test the pipeline before deploying the script.
You need to display the pipeline run details written to the STDOUT output when the pipeline completes.
Which code segment should you add to the test script?

A. pipeline_run.get.metrics()
B. pipeline_run.wait_for_completion(show_output=True)
C. pipeline_param = PipelineParameter(name=”stdout”, default_value=”console”)
D. pipeline_run.get_status()

Selected Answer: B

Question #: 32
Topic #: 4
You train and register a machine learning model. You create a batch inference pipeline that uses the model to generate predictions from multiple data files.
You must publish the batch inference pipeline as a service that can be scheduled to run every night.
You need to select an appropriate compute target for the inference service.
Which compute target should you use?

A. Azure Machine Learning compute instance
B. Azure Machine Learning compute cluster
C. Azure Kubernetes Service (AKS)-based inference cluster
D. Azure Container Instance (ACI) compute target

Selected Answer: B

Question #: 32
Topic #: 5
You create a binary classification model. You use the Fairlearn package to assess model fairness.
You must eliminate the need to retrain the model.
You need to implement the Fairlearn package.
Which algorithm should you use?

A. fairlearn.reductions.ExponentiatedGradient
B. fairlearn.postprocessing.ThresholdOptimizer
C. fairlearnpreprocessing.CorrelationRemover
D. fairlearn.reductions.GridSearch

Selected Answer: B

Question #: 32
Topic #: 1
This question is included in a number of questions that depicts the identical set-up. However, every question has a distinctive result. Establish if the recommendation satisfies the requirements.
You have been tasked with constructing a machine learning model that translates language text into a different language text.
The machine learning model must be constructed and trained to learn the sequence of the.
Recommendation: You make use of Convolutional Neural Networks (CNNs).
Will the requirements be satisfied?

A. Yes
B. No

Selected Answer: B

Question #: 32
Topic #: 2
You are analyzing a dataset by using Azure Machine Learning Studio.
You need to generate a statistical summary that contains the p-value and the unique count for each feature column.
Which two modules can you use? Each correct answer presents a complete solution.
NOTE: Each correct selection is worth one point.

A. Computer Linear Correlation
B. Export Count Table
C. Execute Python Script
D. Convert to Indicator Values
E. Summarize Data

Selected Answer: CE

Question #: 33
Topic #: 5
You have an Azure Machine Learning workspace named workspace1.
You must add a datastore that connects an Azure Blob storage container to workspace1. You must be able to configure a privilege level.
You need to configure authentication.
Which authentication method should you use?

A. Service principal
B. Account key
C. SAS token
D. Managed identity

Selected Answer: C

Question #: 33
Topic #: 3
You plan to use automated machine learning to train a regression model. You have data that has features which have missing values, and categorical features with few distinct values.
You need to configure automated machine learning to automatically impute missing values and encode categorical features as part of the training task.
Which parameter and value pair should you use in the AutoMLConfig class?

A. featurization = ‘auto’
B. enable_voting_ensemble = True
C. task = ‘classification’
D. exclude_nan_labels = True
E. enable_tf = True

Selected Answer: A

Question #: 33
Topic #: 1
This question is included in a number of questions that depicts the identical set-up. However, every question has a distinctive result. Establish if the recommendation satisfies the requirements.
You have been tasked with constructing a machine learning model that translates language text into a different language text.
The machine learning model must be constructed and trained to learn the sequence of the.
Recommendation: You make use of Generative Adversarial Networks (GANs).
Will the requirements be satisfied?

A. Yes
B. No

Selected Answer: B

Question #: 33
Topic #: 2
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.
You need to analyze a full dataset to include all values.
Solution: Use the Last Observation Carried Forward (LOCF) method to impute the missing data points.
Does the solution meet the goal?

A. Yes
B. No

Selected Answer: A

Question #: 34
Topic #: 5
You plan to create a compute instance as part of an Azure Machine Learning development workspace.
You must interactively debug code running on the compute instance by using Visual Studio Code Remote.
You need to provision the compute instance.
What should you do?

A. Enable Remote Desktop Protocol (RDP) access.
B. Modify role-based access control (RBAC) settings at the workspace level.
C. Enable Secure Shell Protocol (SSH) access.
D. Modify role-based access control (RBAC) settings at the compute instance level.

Selected Answer: B

Question #: 34
Topic #: 1
This question is included in a number of questions that depicts the identical set-up. However, every question has a distinctive result. Establish if the recommendation satisfies the requirements.
You have been tasked with constructing a machine learning model that translates language text into a different language text.
The machine learning model must be constructed and trained to learn the sequence of the.
Recommendation: You make use of Recurrent Neural Networks (RNNs).
Will the requirements be satisfied?

A. Yes
B. No

Selected Answer: A

Question #: 35
Topic #: 3
You are building a regression model for estimating the number of calls during an event.
You need to determine whether the feature values achieve the conditions to build a Poisson regression model.
Which two conditions must the feature set contain? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.

A. The label data must be a negative value.
B. The label data must be whole numbers.
C. The label data must be non-discrete.
D. The label data must be a positive value.
E. The label data can be positive or negative.

Selected Answer: BD

Question #: 35
Topic #: 5
You have a dataset that contains salary information for users. You plan to generate an aggregate salary report that shows average salaries by city.
Privacy of individuals must be preserved without impacting accuracy, completeness, or reliability of the data. The aggregation must be statistically consistent with the distribution of the original data. You must return an approximation of the data instead of the raw data.
You need to apply a differential privacy approach.
What should you do?

A. Add noise to the salary data during the analysis
B. Encrypt the salary data before analysis
C. Remove the salary data
D. Convert the salary data to the average column value

Selected Answer: A

Question #: 35
Topic #: 2
You plan to deliver a hands-on workshop to several students. The workshop will focus on creating data visualizations using Python. Each student will use a device that has internet access.
Student devices are not configured for Python development. Students do not have administrator access to install software on their devices. Azure subscriptions are not available for students.
You need to ensure that students can run Python-based data visualization code.
Which Azure tool should you use?

A. Anaconda Data Science Platform
B. Azure BatchAI
C. Azure Notebooks
D. Azure Machine Learning Service

Selected Answer: C

Question #: 35
Topic #: 4
You use the Azure Machine Learning designer to create and run a training pipeline.
The pipeline must be run every night to inference predictions from a large volume of files. The folder where the files will be stored is defined as a dataset.
You need to publish the pipeline as a REST service that can be used for the nightly inferencing run.
What should you do?

A. Create a batch inference pipeline
B. Set the compute target for the pipeline to an inference cluster
C. Create a real-time inference pipeline
D. Clone the pipeline

Selected Answer: A

Question #: 36
Topic #: 3
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Principal Components Analysis (PCA) sampling mode.
Does the solution meet the goal?

A. Yes
B. No

Selected Answer: B

Question #: 36
Topic #: 2
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.
You need to analyze a full dataset to include all values.
Solution: Replace each missing value using the Multiple Imputation by Chained Equations (MICE) method.
Does the solution meet the goal?

A. Yes
B. No

Selected Answer: A

Question #: 37
Topic #: 1
You make use of Azure Machine Learning Studio to create a binary classification model.
You are preparing to carry out a parameter sweep of the model to tune hyperparameters. You have to make sure that the sweep allows for every possible combination of hyperparameters to be iterated. Also, the computing resources needed to carry out the sweep must be reduced.
Which of the following actions should you take?

A. You should consider making use of the Selective grid sweep mode.
B. You should consider making use of the Measured grid sweep mode.
C. You should consider making use of the Entire grid sweep mode.
D. You should consider making use of the Random grid sweep mode.

Selected Answer: D

Question #: 37
Topic #: 3
You are performing feature engineering on a dataset.
You must add a feature named CityName and populate the column value with the text London.
You need to add the new feature to the dataset.
Which Azure Machine Learning Studio module should you use?

A. Edit Metadata
B. Filter Based Feature Selection
C. Execute Python Script
D. Latent Dirichlet Allocation

Selected Answer: C

Question #: 37
Topic #: 5
You create an Azure Machine Learning workspace. You train an MLflow-formatted regression model by using tabular structured data.

You must use a Responsible AI dashboard to assess the model.

You need to use the Azure Machine Learning studio UI to generate the Responsible AI dashboard.

What should you do first?

A. Convert the model from the MLflow format to a custom format.
B. Register the model with the workspace.
C. Create the model explanations.
D. Deploy the model to a managed online endpoint.

Selected Answer: B

Question #: 37
Topic #: 4
You are developing a machine learning model.

You must inference the machine learning model for testing.

You need to use a minimal cost compute target.

Which two compute targets should you use? Each correct answer presents a complete solution.

NOTE: Each correct selection is worth one point.

A. Azure Machine Learning Kubernetes
B. Azure Databricks
C. Remote VM
D. Local web service
E. Azure Container Instances

Selected Answer: C

Question #: 37
Topic #: 2
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are analyzing a numerical dataset which contains missing values in several columns.
You must clean the missing values using an appropriate operation without affecting the dimensionality of the feature set.
You need to analyze a full dataset to include all values.
Solution: Remove the entire column that contains the missing data point.
Does the solution meet the goal?

A. Yes
B. No

Selected Answer: B

Question #: 38
Topic #: 3
You are evaluating a completed binary classification machine learning model.
You need to use the precision as the evaluation metric.
Which visualization should you use?

A. violin plot
B. Gradient descent
C. Scatter plot
D. Receiver Operating Characteristic (ROC) curve

Selected Answer: D

Question #: 38
Topic #: 4
You train and publish a machine learning model.

You need to run a pipeline that retrains the model based on a trigger from an external system.

What should you configure?

A. Azure Data Catalog
B. Azure Batch
C. Azure Logic App

Selected Answer: B

Question #: 38
Topic #: 1
You are in the process of constructing a deep convolutional neural network (CNN). The CNN will be used for image classification.
You notice that the CNN model you constructed displays hints of overfitting.
You want to make sure that overfitting is minimized, and that the model is converged to an optimal fit.
Which of the following is TRUE with regards to achieving your goal?

A. You have to add an additional dense layer with 512 input units, and reduce the amount of training data.
B. You have to add L1/L2 regularization, and reduce the amount of training data.
C. You have to reduce the amount of training data and make use of training data augmentation.
D. You have to add L1/L2 regularization, and make use of training data augmentation.
E. You have to add an additional dense layer with 512 input units, and add L1/L2 regularization.

Selected Answer: D

Question #: 38
Topic #: 2
You are creating a new experiment in Azure Machine Learning Studio. You have a small dataset that has missing values in many columns. The data does not require the application of predictors for each column. You plan to use the Clean Missing Data.
You need to select a data cleaning method.
Which method should you use?

A. Replace using Probabilistic PCA
B. Normalization
C. Synthetic Minority Oversampling Technique (SMOTE)
D. Replace using MICE

Selected Answer: A

Question #: 39
Topic #: 1
This question is included in a number of questions that depicts the identical set-up. However, every question has a distinctive result. Establish if the recommendation satisfies the requirements.
You are planning to make use of Azure Machine Learning designer to train models.
You need choose a suitable compute type.
Recommendation: You choose Attached compute.
Will the requirements be satisfied?

A. Yes
B. No

Selected Answer: A

Question #: 39
Topic #: 4
You create an Azure Machine Learning workspace.

You must configure an event handler to send an email notification when data drift is detected in the workspace datasets. You must minimize development efforts.

You need to configure an Azure service to send the notification.

Which Azure service should you use?

A. Azure Logic Apps
B. Azure Automation runbook
C. Azure Function apps
D. Azure DevOps pipeline

Selected Answer: A

Cart

Practice Exam

Install openssl-1.0.2k on Amazon Linux 2023

Protect Your System: Understanding CVE-2024-3400 Zero-Day Vulnerability

Cart