Certified Machine Learning Professional Topic 1
Question #: 2
Topic #: 1
A machine learning engineer is monitoring categorical input variables for a production machine learning application. The engineer believes that missing values are becoming more prevalent in more recent data for a particular value in one of the categorical input variables.
Which of the following tools can the machine learning engineer use to assess their theory?
A. Kolmogorov-Smirnov (KS) test
B. One-way Chi-squared Test
C. Two-way Chi-squared Test
D. Jenson-Shannon distance
E. None of these
Selected Answer: B
Question #: 2
Topic #: 1
A machine learning engineer is monitoring categorical input variables for a production machine learning application. The engineer believes that missing values are becoming more prevalent in more recent data for a particular value in one of the categorical input variables.
Which of the following tools can the machine learning engineer use to assess their theory?
A. Kolmogorov-Smirnov (KS) test
B. One-way Chi-squared Test
C. Two-way Chi-squared Test
D. Jenson-Shannon distance
E. None of these
Selected Answer: B
Question #: 1
Topic #: 1
Which of the following describes concept drift?
A. Concept drift is when there is a change in the distribution of an input variable
B. Concept drift is when there is a change in the distribution of a target variable
C. Concept drift is when there is a change in the relationship between input variables and target variables
D. Concept drift is when there is a change in the distribution of the predicted target given by the model
E. None of these describe Concept drift
Selected Answer: C
Question #: 48
Topic #: 1
A machine learning engineer is manually refreshing a model in an existing machine learning pipeline. The pipeline uses the MLflow Model Registry model “project”. The machine learning engineer would like to add a new version of the model to “project”.
Which of the following MLflow operations can the machine learning engineer use to accomplish this task?
A. mlflow.register_model
B. MlflowClient.update_registered_model
C. mlflow.add_model_version
D. MlflowClient.get_model_version
E. The machine learning engineer needs to create an entirely new MLflow Model Registry model
Selected Answer: A
Question #: 41
Topic #: 1
A machine learning engineer has developed a random forest model using scikit-learn, logged the model using MLflow as random_forest_model, and stored its run ID in the run_id Python variable. They now want to deploy that model by performing batch inference on a Spark DataFrame spark_df.
Which of the following code blocks can they use to create a function called predict that they can use to complete the task?
A.
B. It is not possible to deploy a scikit-learn model on a Spark DataFrame.
C.
D.
E. 
Selected Answer: E
Question #: 31
Topic #: 1
Which of the following statements describes streaming with Spark as a model deployment strategy?
A. The inference of batch processed records as soon as a trigger is hit
B. The inference of all types of records in real-time
C. The inference of batch processed records as soon as a Spark job is run
D. The inference of incrementally processed records as soon as trigger is hit
E. The inference of incrementally processed records as soon as a Spark job is run
Selected Answer: D
Question #: 4
Topic #: 1
A machine learning engineer wants to log feature importance data from a CSV file at path importance_path with an MLflow run for model model.
Which of the following code blocks will accomplish this task inside of an existing MLflow run block?
A.
B. 
C. mlflow.log_data(importance_path, “feature-importance.csv”)
D. mlflow.log_artifact(importance_path, “feature-importance.csv”)
E. None of these code blocks tan accomplish the task.
Selected Answer: D
Question #: 5
Topic #: 1
Which of the following is a simple, low-cost method of monitoring numeric feature drift?
A. Jensen-Shannon test
B. Summary statistics trends
C. Chi-squared test
D. None of these can be used to monitor feature drift
E. Kolmogorov-Smirnov (KS) test
Selected Answer: B
Question #: 51
Topic #: 1
Which of the following MLflow Model Registry use cases requires the use of an HTTP Webhook?
A. Starting a testing job when a new model is registered
B. Updating data in a source table for a Databricks SQL dashboard when a model version transitions to the Production stage
C. Sending an email alert when an automated testing Job fails
D. None of these use cases require the use of an HTTP Webhook
E. Sending a message to a Slack channel when a model version transitions stages
Selected Answer: E
Question #: 10
Topic #: 1
Which of the following is a reason for using Jensen-Shannon (JS) distance over a Kolmogorov-Smirnov (KS) test for numeric feature drift detection?
A. All of these reasons
B. JS is not normalized or smoothed
C. None of these reasons
D. JS is more robust when working with large datasets
E. JS does not require any manual threshold or cutoff determinations
Selected Answer: D
Question #: 46
Topic #: 1
A machine learning engineer is migrating a machine learning pipeline to use Databricks Machine Learning. They have programmatically identified the best run from an MLflow Experiment and stored its URI in the model_uri variable and its Run ID in the run_id variable. They have also determined that the model was logged with the name “model”. Now, the machine learning engineer wants to register that model in the MLflow Model Registry with the name “best_model”.
Which of the following lines of code can they use to register the model to the MLflow Model Registry?
A. mlflow.register_model(model_uri, “best_model”)
B. mlflow.register_model(run_id, “best_model”)
C. mlflow.register_model(f”runs:/{run_id}/best_model”, “model”)
D. mlflow.register_model(model_uri, “model”)
E. mlflow.register_model(f”runs:/{run_id}/model”)
Selected Answer: A
Question #: 57
Topic #: 1
Which of the following MLflow operations can be used to delete a model from the MLflow Model Registry?
A. client.transition_model_version_stage
B. client.delete_model_version
C. client.update_registered_model
D. client.delete_model
E. client.delete_registered_model
Selected Answer: E
Question #: 13
Topic #: 1
Which of the following is a simple statistic to monitor for categorical feature drift?
A. Mode
B. None of these
C. Mode, number of unique values, and percentage of missing values
D. Percentage of missing values
E. Number of unique values
Selected Answer: A
Question #: 44
Topic #: 1
A machine learning engineer needs to select a deployment strategy for a new machine learning application. The feature values are not available until the time of delivery, and results are needed exceedingly fast for one record at a time.
Which of the following deployment strategies can be used to meet these requirements?
A. Edge/on-device
B. Streaming
C. None of these strategies will meet the requirements.
D. Batch
E. Real-time
Selected Answer: E
Question #: 42
Topic #: 1
Which of the following describes the purpose of the context parameter in the predict method of Python models for MLflow?
A. The context parameter allows the user to specify which version of the registered MLflow Model should be used based on the given application’s current scenario
B. The context parameter allows the user to document the performance of a model after it has been deployed
C. The context parameter allows the user to include relevant details of the business case to allow downstream users to understand the purpose of the model
D. The context parameter allows the user to provide the model with completely custom if-else logic for the given application’s current scenario
E. The context parameter allows the user to provide the model access to objects like preprocessing models or custom configuration files
Selected Answer: A
Question #: 39
Topic #: 1
A machine learning engineering team has written predictions computed in a batch job to a Delta table for querying. However, the team has noticed that the querying is running slowly. The team has already tuned the size of the data files. Upon investigating, the team has concluded that the rows meeting the query condition are sparsely located throughout each of the data files.
Based on the scenario, which of the following optimization techniques could speed up the query by colocating similar records while considering values in multiple columns?
A. Z-Ordering
B. Bin-packing
C. Write as a Parquet file
D. Data skipping
E. Tuning the file size
Selected Answer: A
Question #: 29
Topic #: 1
A data scientist has created a Python function compute_features that returns a Spark DataFrame with the following schema:
The resulting DataFrame is assigned to the features_df variable. The data scientist wants to create a Feature Store table using features_df.
Which of the following code blocks can they use to create and populate the Feature Store table using the Feature Store Client fs?
A.
B.
C. features_df.write.mode(“fs”).path(“new_table”)
D.
E. features_df.write.mode(“feature”).path(“new_table”)
Selected Answer: A
Question #: 32
Topic #: 1
A machine learning engineer has deployed a model recommender using MLflow Model Serving. They now want to query the version of that model that is in the Production stage of the MLflow Model Registry.
Which of the following model URIs can be used to query the described model version?
A. https://
B. The version number of the model version in Production is necessary to complete this task.
C. https://
D. https://
E. https://
Selected Answer: E
Question #: 8
Topic #: 1
Which of the following operations in Feature Store Client fs can be used to return a Spark DataFrame of a data set associated with a Feature Store table?
A. fs.create_table
B. fs.write_table
C. fs.get_table
D. There is no way to accomplish this task with fs
E. fs.read_table
Selected Answer: E
Question #: 3
Topic #: 1
A data scientist is using MLflow to track their machine learning experiment. As a part of each MLflow run, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values.
They are using the following code block:
The code block is not nesting the runs in MLflow as they expected.
Which of the following changes does the data scientist need to make to the above code block so that it successfully nests the child runs under the parent run in MLflow?
A. Indent the child run blocks within the parent run block
B. Add the nested=True argument to the parent run
C. Remove the nested=True argument from the child runs
D. Provide the same name to the run_name parameter for all three run blocks
E. Add the nested=True argument to the parent run and remove the nested=True arguments from the child runs
Selected Answer: A
Question #: 9
Topic #: 1
A machine learning engineer is in the process of implementing a concept drift monitoring solution. They are planning to use the following steps:
1. Deploy a model to production and compute predicted values
2. Obtain the observed (actual) label values
3. _____
4. Run a statistical test to determine if there are changes over time
Which of the following should be completed as Step #3?
A. Obtain the observed values (actual) feature values
B. Measure the latency of the prediction time
C. Retrain the model
D. None of these should be completed as Step #3
E. Compute the evaluation metric using the observed and predicted values
Selected Answer: E
Question #: 7
Topic #: 1
A data scientist wants to remove the star_rating column from the Delta table at the location path. To do this, they need to load in data and drop the star_rating column.
Which of the following code blocks accomplishes this task?
A. spark.read.format(“delta”).load(path).drop(“star_rating”)
B. spark.read.format(“delta”).table(path).drop(“star_rating”)
C. Delta tables cannot be modified
D. spark.read.table(path).drop(“star_rating”)
E. spark.sql(“SELECT * EXCEPT star_rating FROM path”)
Selected Answer: A

