Certified Machine Learning Professional Topic 2
Question #: 6
Topic #: 1
A data scientist has developed a model to predict ice cream sales using the expected temperature and expected number of hours of sun in the day. However, the expected temperature is dropping beneath the range of the input variable on which the model was trained.
Which of the following types of drift is present in the above scenario?
A. Label drift
B. None of these
C. Concept drift
D. Prediction drift
E. Feature drift
Selected Answer: E
Question #: 43
Topic #: 1
A machine learning engineer has developed a model and registered it using the FeatureStoreClient fs. The model has model URI model_uri. The engineer now needs to perform batch inference on customer-level Spark DataFrame spark_df, but it is missing a few of the static features that were used when training the model. The customer_id column is the primary key of spark_df and the training set used when training and logging the model.
Which of the following code blocks can be used to compute predictions for spark_df when the missing feature values can be found in the Feature Store by searching for features by customer_id?
A. df = fs.get_missing_features(spark_df, model_uri)
fs.score_model(model_uri, df)
B. fs.score_model(model_uri, spark_df)
C. df = fs.get_missing_features(spark_df, model_uri)
fs.score_batch(model_uri, df)
D. df = fs.get_missing_features(spark_df)
fs.score_batch(model_uri, df)
E. fs.score_batch(model_uri, spark_df)
Selected Answer: E
Question #: 26
Topic #: 1
A machine learning engineer wants to deploy a model for real-time serving using MLflow Model Serving. For the model, the machine learning engineer currently has one model version in each of the stages in the MLflow Model Registry. The engineer wants to know which model versions can be queried once Model Serving is enabled for the model.
Which of the following lists all of the MLflow Model Registry stages whose model versions are automatically deployed with Model Serving?
A. Staging, Production, Archived
B. Production
C. None, Staging, Production, Archived
D. Staging, Production
E. None, Staging, Production
Selected Answer: D
Question #: 23
Topic #: 1
A machine learning engineer and data scientist are working together to convert a batch deployment to an always-on streaming deployment. The machine learning engineer has expressed that rigorous data tests must be put in place as a part of their conversion to account for potential changes in data formats.
Which of the following describes why these types of data type tests and checks are particularly important for streaming deployments?
A. Because the streaming deployment is always on, all types of data must be handled without producing an error
B. All of these statements
C. Because the streaming deployment is always on, there is no practitioner to debug poor model performance
D. Because the streaming deployment is always on, there is a need to confirm that the deployment can autoscale
E. None of these statements
Selected Answer: B
Question #: 12
Topic #: 1
A data scientist has developed and logged a scikit-learn random forest model model, and then they ended their Spark session and terminated their cluster. After starting a new cluster, they want to review the feature_importances_ of the original model object.
Which of the following lines of code can be used to restore the model object so that feature_importances_ is available?
A. mlflow.load_model(model_uri)
B. client.list_artifacts(run_id)[“feature-importances.csv”]
C. mlflow.sklearn.load_model(model_uri)
D. This can only be viewed in the MLflow Experiments UI
E. client.pyfunc.load_model(model_uri)
Selected Answer: C
Question #: 59
Topic #: 1
A machine learning engineer wants to view all of the active MLflow Model Registry Webhooks for a specific model.
They are using the following code block:
Which of the following changes does the machine learning engineer need to make to this code block so it will successfully accomplish the task?
A. There are no necessary changes
B. Replace list with view in the endpoint URL
C. Replace POST with GET in the call to http_request
D. Replace list with webhooks in the endpoint URL
E. Replace POST with PUT in the call to http_request
Selected Answer: C
Question #: 47
Topic #: 1
A machine learning engineer wants to move their model version model_version for the MLflow Model Registry model model from the Staging stage to the Production stage using MLflow Client client.
Which of the following code blocks can they use to accomplish the task?
A.
B.
C.
D.
E.
Selected Answer: C
Question #: 37
Topic #: 1
Which of the following describes the concept of MLflow Model flavors?
A. A convention that deployment tools can use to wrap preprocessing logic into a Model
B. A convention that MLflow Model Registry can use to version models
C. A convention that MLflow Experiments can use to organize their Runs by project
D. A convention that deployment tools can use to understand the model
E. A convention that MLflow Model Registry can use to organize its Models by project
Selected Answer: D
Question #: 38
Topic #: 1
In a continuous integration, continuous deployment (CI/CD) process for machine learning pipelines, which of the following events commonly triggers the execution of automated testing?
A. The launch of a new cost-efficient SQL endpoint
B. CI/CD pipelines are not needed for machine learning pipelines
C. The arrival of a new feature table in the Feature Store
D. The launch of a new cost-efficient job cluster
E. The arrival of a new model version in the MLflow Model Registry
Selected Answer: E
Question #: 40
Topic #: 1
A machine learning engineer needs to deliver predictions of a machine learning model in real-time. However, the feature values needed for computing the predictions are available one week before the query time.
Which of the following is a benefit of using a batch serving deployment in this scenario rather than a real-time serving deployment where predictions are computed at query time?
A. Batch serving has built-in capabilities in Databricks Machine Learning
B. There is no advantage to using batch serving deployments over real-time serving deployments
C. Computing predictions in real-time provides more up-to-date results
D. Testing is not possible in real-time serving deployments
E. Querying stored predictions can be faster than computing predictions in real-time
Selected Answer: E
Question #: 33
Topic #: 1
Which of the following tools can assist in real-time deployments by packaging software with its own application, tools, and libraries?
A. Cloud-based compute
B. None of these tools
C. REST APIs
D. Containers
E. Autoscaling clusters
Selected Answer: D
Question #: 34
Topic #: 1
A machine learning engineer has registered a sklearn model in the MLflow Model Registry using the sklearn model flavor with UI model_uri.
Which of the following operations can be used to load the model as an sklearn object for batch deployment?
A. mlflow.spark.load_model(model_uri)
B. mlflow.pyfunc.read_model(model_uri)
C. mlflow.sklearn.read_model(model_uri)
D. mlflow.pyfunc.load_model(model_uri)
E. mlflow.sklearn.load_model(model_uri)
Selected Answer: E
Question #: 35
Topic #: 1
A data scientist set up a machine learning pipeline to automatically log a data visualization with each run. They now want to view the visualizations in Databricks.
Which of the following locations in Databricks will show these data visualizations?
A. The MLflow Model Registry Model page
B. The Artifacts section of the MLflow Experiment page
C. Logged data visualizations cannot be viewed in Databricks
D. The Artifacts section of the MLflow Run page
E. The Figures section of the MLflow Run page
Selected Answer: D
Question #: 36
Topic #: 1
A data scientist has developed a scikit-learn model sklearn_model and they want to log the model using MLflow.
They write the following incomplete code block:
Which of the following lines of code can be used to fill in the blank so the code block can successfully complete the task?
A. mlflow.spark.track_model(sklearn_model, “model”)
B. mlflow.sklearn.log_model(sklearn_model, “model”)
C. mlflow.spark.log_model(sklearn_model, “model”)
D. mlflow.sklearn.load_model(“model”)
E. mlflow.sklearn.track_model(sklearn_model, “model”)
Selected Answer: B
Question #: 30
Topic #: 1
Which of the following is a benefit of logging a model signature with an MLflow model?
A. The model will have a unique identifier in the MLflow experiment
B. The schema of input data can be validated when serving models
C. The model can be deployed using real-time serving tools
D. The model will be secured by the user that developed it
E. The schema of input data will be converted to match the signature
Selected Answer: B
Question #: 17
Topic #: 1
Which of the following describes label drift?
A. Label drift is when there is a change in the distribution of the predicted target given by the model
B. None of these describe label drift
C. Label drift is when there is a change in the distribution of an input variable
D. Label drift is when there is a change in the relationship between input variables and target variables
E. Label drift is when there is a change in the distribution of a target variable
Selected Answer: E
Question #: 24
Topic #: 1
Which of the following deployment paradigms can centrally compute predictions for a single record with exceedingly fast results?
A. Streaming
B. Batch
C. Edge/on-device
D. None of these strategies will accomplish the task.
E. Real-time
Selected Answer: E
Question #: 25
Topic #: 1
A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches.
Which of the following tools can be used to provide this type of continuous processing?
A. Spark UDFs
B. Structured Streaming
C. MLflow
D. Delta Lake
E. AutoML
Selected Answer: B
Question #: 15
Topic #: 1
A data scientist has computed updated feature values for all primary key values stored in the Feature Store table features. In addition, feature values for some new primary key values have also been computed. The updated feature values are stored in the DataFrame features_df. They want to replace all data in features with the newly computed data.
Which of the following code blocks can they use to perform this task using the Feature Store Client fs?
A.
B.
C.
D.
E.
Selected Answer: D
Question #: 11
Topic #: 1
A data scientist is utilizing MLflow to track their machine learning experiments. After completing a series of runs for the experiment with experiment ID exp_id, the data scientist wants to programmatically work with the experiment run data in a Spark DataFrame. They have an active MLflow Client client and an active Spark session spark.
Which of the following lines of code can be used to obtain run-level results for exp_id in a Spark DataFrame?
A. client.list_run_infos(exp_id)
B. spark.read.format(“delta”).load(exp_id)
C. There is no way to programmatically return row-level results from an MLflow Experiment.
D. mlflow.search_runs(exp_id)
E. spark.read.format(“mlflow-experiment”).load(exp_id)
Selected Answer: E
Question #: 14
Topic #: 1
Which of the following is a probable response to identifying drift in a machine learning application?
A. None of these responses
B. Retraining and deploying a model on more recent data
C. All of these responses
D. Rebuilding the machine learning application with a new label variable
E. Sunsetting the machine learning application
Selected Answer: B
Question #: 18
Topic #: 1
Which of the following machine learning model deployment paradigms is the most common for machine learning projects?
A. On-device
B. Streaming
C. Real-time
D. Batch
E. None of these deployments
Selected Answer: D
Question #: 20
Topic #: 1
A data scientist has developed a model model and computed the RMSE of the model on the test set. They have assigned this value to the variable rmse. They now want to manually store the RMSE value with the MLflow run.
They write the following incomplete code block:
Which of the following lines of code can be used to fill in the blank so the code block can successfully complete the task?
A. log_artifact
B. log_model
C. log_metric
D. log_param
E. There is no way to store values like this.
Selected Answer: C
Question #: 21
Topic #: 1
Which of the following MLflow operations can be used to automatically calculate and log a Shapley feature importance plot?
A. mlflow.shap.log_explanation
B. None of these operations can accomplish the task.
C. mlflow.shap
D. mlflow.log_figure
E. client.log_artifact
Selected Answer: A
Question #: 27
Topic #: 1
A data scientist has written a function to track the runs of their random forest model. The data scientist is changing the number of trees in the forest across each run.
Which of the following MLflow operations is designed to log single values like the number of trees in a random forest?
A. mlflow.log_artifact
B. mlflow.log_model
C. mlflow.log_metric
D. mlflow.log_param
E. There is no way to store values like this.
Selected Answer: D
Question #: 49
Topic #: 1
Which of the following is an advantage of using the python_function(pyfunc) model flavor over the built-in library-specific model flavors?
A. python_function provides no benefits over the built-in library-specific model flavors
B. python_function can be used to deploy models in a parallelizable fashion
C. python_function can be used to deploy models without worrying about which library was used to create the model
D. python_function can be used to store models in an MLmodel file
E. python_function can be used to deploy models without worrying about whether they are deployed in batch, streaming, or real-time environments
Selected Answer: C
Question #: 50
Topic #: 1
Which of the following lists all of the model stages are available in the MLflow Model Registry?
A. Development, Staging, Production
B. None, Staging, Production
C. Staging, Production, Archived
D. None, Staging, Production, Archived
E. Development, Staging, Production, Archived
Selected Answer: D
Question #: 52
Topic #: 1
A machine learning engineer wants to log and deploy a model as an MLflow pyfunc model. They have custom preprocessing that needs to be completed on feature variables prior to fitting the model or computing predictions using that model. They decide to wrap this preprocessing in a custom model class ModelWithPreprocess, where the preprocessing is performed when calling fit and when calling predict. They then log the fitted model of the ModelWithPreprocess class as a pyfunc model.
Which of the following is a benefit of this approach when loading the logged pyfunc model for downstream deployment?
A. The pyfunc model can be used to deploy models in a parallelizable fashion
B. The same preprocessing logic will automatically be applied when calling fit
C. The same preprocessing logic will automatically be applied when calling predict
D. This approach has no impact when loading the logged pyfunc model for downstream deployment
E. There is no longer a need for pipeline-like machine learning objects
Selected Answer: C
Question #: 55
Topic #: 1
Which of the following Databricks-managed MLflow capabilities is a centralized model store?
A. Models
B. Model Registry
C. Model Serving
D. Feature Store
E. Experiments
Selected Answer: B
Question #: 45
Topic #: 1
A machine learning engineer is using the following code block as part of a batch deployment pipeline:
Which of the following changes needs to be made so this code block will work when the inference table is a stream source?
A. Replace “inference” with the path to the location of the Delta table
B. Replace schema(schema) with option(“maxFilesPerTrigger”, 1)
C. Replace spark.read with spark.readStream
D. Replace format(“delta”) with format(“stream”)
E. Replace predict with a stream-friendly prediction function
Selected Answer: C
Question #: 22
Topic #: 1
A data scientist has developed a scikit-learn random forest model model, but they have not yet logged model with MLflow. They want to obtain the input schema and the output schema of the model so they can document what type of data is expected as input.
Which of the following MLflow operations can be used to perform this task?
A. mlflow.models.schema.infer_schema
B. mlflow.models.signature.infer_signature
C. mlflow.models.Model.get_input_schema
D. mlflow.models.Model.signature
E. There is no way to obtain the input schema and the output schema of an unlogged model.
Selected Answer: B
Question #: 19
Topic #: 1
A data scientist would like to enable MLflow Autologging for all machine learning libraries used in a notebook. They want to ensure that MLflow Autologging is used no matter what version of the Databricks Runtime for Machine Learning is used to run the notebook and no matter what workspace-wide configurations are selected in the Admin Console.
Which of the following lines of code can they use to accomplish this task?
A. mlflow.sklearn.autolog()
B. mlflow.spark.autolog()
C. spark.conf.set(“autologging”, True)
D. It is not possible to automatically log MLflow runs.
E. mlflow.autolog()
Selected Answer: E