Certified Data Analyst Associate Topic 1
Question #: 39
Topic #: 1
A data analyst has been asked to configure an alert for a query that returns the income in the accounts_receivable table for a date range. The date range is configurable using a Date query parameter.
The Alert does not work.
Which of the following describes why the Alert does not work?
A. Alerts don’t work with queries that access tables.
B. Queries that return results based on dates cannot be used with Alerts.
C. The wrong query parameter is being used. Alerts only work with Date and Time query parameters.
D. Queries that use query parameters cannot be used with Alerts.
E. The wrong query parameter is being used. Alerts only work with dropdown list query parameters, not dates.
Selected Answer: D
Question #: 41
Topic #: 1
A data team has been given a series of projects by a consultant that need to be implemented in the Databricks Lakehouse Platform.
Which of the following projects should be completed in Databricks SQL?
A. Testing the quality of data as it is imported from a source
B. Tracking usage of feature variables for machine learning projects
C. Combining two data sources into a single, comprehensive dataset
D. Segmenting customers into like groups using a clustering algorithm
E. Automating complex notebook-based workflows with multiple tasks
Selected Answer: C
Question #: 40
Topic #: 1
Which of the following statements about adding visual appeal to visualizations in the Visualization Editor is incorrect?
A. Visualization scale can be changed.
B. Data Labels can be formatted.
C. Colors can be changed.
D. Borders can be added.
E. Tooltips can be formatted.
Selected Answer: D
Question #: 45
Topic #: 1
A data analyst is working with gold-layer tables to complete an ad-hoc project. A stakeholder has provided the analyst with an additional dataset that can be used to augment the gold-layer tables already in use.
Which of the following terms is used to describe this data augmentation?
A. Data testing
B. Ad-hoc improvements
C. Last-mile dashboarding
D. Last-mile ETL
E. Data enhancement
Selected Answer: E
Question #: 44
Topic #: 1
In which of the following situations will the mean value and median value of variable be meaningfully different?
A. When the variable contains no outliers
B. When the variable contains no missing values
C. When the variable is of the boolean type
D. When the variable is of the categorical type
E. When the variable contains a lot of extreme outliers
Selected Answer: E
Question #: 43
Topic #: 1
Which of the following statements describes descriptive statistics?
A. A branch of statistics that uses summary statistics to quantitatively describe and summarize data.
B. A branch of statistics that uses a variety of data analysis techniques to infer properties of an underlying distribution of probability.
C. A branch of statistics that uses quantitative variables that must take on a finite or countably infinite set of values.
D. A branch of statistics that uses summary statistics to categorically describe and summarize data.
E. A branch of statistics that uses quantitative variables that must take on an uncountable set of values.
Selected Answer: A
Question #: 25
Topic #: 1
A data analyst has been asked to use the below table sales_table to get the percentage rank of products within region by the sales:
The result of the query should look like this:
Which of the following queries will accomplish this task?
A.
B.
C.
D.
E.
Selected Answer: B
Question #: 24
Topic #: 1
A business analyst has been asked to create a data entity/object called sales_by_employee. It should always stay up-to-date when new data are added to the sales table. The new entity should have the columns sales_person, which will be the name of the employee from the employees table, and sales, which will be all sales for that particular sales person. Both the sales table and the employees table have an employee_id column that is used to identify the sales person.
Which of the following code blocks will accomplish this task?
A.
B.
C.
D.
E.
Selected Answer: D
Question #: 17
Topic #: 1
The stakeholders.customers table has 15 columns and 3,000 rows of data. The following command is run:
After running SELECT * FROM stakeholders.eur_customers, 15 rows are returned. After the command executes completely, the user logs out of Databricks.
After logging back in two days later, what is the status of the stakeholders.eur_customers view?
A. The view remains available and SELECT * FROM stakeholders.eur_customers will execute correctly.
B. The view has been dropped.
C. The view is not available in the metastore, but the underlying data can be accessed with SELECT * FROM delta. `stakeholders.eur_customers`.
D. The view remains available but attempting to SELECT from it results in an empty result set because data in views are automatically deleted after logging out.
E. The view has been converted into a table.
Selected Answer: B
Question #: 34
Topic #: 1
A data analyst creates a Databricks SQL Query where the result set has the following schema: region STRING number_of_customer INT
When the analyst clicks on the “Add visualization” button on the SQL Editor page, which of the following types of visualizations will be selected by default?
A. Violin Chart
B. Line Chart
C. Bar Chart
D. Histogram
E. There is no default. The user must choose a visualization type.
Selected Answer: D
Question #: 39
Topic #: 1
A data analyst has been asked to configure an alert for a query that returns the income in the accounts_receivable table for a date range. The date range is configurable using a Date query parameter.
The Alert does not work.
Which of the following describes why the Alert does not work?
A. Alerts don’t work with queries that access tables.
B. Queries that return results based on dates cannot be used with Alerts.
C. The wrong query parameter is being used. Alerts only work with Date and Time query parameters.
D. Queries that use query parameters cannot be used with Alerts.
E. The wrong query parameter is being used. Alerts only work with dropdown list query parameters, not dates.
Selected Answer: D
Question #: 41
Topic #: 1
A data team has been given a series of projects by a consultant that need to be implemented in the Databricks Lakehouse Platform.
Which of the following projects should be completed in Databricks SQL?
A. Testing the quality of data as it is imported from a source
B. Tracking usage of feature variables for machine learning projects
C. Combining two data sources into a single, comprehensive dataset
D. Segmenting customers into like groups using a clustering algorithm
E. Automating complex notebook-based workflows with multiple tasks
Selected Answer: C
Question #: 40
Topic #: 1
Which of the following statements about adding visual appeal to visualizations in the Visualization Editor is incorrect?
A. Visualization scale can be changed.
B. Data Labels can be formatted.
C. Colors can be changed.
D. Borders can be added.
E. Tooltips can be formatted.
Selected Answer: D
Question #: 45
Topic #: 1
A data analyst is working with gold-layer tables to complete an ad-hoc project. A stakeholder has provided the analyst with an additional dataset that can be used to augment the gold-layer tables already in use.
Which of the following terms is used to describe this data augmentation?
A. Data testing
B. Ad-hoc improvements
C. Last-mile dashboarding
D. Last-mile ETL
E. Data enhancement
Selected Answer: E
Question #: 44
Topic #: 1
In which of the following situations will the mean value and median value of variable be meaningfully different?
A. When the variable contains no outliers
B. When the variable contains no missing values
C. When the variable is of the boolean type
D. When the variable is of the categorical type
E. When the variable contains a lot of extreme outliers
Selected Answer: E
Question #: 43
Topic #: 1
Which of the following statements describes descriptive statistics?
A. A branch of statistics that uses summary statistics to quantitatively describe and summarize data.
B. A branch of statistics that uses a variety of data analysis techniques to infer properties of an underlying distribution of probability.
C. A branch of statistics that uses quantitative variables that must take on a finite or countably infinite set of values.
D. A branch of statistics that uses summary statistics to categorically describe and summarize data.
E. A branch of statistics that uses quantitative variables that must take on an uncountable set of values.
Selected Answer: A
Question #: 27
Topic #: 1
Consider the following two statements:
Statement 1:
Statement 2:
Which of the following describes how the result sets will differ for each statement when they are run in Databricks SQL?
A. The first statement will return all data from the customers table and matching data from the orders table. The second statement will return all data from the orders table and matching data from the customers table. Any missing data will be filled in with NULL.
B. When the first statement is run, only rows from the customers table that have at least one match with the orders table on customer_id will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned.
C. There is no difference between the result sets for both statements.
D. Both statements will fail because Databricks SQL does not support those join types.
E. When the first statement is run, all rows from the customers table will be returned and only the customer_id from the orders table will be returned. When the second statement is run, only those rows in the customers table that do not have at least one match with the orders table on customer_id will be returned.
Selected Answer: B
Question #: 1
Topic #: 1
Which of the following layers of the medallion architecture is most commonly used by data analysts?
A. None of these layers are used by data analysts
B. Gold
C. All of these layers are used equally by data analysts
D. Silver
E. Bronze
Selected Answer: B
Question #: 9
Topic #: 1
A data analyst wants to create a dashboard with three main sections: Development, Testing, and Production. They want all three sections on the same dashboard, but they want to clearly designate the sections using text on the dashboard.
Which of the following tools can the data analyst use to designate the Development, Testing, and Production sections using text?
A. Separate endpoints for each section
B. Separate queries for each section
C. Markdown-based text boxes
D. Direct text written into the dashboard in editing mode
E. Separate color palettes for each section
Selected Answer: C
Question #: 26
Topic #: 1
In which of the following situations should a data analyst use higher-order functions?
A. When custom logic needs to be applied to simple, unnested data
B. When custom logic needs to be converted to Python-native code
C. When custom logic needs to be applied at scale to array data objects
D. When built-in functions are taking too long to perform tasks
E. When built-in functions need to run through the Catalyst Optimizer
Selected Answer: C
Question #: 8
Topic #: 1
Which of the following approaches can be used to ingest data directly from cloud-based object storage?
A. Create an external table while specifying the DBFS storage path to FROM
B. Create an external table while specifying the DBFS storage path to PATH
C. It is not possible to directly ingest data from cloud-based object storage
D. Create an external table while specifying the object storage path to FROM
E. Create an external table while specifying the object storage path to LOCATION
Selected Answer: E
Question #: 7
Topic #: 1
A data engineering team has created a Structured Streaming pipeline that processes data in micro-batches and populates gold-level tables. The microbatches are triggered every minute.
A data analyst has created a dashboard based on this gold-level data. The project stakeholders want to see the results in the dashboard updated within one minute or less of new data becoming available within the gold-level tables.
Which of the following cautions should the data analyst share prior to setting up the dashboard to complete this task?
A. The required compute resources could be costly
B. The gold-level tables are not appropriately clean for business reporting
C. The streaming data is not an appropriate data source for a dashboard
D. The streaming cluster is not fault tolerant
E. The dashboard cannot be refreshed that quickly
Selected Answer: A
Question #: 12
Topic #: 1
After running DESCRIBE EXTENDED accounts.customers;, the following was returned:
Now, a data analyst runs the following command:
DROP accounts.customers;
Which of the following describes the result of running this command?
A. Running SELECT * FROM delta. `dbfs:/stakeholders/customers` results in an error.
B. Running SELECT * FROM accounts.customers will return all rows in the table.
C. All files with the .customers extension are deleted.
D. The accounts.customers table is removed from the metastore, and the underlying data files are deleted.
E. The accounts.customers table is removed from the metastore, but the underlying data files are untouched.
Selected Answer: E
Question #: 21
Topic #: 1
A data analyst runs the following command:
INSERT INTO stakeholders.suppliers TABLE stakeholders.new_suppliers;
What is the result of running this command?
A. The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, and any duplicate data is deleted.
B. The command fails because it is written incorrectly.
C. The suppliers table now contains both the data it had before the command was run and the data from the new_suppliers table, including any duplicate data.
D. The suppliers table now contains the data from the new_suppliers table, and the new_suppliers table now contains the data from the suppliers table.
E. The suppliers table now contains only the data from the new_suppliers table.
Selected Answer: C
Question #: 20
Topic #: 1
A data analyst runs the following command:
SELECT age, country –
FROM my_table –
WHERE age >= 75 AND country = ‘canada’;
Which of the following tables represents the output of the above command?
A.
B.
C.
D.
E.
Selected Answer: E
Question #: 6
Topic #: 1
A data analyst has set up a SQL query to run every four hours on a SQL endpoint, but the SQL endpoint is taking too long to start up with each run.
Which of the following changes can the data analyst make to reduce the start-up time for the endpoint while managing costs?
A. Reduce the SQL endpoint cluster size
B. Increase the SQL endpoint cluster size
C. Turn off the Auto stop feature
D. Increase the minimum scaling value
E. Use a Serverless SQL endpoint
Selected Answer: E
Question #: 4
Topic #: 1
Which of the following approaches can be used to connect Databricks to Fivetran for data ingestion?
A. Use Workflows to establish a SQL warehouse (formerly known as a SQL endpoint) for Fivetran to interact with
B. Use Delta Live Tables to establish a cluster for Fivetran to interact with
C. Use Partner Connect’s automated workflow to establish a cluster for Fivetran to interact with
D. Use Partner Connect’s automated workflow to establish a SQL warehouse (formerly known as a SQL endpoint) for Fivetran to interact with
E. Use Workflows to establish a cluster for Fivetran to interact with
Selected Answer: D
Question #: 3
Topic #: 1
Which of the following describes how Databricks SQL should be used in relation to other business intelligence (BI) tools like Tableau, Power BI, and looker?
A. As an exact substitute with the same level of functionality
B. As a substitute with less functionality
C. As a complete replacement with additional functionality
D. As a complementary tool for professional-grade presentations
E. As a complementary tool for quick in-platform BI work
Selected Answer: E
Question #: 2
Topic #: 1
A data analyst has recently joined a new team that uses Databricks SQL, but the analyst has never used Databricks before. The analyst wants to know where in Databricks SQL they can write and execute SQL queries.
On which of the following pages can the analyst write and execute SQL queries?
A. Data page
B. Dashboards page
C. Queries page
D. Alerts page
E. SQL Editor page
Selected Answer: E
Question #: 11
Topic #: 1
A data analyst is attempting to drop a table my_table. The analyst wants to delete all table metadata and data.
They run the following command:
DROP TABLE IF EXISTS my_table;
While the object no longer appears when they run SHOW TABLES, the data files still exist.
Which of the following describes why the data files still exist and the metadata files were deleted?
A. The table’s data was larger than 10 GB
B. The table did not have a location
C. The table was external
D. The table’s data was smaller than 10 GB
E. The table was managed
Selected Answer: B
Question #: 10
Topic #: 1
A data analyst needs to use the Databricks Lakehouse Platform to quickly create SQL queries and data visualizations. It is a requirement that the compute resources in the platform can be made serverless, and it is expected that data visualizations can be placed within a dashboard.
Which of the following Databricks Lakehouse Platform services/capabilities meets all of these requirements?
A. Delta Lake
B. Databricks Notebooks
C. Tableau
D. Databricks Machine Learning
E. Databricks SQL
Selected Answer: E
Question #: 16
Topic #: 1
Which of the following benefits of using Databricks SQL is provided by Data Explorer?
A. It can be used to run UPDATE queries to update any tables in a database.
B. It can be used to view metadata and data, as well as view/change permissions.
C. It can be used to produce dashboards that allow data exploration.
D. It can be used to make visualizations that can be shared with stakeholders.
E. It can be used to connect to third party BI cools.
Selected Answer: B
Question #: 15
Topic #: 1
Which of the following is an advantage of using a Delta Lake-based data lakehouse over common data lake solutions?
A. ACID transactions
B. Flexible schemas
C. Data deletion
D. Scalable storage
E. Open-source formats
Selected Answer: C
Question #: 14
Topic #: 1
Delta Lake stores table data as a series of data files, but it also stores a lot of other information.
Which of the following is stored alongside data files when using Delta Lake?
A. None of these
B. Table metadata, data summary visualizations, and owner account information
C. Table metadata
D. Data summary visualizations
E. Owner account information
Selected Answer: C
Question #: 13
Topic #: 1
Which of the following should data analysts consider when working with personally identifiable information (PII) data?
A. Organization-specific best practices for PII data
B. Legal requirements for the area in which the data was collected
C. None of these considerations
D. Legal requirements for the area in which the analysis is being performed
E. All of these considerations
Selected Answer: E
Question #: 19
Topic #: 1
A data analyst has a managed table table_name in database database_name. They would now like to remove the table from the database and all of the data files associated with the table. The rest of the tables in the database must continue to exist.
Which of the following commands can the analyst use to complete the task without producing an error?
A. DROP DATABASE database_name;
B. DROP TABLE database_name.table_name;
C. DELETE TABLE database_name.table_name;
D. DELETE TABLE table_name FROM database_name;
E. DROP TABLE table_name FROM database_name;
Selected Answer: B
Question #: 5
Topic #: 1
Data professionals with varying titles use the Databricks SQL service as the primary touchpoint with the Databricks Lakehouse Platform. However, some users will use other services like Databricks Machine Learning or Databricks Data Science and Engineering.
Which of the following roles uses Databricks SQL as a secondary service while primarily using one of the other services?
A. Business analyst
B. SQL analyst
C. Data engineer
D. Business intelligence analyst
E. Data analyst
Selected Answer: A
Question #: 37
Topic #: 1
A data analyst has been asked to produce a visualization that shows the flow of users through a website.
Which of the following is used for visualizing this type of flow?
A. Heatmap
B. Choropleth
C. Word Cloud
D. Pivot Table
E. Sankey
Selected Answer: E
Question #: 31
Topic #: 1
Which of the following is a benefit of Databricks SQL using ANSI SQL as its standard SQL dialect?
A. It has increased customization capabilities
B. It is easy to migrate existing SQL queries to Databricks SQL
C. It allows for the use of Photon’s computation optimizations
D. It is more performant than other SQL dialects
E. It is more compatible with Spark’s interpreters
Selected Answer: B
Question #: 30
Topic #: 1
A data analyst is processing a complex aggregation on a table with zero null values and their query returns the following result:
Which of the following queries did the analyst run to obtain the above result?
A.
B.
C.
D.
E.
Selected Answer: E
Question #: 29
Topic #: 1
A data analyst has been asked to count the number of customers in each region and has written the following query:
If there is a mistake in the query, which of the following describes the mistake?
A. The query is using count(*), which will count all the customers in the customers table, no matter the region.
B. The query is missing a GROUP BY region clause.
C. The query is using ORDER BY, which is not allowed in an aggregation.
D. There are no mistakes in the query.
E. The query is selecting region, but region should only occur in the ORDER BY clause.
Selected Answer: B
Question #: 28
Topic #: 1
A data analyst has created a user-defined function using the following line of code:
CREATE FUNCTION price(spend DOUBLE, units DOUBLE)
RETURNS DOUBLE –
RETURN spend / units;
Which of the following code blocks can be used to apply this function to the customer_spend and customer_units columns of the table customer_summary to create column customer_price?
A. SELECT PRICE customer_spend, customer_units AS customer_price
FROM customer_summary
B. SELECT price –
FROM customer_summary
C. SELECT function(price(customer_spend, customer_units)) AS customer_price
FROM customer_summary
D. SELECT double(price(customer_spend, customer_units)) AS customer_price
FROM customer_summary
E. SELECT price(customer_spend, customer_units) AS customer_price
FROM customer_summary
Selected Answer: E
Question #: 18
Topic #: 1
A data analyst created and is the owner of the managed table my_ table. They now want to change ownership of the table to a single other user using Data Explorer.
Which of the following approaches can the analyst use to complete the task?
A. Edit the Owner field in the table page by removing their own account
B. Edit the Owner field in the table page by selecting All Users
C. Edit the Owner field in the table page by selecting the new owner’s account
D. Edit the Owner field in the table page by selecting the Admins group
E. Edit the Owner field in the table page by removing all access
Selected Answer: C