Data Science Quiz 500 Questions with Answers

Data Science Quiz Questions with Answers: We are going to cover data science courses MCQs there are further sub-topics to cover under this course.

Data Science MCQs

1. Python For Data Science

2. Python Programming for Data Analysis

3. Python For Data Science and Machine Learning Bootcamp

4. Dataquest Python

5. Applied Data Science with Python Specialization

6. Machine Learning

7. Tableau Training

8. Udacity Programming for Data Science with Python


1. Python for Data Science


1. What is the primary purpose of NumPy in Python for Data Science?

a) Data visualization

b) Machine learning

c) Data manipulation and analysis

d) Web development

Answer: c) Data manipulation and analysis

2. In pandas, what does the term “DataFrame” refer to?

a) A machine learning model

b) A two-dimensional, labeled data structure

c) A Python library for web development

d) A statistical visualization technique

Answer: b) A two-dimensional, labeled data structure

3. Which library is commonly used for data visualization in Python?

a) TensorFlow

b) Matplotlib

c) Scikit-learn

d) PyTorch

Answer: b) Matplotlib

4. What does the term “CSV” stand for in the context of data handling in Python?

a) Comma-Separated Values

b) Centralized System for Values

c) Complex Structured Variables

d) Computerized Storage of Variables

Answer: a) Comma-Separated Values

5. Which of the following is a supervised learning algorithm in Scikit-learn?

a) K-Means

b) Decision Trees

c) Principal Component Analysis (PCA)

d) Support Vector Machines (SVM)

Answer: b) Decision Trees

6. What is the purpose of the “iloc” function in pandas?

a) Indexing by labels

b) Indexing by integers

c) Calculating summary statistics

d) Plotting data

Answer: b) Indexing by integers

7. Which of the following statements is true about Python’s “lambda” functions?

a) They can have multiple expressions.

b) They are used for large-scale data processing.

c) They are defined using the “def” keyword.

d) They are anonymous functions.

Answer: d) They are anonymous functions.

8. What does the term “tf-idf” stand for in natural language processing (NLP)?

a) Term Frequency-Inverse Document Frequency

b) Text File-Incremental Data Format

c) TensorFlow Integrated Deep Features

d) Token Frequency-Importance of Document Features

Answer: a) Term Frequency-Inverse Document Frequency

9. Which library is commonly used for machine learning tasks in Python?

a) Pandas

b) Seaborn

c) Scikit-learn

d) Numpy

Answer: c) Scikit-learn

10. What does the acronym “API” stand for in the context of web data retrieval?

a) Automated Python Integration

b) Application Programming Interface

c) Advanced Programming Instruction

d) Algorithmic Programming Interface

Answer: b) Application Programming Interface

11. What is the purpose of the “Seaborn” library in Python?

a) Web development

b) Data manipulation

c) Data visualization

d) Machine learning

Answer: c) Data visualization

12. In the context of machine learning, what is the role of the “train-test split” method?

a) Splitting the dataset into training and testing sets

b) Training the model on the entire dataset

c) Testing the model on a separate dataset

d) Randomly shuffling the data

Answer: a) Splitting the dataset into training and testing sets

13. Which of the following statements is true about the “Pandas” library in Python?

a) It is primarily used for machine learning.

b) It is not suitable for handling large datasets.

c) It provides data structures for efficient data manipulation.

d) It is used exclusively for web development.

Answer: c) It provides data structures for efficient data manipulation.

14. What is the purpose of the “Counter” class in Python’s “collections” module?

a) Sorting elements in a list

b) Counting occurrences of elements in an iterable

c) Creating a dictionary from two lists

d) Performing matrix multiplication

Answer: b) Counting occurrences of elements in an iterable

15. Which of the following is a dimensionality reduction technique used in machine learning?

a) K-Nearest Neighbors (KNN)

b) Principal Component Analysis (PCA)

c) Random Forest

d) Gradient Boosting

Answer: b) Principal Component Analysis (PCA)

16. What does the acronym “SQL” stand for in the context of databases?

a) Structured Query Language

b) Simple Query Language

c) System Query Language

d) Sequential Query Language

Answer: a) Structured Query Language

17. In Python, what does the “json” module provide?

a) Tools for web scraping

b) Tools for handling JSON data

c) Mathematical functions

d) Image processing capabilities

Answer: b) Tools for handling JSON data

18. What does the “K-Means” algorithm aim to achieve in machine learning?

a) Classification

b) Regression

c) Clustering

d) Dimensionality reduction

Answer: c) Clustering

19. What is the purpose of the “cross_val_score” function in Scikit-learn?

a) Calculating cross-validated performance metrics

b) Cross-validating hyperparameters

c) Cross-referencing dataset columns

d) Converting data to a pandas DataFrame

Answer: a) Calculating cross-validated performance metrics

20. Which library is commonly used for deep learning tasks in Python?

a) Keras

b) Statsmodels

c) Beautiful Soup

d) NetworkX

Answer: a) Keras

21. What is the purpose of the “SciPy” library in Python?

a) Data visualization

b) Scientific computing

c) Machine learning

d) Web development

Answer: b) Scientific computing

22. What does the term “Regular Expression” (regex) refer to in Python?

a) A module for creating regular polygons

b) A method for defining complex mathematical expressions

c) A sequence of characters defining a search pattern

d) A type of recursive function

Answer: c) A sequence of characters defining a search pattern

23. In machine learning, what does the term “overfitting” mean?

a) The model is too simple and cannot capture the underlying patterns.

b) The model performs well on the training data but poorly on new data.

c) The model is too complex and memorizes the training data.

d) The model is biased towards certain features.

Answer: c) The model is too complex and memorizes the training data.

24. What is the purpose of the “pickle” module in Python?

a) Sorting data structures

b) Serializing and deserializing Python objects

c) Generating random numbers

d) Extracting information from XML files

Answer: b) Serializing and deserializing Python objects

25. Which of the following is a supervised learning algorithm used for classification?

a) K-Means

b) Random Forest

c) Hierarchical Clustering

d) DBSCAN

Answer: b) Random Forest

26. What is the purpose of the “Scrapy” library in Python?

a) Scientific computing

b) Web scraping

c) Machine learning

d) Image processing

Answer: b) Web scraping

27. Which of the following is a method for handling missing data in pandas?

a) dropna()

b) fillna()

c) isnull()

d) All of the above

Answer: d) All of the above

28. What does the term “bagging” refer to in machine learning?

a) Boosting multiple weak models

b) Training multiple models independently and combining their predictions

c) Feature engineering technique

d) Dimensionality reduction

Answer: b) Training multiple models independently and combining their predictions

29. What is the purpose of the “requests” library in Python?

a) Handling dates and times

b) Making HTTP requests

c) Statistical analysis

d) Natural language processing

Answer: b) Making HTTP requests

30. What does the term “One-Hot Encoding” refer to in the context of machine learning?

a) Encoding numerical values as binary digits

b) Encoding categorical variables as binary vectors

c) Encoding text data into numerical vectors

d) Encoding time series data

Answer: b) Encoding categorical variables as binary vectors

31. Which library provides tools for working with SQL databases in Python?

a) SQLite

b) SQLAlchemy

c) Pandas

d) Scikit-learn

Answer: b) SQLAlchemy

32. What does the term “ensemble learning” mean in the context of machine learning?

a) Training a single model on multiple datasets

b) Combining predictions from multiple models to improve performance

c) Splitting the dataset into training and testing sets

d) Regularizing a model to prevent overfitting

Answer: b) Combining predictions from multiple models to improve performance

33. In Pandas, what does the “groupby” function do?

a) Sorts the data in ascending order

b) Aggregates data based on specified criteria

c) Combines two dataframes into one

d) Computes the correlation matrix

Answer: b) Aggregates data based on specified criteria

34. Which of the following is a dimensionality reduction technique that preserves variance?

a) Principal Component Analysis (PCA)

b) Linear Regression

c) Decision Trees

d) K-Means Clustering

Answer: a) Principal Component Analysis (PCA)

35. What is the purpose of the “matplotlib” library in Python?

a) Scientific computing

b) Web development

c) Data visualization

d) Machine learning

Answer: c) Data visualization


2. Python Programming For Data Analysis


1. What library is commonly used for data manipulation and analysis in Python?

a) Matplotlib

b) Pandas

c) NumPy

d) Seaborn

Answer: b) Pandas

2. In Pandas, what is the primary data structure used to store one-dimensional labeled data?

a) DataFrame

b) Series

c) Array

d) List

Answer: b) Series

3. What function in Pandas is used to read a CSV file into a DataFrame?

a) load_csv()

b) read_csv()

c) import_csv()

d) open_csv()

Answer: b) read_csv()

4. Which of the following is a correct way to select a column named ‘age’ from a Pandas DataFrame called ‘df’?

a) df.select_column(‘age’)

b) df[‘age’]

c) df.select(‘age’)

d) df.column(‘age’)

Answer: b) df[‘age’]

5. What is the purpose of the NumPy function np.mean()?

a) Calculate the median

b) Calculate the mean

c) Calculate the standard deviation

d) Perform matrix multiplication

Answer: b) Calculate the mean

6. In Python, what does the term “list comprehension” refer to?

a) A concise way to create lists using a single line of code

b) A type of exception handling in Python

c) A method for defining complex mathematical expressions

d) A way to define functions with multiple return statements

Answer: a) A concise way to create lists using a single line of code

7. Which function is used to plot a histogram in Matplotlib?

a) plot_hist()

b) hist_plot()

c) plt.histogram()

d) plt.hist()

Answer: d) plt.hist()

8. What is the purpose of the iloc function in Pandas?

a) Indexing by labels

b) Indexing by integers

c) Calculating summary statistics

d) Plotting data

Answer: b) Indexing by integers

9. What is the primary purpose of the apply() function in Pandas?

a) Apply a mathematical operation to a column

b) Apply a function along the axis of a DataFrame

c) Apply a filter to a DataFrame

d) Apply a sorting algorithm to a DataFrame

Answer: b) Apply a function along the axis of a DataFrame

10. What is the purpose of the merge() function in Pandas?

a) Splitting a DataFrame into multiple DataFrames

b) Combining two DataFrames based on a common key

c) Merging two DataFrames into a single DataFrame

d) Reshaping a DataFrame

Answer: c) Merging two DataFrames into a single DataFrame

11. What does the groupby() function in Pandas allow you to do?

a) Sort the data in ascending order

b) Group data based on specified criteria

c) Apply a function element-wise

d) Reshape a DataFrame

Answer: b) Group data based on specified criteria

12. Which library is commonly used for statistical data visualization in Python?

a) NumPy

b) Seaborn

c) Pandas

d) Matplotlib

Answer: b) Seaborn

13. What is the purpose of the value_counts() function in Pandas?

a) Compute summary statistics

b) Count the occurrences of unique values in a column

c) Create a new DataFrame

d) Sort the data in descending order

Answer: b) Count the occurrences of unique values in a column

14. How do you check for missing values in a Pandas DataFrame?

a) df.check_missing()

b) df.isnull()

c) df.missing_values()

d) df.check_na()

Answer: b) df.isnull()

15. In Matplotlib, what does the scatter() function do?

a) Plot a line chart

b) Plot a scatter plot

c) Plot a bar chart

d) Plot a pie chart

Answer: b) Plot a scatter plot

16. Which of the following is a correct way to create a NumPy array with values ranging from 0 to 9?

a) np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

b) np.arange(0, 10)

c) np.linspace(0, 9, 10)

d) All of the above

Answer: d) All of the above

17. What is the purpose of the pivot_table() function in Pandas?

a) Reshape a DataFrame by aggregating data

b) Pivot columns into rows

c) Apply a function element-wise

d) Group data based on specified criteria

Answer: a) Reshape a DataFrame by aggregating data

18. Which library provides tools for statistical modeling in Python?

a) Statsmodels

b) Scikit-learn

c) TensorFlow

d) Keras

Answer: a) Statsmodels

19. What does the fillna() function in Pandas do?

a) Fill missing values with a specified constant

b) Drop rows with missing values

c) Fill missing values with the mean of the column

d) All of the above

Answer: d) All of the above

20. In Python, what is the purpose of the lambda function?

a) Define a function with a single expression

b) Create anonymous classes

c) Format strings

d) Generate random numbers

Answer: a) Define a function with a single expression

21. Which function is used to calculate the correlation matrix in Pandas?

a) correlation_matrix()

b) calculate_correlation()

c) corr()

d) correlate()

Answer: c) corr()

22. What does the term “resampling” refer to in the context of time series data?

a) Splitting the data into training and testing sets

b) Transforming data into a different representation

c) Changing the frequency of data points

d) Imputing missing values

Answer: c) Changing the frequency of data points

23. What is the purpose of the np.unique() function in NumPy?

a) Sort an array in ascending order

b) Find the unique elements in an array

c) Calculate the mean of an array

d) Perform element-wise multiplication

Answer: b) Find the unique elements in an array

24. Which of the following statements is true about the iterrows() function in Pandas?

a) It is used for sorting a DataFrame.

b) It iterates over rows in a DataFrame.

c) It computes summary statistics.

d) It is used for merging DataFrames.

Answer: b) It iterates over rows in a DataFrame.

25. What is the primary purpose of the cut() function in Pandas?

a) Cut the DataFrame into smaller pieces

b) Apply a function to each element of the DataFrame

c) Bin values into discrete intervals

d) Concatenate two DataFrames

Answer: c) Bin values into discrete intervals

26. What does the term “feature scaling” refer to in the context of machine learning?

a) Encoding categorical variables

b) Standardizing numerical features

c) Reshaping the input data

d) Imputing missing values

Answer: b) Standardizing numerical features

27. Which of the following is a correct way to install a Python library using pip in the command line?

a) install pandas

b) pip install pandas

c) python install pandas

d) pandas install

Answer: b) pip install pandas

28. What does the np.random.seed() function do in NumPy?

a) Generate random numbers

b) Set the seed for the random number generator

c) Shuffle an array randomly

d) Create a random permutation of an array

Answer: b) Set the seed for the random number generator

29. What is the purpose of the loc function in Pandas?

a) Indexing by integers

b) Indexing by labels

c) Sorting data

d) Plotting data

Answer: b) Indexing by labels

30. Which of the following statements is true about the cross_val_score() function in Scikit-learn?

a) It calculates the cross-validated performance metrics for a machine learning model.

b) It computes the mean squared error of a model.

c) It performs feature scaling on the input data.

d) It is used for hyperparameter tuning.

Answer: a) It calculates the cross-validated performance metrics for a machine learning model.

31. What is the purpose of the resample() function in Pandas for time series data?

a) Reorganize data into a DataFrame

b) Rename columns based on a specified criteria

c) Change the frequency of time series data

d) Reshape a DataFrame by aggregating data

Answer: c) Change the frequency of time series data

32. In Python, what does the term “broadcasting” refer to in the context of NumPy?

a) A method for transmitting data over a network

b) A technique for manipulating strings

c) The automatic extension of arrays to perform element-wise operations

d) A mechanism for encrypting data

Answer: c) The automatic extension of arrays to perform element-wise operations

33. What is the purpose of the min() and max() functions in Pandas?

a) Compute the minimum and maximum values of a DataFrame

b) Filter rows based on specified criteria

c) Reshape a DataFrame

d) Group data based on specified criteria

Answer: a) Compute the minimum and maximum values of a DataFrame

34. What does the term “tf-idf” stand for in natural language processing (NLP)?

a) Text Frequency-Inverse Document Frequency

b) Term Frequency-Inverse Document Frequency

c) Token Frequency-Document Importance Factor

d) Text Feature-Inverse Document Frequency

Answer: b) Term Frequency-Inverse Document Frequency

35. Which library provides tools for creating interactive visualizations in Python?

a) Plotly

b) Matplotlib

c) Seaborn

d) Bokeh

Answer: a) Plotly


3. Python for Data Science and Machine Learning Bootcamp


1. What is the purpose of the NumPy library in the context of data science and machine learning?

a) Web development

b) Data visualization

c) Data manipulation and numerical operations

d) Natural language processing

Answer: c) Data manipulation and numerical operations

2. Which of the following libraries is commonly used for creating machine learning models in Python?

a) Matplotlib

b) TensorFlow

c) Seaborn

d) Flask

Answer: b) TensorFlow

3. What is the primary data structure used in the pandas library for handling tabular data?

a) Series

b) List

c) DataFrame

d) Array

Answer: c) DataFrame

4. In machine learning, what does the term “training set” refer to?

a) The dataset used for making predictions

b) The dataset used for evaluating model performance

c) The dataset used for training the model

d) The validation dataset

Answer: c) The dataset used for training the model

5. What does the term “feature engineering” mean in the context of machine learning?

a) Creating new machine learning features

b) Optimizing model hyperparameters

c) Building complex machine learning models

d) Extracting relevant information from raw data

Answer: d) Extracting relevant information from raw data

6. Which Python library is commonly used for data visualization in a machine learning bootcamp?

a) TensorFlow

b) Matplotlib

c) Scikit-learn

d) NumPy

Answer: b) Matplotlib

7. What is the purpose of the scikit-learn library in a machine learning bootcamp?

a) Data manipulation

b) Neural network implementation

c) Machine learning algorithms and tools

d) Natural language processing

Answer: c) Machine learning algorithms and tools

8. What does the term “supervised learning” mean in machine learning?

a) Training a model without labels

b) Learning from labeled data to make predictions on new, unseen data

c) Training a model with only one feature

d) Unsupervised learning with multiple models

Answer: b) Learning from labeled data to make predictions on new, unseen data

9. Which method is used to evaluate the performance of a machine learning model on a classification problem?

a) Mean Squared Error (MSE)

b) R-squared

c) Confusion Matrix

d) F1 Score

Answer: c) Confusion Matrix

10. What is the purpose of the train_test_split function in scikit-learn?

a) Splitting data into training and validation sets

b) Training a model on the entire dataset

c) Testing the model on a separate dataset

d) Splitting data into training and testing sets

Answer: d) Splitting data into training and testing sets

11. What does the term “hyperparameter” refer to in the context of machine learning?

a) Parameters learned by the model during training

b) Model weights

c) External configuration settings for the model

d) Features in the dataset

Answer: c) External configuration settings for the model

12. Which of the following is a commonly used algorithm for linear regression in scikit-learn?

a) K-Means

b) Decision Trees

c) LinearSVC

d) LinearRegression

Answer: d) LinearRegression

13. In a Jupyter Notebook, how do you comment out a single line of code?

a) // Comment here

b) # Comment here

c) /* Comment here */

d) <!– Comment here –>

Answer: b) # Comment here

14. What does the term “bagging” stand for in ensemble learning?

a) Boosting and Aggregation

b) Bootstrap Aggregating

c) Binary Aggregation

d) Bagged Algorithm Generation

Answer: b) Bootstrap Aggregating

15. What is the primary purpose of the Random Forest algorithm?

a) Regression

b) Clustering

c) Classification and Regression

d) Dimensionality Reduction

Answer: c) Classification and Regression

16. Which of the following metrics is commonly used for evaluating classification models?

a) Mean Squared Error (MSE)

b) R-squared

c) Area Under the Receiver Operating Characteristic curve (AUC-ROC)

d) Mean Absolute Error (MAE)

Answer: c) Area Under the Receiver Operating Characteristic curve (AUC-ROC)

17. What is the purpose of the GridSearchCV class in scikit-learn?

a) Feature selection

b) Hyperparameter tuning using cross-validation

c) Dimensionality reduction

d) Data preprocessing

Answer: b) Hyperparameter tuning using cross-validation

18. What is the significance of the term “recall” in the context of classification metrics?

a) It measures the ability of a model to identify true positive cases

b) It measures the overall correctness of a model

c) It is another term for precision

d) It is used for clustering evaluation

Answer: a) It measures the ability of a model to identify true positive cases

19. What is the purpose of the tf.keras module in TensorFlow?

a) Feature extraction

b) Image processing

c) Neural network construction using the Keras API

d) Text mining

Answer: c) Neural network construction using the Keras API

20. What does the term “cross-validation” help address in machine learning?

a) Overfitting

b) Underfitting

c) Data leakage

d) Bias-variance tradeoff

Answer: a) Overfitting

21. In machine learning, what is the purpose of the term “dropout” in neural networks?

a) Eliminating data points with missing values

b) Randomly excluding units during training to prevent overfitting

c) Reducing the learning rate during training

d) Adjusting the weights of the neural network

Answer: b) Randomly excluding units during training to prevent overfitting

22. Which activation function is commonly used in the output layer for binary classification problems?

a) Sigmoid

b) ReLU

c) Tanh

d) Softmax

Answer: a) Sigmoid

23. What is the primary objective of using regularization techniques in machine learning models?

a) Increase model complexity

b) Reduce model complexity to prevent overfitting

c) Achieve higher training accuracy

d) Improve model interpretability

Answer: b) Reduce model complexity to prevent overfitting

24. What is the purpose of the StandardScaler in scikit-learn?

a) Normalize data by scaling it to have zero mean and unit variance

b) Perform dimensionality reduction

c) Impute missing values in the dataset

d) Apply feature engineering techniques

Answer: a) Normalize data by scaling it to have zero mean and unit variance

25. Which of the following techniques is commonly used for handling imbalanced datasets in classification problems?

a) Feature scaling

b) Data augmentation

c) SMOTE (Synthetic Minority Over-sampling Technique)

d) Cross-validation

Answer: c) SMOTE (Synthetic Minority Over-sampling Technique)

26. What is the primary purpose of the MinMaxScaler in scikit-learn?

a) Standardize data by removing the mean and scaling to unit variance

b) Normalize data to a specific range, usually [0, 1]

c) Impute missing values in the dataset

d) Transform categorical features into numerical representations

Answer: b) Normalize data to a specific range, usually [0, 1]

27. In the context of neural networks, what does the term “epoch” refer to?

a) A measure of model complexity

b) A single pass through the entire training dataset

c) The learning rate of the model

d) The number of layers in the neural network

Answer: b) A single pass through the entire training dataset

28. What is the primary purpose of the One-Hot Encoding technique in data preprocessing?

a) Standardizing numerical features

b) Encoding categorical variables as binary vectors

c) Normalizing data

d) Imputing missing values

Answer: b) Encoding categorical variables as binary vectors

29. Which of the following statements is true about the Logistic Regression algorithm in machine learning?

a) It is only used for regression problems.

b) It can handle both classification and regression problems.

c) It is a non-linear algorithm.

d) It requires the data to be linearly separable.

Answer: b) It can handle both classification and regression problems.

30. What is the primary purpose of the R-squared metric in regression problems?

a) Measure the proportion of explained variance in the target variable

b) Measure the accuracy of a classification model

c) Evaluate the balance between precision and recall

d) Assess the quality of the model’s predictions

Answer: a) Measure the proportion of explained variance in the target variable

31. In machine learning, what is the objective of the K-Means clustering algorithm?

a) Predicting labels for new data points

b) Classifying data into multiple categories

c) Dividing data into groups based on similarity

d) Reducing the dimensionality of the data

Answer: c) Dividing data into groups based on similarity

32. What is the purpose of the tf-idf vectorization technique in natural language processing (NLP)?

a) Tokenizing words in a document

b) Calculating word frequencies

c) Encoding categorical variables

d) Assigning weights to words based on their importance in a document

Answer: d) Assigning weights to words based on their importance in a document

33. What is the primary role of the Random Search technique in hyperparameter tuning?

a) Exhaustively search through all possible hyperparameter combinations

b) Randomly sample hyperparameter combinations from a predefined search space

c) Optimize the learning rate of a model

d) Perform feature engineering on the dataset

Answer: b) Randomly sample hyperparameter combinations from a predefined search space

34. In the context of neural networks, what does the term “batch size” refer to?

a) The number of layers in the network

b) The number of epochs in training

c) The number of samples processed in each iteration

d) The learning rate during training

Answer: c) The number of samples processed in each iteration

35. What is the purpose of the pickle module in Python’s standard library?

a) Handling data manipulation in Pandas

b) Serializing and deserializing Python objects

c) Creating visualizations in Matplotlib

d) Extracting information from XML files

Answer: b) Serializing and deserializing Python objects


4. Dataquest Python


1. What does the acronym “API” stand for in the context of web development and data science?

a) Advanced Python Integration

b) Application Programming Interface

c) Automated Program Invocation

d) Adaptive Programming Interface

Answer: b) Application Programming Interface

2. In Python, what does the zip() function do?

a) Compresses files in a directory

b) Combines two or more iterables element-wise

c) Creates a new list by filtering elements from an existing list

d) Unzips compressed files

Answer: b) Combines two or more iterables element-wise

3. What is the purpose of the iloc attribute in Pandas?

a) Accessing elements by label

b) Accessing elements by integer index

c) Computing summary statistics

d) Filtering rows based on conditions

Answer: b) Accessing elements by integer index

4. In Python, what is the primary use of the __init__ method in a class?

a) Initializing class variables

b) Defining class methods

c) Initializing instance variables

d) Creating an instance of the class

Answer: c) Initializing instance variables

5. What does the term “regular expression” (regex) refer to in Python?

a) A sequence of characters defining a search pattern

b) An expression used exclusively in regular programming

c) A method for writing regular code

d) A data type for representing expressions

Answer: a) A sequence of characters defining a search pattern

6. In Pandas, what does the groupby() function do?

a) Sorting the data in ascending order

b) Aggregating data based on specified criteria

c) Filtering rows based on conditions

d) Creating a new DataFrame

Answer: b) Aggregating data based on specified criteria

7. What is the purpose of the requests library in Python?

a) Handling dates and times

b) Making HTTP requests

c) Manipulating strings

d) Creating visualizations

Answer: b) Making HTTP requests

8. Which of the following is a correct way to open and read a file in Python?

a) open_file(‘example.txt’, ‘r’)

b) read(‘example.txt’)

c) with open(‘example.txt’, ‘r’) as file:

d) file = open(‘example.txt’)

Answer: c) with open(‘example.txt’, ‘r’) as file:

9. What does the term “Lambda function” refer to in Python?

a) A function that performs complex mathematical operations

b) A function defined using the def keyword

c) An anonymous function defined using the lambda keyword

d) A function that requires user input

Answer: c) An anonymous function defined using the lambda keyword

10. In the context of machine learning, what is the purpose of the train_test_split function from scikit-learn?

a) Splitting the dataset into training and testing sets

b) Training the model on the entire dataset

c) Testing the model on a separate dataset

d) Shuffling the data randomly

Answer: a) Splitting the dataset into training and testing sets

11. What is the purpose of the isin() method in Pandas?

a) Sorting data

b) Checking for missing values

c) Filtering data based on a list of values

d) Grouping data by specified criteria

Answer: c) Filtering data based on a list of values

12. In Python, what does the term “list comprehension” refer to?

a) A concise way to create lists using a single line of code

b) A method for defining complex mathematical expressions

c) A way to manipulate strings

d) A type of exception handling

Answer: a) A concise way to create lists using a single line of code

13. What is the purpose of the enumerate() function in Python?

a) Counting the number of elements in a list

b) Creating a dictionary

c) Iterating over elements of a sequence and keeping track of the index

d) Finding the maximum value in a list

Answer: c) Iterating over elements of a sequence and keeping track of the index

14. What does the term “feature scaling” refer to in the context of machine learning?

a) Encoding categorical variables

b) Standardizing numerical features

c) Reshaping the input data

d) Imputing missing values

Answer: b) Standardizing numerical features

15. In Python, what is the purpose of the map() function?

a) Creating visualizations

b) Applying a function to each element of an iterable

c) Sorting a list

d) Concatenating strings

Answer: b) Applying a function to each element of an iterable

16. What is the primary use of the Counter class in Python’s collections module?

a) Counting the number of elements in a list

b) Creating a counter for loop iterations

c) Summing up numerical values in a list

d) Calculating the mean of a list

Answer: a) Counting the number of elements in a list

17. Which method is used to find the index of the maximum value in a list using Python?

a) list.index(max(list))

b) max(list).index()

c) list.find(max(list))

d) list.max_index()

Answer: a) list.index(max(list))

18. What does the continue statement do in a Python loop?

a) Terminates the loop immediately

b) Skips the rest of the code in the loop and moves to the next iteration

c) Restarts the loop from the beginning

d) Breaks out of the loop

Answer: b) Skips the rest of the code in the loop and moves to the next iteration

19. Which of the following is a correct way to import the numpy library with the alias np?

a) import numpy as np

b) from numpy import np

c) import np as numpy

d) from np import numpy

Answer: a) import numpy as np

20. What is the purpose of the isin() method in Pandas?

a) Sorting data

b) Checking for missing values

c) Filtering data based on a list of values

d) Grouping data by specified criteria

Answer: c) Filtering data based on a list of values

21. In Python, what does the term “Pickle” refer to?

a) A type of cucumber

b) A module for data serialization

c) A sorting algorithm

d) A method for creating pickled vegetables

Answer: b) A module for data serialization

22. What is the purpose of the mode() function in Pandas?

a) Computing the mode of a numerical variable

b) Filtering rows based on a condition

c) Sorting data

d) Creating visualizations

Answer: a) Computing the mode of a numerical variable

23. In Python, what is the purpose of the with statement?

a) Defining a function

b) Creating a class

c) Opening and managing resources like files

d) Executing a block of code repeatedly

Answer: c) Opening and managing resources like files

24. What does the term “SciKit-Learn” refer to in Python?

a) A module for scientific computing

b) A library for data visualization

c) A machine learning library

d) A web development framework

Answer: c) A machine learning library

25. In Python, what is the purpose of the *args and **kwargs syntax in function definitions?

a) Defining default values for function parameters

b) Accepting a variable number of positional and keyword arguments in a function

c) Indicating optional parameters

d) Creating lambda functions

Answer: b) Accepting a variable number of positional and keyword arguments in a function

26. What is the primary use of the crosstab() function in Pandas?

a) Cross-validation of machine learning models

b) Creating contingency tables

c) Cross-plotting variables in a DataFrame

d) Cross-checking data for consistency

Answer: b) Creating contingency tables

27. In Python, what does the term “Docstring” refer to?

a) A documentation string used to describe a function, module, or class

b) A data type for representing documents

c) A string used to create documents

d) A module for handling strings

Answer: a) A documentation string used to describe a function, module, or class

28. What does the term “stacking” refer to in the context of machine learning ensemble methods?

a) Combining predictions from multiple models to make a final prediction

b) Increasing the number of layers in a neural network

c) Creating a stack of data frames

d) Sorting the data in ascending order

Answer: a) Combining predictions from multiple models to make a final prediction

29. In Python, what does the sorted() function do?

a) Sorts a list in ascending order

b) Randomly shuffles a list

c) Applies a sorting algorithm to a DataFrame

d) Sorts a list in descending order

Answer: a) Sorts a list in ascending order

30. What is the purpose of the json.loads() function in Python?

a) Loads a JavaScript file into Python

b) Loads a JSON-formatted string into a Python object

c) Loads a list of strings into a JSON file

d) Loads a JSON-formatted file into a Python object

Answer: b) Loads a JSON-formatted string into a Python object

31. In Pandas, what does the nunique() function do?

a) Counts the number of occurrences of each unique value in a column

b) Returns the number of unique values in a column

c) Computes the mean of a numerical column

d) Normalizes a column by scaling it to have zero mean and unit variance

Answer: b) Returns the number of unique values in a column

32. What is the purpose of the apply() function in Pandas?

a) Applies a function along the rows or columns of a DataFrame

b) Filters rows based on a condition

c) Computes summary statistics of a DataFrame

d) Reshapes a DataFrame by aggregating data

Answer: a) Applies a function along the rows or columns of a DataFrame

33. What does the term “Matplotlib” refer to in Python?

a) A machine learning library

b) A data manipulation library

c) A data visualization library

d) A web development framework

Answer: c) A data visualization library

34. In Python, what is the purpose of the os.path.join() function?

a) Combining multiple strings into a single path

b) Extracting the file extension from a path

c) Removing a file or directory

d) Creating a new file in a specified directory

Answer: a) Combining multiple strings into a single path

35. What is the primary use of the loc attribute in Pandas?

a) Indexing by integers

b) Indexing by labels

c) Sorting data

d) Plotting data

Answer: b) Indexing by labels


5. Applied Data Science with Python Specialization


1. What is the purpose of the pandas library in the context of data science?

a) Web development

b) Statistical analysis and data manipulation

c) Neural network implementation

d) Image processing

Answer: b) Statistical analysis and data manipulation

2. Which of the following libraries is commonly used for machine learning in Python?

a) Matplotlib

b) TensorFlow

c) Requests

d) Beautiful Soup

Answer: b) TensorFlow

3. In the context of machine learning, what does the term “feature engineering” mean?

a) Constructing new features from existing ones to improve model performance

b) Optimizing hyperparameters

c) Selecting the most important features for a model

d) Building complex machine learning models

Answer: a) Constructing new features from existing ones to improve model performance

4. What is the primary data structure used in the pandas library for handling tabular data?

a) Series

b) List

c) DataFrame

d) Array

Answer: c) DataFrame

5. What does the term “regularization” refer to in the context of machine learning models?

a) The process of dividing data into training and testing sets

b) A technique to prevent overfitting by penalizing large coefficients

c) The process of normalizing numerical features

d) The method used to handle missing values in a dataset

Answer: b) A technique to prevent overfitting by penalizing large coefficients

6. Which Python library is commonly used for creating interactive visualizations in a data science specialization?

a) Plotly

b) NumPy

c) Seaborn

d) Scikit-learn

Answer: a) Plotly

7. In the context of natural language processing (NLP), what does the term “TF-IDF” stand for?

a) Text Feature-Inverse Document Frequency

b) Token Frequency-Document Importance Factor

c) Text Frequency-Inverse Document Frequency

d) Term Frequency-Inverse Document Frequency

Answer: d) Term Frequency-Inverse Document Frequency

8. What is the purpose of the scikit-learn library in a data science specialization?

a) Data manipulation

b) Neural network implementation

c) Machine learning algorithms and tools

d) Natural language processing

Answer: c) Machine learning algorithms and tools

9. In machine learning, what does the term “cross-validation” help address?

a) Feature selection

b) Overfitting

c) Underfitting

d) Bias-variance tradeoff

Answer: b) Overfitting

10. What is the purpose of the matplotlib library in a data science specialization?

a) Statistical analysis

b) Data manipulation

c) Data visualization

d) Machine learning

Answer c) Data visualization

11. What is the primary purpose of the SciPy library in the context of data science?

a) Machine learning

b) Statistical analysis and scientific computing

c) Web development

d) Natural language processing

Answer: b) Statistical analysis and scientific computing

12. In a machine learning pipeline, what is the purpose of the train_test_split function from scikit-learn?

a) Training the model on the entire dataset

b) Splitting the dataset into training and testing sets

c) Testing the model on a separate dataset

d) Shuffling the data randomly

Answer: b) Splitting the dataset into training and testing sets

13. What does the term “confusion matrix” represent in the context of classification models?

a) A matrix representing the confusion between training and testing datasets

b) A matrix showing the confusion between different features

c) A matrix summarizing the performance of a classification model

d) A matrix indicating the confusion between different classes in a regression model

Answer: c) A matrix summarizing the performance of a classification model

14. In Python, what does the term “API” stand for in the context of web development and data science?

a) Advanced Python Integration

b) Application Programming Interface

c) Automated Program Invocation

d) Adaptive Programming Interface

Answer: b) Application Programming Interface

15. What is the purpose of the train_test_split function in scikit-learn?

a) Splitting data into training and validation sets

b) Training a model on the entire dataset

c) Testing the model on a separate dataset

d) Splitting data into training and testing sets

Answer: d) Splitting data into training and testing sets

16. In the context of feature scaling, what does the StandardScaler do in scikit-learn?

a) Scales features to have a specific range, usually [0, 1]

b) Normalizes data by removing the mean and scaling to unit variance

c) Transforms categorical features into numerical representations

d) Imputes missing values in the dataset

Answer: b) Normalizes data by removing the mean and scaling to unit variance

17. What does the term “bagging” stand for in ensemble learning?

a) Boosting and Aggregation

b) Bootstrap Aggregating

c) Binary Aggregation

d) Bagged Algorithm Generation

Answer: b) Bootstrap Aggregating

18. In a Jupyter Notebook, how do you comment out multiple lines of code?

a) // Comment here

b) /* Comment here */

c) # Comment here

d) <!– Comment here –>

Answer: c) # Comment here

19. What is the primary purpose of the GridSearchCV class in scikit-learn?

a) Feature selection

b) Hyperparameter tuning using cross-validation

c) Dimensionality reduction

d) Data preprocessing

Answer: b) Hyperparameter tuning using cross-validation

20. What does the term “one-hot encoding” refer to in the context of data preprocessing?

a) Encoding categorical variables as binary vectors

b) Normalizing numerical features

c) Imputing missing values

d) Scaling features to have zero mean and unit variance

Answer: a) Encoding categorical variables as binary vectors


6. Machine Learning


1. What is the primary goal of supervised learning in machine learning?

a) To classify data into predefined categories

b) To find hidden patterns in data

c) To make predictions based on labeled data

d) To cluster similar data points

Answer: c) To make predictions based on labeled data

2. Which of the following is an example of a classification problem in machine learning?

a) Predicting stock prices

b) Recognizing handwritten digits

c) Estimating house prices

d) Forecasting temperature trends

Answer: b) Recognizing handwritten digits

3. What is the purpose of the “training” phase in a machine learning model?

a) To make predictions on new data

b) To evaluate the model’s performance

c) To learn patterns from labeled data

d) To test the model on the entire dataset

Answer: c) To learn patterns from labeled data

4. In machine learning, what does the term “feature” refer to?

a) The label or output variable

b) The input variable or predictor

c) The prediction made by the model

d) The error in the model’s predictions

Answer: b) The input variable or predictor

5. What is the purpose of the “test set” in machine learning?

a) To train the model

b) To evaluate the model’s performance on new data

c) To fine-tune hyperparameters

d) To visualize data distribution

Answer: b) To evaluate the model’s performance on new data

6. Which algorithm is commonly used for regression tasks in machine learning?

a) K-Means

b) Decision Trees

c) K-Nearest Neighbors (KNN)

d) Linear Regression

Answer: d) Linear Regression

7. What does the term “overfitting” mean in the context of machine learning models?

a) The model performs well on the training data but poorly on new data

b) The model is too simple and cannot capture the underlying patterns

c) The model is well-generalized to new data

d) The model is undertrained and lacks complexity

Answer: a) The model performs well on the training data but poorly on new data

8. In a confusion matrix, what does the term “false positive” represent?

a) The number of correctly predicted positive instances

b) The number of incorrectly predicted positive instances

c) The number of correctly predicted negative instances

d) The number of incorrectly predicted negative instances

Answer: b) The number of incorrectly predicted positive instances

9. Which evaluation metric is commonly used for binary classification problems when the classes are imbalanced?

a) Precision

b) Recall

c) F1-score

d) Accuracy

Answer: a) Precision

10. What is the purpose of the “bias-variance tradeoff” in machine learning?

a) To increase the model’s complexity

b) To balance the tradeoff between underfitting and overfitting

c) To minimize the bias of the model

d) To improve the interpretability of the model

Answer: b) To balance the tradeoff between underfitting and overfitting

11. Which algorithm is suitable for both classification and regression tasks in machine learning?

a) Decision Trees

b) Support Vector Machines (SVM)

c) K-Nearest Neighbors (KNN)

d) Random Forest

Answer: a) Decision Trees

12. What is the primary purpose of the “k” parameter in K-Nearest Neighbors (KNN) algorithm?

a) The number of features in the dataset

b) The number of clusters

c) The number of neighbors to consider for prediction

d) The learning rate of the model

Answer: c) The number of neighbors to consider for prediction

13. In machine learning, what is the role of the activation function in a neural network?

a) To control the learning rate

b) To minimize the loss function

c) To introduce non-linearity to the model

d) To preprocess input data

Answer: c) To introduce non-linearity to the model

14. What is the primary objective of ensemble methods in machine learning?

a) To reduce model complexity

b) To combine the predictions of multiple models to improve performance

c) To decrease training time

d) To increase the interpretability of models

Answer: b) To combine the predictions of multiple models to improve performance

15. Which algorithm is commonly used for clustering in machine learning?

a) Linear Regression

b) K-Means

c) Decision Trees

d) Support Vector Machines (SVM)

Answer: b) K-Means

16. What is the purpose of regularization techniques in machine learning models?

a) To increase model complexity

b) To reduce model complexity and prevent overfitting

c) To improve model interpretability

d) To speed up training time

Answer: b) To reduce model complexity and prevent overfitting

17. In the context of natural language processing (NLP), what does the term “TF-IDF” stand for?

a) Term Frequency-Inverse Document Frequency

b) Token Frequency-Document Importance Factor

c) Text Frequency-Inverse Document Frequency

d) Text Feature-Inverse Document Frequency

Answer: a) Term Frequency-Inverse Document Frequency

18. What is the primary purpose of the “learning rate” in gradient descent optimization algorithms?

a) To control the size of each gradient update

b) To specify the number of iterations

c) To initialize the weights of the model

d) To determine the number of neighbors in KNN

Answer: a) To control the size of each gradient update

19. Which algorithm is suitable for handling missing values in a dataset during preprocessing?

a) Decision Trees

b) K-Nearest Neighbors (KNN)

c) Support Vector Machines (SVM)

d) Random Forest

Answer: b) K-Nearest Neighbors (KNN)

20. What is the primary purpose of the “dropout” technique in neural networks?

a) To remove unnecessary features from the input data

b) To randomly exclude units during training to prevent overfitting

c) To increase the learning rate of the model

d) To reduce the dimensionality of the data

Answer: b) To randomly exclude units during training to prevent overfitting


7. Tableau Training


1. What is Tableau primarily used for?

a) Web development

b) Data visualization

c) Machine learning

d) Database management

Answer: b) Data visualization

2. In Tableau, what is a “dashboard”?

a) A type of data source

b) A collection of worksheets

c) A visual representation of data insights

d) A set of filters applied to a view

Answer: c) A visual representation of data insights

3. Which of the following charts in Tableau is suitable for displaying the distribution of a single numerical variable?

a) Bar chart

b) Scatter plot

c) Pie chart

d) Histogram

Answer: d) Histogram

4. What is the purpose of a “parameter” in Tableau?

a) To store data in a worksheet

b) To control one or more values in calculations

c) To connect to external databases

d) To create calculated fields

Answer: b) To control one or more values in calculations

5. In Tableau, what does the term “data source” refer to?

a) A specific data point in a dataset

b) The entire dataset

c) A filter applied to a view

d) A calculated field

Answer: b) The entire dataset

6. How does Tableau handle the “JOIN” operation between tables in a data source?

a) Automatically detects and performs the appropriate join

b) Requires manual coding of SQL statements

c) Supports only inner joins

d) Doesn’t support joins between tables

Answer: a) Automatically detects and performs the appropriate join

7. What is the purpose of the “Show Me” menu in Tableau?

a) To display additional data

b) To choose a visualization type

c) To export data to external formats

d) To create calculated fields

Answer: b) To choose a visualization type

8. What is a “worksheet” in Tableau?

a) A summary of data insights

b) A collection of dashboards

c) A single view or tab within a Tableau file

d) A data source connection

Answer: c) A single view or tab within a Tableau file

9. What is the purpose of the “Filter” shelf in Tableau?

a) To sort data in ascending order

b) To apply formatting to visualizations

c) To filter data based on specified conditions

d) To create calculated fields

Answer: c) To filter data based on specified conditions

10. In Tableau, what does the term “Dual-Axis” refer to?

a) Combining two separate datasets

b) Displaying two different charts on the same axis

c) Using two separate data connections

d) Overlapping two charts on a single axis

Answer: b) Displaying two different charts on the same axis

11. In Tableau, what is the purpose of the “Hierarchy” feature?

a) Creating calculated fields

b) Organizing data into levels of detail

c) Applying filters to the data

d) Connecting to external databases

Answer: b) Organizing data into levels of detail

12. What does the term “Marks” represent in Tableau?

a) Individual data points on a worksheet

b) The axis labels in a visualization

c) The entire dataset in a workbook

d) A measure of data accuracy

Answer: a) Individual data points on a worksheet

13. Which chart type in Tableau is suitable for comparing proportions or percentages within a whole?

a) Line chart

b) Stacked bar chart

c) Heat map

d) Scatter plot

Answer: b) Stacked bar chart

14. In Tableau, what is the purpose of the “Page Shelf”?

a) To navigate between different worksheets

b) To control the layout and formatting of dashboards

c) To create a series of visualizations over a progression of values

d) To manage data connections

Answer: c) To create a series of visualizations over a progression of values

15. What does the “Aggregation” option in Tableau allow you to do?

a) Group data based on specific criteria

b) Perform mathematical operations on measures

c) Apply filters to the data

d) Create calculated fields

Answer: b) Perform mathematical operations on measures

16. What is the purpose of the “Map Layers” feature in Tableau?

a) Displaying geographical data

b) Sorting data points on a map

c) Adding additional visual elements to maps

d) Connecting to external mapping services

Answer: c) Adding additional visual elements to maps

17. In Tableau, what is a “Story”?

a) A summary of data insights

b) A collection of dashboards

c) A representation of geographical data

d) A sequence of sheets or dashboards that work together

Answer: d) A sequence of sheets or dashboards that work together

18. What is the purpose of the “Table Calculation” feature in Tableau?

a) Creating tables within a worksheet

b) Performing calculations on table data

c) Sorting data in a tabular format

d) Calculating values based on the result set of the query

Answer: b) Performing calculations on table data

19. How does Tableau handle “Null” values in data?

a) Automatically fills in null values with default settings

b) Requires manual removal of null values

c) Treats null values as zero

d) Provides options for custom handling of null values

Answer: d) Provides options for custom handling of null values

20. What is the purpose of the “Quick Filter” feature in Tableau?

a) To quickly create new worksheets

b) To apply filters to the entire workbook

c) To filter data based on specified conditions

d) To navigate between dashboards

Answer: c) To filter data based on specified conditions


8. Udacity Programming for Data Science with Python


1. What is the primary purpose of the NumPy library in Python for data science?

a) Data visualization

b) Statistical analysis

c) Machine learning algorithms

d) Efficient handling of numerical arrays

Answer: d) Efficient handling of numerical arrays

2. In the context of Pandas, what does the term “DataFrame” refer to?

a) A statistical summary of data

b) A collection of Python functions

c) A two-dimensional table of data

d) A machine learning model

Answer: c) A two-dimensional table of data

3. What does the term “API” stand for in the context of web development and data science?

a) Application Programming Interface

b) Automated Program Invocation

c) Adaptive Programming Interface

d) Advanced Python Integration

Answer: a) Application Programming Interface

4. What is the purpose of the “Matplotlib” library in Python for data science?

a) Data manipulation

b) Machine learning algorithms

c) Data visualization

d) Web development

Answer: c) Data visualization

5. In the context of machine learning, what does “train-test split” refer to?

a) Training a model on the entire dataset

b) Splitting the dataset into training and testing sets

c) Testing the model on a separate dataset

d) Shuffling the data randomly

Answer: b) Splitting the dataset into training and testing sets

6. What is the purpose of the “scikit-learn” library in Python for data science?

a) Data manipulation

b) Machine learning algorithms and tools

c) Data visualization

d) Web development

Answer: b) Machine learning algorithms and tools

7. In Pandas, what does the groupby() function do?

a) Sorts data in ascending order

b) Groups data based on specified criteria

c) Applies a sorting algorithm to a DataFrame

d) Filters rows based on a condition

Answer: b) Groups data based on specified criteria

8. What is the primary purpose of the “Jupyter Notebook” in data science?

a) Creating interactive web applications

b) Writing and executing Python code in a collaborative environment

c) Data visualization

d) Statistical analysis

Answer: b) Writing and executing Python code in a collaborative environment

9. In the context of data preprocessing, what does “imputation” refer to?

a) Encoding categorical variables

b) Removing outliers from the data

c) Handling missing values by filling them in

d) Scaling numerical features

Answer: c) Handling missing values by filling them in

10. What is the purpose of the “Seaborn” library in Python for data science?

a) Machine learning algorithms

b) Data visualization with an emphasis on statistical relationships

c) Web development

d) Text data analysis

Answer: b) Data visualization with an emphasis on statistical relationships

11. What does the term “regular expression” (regex) refer to in Python for data science?

a) A method for data encoding

b) A sequence of machine learning algorithms

c) A powerful tool for text pattern matching

d) A feature selection technique

Answer: c) A powerful tool for text pattern matching

12. In the context of machine learning, what does “feature scaling” aim to achieve?

a) Encoding categorical variables

b) Transforming data into a standard range

c) Handling missing values

d) Visualizing data distribution

Answer: b) Transforming data into a standard range

13. What is the primary purpose of the “Requests” library in Python for data science?

a) Web development

b) Statistical analysis

c) Machine learning algorithms

d) Making HTTP requests

Answer: d) Making HTTP requests

14. What is the primary role of the “Beautiful Soup” library in Python?

a) Web scraping and parsing HTML/XML documents

b) Statistical analysis

c) Machine learning model training

d) Data visualization

Answer: a) Web scraping and parsing HTML/XML documents

15. What does the term “pickle” refer to in Python for data science?

a) A Python module for statistical analysis

b) A machine learning algorithm

c) A way to serialize and deserialize Python objects

d) A data visualization library

Answer: c) A way to serialize and deserialize Python objects

16. In Pandas, what does the fillna() function do?

a) Fills null values in a DataFrame with specified values

b) Removes all rows containing null values

c) Drops columns with null values

d) Sorts the DataFrame in ascending order

Answer: a) Fills null values in a DataFrame with specified values

17. What is the purpose of the “Decision Trees” algorithm in machine learning?

a) Clustering data points

b) Classification and regression tasks

c) Dimensionality reduction

d) Encoding categorical variables

Answer: b) Classification and regression tasks

18. In Python, what does the term “virtual environment” refer to?

a) Simulating a web development environment

b) A self-contained directory that contains a Python interpreter and installed packages

c) A remote server for data storage

d) A feature for virtualizing data analysis workflows

Answer: b) A self-contained directory that contains a Python interpreter and installed packages

19. What is the purpose of the “pickle” module in Python’s standard library?

a) To clean and preprocess data

b) To serialize and deserialize Python objects

c) To perform statistical analysis

d) To create interactive visualizations

Answer: b) To serialize and deserialize Python objects

20. What does the term “cross-validation” aim to address in machine learning?

a) Feature selection

b) Overfitting

c) Underfitting

d) Bias-variance tradeoff

Answer: b) Overfitting

For More Quiz Click Here

Related Queries:

data science quiz for beginners | data science mcq online test | data analytics quiz | coursera python for data science and ai quiz answers | mcq on data science | quiz on data science | data science 101 quiz answers | data science mcq questions | data science quiz questions and answers

Leave a Comment

error: Content is protected !!