Data Science Quiz Questions with Answers: We are going to cover data science courses MCQs there are further sub-topics to cover under this course.
Data Science MCQs
1. Python For Data Science
2. Python Programming for Data Analysis
3. Python For Data Science and Machine Learning Bootcamp
4. Dataquest Python
5. Applied Data Science with Python Specialization
6. Machine Learning
7. Tableau Training
8. Udacity Programming for Data Science with Python
1. Python for Data Science
1. What is the primary purpose of NumPy in Python for Data Science?
a) Data visualization
b) Machine learning
c) Data manipulation and analysis
d) Web development
Answer: c) Data manipulation and analysis
2. In pandas, what does the term “DataFrame” refer to?
a) A machine learning model
b) A two-dimensional, labeled data structure
c) A Python library for web development
d) A statistical visualization technique
Answer: b) A two-dimensional, labeled data structure
3. Which library is commonly used for data visualization in Python?
a) TensorFlow
b) Matplotlib
c) Scikit-learn
d) PyTorch
Answer: b) Matplotlib
4. What does the term “CSV” stand for in the context of data handling in Python?
a) Comma-Separated Values
b) Centralized System for Values
c) Complex Structured Variables
d) Computerized Storage of Variables
Answer: a) Comma-Separated Values
5. Which of the following is a supervised learning algorithm in Scikit-learn?
a) K-Means
b) Decision Trees
c) Principal Component Analysis (PCA)
d) Support Vector Machines (SVM)
Answer: b) Decision Trees
6. What is the purpose of the “iloc” function in pandas?
a) Indexing by labels
b) Indexing by integers
c) Calculating summary statistics
d) Plotting data
Answer: b) Indexing by integers
7. Which of the following statements is true about Python’s “lambda” functions?
a) They can have multiple expressions.
b) They are used for large-scale data processing.
c) They are defined using the “def” keyword.
d) They are anonymous functions.
Answer: d) They are anonymous functions.
8. What does the term “tf-idf” stand for in natural language processing (NLP)?
a) Term Frequency-Inverse Document Frequency
b) Text File-Incremental Data Format
c) TensorFlow Integrated Deep Features
d) Token Frequency-Importance of Document Features
Answer: a) Term Frequency-Inverse Document Frequency
9. Which library is commonly used for machine learning tasks in Python?
a) Pandas
b) Seaborn
c) Scikit-learn
d) Numpy
Answer: c) Scikit-learn
10. What does the acronym “API” stand for in the context of web data retrieval?
a) Automated Python Integration
b) Application Programming Interface
c) Advanced Programming Instruction
d) Algorithmic Programming Interface
Answer: b) Application Programming Interface
11. What is the purpose of the “Seaborn” library in Python?
a) Web development
b) Data manipulation
c) Data visualization
d) Machine learning
Answer: c) Data visualization
12. In the context of machine learning, what is the role of the “train-test split” method?
a) Splitting the dataset into training and testing sets
b) Training the model on the entire dataset
c) Testing the model on a separate dataset
d) Randomly shuffling the data
Answer: a) Splitting the dataset into training and testing sets
13. Which of the following statements is true about the “Pandas” library in Python?
a) It is primarily used for machine learning.
b) It is not suitable for handling large datasets.
c) It provides data structures for efficient data manipulation.
d) It is used exclusively for web development.
Answer: c) It provides data structures for efficient data manipulation.
14. What is the purpose of the “Counter” class in Python’s “collections” module?
a) Sorting elements in a list
b) Counting occurrences of elements in an iterable
c) Creating a dictionary from two lists
d) Performing matrix multiplication
Answer: b) Counting occurrences of elements in an iterable
15. Which of the following is a dimensionality reduction technique used in machine learning?
a) K-Nearest Neighbors (KNN)
b) Principal Component Analysis (PCA)
c) Random Forest
d) Gradient Boosting
Answer: b) Principal Component Analysis (PCA)
16. What does the acronym “SQL” stand for in the context of databases?
a) Structured Query Language
b) Simple Query Language
c) System Query Language
d) Sequential Query Language
Answer: a) Structured Query Language
17. In Python, what does the “json” module provide?
a) Tools for web scraping
b) Tools for handling JSON data
c) Mathematical functions
d) Image processing capabilities
Answer: b) Tools for handling JSON data
18. What does the “K-Means” algorithm aim to achieve in machine learning?
a) Classification
b) Regression
c) Clustering
d) Dimensionality reduction
Answer: c) Clustering
19. What is the purpose of the “cross_val_score” function in Scikit-learn?
a) Calculating cross-validated performance metrics
b) Cross-validating hyperparameters
c) Cross-referencing dataset columns
d) Converting data to a pandas DataFrame
Answer: a) Calculating cross-validated performance metrics
20. Which library is commonly used for deep learning tasks in Python?
a) Keras
b) Statsmodels
c) Beautiful Soup
d) NetworkX
Answer: a) Keras
21. What is the purpose of the “SciPy” library in Python?
a) Data visualization
b) Scientific computing
c) Machine learning
d) Web development
Answer: b) Scientific computing
22. What does the term “Regular Expression” (regex) refer to in Python?
a) A module for creating regular polygons
b) A method for defining complex mathematical expressions
c) A sequence of characters defining a search pattern
d) A type of recursive function
Answer: c) A sequence of characters defining a search pattern
23. In machine learning, what does the term “overfitting” mean?
a) The model is too simple and cannot capture the underlying patterns.
b) The model performs well on the training data but poorly on new data.
c) The model is too complex and memorizes the training data.
d) The model is biased towards certain features.
Answer: c) The model is too complex and memorizes the training data.
24. What is the purpose of the “pickle” module in Python?
a) Sorting data structures
b) Serializing and deserializing Python objects
c) Generating random numbers
d) Extracting information from XML files
Answer: b) Serializing and deserializing Python objects
25. Which of the following is a supervised learning algorithm used for classification?
a) K-Means
b) Random Forest
c) Hierarchical Clustering
d) DBSCAN
Answer: b) Random Forest
26. What is the purpose of the “Scrapy” library in Python?
a) Scientific computing
b) Web scraping
c) Machine learning
d) Image processing
Answer: b) Web scraping
27. Which of the following is a method for handling missing data in pandas?
a) dropna()
b) fillna()
c) isnull()
d) All of the above
Answer: d) All of the above
28. What does the term “bagging” refer to in machine learning?
a) Boosting multiple weak models
b) Training multiple models independently and combining their predictions
c) Feature engineering technique
d) Dimensionality reduction
Answer: b) Training multiple models independently and combining their predictions
29. What is the purpose of the “requests” library in Python?
a) Handling dates and times
b) Making HTTP requests
c) Statistical analysis
d) Natural language processing
Answer: b) Making HTTP requests
30. What does the term “One-Hot Encoding” refer to in the context of machine learning?
a) Encoding numerical values as binary digits
b) Encoding categorical variables as binary vectors
c) Encoding text data into numerical vectors
d) Encoding time series data
Answer: b) Encoding categorical variables as binary vectors
31. Which library provides tools for working with SQL databases in Python?
a) SQLite
b) SQLAlchemy
c) Pandas
d) Scikit-learn
Answer: b) SQLAlchemy
32. What does the term “ensemble learning” mean in the context of machine learning?
a) Training a single model on multiple datasets
b) Combining predictions from multiple models to improve performance
c) Splitting the dataset into training and testing sets
d) Regularizing a model to prevent overfitting
Answer: b) Combining predictions from multiple models to improve performance
33. In Pandas, what does the “groupby” function do?
a) Sorts the data in ascending order
b) Aggregates data based on specified criteria
c) Combines two dataframes into one
d) Computes the correlation matrix
Answer: b) Aggregates data based on specified criteria
34. Which of the following is a dimensionality reduction technique that preserves variance?
a) Principal Component Analysis (PCA)
b) Linear Regression
c) Decision Trees
d) K-Means Clustering
Answer: a) Principal Component Analysis (PCA)
35. What is the purpose of the “matplotlib” library in Python?
a) Scientific computing
b) Web development
c) Data visualization
d) Machine learning
Answer: c) Data visualization
2. Python Programming For Data Analysis
1. What library is commonly used for data manipulation and analysis in Python?
a) Matplotlib
b) Pandas
c) NumPy
d) Seaborn
Answer: b) Pandas
2. In Pandas, what is the primary data structure used to store one-dimensional labeled data?
a) DataFrame
b) Series
c) Array
d) List
Answer: b) Series
3. What function in Pandas is used to read a CSV file into a DataFrame?
a) load_csv()
b) read_csv()
c) import_csv()
d) open_csv()
Answer: b) read_csv()
4. Which of the following is a correct way to select a column named ‘age’ from a Pandas DataFrame called ‘df’?
a) df.select_column(‘age’)
b) df[‘age’]
c) df.select(‘age’)
d) df.column(‘age’)
Answer: b) df[‘age’]
5. What is the purpose of the NumPy function np.mean()?
a) Calculate the median
b) Calculate the mean
c) Calculate the standard deviation
d) Perform matrix multiplication
Answer: b) Calculate the mean
6. In Python, what does the term “list comprehension” refer to?
a) A concise way to create lists using a single line of code
b) A type of exception handling in Python
c) A method for defining complex mathematical expressions
d) A way to define functions with multiple return statements
Answer: a) A concise way to create lists using a single line of code
7. Which function is used to plot a histogram in Matplotlib?
a) plot_hist()
b) hist_plot()
c) plt.histogram()
d) plt.hist()
Answer: d) plt.hist()
8. What is the purpose of the iloc function in Pandas?
a) Indexing by labels
b) Indexing by integers
c) Calculating summary statistics
d) Plotting data
Answer: b) Indexing by integers
9. What is the primary purpose of the apply() function in Pandas?
a) Apply a mathematical operation to a column
b) Apply a function along the axis of a DataFrame
c) Apply a filter to a DataFrame
d) Apply a sorting algorithm to a DataFrame
Answer: b) Apply a function along the axis of a DataFrame
10. What is the purpose of the merge() function in Pandas?
a) Splitting a DataFrame into multiple DataFrames
b) Combining two DataFrames based on a common key
c) Merging two DataFrames into a single DataFrame
d) Reshaping a DataFrame
Answer: c) Merging two DataFrames into a single DataFrame
11. What does the groupby() function in Pandas allow you to do?
a) Sort the data in ascending order
b) Group data based on specified criteria
c) Apply a function element-wise
d) Reshape a DataFrame
Answer: b) Group data based on specified criteria
12. Which library is commonly used for statistical data visualization in Python?
a) NumPy
b) Seaborn
c) Pandas
d) Matplotlib
Answer: b) Seaborn
13. What is the purpose of the value_counts() function in Pandas?
a) Compute summary statistics
b) Count the occurrences of unique values in a column
c) Create a new DataFrame
d) Sort the data in descending order
Answer: b) Count the occurrences of unique values in a column
14. How do you check for missing values in a Pandas DataFrame?
a) df.check_missing()
b) df.isnull()
c) df.missing_values()
d) df.check_na()
Answer: b) df.isnull()
15. In Matplotlib, what does the scatter() function do?
a) Plot a line chart
b) Plot a scatter plot
c) Plot a bar chart
d) Plot a pie chart
Answer: b) Plot a scatter plot
16. Which of the following is a correct way to create a NumPy array with values ranging from 0 to 9?
a) np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b) np.arange(0, 10)
c) np.linspace(0, 9, 10)
d) All of the above
Answer: d) All of the above
17. What is the purpose of the pivot_table() function in Pandas?
a) Reshape a DataFrame by aggregating data
b) Pivot columns into rows
c) Apply a function element-wise
d) Group data based on specified criteria
Answer: a) Reshape a DataFrame by aggregating data
18. Which library provides tools for statistical modeling in Python?
a) Statsmodels
b) Scikit-learn
c) TensorFlow
d) Keras
Answer: a) Statsmodels
19. What does the fillna() function in Pandas do?
a) Fill missing values with a specified constant
b) Drop rows with missing values
c) Fill missing values with the mean of the column
d) All of the above
Answer: d) All of the above
20. In Python, what is the purpose of the lambda function?
a) Define a function with a single expression
b) Create anonymous classes
c) Format strings
d) Generate random numbers
Answer: a) Define a function with a single expression
21. Which function is used to calculate the correlation matrix in Pandas?
a) correlation_matrix()
b) calculate_correlation()
c) corr()
d) correlate()
Answer: c) corr()
22. What does the term “resampling” refer to in the context of time series data?
a) Splitting the data into training and testing sets
b) Transforming data into a different representation
c) Changing the frequency of data points
d) Imputing missing values
Answer: c) Changing the frequency of data points
23. What is the purpose of the np.unique() function in NumPy?
a) Sort an array in ascending order
b) Find the unique elements in an array
c) Calculate the mean of an array
d) Perform element-wise multiplication
Answer: b) Find the unique elements in an array
24. Which of the following statements is true about the iterrows() function in Pandas?
a) It is used for sorting a DataFrame.
b) It iterates over rows in a DataFrame.
c) It computes summary statistics.
d) It is used for merging DataFrames.
Answer: b) It iterates over rows in a DataFrame.
25. What is the primary purpose of the cut() function in Pandas?
a) Cut the DataFrame into smaller pieces
b) Apply a function to each element of the DataFrame
c) Bin values into discrete intervals
d) Concatenate two DataFrames
Answer: c) Bin values into discrete intervals
26. What does the term “feature scaling” refer to in the context of machine learning?
a) Encoding categorical variables
b) Standardizing numerical features
c) Reshaping the input data
d) Imputing missing values
Answer: b) Standardizing numerical features
27. Which of the following is a correct way to install a Python library using pip in the command line?
a) install pandas
b) pip install pandas
c) python install pandas
d) pandas install
Answer: b) pip install pandas
28. What does the np.random.seed() function do in NumPy?
a) Generate random numbers
b) Set the seed for the random number generator
c) Shuffle an array randomly
d) Create a random permutation of an array
Answer: b) Set the seed for the random number generator
29. What is the purpose of the loc function in Pandas?
a) Indexing by integers
b) Indexing by labels
c) Sorting data
d) Plotting data
Answer: b) Indexing by labels
30. Which of the following statements is true about the cross_val_score() function in Scikit-learn?
a) It calculates the cross-validated performance metrics for a machine learning model.
b) It computes the mean squared error of a model.
c) It performs feature scaling on the input data.
d) It is used for hyperparameter tuning.
Answer: a) It calculates the cross-validated performance metrics for a machine learning model.
31. What is the purpose of the resample() function in Pandas for time series data?
a) Reorganize data into a DataFrame
b) Rename columns based on a specified criteria
c) Change the frequency of time series data
d) Reshape a DataFrame by aggregating data
Answer: c) Change the frequency of time series data
32. In Python, what does the term “broadcasting” refer to in the context of NumPy?
a) A method for transmitting data over a network
b) A technique for manipulating strings
c) The automatic extension of arrays to perform element-wise operations
d) A mechanism for encrypting data
Answer: c) The automatic extension of arrays to perform element-wise operations
33. What is the purpose of the min() and max() functions in Pandas?
a) Compute the minimum and maximum values of a DataFrame
b) Filter rows based on specified criteria
c) Reshape a DataFrame
d) Group data based on specified criteria
Answer: a) Compute the minimum and maximum values of a DataFrame
34. What does the term “tf-idf” stand for in natural language processing (NLP)?
a) Text Frequency-Inverse Document Frequency
b) Term Frequency-Inverse Document Frequency
c) Token Frequency-Document Importance Factor
d) Text Feature-Inverse Document Frequency
Answer: b) Term Frequency-Inverse Document Frequency
35. Which library provides tools for creating interactive visualizations in Python?
a) Plotly
b) Matplotlib
c) Seaborn
d) Bokeh
Answer: a) Plotly
3. Python for Data Science and Machine Learning Bootcamp
1. What is the purpose of the NumPy library in the context of data science and machine learning?
a) Web development
b) Data visualization
c) Data manipulation and numerical operations
d) Natural language processing
Answer: c) Data manipulation and numerical operations
2. Which of the following libraries is commonly used for creating machine learning models in Python?
a) Matplotlib
b) TensorFlow
c) Seaborn
d) Flask
Answer: b) TensorFlow
3. What is the primary data structure used in the pandas library for handling tabular data?
a) Series
b) List
c) DataFrame
d) Array
Answer: c) DataFrame
4. In machine learning, what does the term “training set” refer to?
a) The dataset used for making predictions
b) The dataset used for evaluating model performance
c) The dataset used for training the model
d) The validation dataset
Answer: c) The dataset used for training the model
5. What does the term “feature engineering” mean in the context of machine learning?
a) Creating new machine learning features
b) Optimizing model hyperparameters
c) Building complex machine learning models
d) Extracting relevant information from raw data
Answer: d) Extracting relevant information from raw data
6. Which Python library is commonly used for data visualization in a machine learning bootcamp?
a) TensorFlow
b) Matplotlib
c) Scikit-learn
d) NumPy
Answer: b) Matplotlib
7. What is the purpose of the scikit-learn library in a machine learning bootcamp?
a) Data manipulation
b) Neural network implementation
c) Machine learning algorithms and tools
d) Natural language processing
Answer: c) Machine learning algorithms and tools
8. What does the term “supervised learning” mean in machine learning?
a) Training a model without labels
b) Learning from labeled data to make predictions on new, unseen data
c) Training a model with only one feature
d) Unsupervised learning with multiple models
Answer: b) Learning from labeled data to make predictions on new, unseen data
9. Which method is used to evaluate the performance of a machine learning model on a classification problem?
a) Mean Squared Error (MSE)
b) R-squared
c) Confusion Matrix
d) F1 Score
Answer: c) Confusion Matrix
10. What is the purpose of the train_test_split function in scikit-learn?
a) Splitting data into training and validation sets
b) Training a model on the entire dataset
c) Testing the model on a separate dataset
d) Splitting data into training and testing sets
Answer: d) Splitting data into training and testing sets
11. What does the term “hyperparameter” refer to in the context of machine learning?
a) Parameters learned by the model during training
b) Model weights
c) External configuration settings for the model
d) Features in the dataset
Answer: c) External configuration settings for the model
12. Which of the following is a commonly used algorithm for linear regression in scikit-learn?
a) K-Means
b) Decision Trees
c) LinearSVC
d) LinearRegression
Answer: d) LinearRegression
13. In a Jupyter Notebook, how do you comment out a single line of code?
a) // Comment here
b) # Comment here
c) /* Comment here */
d) <!– Comment here –>
Answer: b) # Comment here
14. What does the term “bagging” stand for in ensemble learning?
a) Boosting and Aggregation
b) Bootstrap Aggregating
c) Binary Aggregation
d) Bagged Algorithm Generation
Answer: b) Bootstrap Aggregating
15. What is the primary purpose of the Random Forest algorithm?
a) Regression
b) Clustering
c) Classification and Regression
d) Dimensionality Reduction
Answer: c) Classification and Regression
16. Which of the following metrics is commonly used for evaluating classification models?
a) Mean Squared Error (MSE)
b) R-squared
c) Area Under the Receiver Operating Characteristic curve (AUC-ROC)
d) Mean Absolute Error (MAE)
Answer: c) Area Under the Receiver Operating Characteristic curve (AUC-ROC)
17. What is the purpose of the GridSearchCV class in scikit-learn?
a) Feature selection
b) Hyperparameter tuning using cross-validation
c) Dimensionality reduction
d) Data preprocessing
Answer: b) Hyperparameter tuning using cross-validation
18. What is the significance of the term “recall” in the context of classification metrics?
a) It measures the ability of a model to identify true positive cases
b) It measures the overall correctness of a model
c) It is another term for precision
d) It is used for clustering evaluation
Answer: a) It measures the ability of a model to identify true positive cases
19. What is the purpose of the tf.keras module in TensorFlow?
a) Feature extraction
b) Image processing
c) Neural network construction using the Keras API
d) Text mining
Answer: c) Neural network construction using the Keras API
20. What does the term “cross-validation” help address in machine learning?
a) Overfitting
b) Underfitting
c) Data leakage
d) Bias-variance tradeoff
Answer: a) Overfitting
21. In machine learning, what is the purpose of the term “dropout” in neural networks?
a) Eliminating data points with missing values
b) Randomly excluding units during training to prevent overfitting
c) Reducing the learning rate during training
d) Adjusting the weights of the neural network
Answer: b) Randomly excluding units during training to prevent overfitting
22. Which activation function is commonly used in the output layer for binary classification problems?
a) Sigmoid
b) ReLU
c) Tanh
d) Softmax
Answer: a) Sigmoid
23. What is the primary objective of using regularization techniques in machine learning models?
a) Increase model complexity
b) Reduce model complexity to prevent overfitting
c) Achieve higher training accuracy
d) Improve model interpretability
Answer: b) Reduce model complexity to prevent overfitting
24. What is the purpose of the StandardScaler in scikit-learn?
a) Normalize data by scaling it to have zero mean and unit variance
b) Perform dimensionality reduction
c) Impute missing values in the dataset
d) Apply feature engineering techniques
Answer: a) Normalize data by scaling it to have zero mean and unit variance
25. Which of the following techniques is commonly used for handling imbalanced datasets in classification problems?
a) Feature scaling
b) Data augmentation
c) SMOTE (Synthetic Minority Over-sampling Technique)
d) Cross-validation
Answer: c) SMOTE (Synthetic Minority Over-sampling Technique)
26. What is the primary purpose of the MinMaxScaler in scikit-learn?
a) Standardize data by removing the mean and scaling to unit variance
b) Normalize data to a specific range, usually [0, 1]
c) Impute missing values in the dataset
d) Transform categorical features into numerical representations
Answer: b) Normalize data to a specific range, usually [0, 1]
27. In the context of neural networks, what does the term “epoch” refer to?
a) A measure of model complexity
b) A single pass through the entire training dataset
c) The learning rate of the model
d) The number of layers in the neural network
Answer: b) A single pass through the entire training dataset
28. What is the primary purpose of the One-Hot Encoding technique in data preprocessing?
a) Standardizing numerical features
b) Encoding categorical variables as binary vectors
c) Normalizing data
d) Imputing missing values
Answer: b) Encoding categorical variables as binary vectors
29. Which of the following statements is true about the Logistic Regression algorithm in machine learning?
a) It is only used for regression problems.
b) It can handle both classification and regression problems.
c) It is a non-linear algorithm.
d) It requires the data to be linearly separable.
Answer: b) It can handle both classification and regression problems.
30. What is the primary purpose of the R-squared metric in regression problems?
a) Measure the proportion of explained variance in the target variable
b) Measure the accuracy of a classification model
c) Evaluate the balance between precision and recall
d) Assess the quality of the model’s predictions
Answer: a) Measure the proportion of explained variance in the target variable
31. In machine learning, what is the objective of the K-Means clustering algorithm?
a) Predicting labels for new data points
b) Classifying data into multiple categories
c) Dividing data into groups based on similarity
d) Reducing the dimensionality of the data
Answer: c) Dividing data into groups based on similarity
32. What is the purpose of the tf-idf vectorization technique in natural language processing (NLP)?
a) Tokenizing words in a document
b) Calculating word frequencies
c) Encoding categorical variables
d) Assigning weights to words based on their importance in a document
Answer: d) Assigning weights to words based on their importance in a document
33. What is the primary role of the Random Search technique in hyperparameter tuning?
a) Exhaustively search through all possible hyperparameter combinations
b) Randomly sample hyperparameter combinations from a predefined search space
c) Optimize the learning rate of a model
d) Perform feature engineering on the dataset
Answer: b) Randomly sample hyperparameter combinations from a predefined search space
34. In the context of neural networks, what does the term “batch size” refer to?
a) The number of layers in the network
b) The number of epochs in training
c) The number of samples processed in each iteration
d) The learning rate during training
Answer: c) The number of samples processed in each iteration
35. What is the purpose of the pickle module in Python’s standard library?
a) Handling data manipulation in Pandas
b) Serializing and deserializing Python objects
c) Creating visualizations in Matplotlib
d) Extracting information from XML files
Answer: b) Serializing and deserializing Python objects
4. Dataquest Python
1. What does the acronym “API” stand for in the context of web development and data science?
a) Advanced Python Integration
b) Application Programming Interface
c) Automated Program Invocation
d) Adaptive Programming Interface
Answer: b) Application Programming Interface
2. In Python, what does the zip() function do?
a) Compresses files in a directory
b) Combines two or more iterables element-wise
c) Creates a new list by filtering elements from an existing list
d) Unzips compressed files
Answer: b) Combines two or more iterables element-wise
3. What is the purpose of the iloc attribute in Pandas?
a) Accessing elements by label
b) Accessing elements by integer index
c) Computing summary statistics
d) Filtering rows based on conditions
Answer: b) Accessing elements by integer index
4. In Python, what is the primary use of the __init__ method in a class?
a) Initializing class variables
b) Defining class methods
c) Initializing instance variables
d) Creating an instance of the class
Answer: c) Initializing instance variables
5. What does the term “regular expression” (regex) refer to in Python?
a) A sequence of characters defining a search pattern
b) An expression used exclusively in regular programming
c) A method for writing regular code
d) A data type for representing expressions
Answer: a) A sequence of characters defining a search pattern
6. In Pandas, what does the groupby() function do?
a) Sorting the data in ascending order
b) Aggregating data based on specified criteria
c) Filtering rows based on conditions
d) Creating a new DataFrame
Answer: b) Aggregating data based on specified criteria
7. What is the purpose of the requests library in Python?
a) Handling dates and times
b) Making HTTP requests
c) Manipulating strings
d) Creating visualizations
Answer: b) Making HTTP requests
8. Which of the following is a correct way to open and read a file in Python?
a) open_file(‘example.txt’, ‘r’)
b) read(‘example.txt’)
c) with open(‘example.txt’, ‘r’) as file:
d) file = open(‘example.txt’)
Answer: c) with open(‘example.txt’, ‘r’) as file:
9. What does the term “Lambda function” refer to in Python?
a) A function that performs complex mathematical operations
b) A function defined using the def keyword
c) An anonymous function defined using the lambda keyword
d) A function that requires user input
Answer: c) An anonymous function defined using the lambda keyword
10. In the context of machine learning, what is the purpose of the train_test_split function from scikit-learn?
a) Splitting the dataset into training and testing sets
b) Training the model on the entire dataset
c) Testing the model on a separate dataset
d) Shuffling the data randomly
Answer: a) Splitting the dataset into training and testing sets
11. What is the purpose of the isin() method in Pandas?
a) Sorting data
b) Checking for missing values
c) Filtering data based on a list of values
d) Grouping data by specified criteria
Answer: c) Filtering data based on a list of values
12. In Python, what does the term “list comprehension” refer to?
a) A concise way to create lists using a single line of code
b) A method for defining complex mathematical expressions
c) A way to manipulate strings
d) A type of exception handling
Answer: a) A concise way to create lists using a single line of code
13. What is the purpose of the enumerate() function in Python?
a) Counting the number of elements in a list
b) Creating a dictionary
c) Iterating over elements of a sequence and keeping track of the index
d) Finding the maximum value in a list
Answer: c) Iterating over elements of a sequence and keeping track of the index
14. What does the term “feature scaling” refer to in the context of machine learning?
a) Encoding categorical variables
b) Standardizing numerical features
c) Reshaping the input data
d) Imputing missing values
Answer: b) Standardizing numerical features
15. In Python, what is the purpose of the map() function?
a) Creating visualizations
b) Applying a function to each element of an iterable
c) Sorting a list
d) Concatenating strings
Answer: b) Applying a function to each element of an iterable
16. What is the primary use of the Counter class in Python’s collections module?
a) Counting the number of elements in a list
b) Creating a counter for loop iterations
c) Summing up numerical values in a list
d) Calculating the mean of a list
Answer: a) Counting the number of elements in a list
17. Which method is used to find the index of the maximum value in a list using Python?
a) list.index(max(list))
b) max(list).index()
c) list.find(max(list))
d) list.max_index()
Answer: a) list.index(max(list))
18. What does the continue statement do in a Python loop?
a) Terminates the loop immediately
b) Skips the rest of the code in the loop and moves to the next iteration
c) Restarts the loop from the beginning
d) Breaks out of the loop
Answer: b) Skips the rest of the code in the loop and moves to the next iteration
19. Which of the following is a correct way to import the numpy library with the alias np?
a) import numpy as np
b) from numpy import np
c) import np as numpy
d) from np import numpy
Answer: a) import numpy as np
20. What is the purpose of the isin() method in Pandas?
a) Sorting data
b) Checking for missing values
c) Filtering data based on a list of values
d) Grouping data by specified criteria
Answer: c) Filtering data based on a list of values
21. In Python, what does the term “Pickle” refer to?
a) A type of cucumber
b) A module for data serialization
c) A sorting algorithm
d) A method for creating pickled vegetables
Answer: b) A module for data serialization
22. What is the purpose of the mode() function in Pandas?
a) Computing the mode of a numerical variable
b) Filtering rows based on a condition
c) Sorting data
d) Creating visualizations
Answer: a) Computing the mode of a numerical variable
23. In Python, what is the purpose of the with statement?
a) Defining a function
b) Creating a class
c) Opening and managing resources like files
d) Executing a block of code repeatedly
Answer: c) Opening and managing resources like files
24. What does the term “SciKit-Learn” refer to in Python?
a) A module for scientific computing
b) A library for data visualization
c) A machine learning library
d) A web development framework
Answer: c) A machine learning library
25. In Python, what is the purpose of the *args and **kwargs syntax in function definitions?
a) Defining default values for function parameters
b) Accepting a variable number of positional and keyword arguments in a function
c) Indicating optional parameters
d) Creating lambda functions
Answer: b) Accepting a variable number of positional and keyword arguments in a function
26. What is the primary use of the crosstab() function in Pandas?
a) Cross-validation of machine learning models
b) Creating contingency tables
c) Cross-plotting variables in a DataFrame
d) Cross-checking data for consistency
Answer: b) Creating contingency tables
27. In Python, what does the term “Docstring” refer to?
a) A documentation string used to describe a function, module, or class
b) A data type for representing documents
c) A string used to create documents
d) A module for handling strings
Answer: a) A documentation string used to describe a function, module, or class
28. What does the term “stacking” refer to in the context of machine learning ensemble methods?
a) Combining predictions from multiple models to make a final prediction
b) Increasing the number of layers in a neural network
c) Creating a stack of data frames
d) Sorting the data in ascending order
Answer: a) Combining predictions from multiple models to make a final prediction
29. In Python, what does the sorted() function do?
a) Sorts a list in ascending order
b) Randomly shuffles a list
c) Applies a sorting algorithm to a DataFrame
d) Sorts a list in descending order
Answer: a) Sorts a list in ascending order
30. What is the purpose of the json.loads() function in Python?
a) Loads a JavaScript file into Python
b) Loads a JSON-formatted string into a Python object
c) Loads a list of strings into a JSON file
d) Loads a JSON-formatted file into a Python object
Answer: b) Loads a JSON-formatted string into a Python object
31. In Pandas, what does the nunique() function do?
a) Counts the number of occurrences of each unique value in a column
b) Returns the number of unique values in a column
c) Computes the mean of a numerical column
d) Normalizes a column by scaling it to have zero mean and unit variance
Answer: b) Returns the number of unique values in a column
32. What is the purpose of the apply() function in Pandas?
a) Applies a function along the rows or columns of a DataFrame
b) Filters rows based on a condition
c) Computes summary statistics of a DataFrame
d) Reshapes a DataFrame by aggregating data
Answer: a) Applies a function along the rows or columns of a DataFrame
33. What does the term “Matplotlib” refer to in Python?
a) A machine learning library
b) A data manipulation library
c) A data visualization library
d) A web development framework
Answer: c) A data visualization library
34. In Python, what is the purpose of the os.path.join() function?
a) Combining multiple strings into a single path
b) Extracting the file extension from a path
c) Removing a file or directory
d) Creating a new file in a specified directory
Answer: a) Combining multiple strings into a single path
35. What is the primary use of the loc attribute in Pandas?
a) Indexing by integers
b) Indexing by labels
c) Sorting data
d) Plotting data
Answer: b) Indexing by labels
5. Applied Data Science with Python Specialization
1. What is the purpose of the pandas library in the context of data science?
a) Web development
b) Statistical analysis and data manipulation
c) Neural network implementation
d) Image processing
Answer: b) Statistical analysis and data manipulation
2. Which of the following libraries is commonly used for machine learning in Python?
a) Matplotlib
b) TensorFlow
c) Requests
d) Beautiful Soup
Answer: b) TensorFlow
3. In the context of machine learning, what does the term “feature engineering” mean?
a) Constructing new features from existing ones to improve model performance
b) Optimizing hyperparameters
c) Selecting the most important features for a model
d) Building complex machine learning models
Answer: a) Constructing new features from existing ones to improve model performance
4. What is the primary data structure used in the pandas library for handling tabular data?
a) Series
b) List
c) DataFrame
d) Array
Answer: c) DataFrame
5. What does the term “regularization” refer to in the context of machine learning models?
a) The process of dividing data into training and testing sets
b) A technique to prevent overfitting by penalizing large coefficients
c) The process of normalizing numerical features
d) The method used to handle missing values in a dataset
Answer: b) A technique to prevent overfitting by penalizing large coefficients
6. Which Python library is commonly used for creating interactive visualizations in a data science specialization?
a) Plotly
b) NumPy
c) Seaborn
d) Scikit-learn
Answer: a) Plotly
7. In the context of natural language processing (NLP), what does the term “TF-IDF” stand for?
a) Text Feature-Inverse Document Frequency
b) Token Frequency-Document Importance Factor
c) Text Frequency-Inverse Document Frequency
d) Term Frequency-Inverse Document Frequency
Answer: d) Term Frequency-Inverse Document Frequency
8. What is the purpose of the scikit-learn library in a data science specialization?
a) Data manipulation
b) Neural network implementation
c) Machine learning algorithms and tools
d) Natural language processing
Answer: c) Machine learning algorithms and tools
9. In machine learning, what does the term “cross-validation” help address?
a) Feature selection
b) Overfitting
c) Underfitting
d) Bias-variance tradeoff
Answer: b) Overfitting
10. What is the purpose of the matplotlib library in a data science specialization?
a) Statistical analysis
b) Data manipulation
c) Data visualization
d) Machine learning
Answer c) Data visualization
11. What is the primary purpose of the SciPy library in the context of data science?
a) Machine learning
b) Statistical analysis and scientific computing
c) Web development
d) Natural language processing
Answer: b) Statistical analysis and scientific computing
12. In a machine learning pipeline, what is the purpose of the train_test_split function from scikit-learn?
a) Training the model on the entire dataset
b) Splitting the dataset into training and testing sets
c) Testing the model on a separate dataset
d) Shuffling the data randomly
Answer: b) Splitting the dataset into training and testing sets
13. What does the term “confusion matrix” represent in the context of classification models?
a) A matrix representing the confusion between training and testing datasets
b) A matrix showing the confusion between different features
c) A matrix summarizing the performance of a classification model
d) A matrix indicating the confusion between different classes in a regression model
Answer: c) A matrix summarizing the performance of a classification model
14. In Python, what does the term “API” stand for in the context of web development and data science?
a) Advanced Python Integration
b) Application Programming Interface
c) Automated Program Invocation
d) Adaptive Programming Interface
Answer: b) Application Programming Interface
15. What is the purpose of the train_test_split function in scikit-learn?
a) Splitting data into training and validation sets
b) Training a model on the entire dataset
c) Testing the model on a separate dataset
d) Splitting data into training and testing sets
Answer: d) Splitting data into training and testing sets
16. In the context of feature scaling, what does the StandardScaler do in scikit-learn?
a) Scales features to have a specific range, usually [0, 1]
b) Normalizes data by removing the mean and scaling to unit variance
c) Transforms categorical features into numerical representations
d) Imputes missing values in the dataset
Answer: b) Normalizes data by removing the mean and scaling to unit variance
17. What does the term “bagging” stand for in ensemble learning?
a) Boosting and Aggregation
b) Bootstrap Aggregating
c) Binary Aggregation
d) Bagged Algorithm Generation
Answer: b) Bootstrap Aggregating
18. In a Jupyter Notebook, how do you comment out multiple lines of code?
a) // Comment here
b) /* Comment here */
c) # Comment here
d) <!– Comment here –>
Answer: c) # Comment here
19. What is the primary purpose of the GridSearchCV class in scikit-learn?
a) Feature selection
b) Hyperparameter tuning using cross-validation
c) Dimensionality reduction
d) Data preprocessing
Answer: b) Hyperparameter tuning using cross-validation
20. What does the term “one-hot encoding” refer to in the context of data preprocessing?
a) Encoding categorical variables as binary vectors
b) Normalizing numerical features
c) Imputing missing values
d) Scaling features to have zero mean and unit variance
Answer: a) Encoding categorical variables as binary vectors
6. Machine Learning
1. What is the primary goal of supervised learning in machine learning?
a) To classify data into predefined categories
b) To find hidden patterns in data
c) To make predictions based on labeled data
d) To cluster similar data points
Answer: c) To make predictions based on labeled data
2. Which of the following is an example of a classification problem in machine learning?
a) Predicting stock prices
b) Recognizing handwritten digits
c) Estimating house prices
d) Forecasting temperature trends
Answer: b) Recognizing handwritten digits
3. What is the purpose of the “training” phase in a machine learning model?
a) To make predictions on new data
b) To evaluate the model’s performance
c) To learn patterns from labeled data
d) To test the model on the entire dataset
Answer: c) To learn patterns from labeled data
4. In machine learning, what does the term “feature” refer to?
a) The label or output variable
b) The input variable or predictor
c) The prediction made by the model
d) The error in the model’s predictions
Answer: b) The input variable or predictor
5. What is the purpose of the “test set” in machine learning?
a) To train the model
b) To evaluate the model’s performance on new data
c) To fine-tune hyperparameters
d) To visualize data distribution
Answer: b) To evaluate the model’s performance on new data
6. Which algorithm is commonly used for regression tasks in machine learning?
a) K-Means
b) Decision Trees
c) K-Nearest Neighbors (KNN)
d) Linear Regression
Answer: d) Linear Regression
7. What does the term “overfitting” mean in the context of machine learning models?
a) The model performs well on the training data but poorly on new data
b) The model is too simple and cannot capture the underlying patterns
c) The model is well-generalized to new data
d) The model is undertrained and lacks complexity
Answer: a) The model performs well on the training data but poorly on new data
8. In a confusion matrix, what does the term “false positive” represent?
a) The number of correctly predicted positive instances
b) The number of incorrectly predicted positive instances
c) The number of correctly predicted negative instances
d) The number of incorrectly predicted negative instances
Answer: b) The number of incorrectly predicted positive instances
9. Which evaluation metric is commonly used for binary classification problems when the classes are imbalanced?
a) Precision
b) Recall
c) F1-score
d) Accuracy
Answer: a) Precision
10. What is the purpose of the “bias-variance tradeoff” in machine learning?
a) To increase the model’s complexity
b) To balance the tradeoff between underfitting and overfitting
c) To minimize the bias of the model
d) To improve the interpretability of the model
Answer: b) To balance the tradeoff between underfitting and overfitting
11. Which algorithm is suitable for both classification and regression tasks in machine learning?
a) Decision Trees
b) Support Vector Machines (SVM)
c) K-Nearest Neighbors (KNN)
d) Random Forest
Answer: a) Decision Trees
12. What is the primary purpose of the “k” parameter in K-Nearest Neighbors (KNN) algorithm?
a) The number of features in the dataset
b) The number of clusters
c) The number of neighbors to consider for prediction
d) The learning rate of the model
Answer: c) The number of neighbors to consider for prediction
13. In machine learning, what is the role of the activation function in a neural network?
a) To control the learning rate
b) To minimize the loss function
c) To introduce non-linearity to the model
d) To preprocess input data
Answer: c) To introduce non-linearity to the model
14. What is the primary objective of ensemble methods in machine learning?
a) To reduce model complexity
b) To combine the predictions of multiple models to improve performance
c) To decrease training time
d) To increase the interpretability of models
Answer: b) To combine the predictions of multiple models to improve performance
15. Which algorithm is commonly used for clustering in machine learning?
a) Linear Regression
b) K-Means
c) Decision Trees
d) Support Vector Machines (SVM)
Answer: b) K-Means
16. What is the purpose of regularization techniques in machine learning models?
a) To increase model complexity
b) To reduce model complexity and prevent overfitting
c) To improve model interpretability
d) To speed up training time
Answer: b) To reduce model complexity and prevent overfitting
17. In the context of natural language processing (NLP), what does the term “TF-IDF” stand for?
a) Term Frequency-Inverse Document Frequency
b) Token Frequency-Document Importance Factor
c) Text Frequency-Inverse Document Frequency
d) Text Feature-Inverse Document Frequency
Answer: a) Term Frequency-Inverse Document Frequency
18. What is the primary purpose of the “learning rate” in gradient descent optimization algorithms?
a) To control the size of each gradient update
b) To specify the number of iterations
c) To initialize the weights of the model
d) To determine the number of neighbors in KNN
Answer: a) To control the size of each gradient update
19. Which algorithm is suitable for handling missing values in a dataset during preprocessing?
a) Decision Trees
b) K-Nearest Neighbors (KNN)
c) Support Vector Machines (SVM)
d) Random Forest
Answer: b) K-Nearest Neighbors (KNN)
20. What is the primary purpose of the “dropout” technique in neural networks?
a) To remove unnecessary features from the input data
b) To randomly exclude units during training to prevent overfitting
c) To increase the learning rate of the model
d) To reduce the dimensionality of the data
Answer: b) To randomly exclude units during training to prevent overfitting
7. Tableau Training
1. What is Tableau primarily used for?
a) Web development
b) Data visualization
c) Machine learning
d) Database management
Answer: b) Data visualization
2. In Tableau, what is a “dashboard”?
a) A type of data source
b) A collection of worksheets
c) A visual representation of data insights
d) A set of filters applied to a view
Answer: c) A visual representation of data insights
3. Which of the following charts in Tableau is suitable for displaying the distribution of a single numerical variable?
a) Bar chart
b) Scatter plot
c) Pie chart
d) Histogram
Answer: d) Histogram
4. What is the purpose of a “parameter” in Tableau?
a) To store data in a worksheet
b) To control one or more values in calculations
c) To connect to external databases
d) To create calculated fields
Answer: b) To control one or more values in calculations
5. In Tableau, what does the term “data source” refer to?
a) A specific data point in a dataset
b) The entire dataset
c) A filter applied to a view
d) A calculated field
Answer: b) The entire dataset
6. How does Tableau handle the “JOIN” operation between tables in a data source?
a) Automatically detects and performs the appropriate join
b) Requires manual coding of SQL statements
c) Supports only inner joins
d) Doesn’t support joins between tables
Answer: a) Automatically detects and performs the appropriate join
7. What is the purpose of the “Show Me” menu in Tableau?
a) To display additional data
b) To choose a visualization type
c) To export data to external formats
d) To create calculated fields
Answer: b) To choose a visualization type
8. What is a “worksheet” in Tableau?
a) A summary of data insights
b) A collection of dashboards
c) A single view or tab within a Tableau file
d) A data source connection
Answer: c) A single view or tab within a Tableau file
9. What is the purpose of the “Filter” shelf in Tableau?
a) To sort data in ascending order
b) To apply formatting to visualizations
c) To filter data based on specified conditions
d) To create calculated fields
Answer: c) To filter data based on specified conditions
10. In Tableau, what does the term “Dual-Axis” refer to?
a) Combining two separate datasets
b) Displaying two different charts on the same axis
c) Using two separate data connections
d) Overlapping two charts on a single axis
Answer: b) Displaying two different charts on the same axis
11. In Tableau, what is the purpose of the “Hierarchy” feature?
a) Creating calculated fields
b) Organizing data into levels of detail
c) Applying filters to the data
d) Connecting to external databases
Answer: b) Organizing data into levels of detail
12. What does the term “Marks” represent in Tableau?
a) Individual data points on a worksheet
b) The axis labels in a visualization
c) The entire dataset in a workbook
d) A measure of data accuracy
Answer: a) Individual data points on a worksheet
13. Which chart type in Tableau is suitable for comparing proportions or percentages within a whole?
a) Line chart
b) Stacked bar chart
c) Heat map
d) Scatter plot
Answer: b) Stacked bar chart
14. In Tableau, what is the purpose of the “Page Shelf”?
a) To navigate between different worksheets
b) To control the layout and formatting of dashboards
c) To create a series of visualizations over a progression of values
d) To manage data connections
Answer: c) To create a series of visualizations over a progression of values
15. What does the “Aggregation” option in Tableau allow you to do?
a) Group data based on specific criteria
b) Perform mathematical operations on measures
c) Apply filters to the data
d) Create calculated fields
Answer: b) Perform mathematical operations on measures
16. What is the purpose of the “Map Layers” feature in Tableau?
a) Displaying geographical data
b) Sorting data points on a map
c) Adding additional visual elements to maps
d) Connecting to external mapping services
Answer: c) Adding additional visual elements to maps
17. In Tableau, what is a “Story”?
a) A summary of data insights
b) A collection of dashboards
c) A representation of geographical data
d) A sequence of sheets or dashboards that work together
Answer: d) A sequence of sheets or dashboards that work together
18. What is the purpose of the “Table Calculation” feature in Tableau?
a) Creating tables within a worksheet
b) Performing calculations on table data
c) Sorting data in a tabular format
d) Calculating values based on the result set of the query
Answer: b) Performing calculations on table data
19. How does Tableau handle “Null” values in data?
a) Automatically fills in null values with default settings
b) Requires manual removal of null values
c) Treats null values as zero
d) Provides options for custom handling of null values
Answer: d) Provides options for custom handling of null values
20. What is the purpose of the “Quick Filter” feature in Tableau?
a) To quickly create new worksheets
b) To apply filters to the entire workbook
c) To filter data based on specified conditions
d) To navigate between dashboards
Answer: c) To filter data based on specified conditions
8. Udacity Programming for Data Science with Python
1. What is the primary purpose of the NumPy library in Python for data science?
a) Data visualization
b) Statistical analysis
c) Machine learning algorithms
d) Efficient handling of numerical arrays
Answer: d) Efficient handling of numerical arrays
2. In the context of Pandas, what does the term “DataFrame” refer to?
a) A statistical summary of data
b) A collection of Python functions
c) A two-dimensional table of data
d) A machine learning model
Answer: c) A two-dimensional table of data
3. What does the term “API” stand for in the context of web development and data science?
a) Application Programming Interface
b) Automated Program Invocation
c) Adaptive Programming Interface
d) Advanced Python Integration
Answer: a) Application Programming Interface
4. What is the purpose of the “Matplotlib” library in Python for data science?
a) Data manipulation
b) Machine learning algorithms
c) Data visualization
d) Web development
Answer: c) Data visualization
5. In the context of machine learning, what does “train-test split” refer to?
a) Training a model on the entire dataset
b) Splitting the dataset into training and testing sets
c) Testing the model on a separate dataset
d) Shuffling the data randomly
Answer: b) Splitting the dataset into training and testing sets
6. What is the purpose of the “scikit-learn” library in Python for data science?
a) Data manipulation
b) Machine learning algorithms and tools
c) Data visualization
d) Web development
Answer: b) Machine learning algorithms and tools
7. In Pandas, what does the groupby() function do?
a) Sorts data in ascending order
b) Groups data based on specified criteria
c) Applies a sorting algorithm to a DataFrame
d) Filters rows based on a condition
Answer: b) Groups data based on specified criteria
8. What is the primary purpose of the “Jupyter Notebook” in data science?
a) Creating interactive web applications
b) Writing and executing Python code in a collaborative environment
c) Data visualization
d) Statistical analysis
Answer: b) Writing and executing Python code in a collaborative environment
9. In the context of data preprocessing, what does “imputation” refer to?
a) Encoding categorical variables
b) Removing outliers from the data
c) Handling missing values by filling them in
d) Scaling numerical features
Answer: c) Handling missing values by filling them in
10. What is the purpose of the “Seaborn” library in Python for data science?
a) Machine learning algorithms
b) Data visualization with an emphasis on statistical relationships
c) Web development
d) Text data analysis
Answer: b) Data visualization with an emphasis on statistical relationships
11. What does the term “regular expression” (regex) refer to in Python for data science?
a) A method for data encoding
b) A sequence of machine learning algorithms
c) A powerful tool for text pattern matching
d) A feature selection technique
Answer: c) A powerful tool for text pattern matching
12. In the context of machine learning, what does “feature scaling” aim to achieve?
a) Encoding categorical variables
b) Transforming data into a standard range
c) Handling missing values
d) Visualizing data distribution
Answer: b) Transforming data into a standard range
13. What is the primary purpose of the “Requests” library in Python for data science?
a) Web development
b) Statistical analysis
c) Machine learning algorithms
d) Making HTTP requests
Answer: d) Making HTTP requests
14. What is the primary role of the “Beautiful Soup” library in Python?
a) Web scraping and parsing HTML/XML documents
b) Statistical analysis
c) Machine learning model training
d) Data visualization
Answer: a) Web scraping and parsing HTML/XML documents
15. What does the term “pickle” refer to in Python for data science?
a) A Python module for statistical analysis
b) A machine learning algorithm
c) A way to serialize and deserialize Python objects
d) A data visualization library
Answer: c) A way to serialize and deserialize Python objects
16. In Pandas, what does the fillna() function do?
a) Fills null values in a DataFrame with specified values
b) Removes all rows containing null values
c) Drops columns with null values
d) Sorts the DataFrame in ascending order
Answer: a) Fills null values in a DataFrame with specified values
17. What is the purpose of the “Decision Trees” algorithm in machine learning?
a) Clustering data points
b) Classification and regression tasks
c) Dimensionality reduction
d) Encoding categorical variables
Answer: b) Classification and regression tasks
18. In Python, what does the term “virtual environment” refer to?
a) Simulating a web development environment
b) A self-contained directory that contains a Python interpreter and installed packages
c) A remote server for data storage
d) A feature for virtualizing data analysis workflows
Answer: b) A self-contained directory that contains a Python interpreter and installed packages
19. What is the purpose of the “pickle” module in Python’s standard library?
a) To clean and preprocess data
b) To serialize and deserialize Python objects
c) To perform statistical analysis
d) To create interactive visualizations
Answer: b) To serialize and deserialize Python objects
20. What does the term “cross-validation” aim to address in machine learning?
a) Feature selection
b) Overfitting
c) Underfitting
d) Bias-variance tradeoff
Answer: b) Overfitting
Related Queries: