Q24: What is Deep Learning and how does it differ from Machine Learning?
Deep Learning is a subset of machine learning that uses neural networks with multiple layers (typically 3 or more) to automatically learn and extract features from raw data.
| Aspect |
Machine Learning |
Deep Learning |
| Feature Engineering |
Manual feature engineering required |
Automatic feature extraction |
| Data Requirements |
Works well with small datasets |
Requires large amounts of data |
| Computational Power |
Less computational power needed |
Requires significant computational resources (GPUs) |
| Human Intervention |
More human intervention |
Less human intervention once running |
| Algorithm Complexity |
Simpler algorithms |
Complex neural network architectures |
| Training Time |
Faster training time |
Longer training time |
| Interpretability |
More interpretable |
Black box models |
Q25: Explain the structure and components of an Artificial Neural Network.
An Artificial Neural Network (ANN) consists of interconnected nodes (neurons) organized in layers.
Components:
- Input Layer: Receives input data, number of neurons = input features
- Hidden Layers: Intermediate layers performing computations
- Output Layer: Produces final output, neurons depend on task type
- Weights: Parameters controlling connection strength between neurons
- Bias: Additional parameter providing flexibility
- Activation Function: Introduces non-linearity
Neuron Operation:
- Receives weighted inputs from previous layer
- Calculates weighted sum plus bias
- Applies activation function
- Passes output to next layer
Q26: What are activation functions and why are they important?
Activation functions are mathematical functions applied to neuron outputs to introduce non-linearity, enabling networks to learn complex patterns.
Common Activation Functions:
1. ReLU (Rectified Linear Unit):
f(x) = max(0, x)
- Advantages: Computationally efficient, reduces vanishing gradient
- Usage: Hidden layers
2. Sigmoid:
f(x) = 1/(1 + e^(-x))
- Range: (0, 1)
- Usage: Binary classification output layer
3. Tanh (Hyperbolic Tangent):
f(x) = (e^x - e^(-x))/(e^x + e^(-x))
- Range: (-1, 1)
- Usage: Hidden layers, better than sigmoid for hidden layers
4. Softmax:
- Converts raw scores to probability distribution
- Usage: Multi-class classification output layer
5. ELU (Exponential Linear Unit):
- Addresses dying ReLU problem
- Provides smoothness for negative values
Importance:
- Enable learning of non-linear relationships
- Control information flow in network
- Affect gradient flow during backpropagation
Q27: Explain Forward Propagation and Backpropagation.
Forward Propagation:
- Process of transmitting input data through the network to produce output
- Input flows from input layer through hidden layers to output layer
- Each neuron computes weighted sum and applies activation function
- Output from one layer becomes input for next layer
Backpropagation:
- Optimization algorithm used to train neural networks
- Calculates gradients of loss function with respect to weights and biases
- Propagates error backward from output to input layer
- Uses chain rule to compute gradients
- Updates weights and biases to minimize loss
Training Process:
- Forward pass: Compute predictions
- Calculate loss: Compare predictions with actual targets
- Backward pass: Compute gradients
- Update parameters: Adjust weights and biases
- Repeat for multiple epochs
Q28: What are the common loss functions used in deep learning?
Regression Problems:
Binary Classification:
Multi-class Classification:
Purpose: Loss functions quantify the difference between predicted and actual values, guiding the optimization process during training.
Q29: What are optimizers and explain common types?
Optimizers are algorithms that adjust network parameters (weights and biases) to minimize the loss function.
Common Optimizers:
1. Stochastic Gradient Descent (SGD):
- Basic optimizer with fixed learning rate
- Updates parameters in direction opposite to gradient
- Simple but can be slow to converge
2. Adam (Adaptive Moment Estimation):
- Combines momentum and adaptive learning rates
- Maintains moving averages of gradients and squared gradients
- Generally performs well across different problems
3. RMSprop:
- Adaptive learning rate optimizer
- Maintains moving average of squared gradients
- Good for recurrent neural networks
Key Parameters:
- Learning Rate: Controls step size during optimization
- Momentum: Helps accelerate convergence
- Decay: Reduces learning rate over time