Data Science Interview questions
ML Basics:
- How does Random Forest handle missing values?
- What do you understand by machine learning inference?
- Explain what is the Logit function?
- How is statistics used in DS/ML?
- Explain the important hyper-parameters of Random Forest and Logistic Regression.
- Explain the Central Limit Theorem.
- What are the assumptions of Linear Regression?
- Explain Gradient Boosting.
- How can error in imbalanced data be handled?
- Do you have any experience in model deployment and AWS?
- Explain in brief the entire ML pipeline.
- Bias and Variance:
- How does the bias-variance trade-off affect model performance?
- Loss Function:
- What are the most commonly used loss functions for classification tasks?
- What is the difference between Mean Squared Error (MSE) and Mean Absolute Error (MAE)?
- If your model is overfitting, what changes would you make to the loss function?
- What is the impact of outliers on MSE and MAE? How does Huber Loss mitigate this?
- How do you handle loss functions for multi-label classification tasks?
- What is the relationship between a loss function and an optimizer in machine learning?
- Can you design a custom loss function for a specific task? How would you implement it in a machine learning framework (e.g., TensorFlow or PyTorch)?
- How does gradient descent minimize the loss function?
- What are some common issues when using log loss in classification tasks, and how do you address them?
- Why do we prefer to use cross-entropy over MSE for classification tasks?
- What is cross-entropy loss, and why is it used for classification tasks?
- Why is Mean Squared Error (MSE) not suitable for classification problems?
- Explain hinge loss and its use in Support Vector Machines (SVMs).
- Cross Validation:
- What are the advantages and disadvantages of k-fold cross-validation?
- What is the difference between k-fold cross-validation and stratified k-fold cross-validation?
- What is Leave-One-Out Cross-Validation (LOOCV), and how is it different from k-fold cross-validation?
- When should you use stratified cross-validation?
- How does cross-validation help prevent overfitting?
- What are some common pitfalls when using cross-validation?
- How do you choose the right number of folds in k-fold cross-validation?
- Why is LOOCV not commonly used despite being exhaustive?
- Can cross-validation be used for model selection?
- Overfitting:
- What causes overfitting in a model?
- How do you identify if a model is overfitting?
- What is the difference between overfitting and underfitting?
- How can you prevent overfitting in machine learning models?
- What is regularization, and how does it help in preventing overfitting?
- What is early stopping, and how does it help avoid overfitting?
- What is the bias-variance trade-off in the context of overfitting?
- What is the impact of overfitting on model performance?
- Why does increasing the size of the training data help reduce overfitting?
- What is the role of feature selection in preventing overfitting?
- Silhouette Score:
- What is the Silhouette Score in clustering?
- What does it mean if the Silhouette Score is close to zero?
- Can the Silhouette Score handle non-convex clusters?
- How does the Silhouette Score handle outliers?
- What does a negative Silhouette Score indicate?
- Ref:
- https://interviewkickstart.com/blogs/articles/machine-learning-engineer-interview-questions
AI/ML System Design and Architecture
-
How would you design a scalable recommendation system for billions of users?
-
Design an end-to-end system for deploying a transformer-based LLM for chat completion with low latency.
-
How do you handle model versioning and rollback in production ML pipelines?
-
How would you design a real-time fraud detection system using ML?
-
Compare embedding storage strategies for RAG systems: FAISS, Weaviate, Elasticsearch, etc.
-
How would you architect an AI assistant like Perplexity or ChatGPT?
-
How do you ensure retraining pipelines are scalable, reproducible, and cost-efficient?
Design an end-to-end recommender system for a marketplace: requirements, data strategy, feature store, model choices (ranking vs candidate gen), online/offline eval, cold start, feedback loops, and on-call/SLAs.
Build a fraud detection platform for payments: labeling strategy, imbalanced data handling, drift monitoring, human-in-the-loop, and abuse/adversarial adaptation
Design a real-time ETA or translation system: streaming ingestion, model training cadence, latency budgets, online feature computation, A/B strategy, rollback and guardrails.
Design a large-scale ticket-routing system for customer support: taxonomy evolution, weak supervision, evaluation beyond accuracy, and operational dashboards.
Architect an ML platform to support 100+ teams: feature stores, registries, model deployment, canary/shadow, lineage, RBAC/compliance, and cost governance.
Ref:
https://towardsdatascience.com/nailing-the-machine-learning-design-interview-6b91bc1d036c/
AI/ML Deep Knowledge
-
How does attention mechanism in transformers work?
Explain transformer internals: attention, positional encodings, layer norms, residual connections, and how scaling laws affect design choices.
Parameter-efficient fine-tuning: LoRA, adapters, prefix/prompt tuning—trade-offs in quality, latency, and memory; when full fine-tuning is justified.
-
Explain the differences between BERT, GPT, T5, and LLaMA.
-
What are the key trade-offs between dense and sparse models?
-
How do retrieval-augmented generation (RAG) models work? How would you improve RAG performance?
Retrieval-Augmented Generation (RAG): indexing strategy, chunking, embeddings, vector DB selection, freshness/consistency, hallucination reduction, and eval methods.
Safety and guardrails: prompt injection defenses, output filtering, red teaming, jailbreaking mitigation, and governance processes.
-
What are the strengths and weaknesses of instruction tuning vs. reinforcement learning from human feedback (RLHF)?
-
What are diffusion models, and how do they differ from GANs?
-
Explain Mixture of Experts (MoE). When is it useful? Challenges?
Evaluation of LLMs: automatic metrics (e.g., BLEU/ROUGE/BERTScore for tasks), human eval, rubric-based scoring, and production KPIs; offline vs online trade-offs.
Cost/performance optimization: model distillation, quantization, speculative decoding, routing to small/large models, and request shaping.
Shipping an LLM feature to millions of users: experimentation design, guardrails, abuse prevention, data retention/privacy, and incident response playbooks.
Monitoring: hallucination/drift monitoring, content policy violations, prompt distribution shifts, and feedback loop design.
Ref :
https://www.coursera.org/articles/large-language-models-interview-questions
https://www.projectpro.io/article/llm-interview-questions-and-answers/1025
https://github.com/Devinterview-io/llms-interview-questions
Data Engineering + Model Training
-
How do you handle large-scale data preprocessing for deep learning models?
-
What are the bottlenecks in a distributed model training pipeline? How do you mitigate them?
-
How would you do feature selection and feature drift detection in production?
-
What logging and tracing tools would you use to monitor ML model performance in production?
-
How do you handle concept drift in online learning systems?
LLM-specific / Foundation Model Questions
-
How would you fine-tune an LLM for a specific downstream task with minimal data?
-
What are the security concerns when deploying public LLM endpoints?
-
How would you reduce hallucinations in LLMs?
-
Describe a scalable architecture to implement function calling / tool use in an LLM agent.
-
Compare quantization methods (INT8, GPTQ, AWQ). When would you use each?
-
How do you evaluate the quality of generated text from an LLM?
How would you ensure that LLM will answer customer questions in line with company's policies.
Distributed Systems for AI
-
Design a distributed training system with fault tolerance and job resumption.
-
How would you shard and serve an embedding index with billions of entries?
-
Design an LLM inference engine that supports model parallelism and request batching.
-
Discuss how you'd cache LLM responses at scale while ensuring correctness.
Comments
Post a Comment