Data Science Interview questions

ML Basics: 

  • How does Random Forest handle missing values?
  • What do you understand by machine learning inference?
  • Explain what is the Logit function?
  • How is statistics used in DS/ML?
  • Explain the important hyper-parameters of Random Forest and Logistic Regression.
  • Explain the Central Limit Theorem.
  • What are the assumptions of Linear Regression?
  • Explain Gradient Boosting.
  • How can error in imbalanced data be handled?
  • Do you have any experience in model deployment and AWS?
  • Explain in brief the entire ML pipeline.
  • Bias and Variance:
    • How does the bias-variance trade-off affect model performance?
  • Loss Function:
    • What are the most commonly used loss functions for classification tasks? 
    • What is the difference between Mean Squared Error (MSE) and Mean Absolute Error (MAE)? 
    • If your model is overfitting, what changes would you make to the loss function?
    • What is the impact of outliers on MSE and MAE? How does Huber Loss mitigate this?
    • How do you handle loss functions for multi-label classification tasks?
    • What is the relationship between a loss function and an optimizer in machine learning? 
    • Can you design a custom loss function for a specific task? How would you implement it in a machine learning framework (e.g., TensorFlow or PyTorch)?  
    • How does gradient descent minimize the loss function?
    • What are some common issues when using log loss in classification tasks, and how do you address them?
    • Why do we prefer to use cross-entropy over MSE for classification tasks?
    • What is cross-entropy loss, and why is it used for classification tasks?
    • Why is Mean Squared Error (MSE) not suitable for classification problems?
    • Explain hinge loss and its use in Support Vector Machines (SVMs).
  • Cross Validation:
    • What are the advantages and disadvantages of k-fold cross-validation?
    • What is the difference between k-fold cross-validation and stratified k-fold cross-validation?
    • What is Leave-One-Out Cross-Validation (LOOCV), and how is it different from k-fold cross-validation?
    • When should you use stratified cross-validation?
    • How does cross-validation help prevent overfitting?
    • What are some common pitfalls when using cross-validation?
    • How do you choose the right number of folds in k-fold cross-validation?
    • Why is LOOCV not commonly used despite being exhaustive?
    • Can cross-validation be used for model selection?
  • Overfitting:
    • What causes overfitting in a model?
    • How do you identify if a model is overfitting?
    • What is the difference between overfitting and underfitting?
    • How can you prevent overfitting in machine learning models?
    • What is regularization, and how does it help in preventing overfitting?
    • What is early stopping, and how does it help avoid overfitting?
    • What is the bias-variance trade-off in the context of overfitting?
    • What is the impact of overfitting on model performance?
    • Why does increasing the size of the training data help reduce overfitting?
    • What is the role of feature selection in preventing overfitting?
  • Silhouette Score:
    • What is the Silhouette Score in clustering?
    • What does it mean if the Silhouette Score is close to zero?
    • Can the Silhouette Score handle non-convex clusters?
    • How does the Silhouette Score handle outliers?
    • What does a negative Silhouette Score indicate?
  • Ref:
    • https://interviewkickstart.com/blogs/articles/machine-learning-engineer-interview-questions

AI/ML System Design and Architecture

  • How would you design a scalable recommendation system for billions of users?

  • Design an end-to-end system for deploying a transformer-based LLM for chat completion with low latency.

  • How do you handle model versioning and rollback in production ML pipelines?

  • How would you design a real-time fraud detection system using ML?

  • Compare embedding storage strategies for RAG systems: FAISS, Weaviate, Elasticsearch, etc.

  • How would you architect an AI assistant like Perplexity or ChatGPT?

  • How do you ensure retraining pipelines are scalable, reproducible, and cost-efficient?

  • Design an end-to-end recommender system for a marketplace: requirements, data strategy, feature store, model choices (ranking vs candidate gen), online/offline eval, cold start, feedback loops, and on-call/SLAs.

  • Build a fraud detection platform for payments: labeling strategy, imbalanced data handling, drift monitoring, human-in-the-loop, and abuse/adversarial adaptation

  • Design a real-time ETA or translation system: streaming ingestion, model training cadence, latency budgets, online feature computation, A/B strategy, rollback and guardrails.

  • Design a large-scale ticket-routing system for customer support: taxonomy evolution, weak supervision, evaluation beyond accuracy, and operational dashboards.

  • Architect an ML platform to support 100+ teams: feature stores, registries, model deployment, canary/shadow, lineage, RBAC/compliance, and cost governance.

  • Ref: 

  • https://towardsdatascience.com/nailing-the-machine-learning-design-interview-6b91bc1d036c/

     

AI/ML Deep Knowledge

  • How does attention mechanism in transformers work?

  • Explain transformer internals: attention, positional encodings, layer norms, residual connections, and how scaling laws affect design choices.

  • Parameter-efficient fine-tuning: LoRA, adapters, prefix/prompt tuning—trade-offs in quality, latency, and memory; when full fine-tuning is justified.

  • Explain the differences between BERT, GPT, T5, and LLaMA.

  • What are the key trade-offs between dense and sparse models?

  • How do retrieval-augmented generation (RAG) models work? How would you improve RAG performance?

  • Retrieval-Augmented Generation (RAG): indexing strategy, chunking, embeddings, vector DB selection, freshness/consistency, hallucination reduction, and eval methods.

  • Safety and guardrails: prompt injection defenses, output filtering, red teaming, jailbreaking mitigation, and governance processes.

  • What are the strengths and weaknesses of instruction tuning vs. reinforcement learning from human feedback (RLHF)?

  • What are diffusion models, and how do they differ from GANs?

  • Explain Mixture of Experts (MoE). When is it useful? Challenges?

  • Evaluation of LLMs: automatic metrics (e.g., BLEU/ROUGE/BERTScore for tasks), human eval, rubric-based scoring, and production KPIs; offline vs online trade-offs.

  • Cost/performance optimization: model distillation, quantization, speculative decoding, routing to small/large models, and request shaping.

  • Shipping an LLM feature to millions of users: experimentation design, guardrails, abuse prevention, data retention/privacy, and incident response playbooks.

  • Monitoring: hallucination/drift monitoring, content policy violations, prompt distribution shifts, and feedback loop design.

  • Ref :

    https://www.coursera.org/articles/large-language-models-interview-questions

    https://www.projectpro.io/article/llm-interview-questions-and-answers/1025 

    https://github.com/Devinterview-io/llms-interview-questions

Data Engineering + Model Training

  • How do you handle large-scale data preprocessing for deep learning models?

  • What are the bottlenecks in a distributed model training pipeline? How do you mitigate them?

  • How would you do feature selection and feature drift detection in production?

  • What logging and tracing tools would you use to monitor ML model performance in production?

  • How do you handle concept drift in online learning systems?

LLM-specific / Foundation Model Questions

  • How would you fine-tune an LLM for a specific downstream task with minimal data?

  • What are the security concerns when deploying public LLM endpoints?

  • How would you reduce hallucinations in LLMs?

  • Describe a scalable architecture to implement function calling / tool use in an LLM agent.

  • Compare quantization methods (INT8, GPTQ, AWQ). When would you use each?

  • How do you evaluate the quality of generated text from an LLM?

  • How would you ensure that LLM will answer customer questions in line with company's policies.

Distributed Systems for AI

  • Design a distributed training system with fault tolerance and job resumption.

  • How would you shard and serve an embedding index with billions of entries?

  • Design an LLM inference engine that supports model parallelism and request batching.

  • Discuss how you'd cache LLM responses at scale while ensuring correctness.

Comments

Popular posts from this blog

Density Based clustering evaluation with Modified Silhouette Score

Cluster Evaluation: Silhouette Score

Understanding of Attribute types