Data Scientist Interview Questions in United States | Glassdoor.ca

# Data Scientist Interview Questions in United States

1,309

Data scientist interview questions shared by candidates

## Top Interview Questions

Sort: RelevancePopular Date

12 Sep, 2013

25 Feb, 2012
 Find the second largest element in a Binary Search Tree16 Answersfind the right most element. If this is a right node with no children, return its parent. if this is not, return the largest element of its left child.One addition is the situation where the tree has no right branch (root is largest). In this special case, it does not have a parent. So it's better to keep track of parent and current pointers, if different, the original method by the candidate works well, if the same (which means the root situation), find the largest of its left branch.if (root == null || (!root.hasRightChild() ) { return null;} else return findSecondGreatest(root, root.getValue()); value findSecondGreatest(Node curr, value oldValue) { if(curr.hasRightChild()) { return (findSecondGreatest( curr.getRightChild(), curr.value)); } else return oldValue; }Show More ResponsesAbove answer is wrong. it has to be something like this. public static int findSecondLargest(Node node) { Node secondLargest = null; Node parent = null; Node child = node; if (node!=null && (node.hasLeftChild()||node.hasRightChild())) { if (node.hasRightChild()) { while (child.hasRightChild()) { parent = child; child = child.rightChild(); } secondLargest = parent; } else if (node.hasLeftChild()) { child = node.leftChild(); while (child.hasRightChild()) { child = child.rightChild(); } secondLargest = child; } } return secondLargest; }The above answer is also wrong; Node findSceondLargest(Node root) { // If tree is null or is single node only, return null (no second largest) if (root==null || (root.left==null && root.right==null)) return null; Node parent = null, child = root; // find the right most child while (child.right!=null) { parent = child; child = child.right; } // if the right most child has no left child, then it's parent is second largest if (child.left==null) return parent; // otherwise, return left child's rightmost child as second largest child = child.left; while (child.right!=null) child = child.right; return child; }Soln by "mindpower" works. Thank you. I am trying to solve a similar problem Find the 2nd nearest high(in in-order traversal) value for a given node Eg: Given nums: 12 7 14 3, construct a BST. If the given value is: 7 then we should return 14 (in the sort order: 3, 7, 12, 14) if the given value is: 3 then we should return 12 (in the sort order: 3, 7, 12, 14)Generic solution in C# for any k. Notice that this example can be easily changed to find the k-th smallest node by doing a depth-first recursion on root.Left first, and then a tail recursion on root.Right. public Node GetKthLargest(int k) { return GetKthLargest(ref k, this.Root); } Node GetKthLargest(ref int k, Node root) { if (root == null || k < 1) return null; var node = GetKthLargest(ref k, root.Right); if (node != null) return node; if (--k == 0) return root; return GetKthLargest(ref k, root.Left); }recursion is not needed. SecondLargest(Node root, Node secondLarge) { if(root.right==null) return root.left; Node secondLargest = root; while(secondLargest.right.right==null) secondLargest=secondLargest.right; return secondLargest; }int getmax(node *root) { if(root->right == NULL) { return root->d; } return getmax(root->right); } int secondmax(node *root) { if(root == NULL) { return -1; } if(root->right == NULL && root->left != NULL) { return getmax(root->left); } if(root->right != NULL) { if(root->right->right == NULL && root->right->left == NULL) { return root->d; } } return secondmax(root->right); }In-order traverse the tree. The second last element in the array in the answer.In Python: def find_second_largest_bst_element(root, parent=None): if parent is None: # BST root if root.right is None: # no right subtree if root.left is not None: # if a left subtree exists... return root.left else: # root is the only element of the BST return False else: if root.right is None: # right-most element if root.left is not None: # left subtree exists return root.left else: # leaf return parent else: # check right subtree find_second_largest_bst_element(root.right, root) find_second_largest_bst_element(root)For kth smallest, descend the left subtree first. class Node: def __init__(self, value, left=None, right=None): self.value = value self.left = left self.right = right def findKthLargest(root, k): global count if root is None: return findKthLargest(root.right, k) count += 1 if count == k: print root.value return findKthLargest(root.left, k) count = 0 r = Node(10, Node(5, Node(2), Node(7)), Node(30, Node(22), Node(32))) findKthLargest(r, 3)// solution in java // main routine Node findSecondMax(Node root) { if(root == null || (root.left == null && root.right == null) return null; else { Node max = findMax(root); return (max.parent == null) ? findMax(max.left) : max.parent; } } //helper routine, recursive implementation.... can also be done non-recursively Node findMax(Node root) { return (root.right == null) ? root : findMax(root.right); }Show More ResponsesFind the largest number in the binary tree and delete it. And again find the largest number. Short and fast.Reverse in-order traversal of the BST, keeping a count of # of visited nodes. This methods works great to return the kth largest element in a BST.mindpower's solution looks right

26 May, 2013
 Write a function that takes in two sorted lists and outputs a sorted list that is their union.9 Answersf(a,b) { return sort(unique(a,b)) }def sortedUnion(list1,list2): list3 = [x for x in list1 if x in list2] return sorted(list(set(list3)))google merge sortShow More Responseswrite 2 helpers: 1) INSERT(A, b) = put element b within A in the sort order 2) DEL(A, a) = delete element a from A Then do this recursion: f(A,B) : if max(A) <= min(B) return [A B] else { B = INSERT(B, max(a)); A = DEL(A, max(a); f(A,B); } something like that. try coding and testing. I haven't.Oops, check/write a termination conditionOn Python, you could do: from sets import Set def merge_sort(a,b): return sorted( Set(a).union(Set(b)) )def sorted_union(list1, list2): union=set(list1).union(set(list2)) sorted_union=sorted(list(union)) return sorted_unionSecond part of merge sort. Don't answer with sort(a), etc. Anyone can do that... def merge(A, B): i=0 j=0 sorted_list = [] while i < len(A) and j < len(B): if A[i] <= B[j]: sorted_list.append(A[i]) i += 1 else: sorted_list.append(B[j]) j += 1 if i < len(A): sorted_list.extend(A[i:]) elif j < len(B): sorted_list.extend(B[j:]) return sorted_listI assumed that we can not use any "sort" function and we want it with linear time. so here it is: def my_sort(list_a, list_b): if len(list_a) ==0: return list_b elif len(list_b) ==0: return list_a else: if list_a[-1] > list_b[-1]: return( my_sort(list_a[0:-1], list_b) + [list_a.pop(-1)]) else: return(my_sort(list_a,list_b[:-1]) + [list_b.pop(-1)])

16 Feb, 2012
 generating a sorted vector from two sorted vectors. 3 Answerskeep two pointers and compare the two numbers they point to. Move the pointer which points to the smaller or equal number. End loop when two pointers reach the end.look at merge in mergesort, does exact same thing.Merge sort is the best...many languages have this function inbuilt...else this can also be done manually, assume two vectors A [1,2,3,4] And B[5,6,7,8]...merge them...compare the last value of A and first value of B...in our case 4<5 is true...thus the result...if it is false then move the number up and then compare it with the previous number and so on...

### Data Scientist at Fino Consulting was asked...

18 Feb, 2014
 Why is data important? (or something along those lines; I'd call this an unexpected question)1 AnswerI discussed the untapped possibilities of data as a resource for boosted revenue, and educated business decisions. It's like a surgeon discovering the scalpel, while before all he had was a butter-knife. The precision is a game-changer.

### Senior Data Scientist at SparkCognition was asked...

26 Feb, 2017
 academic-like questions where you could just look up the answer. just memorize a thing or two about the basic algos like neural nets, random forests, svm, linear regression. that satisfies their data scientists.2 Answersi did not bother much with providing answers to questions of this typeCould you please share their data science exercise and solution ?

### Data Scientist at Groupon was asked...

17 Jul, 2014
 I signed an NDA, so I can't give much details, but one interviewer asked me a very open ended question that involved how I would create/design/implement a certain algorithm from start to end.1 AnswerI created an epic flowchart on the white board. Don't be afraid to ask for clarifications.

### Data Scientist at Zenefits was asked...

17 Dec, 2016
 Bayes' Formular: Marbles: There are 30 red marbles and 10 black marbles in Urn #1. You have 20 red and 20 Black marbles in Urn 2. Randomly you pull a marble from the random urn and find that it is red. What is the probability that it was pulled from Urn #12 AnswersProb(Red from Urn#1)= [0.5*(3/4)]/ [0.5*(3/4)+0.5*0.5]Easier is to just consider the red marbles: There are 30 in urn 1, and 30 + 20 = 50 altogether in both urns. So the answer is 30/50.

### Data Scientist at Zenefits was asked...

17 Dec, 2016
 Alice and Bob take turns in rolling a fair dice. Whoever gets "6" first wins the game. Alice starts the game. What are the chances that Alice wins. 4 AnswersProb(Alice wins in the first round) =1/6; Prob(Alice Wins in the second round)=(5/6)*(5/6)*(1/6). Alice does not get "6" and Bob does not get "6" in the first round and finally Alice gets six. For n-th round Prob (Alice wins)=(1/6)*(5/6)^[2*(n-1)]If Alice starts, she wins the game with probability 6/11I think it should be 7/11Show More ResponsesThe answer is 6/11. To see this, let pA = Pr[Alice wins] and pB = Pr[Bob wins]. Then note that pA = Pr[win if go first], so pB = Pr[Alice loses first roll] * Pr[win if go first]. So mathematically we have pB = 5/6 * pA, and pA + pB = 1.

### Data Scientist at Zenefits was asked...

17 Dec, 2016
 Two random cards numbered from 1,2...100 are pulled from the deck. What is the probability that one number doubles the other from the deck. 2 AnswersYou have a total of 100=50*2 favorable outcomes (1 2); (2;4)..(50; 100) and 100_C_2 possible combinations 100/(100C2)=0.02A simpler way to think of it is that for a given first card x1, it has a unique "partner" among the remaining cards. So there is a 1/99 chance to draw it. The first card does not matter, since Pr[succes|x1] is independent of x1. (Mathematically, you are double counting, because there are only 50 favorable outcomes, ignoring draw order.)
110 of 1,309 Interview Questions