The process took around 2 months, in which I had 2 interviews via teams over the span of two months, one with a project member and one with the project lead. Basic questions about my resume and fundamental architecture questions were asked. After that was the panel round where I interviewed with four of the team members. The first panel interview went well though when doing the technical question by hand, I messed up a lot, but still got the right answer. The second team member did not ask me many "technical" questions but mostly stuff on my resume, but asked me stuff I didn't work on. It felt weird when I said I did x,y,z that they asked me something tangential, that my impression from the rest of my interviews, they didn't even seem to care about. I said something correct, but with the wrong reasoning to that, but overall the rest of the technical questions I answered fine that had to do with my experience. Third interview was a complete mess. The interviewer did not define any of the problems completely. Imagine being asked to do the operation "whats 5 and 6?" multiple times and asking for clarification and getting "whats 5 and 6 together?" and having to guess each time whether its 11 or 30. I should've known this when I referenced a common operation in PyTorch/NumPy/Tensorflow, and he seemed to be clueless about the notation. Final interview was great but fundamentally had the same problems as all the other interviews. They asked me do I know "tensor parallelism" which my immediate response in my head is "what a dumb name, you fundamentally have three tensors, an input, output and weight tensor, and only one of them is parallel." While I felt this was my strongest interview because at this point since I was more used to asking for definitions and taking shots if they refused define them. This interview also featured the work I was most closely working on the past two years, but it still felt like there was a bit of pulling teeth when talking to them.
I have been working on these problems for a long time in the accelerator space, and thought even though this job was GPU based I'd be fine. The accelerator space seems to clearly define these terms better by talking about them in terms of stuff like "fine-grained parallelism" or "coarse-grained parallelism" (see tangram out of Stanford) referring to how close to the compute they are or "weight stationary," "output stationary," "input stationary," or "row stationary" (see work by Joel Emer and Vivienne Sze) to define which part of the tensor is the one moving (though that's just the primary one, as secondary ones don't have a defined order, but there are notations to talk about complete movement) and additional work even has high level languages or representations to define these movements (see Maestro out of GATech). Coming from that space into this, feels like speaking a completely different language where every word sounds the same and has no reflection on their meaning. To be completely clear, I am willing to learn all these translations, words, or definitions, but asking to do this on the spot without preparation to me reads as insane.
It feels bad that constantly throughout the panel I had to fight with translating loosely defined words, or ill-conceived problems that in order to answer, I'd have to fundamentally be completely reactive to. I don't believe any role would actually have me do this, that it would be sit down and understand the stuff before simulating it, but within this process all I was given was a mix of poorly formulated problems and definitions that only makes sense to the people asking it a lot of the time. I don't believe they actually tested my ability to do the job, and at best they tested if I'd be as good as them day 1 which seems unreasonable. I'd find it hard pressed for anyone to pass the same process I went through without additional information about the language, types of problems being asked, them more clearly defining problems, or some form of nepotism.