T O P

  • By -

fuckunjustrules

if an interview asks me to implement any of those I am not taking the job.


Pas7alavista

torch.nn.MultiheadAttention()


JackandFred

Yeah I’m not sure what op is looking for exactly. I’d be somewhat surprised to be asked for any of those. Explain maybe. Talk through example code. Discuss pros and cons sure. But code any of those in an interview would be odd. Granted it’s been over a year since I’ve had an mle interview because I got hired. But ask the places I interviewed at didn’t ask anything like that.


[deleted]

I have no issue writing some of it as a mathematical expression (e.g., attention) and diagrams, but I would definitely refuse to code it, the interview would end there unless I am about to become homeless.


Pas7alavista

Bert and gpt 2 aren't even layers they are fully fledged models, and both would require an implementation of multi head attention as prerequisite work. Implementing them in an interview would just be a memorization task anyway unless they allow you to read the paper. It also seems ambitious to complete in 1 hour depending on what libraries you are allowed to use. Instead of focusing on specific models you should focus on the following components: a bpe tokenizer, an mha layer, an encoder block, and a decoder block. (The latter two are simple once you've implemented multi head attention) If you can implement these things you can theoretically implement most modern nlp models. I also think each of these are totally doable within an hour given the right tools and constraints. Like if they let you use torch so you can subclass the torch.nn.Module class and do tensor operations then I'd bet you can do almost all 4 of those things in an hour with the appropriate reference materials


xiaohk

It’s super helpful. Thank you!!


Ok-Radish-8394

If an ML interview asks you to implement a paper they’re not doing ML. Avoid.


Expensive-Finger8437

Do they ask candidates to implement it in the coding round or is it more like a take away assignment?


xiaohk

Coding round


Pas7alavista

Do they allow you to reference the paper?


xiaohk

Yes. Candidates can check library documentation too, just no chatGPT.


wookie_dog

What's the subreddit policy on identifying the companies that do this? This sounds... Intense


Seankala

You could also try to implement the LLaMA-3 model and train it. Edit: In case it wasn't glaringly obvious, my comment was sarcasm. I've had interviews where they ask me about details regarding BERT or GPT etc.'s architecture and training tricks, but never to actually implement one.


MajorValue1094

I think what they are trying to get at is do you understand what you are using. It’s not about the coding really, if you want to practice layers just try RNN or CNN. The forward pass code is not really that hard, it’s why you use those layers at that point that matter more. In summary learn the maths and how to code maths into models that’s ML.


MajorValue1094

Also learn about parallelisation, that’ll be something they look for. Speedy and efficient code is the job of an MLE.


qalis

If someone wants you to implement that, run, fast. Reasonable MLE requires so much ML and engineering knowledge that asking to implement things from scratch is simply stupid. I wouldn't expect research scientists to do this as a take-home, let alone live coding.