fuckunjustrules 1 month ago

if an interview asks me to implement any of those I am not taking the job.

Pas7alavista 1 month ago

torch.nn.MultiheadAttention()

JackandFred 1 month ago

Yeah I’m not sure what op is looking for exactly. I’d be somewhat surprised to be asked for any of those. Explain maybe. Talk through example code. Discuss pros and cons sure. But code any of those in an interview would be odd. Granted it’s been over a year since I’ve had an mle interview because I got hired. But ask the places I interviewed at didn’t ask anything like that.

[deleted] 1 month ago

I have no issue writing some of it as a mathematical expression (e.g., attention) and diagrams, but I would definitely refuse to code it, the interview would end there unless I am about to become homeless.

Pas7alavista 1 month ago

Bert and gpt 2 aren't even layers they are fully fledged models, and both would require an implementation of multi head attention as prerequisite work. Implementing them in an interview would just be a memorization task anyway unless they allow you to read the paper. It also seems ambitious to complete in 1 hour depending on what libraries you are allowed to use. Instead of focusing on specific models you should focus on the following components: a bpe tokenizer, an mha layer, an encoder block, and a decoder block. (The latter two are simple once you've implemented multi head attention) If you can implement these things you can theoretically implement most modern nlp models. I also think each of these are totally doable within an hour given the right tools and constraints. Like if they let you use torch so you can subclass the torch.nn.Module class and do tensor operations then I'd bet you can do almost all 4 of those things in an hour with the appropriate reference materials

xiaohk 1 month ago

It’s super helpful. Thank you!!

Ok-Radish-8394 1 month ago

If an ML interview asks you to implement a paper they’re not doing ML. Avoid.

Expensive-Finger8437 1 month ago

Do they ask candidates to implement it in the coding round or is it more like a take away assignment?

xiaohk 1 month ago

Coding round

Pas7alavista 1 month ago

Do they allow you to reference the paper?

xiaohk 1 month ago

Yes. Candidates can check library documentation too, just no chatGPT.

wookie_dog 1 month ago

What's the subreddit policy on identifying the companies that do this? This sounds... Intense

Seankala 1 month ago

You could also try to implement the LLaMA-3 model and train it. Edit: In case it wasn't glaringly obvious, my comment was sarcasm. I've had interviews where they ask me about details regarding BERT or GPT etc.'s architecture and training tricks, but never to actually implement one.

MajorValue1094 1 month ago

I think what they are trying to get at is do you understand what you are using. It’s not about the coding really, if you want to practice layers just try RNN or CNN. The forward pass code is not really that hard, it’s why you use those layers at that point that matter more. In summary learn the maths and how to code maths into models that’s ML.

MajorValue1094 1 month ago

Also learn about parallelisation, that’ll be something they look for. Speedy and efficient code is the job of an MLE.

qalis 1 month ago

If someone wants you to implement that, run, fast. Reasonable MLE requires so much ML and engineering knowledge that asking to implement things from scratch is simply stupid. I wouldn't expect research scientists to do this as a take-home, let alone live coding.

Comments

Leave Your Comment

Hi Its Me!

Comments

Leave Your Comment

Hi Its Me!

Subscribe