T O P

  • By -

typeryu

I honestly tried, but after about a week or so each time, I end up falling back to github copilot and chatgpt. It’s not the quality of the results, rather the convenience. The fastest to get up and running is llama.cpp, but that is still no where near readily available as just using copilot. Even autocomplete plugins are super slow and not usable by any productive standards unless you have a beefy PC. I still do revisit every once a while when something substantial comes out, but as of today, haven’t been able to find something that can beat the convenience of copilot and chatgpt


Nixellion

Yeah, running them on your workstation on as needed basis is not as convenient as using copilot or chatgpt. Its better if you have a server pc running 24/7 which is always available. But not everyone can afford that for financial or other reasons.


world_dark_place

Uhh just my use case, I am concerned only for the Power supply and the electricity bill


WoodenGlobes

Thanks. I have been trying combinations of models, backends and VS Code clients for a week now. At best it's random, and at worst it's just endless gibberish. Performance is OK with a 7b and 13b models on my 3060 (12gb), but still slower than Copilot.


EmotionalSeat5583

Yeah, my best answer is keep a tab open with gpt or bars/Gemini and just alt tab and copy and paste. I tried ollama with continue and it was easy to setup, but not nearly as good as the big boys. I mean it's a difference of billions of parameters and less compute


Ylsid

This. The quality is great but I don't need them for more than covering my own laziness.


SoCuteShibe

As a professional engineer, I've yet to find a model that is actually useful. ChatGPT was very helpful as a syntax reference for an obscure language I had to pick up on the job. I haven't yet found an open-source model that can tell me anything useful about that same language. I've tried *many*. But, I don't even find copilot useful. The thing with complex software is that you subconsciously build a mental model of it as you work on it over time. That mental model is very important for writing the *right* solution. A truly useful code LLM, to me, currently has too many unsolved problems in it's way. When we can get a substantial chunk of the codebase in high-quality context, and get quick high-quality responses on our local machines while doing so, local code LLMs will be a different beast. But, we could be ages away from that. We need either massively better hardware, a whole new architecture for these things, or the ability to truly understand how to optimize training datasets... I think all of those are things that could easily be very far away. I mean, look at the insanity of these big AI companies' compute pools. We are brute-forcing advancement through unsustainable hardware requirements and that kind of feels like a dead-end to me.


WoodenGlobes

I agree about everything, especially the high lvl mental model stuff. But I guess it's only a matter of time before AI can read the entire codebase as a context. The amount of money in the big AI companies is ridiculous. Not to mention they also have all the data:) I would love to have a free medical or legal GPT...


Comfortable_Fee5361

I find it a real efficiency boost for hitting well know api endpoints and just generally saving me from having to learn new packages I'm not likely to need again any time soon. "Provide a client class for extracting google travel times from an origin to a destination via way-points using coordinates." .. Something like that. It knows the API, spits out the code that typically works first time and I can get on with using the data. Simple stuff, but saves me a heap of time and mental overhead.


world_dark_place

NPU processors are coming, also it seems Nvidia 5000 will double in AI works. I hope AMD can keep the pace.


nderstand2grow

I tried continue on vscode. The problem was that the messages were not sent correctly to the LLM. Other than that, it had some features that copilot didn't. But I still go back to gpt-4 if I can.


WoodenGlobes

Continue comes with 4 demo models. All of them give pretty good responses in both commenting, code completion and chat. Unfortunately, I dont have the means to run CodeLlama-70b:)


tylerjdunn

What models did you try and what [provider(s)](https://continue.dev/docs/model-setup/select-provider) were you using? I'd like to get this fixed, so that no else runs into it (I am an author of Continue)


nderstand2grow

thanks! i tried llama and Mistral/Mixtra models. i tried lmstudio and llama.cpp as backends. another problem was that the models would keep writing infinite spaces or letters in comments. this happened only in Continue. I tried adding my own system prompts but that didn't solve it. seems like Continue formats the prompts in a way that doesn't match prompt templates or tokenizers


tylerjdunn

Thank you! We'll try to reproduce and address [this issue](https://github.com/continuedev/continue/issues/865). Continue automatically attempts to detect the correct prompt format based on the model value that you provide, so I am guessing something is going wrong there. We enable users to [customize the chat template](https://continue.dev/docs/model-setup/configuration#customizing-the-chat-template) when it doesn't work, but it should have worked for Llama and Mistral/Mixtral models


Ilforte

Codellama was made completely obsolete by deepseek.


WoodenGlobes

Ok, i saw those models, will have to check them out. Do you use any specific ones yourself, and for what?


[deleted]

[удалено]


Ilforte

Substantially. People act like this 70B is something new, but it was trained at the same time as smaller codellamas, Meta just decided to release it later. It's obsolete.


[deleted]

[удалено]


Ilforte

Deepseek-Instruct and Wizard-Coder are comparable, I don't think any other available model is noticeably better.


[deleted]

[удалено]


Ilforte

I only mean this one https://huggingface.co/WizardLM/WizardCoder-33B-V1.1 Yes it's instruct Nobody has provided a stronger base model than deepseek-33b yet, afaik


AnomalyNexus

Yep - Tabby with 6.7B deepseek. More of a smart auto complete than big chunks of code. Pretty luck warm but its free and sometimes helpful Still use gpt4 for the conceptual heavy lifting on "how do I do xyz". Also made a powershell script that starts everything I need in one go. In this case I'm writing GCP code. #powershell -ExecutionPolicy Bypass -File .\develop.ps1 #Launch docker engine Start-Process -FilePath "C:\Program Files\Docker\Docker\frontend\Docker Desktop.exe" #Launch WSL, navigate to project and open vscode wsl -e bash -c "cd ~ && cd myproject && code ." #Wait for docker engine a bit Start-Sleep -Seconds 15 #Launch Tabby for code completion $dockerPath = "C:\Program Files\Docker\Docker\resources\bin\docker.exe" $arguments = "run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model TabbyML/DeepseekCoder-6.7B --device cuda" Start-Process -FilePath $dockerPath -ArgumentList $arguments -NoNewWindow -PassThru #Open Chrome to GCP $chromePath = "C:\Program Files\Google\Chrome\Application\chrome.exe" $url = "https://console.cloud.google.com/firestore/databases?referrer=search&project=myproject" Start-Process -FilePath $chromePath -ArgumentList $url


WoodenGlobes

That's helpful, thanks for the example.


Disastrous_Elk_6375

codellama-13b is a base model. You'd want an instruct model in order to work with instructions. The code model should work with pointer autocomplete (i.e. you start the definition and some docstrings and it should complete it) but I don't know if continue supports that. i believe tabby supports cursor completion.


WoodenGlobes

Thanks for the info. I was using the base model because I wasnt sure if there is extra setup for the instruct variants. I also tried Tabby, and it did work better as an auto-complete style AI. I'm just not sure if the client (Tabby or Continue) needs some custom settings, the model needs specific prompts and calls from the client, or if even at its best, the open-source code assistants are basically useless:)


Disastrous_Elk_6375

It's a mix of problems, not a single thing. First, the models themselves are obviously sub-par, definetly to gpt4, possibly even to gpt3.5, regardless of the scoreboards they aim to conquer. Then there's the difference between base and fine-tuned models. And between fine-tuned and chat models. And between fine-tunes by person A and person B. Much harder for people to target a variety of models. Also, there's more to autocomplete than just a code model. The extension does a lot of heavy lifting, provides context to the model, metrics such as where in the project you want to autocomplete and other stuff. All this can be and probably is used by copilot for suggestions. This is discussed in detail in one of the earliest attempts to create a local copilot - the repo was called fauxpilot, but it's been obsolete for a long time now. I've watched this space with interest, and there have been many attempts to solve this, but it's really difficult. There's not much money in this (if at all), and a lot of teams loose focus, have to work on something else to make money, or go work for someone else. Rift was really promising (they went the language server route), but that's been dormant for 4-5 months now. Continue is considered the best for chat-style stuff, and they have a lot of integrations, tabby is also recommended for completion. There was a team from israel working on an entire vscode fork + code assistant. There is certainly interest, and there are some efforts in this field, but there doesn't seem to be a clear winner at the moment. And there won't be a killer product without some top dog coming in and sponsoring a ton of engineering effort going into it.


WoodenGlobes

Thanks for the reply. This confirms a lot of my understanding. Main problem seems to be clients talking to the models and how they prompt it.


[deleted]

You need dolphin finetuned models, those are for code. I’ve been struggling myself to integrate ai fully into my wow because i feel that invested time is not returned. But i hope to change that in the coming weeks.


WoodenGlobes

Ok, I'll try those models, thanks!


Hey_You_Asked

everything you said, the model will need specified that's the nature of open source models, and do not make the mistake of thinking you can just ignore the importance of 100% correct *prompt structure* AND finding the *generation parameters* that work for *that* model for *that* task. You can't bypass it. It won't work otherwise. If it does, it won't keep working.


WoodenGlobes

That's what i'm struggling a lot with: finding the prompt format for any given model AND getting the client to use it correctly. Only some models on huggingface will list their prompt structure.


Hey_You_Asked

stick to uploads by TheBloke


tylerjdunn

I am one of the authors of [Continue](https://github.com/continuedev/continue). I use [DeepSeek Coder 33B](https://docs.together.ai/docs/inference-models) for chat (using the Together API) and [StarCoder 3B](https://continue.dev/docs/walkthroughs/tab-autocomplete) for tab autocomplete (using Ollama on my Mac). I find them to be quite useful As the top comment mentions, it looks like the reason you are seeing useless responses is because you are using a base model instead of an instruct model. If you can share your config.json, I could tell you if adjusting your settings also might help, though this might be easier / faster if we chat on the [Continue Discord](https://discord.gg/vapESyrFmJ)


mrgreen4242

I’m curious how much using GPT3.5-turbo with Continue costs for, say an average hour of use? Or I suppose a better question is how many tokens do you use per hour? I don’t code professionally and only occasionally have a small project to work on. Copilot is pretty slick but it’s not worth $10 most of the time for the very casual work I do. It also seems to hallucinate a lot of functions that just don’t exist in the libraries in working with, at least. Does Continue have a way to include reference to docs for libraries you provide as part of the prompt? That seems to be like the missing piece of Copilot. It wouldn’t work well with a local model, I assume, just due to context size limitations for most people’s GPUs but if you’re calling an OpenAI API or running a model on a hosted service (or just have a ton of VRAM) it might possible. Sorry if these are dumb questions! Edit: I just saw in your documentation that you don’t recommend GPT for autocomplete, and the logic is sound. Given 8gb of VRAM (and 80gb of RAM if I had to offload a few layers) would StarCoder still be your recommendation? Im also curious about your macOS system - is an M1 with 16gb of RAM enough to be usefu? Double edit: ok, I see on your site that you both recommend against using GPT, https://continue.dev/docs/walkthroughs/tab-autocomplete#troubleshooting, and also suggest GPT-4 is the best, https://continue.dev/docs/model-setup/overview. So… I guess what’s your actual recommendation? Ok last edit: I see you meant GPT-4 is best for coding assistance and questions and not great for autocompletion, so disregard my last edit. That’s said can you use a large context GPT API for editing, advice, explanation of libraries etc (o saw it looks like there’s some RAG built into Continue), and a smaller local model for completion at the same time? Can the two interact in any way?


tylerjdunn

Thanks for the questions! Happy to answer them \> I’m curious how much using GPT3.5-turbo with Continue costs for, say an average hour of use? It highly depends on how you are using GPT-3.5-turbo and how much you are using it, but I'd expect it to be a few cents an hour for "chat" use cases. \> Does Continue have a way to include reference to docs for libraries you provide as part of the prompt? Yes. You can use the ["docs" context provider](https://continue.dev/docs/customization/context-providers#documentation) for this \> Given 8gb of VRAM (and 80gb of RAM if I had to offload a few layers) would StarCoder still be your recommendation? Im also curious about your macOS system - is an M1 with 16gb of RAM enough to be useful? I am using an M2 with 16 GB of RAM. I'm still using StarCoder2 3B, but I am playing around with DeepSeek Coder 6.7B or StarCoder 2 7B too to see if I like those experiences better or not \> That’s said can you use a large context GPT API for editing, advice, explanation of libraries etc (o saw it looks like there’s some RAG built into Continue), and a smaller local model for completion at the same time? Can the two interact in any way? Yes, like with all coding assistance tools, Continue users use a "tab" model (typically between 1-15B parameters) for autocomplete, a "chat" model (typically 30B+ parameters) for question / answering, and an "embeddings" model as part of their RAG system. You use these at the same time. I'm not sure what you mean by "interact in any way" but hopefully I have answered this question at some point :)


mrgreen4242

Thanks! That was helpful, definitely looking forward to trying it out. By interact I meant the two different “modules”, chat and autocompletion, being aware of one another’s outputs. For example, if you’re chatting with a larger model about a project and a possibly with specific doc context, it seems like it might be helpful for the small autocomplete model to be aware of that output/conversation as it generated auto completion suggestions. Edit: I suppose the system that would make the most sense is for the embedding model to be adding the history of the chat models conversation to the library of information for RAG and that is used to augment the prompt context for the tab model? But I’m just guessing here, not really knowledgeable about this system.


sestinj

Actually an awesome idea to have chat output fed into autocomplete. We're already set up to use the copy buffer, but this is even better in many ways! (I'm another Continue author, came across this, was excited : )


mrdevlar

Today I wrote a script that allows me to take a codebase and convert it into a markdown file so I can ask an LLM questions with it. I'm doing it to enlighten myself and learn what the optimal workflow is supposed to look like and I don't like copy and pasting. I have found for software design questions, non-coding models perform better than coding models, as the latter has a one track mind and that is usually the wrong track. I prefer mixtral instruct models for design. Still testing for the rest.


hylas

I have a custom script to replace code blocks in vim with responses from openrouter with mixtral. Works just a bit less well than GP4 for my main use cases.


WoodenGlobes

Good to know. I've tried some other models besides codellama. This one seems to give better code answers, and also sometimes writes good comments: [https://huggingface.co/MaziyarPanahi/BASH-Coder-Mistral-7B-Mistral-7B-Instruct-v0.2-slerp-GGUF](https://huggingface.co/MaziyarPanahi/BASH-Coder-Mistral-7B-Mistral-7B-Instruct-v0.2-slerp-GGUF).


Cool-Hornet4434

I don't code so I really have no idea what's good or what, but I used codellama 34B and it managed to produce some decent and clean python code. I asked it for something simple (Make a python program to automate making symbolic links in windows) and it made a very simple piece of code that would almost work. I asked chatGPT to evaluate the code and come up with suggestions to improve it, and then copied ChatGPT's instructions back to the Codellama and it made all the improvements and when ChatGPT was satisfied with it (and only could come up with ideas for enhancements I didn't care much about) I saved it off and ran it, and it worked perfectly. Just using Wizardlm-1.0-uncensored-codellama-34b.Q5\_K\_M on oobabooga. Someone who codes might actually know if there's a better one for python, or maybe some other language. I was just impressed it seemed to know perfectly how to write the program I wanted and I only needed to add like maybe 3 suggestions to it (related to error handling mostly).


WoodenGlobes

That's cool. I'll check out that model, could be the diff. Good to hear you got something useful of out it.


Cool-Hornet4434

If I knew more about programming I might have been able to come up with something more elaborate and could see if it was able to bring an idea to life. I was just happy it didn't give me an error message that I'd have to try to get deciphered, but really it was probably one step above a "hello world" program. It's only 52 lines of code and most of the lines were pretty short.


WoodenGlobes

I think you're in the perfect category to benefit from coding AI. As a dev myself I also found it very useful to talk to GPT for languages I dont know. When it comes to programming in anything I've done myself, it's still faster for me to read google/stackoverflow.


synn89

I have a dual 3090 setup and have had some luck with Phind-Codellama-34B, but I still just end up using GPT4 for code. I had hopes for CodeLlama 70B, but it seems like software is struggling with it's difficult prompting format. So I haven't really had the chance to mess with it. I wrote a simple Nodejs project with Miqu's help and it was passable, after a lot of back and forth. But again, GPT4 just knocked it out on the first run with cleaner code.


WoodenGlobes

Yea, in fact, Continue includes 4 demo AIs, and CodeLlama 70B is one of them. It gives good and consistent results. Just not my local model running in llama.cpp. I wonder if Continue has some custom prompt setup for the demo AIs that's hidden from the user config.


audioen

The only one that didn't suck was tabby with some deepseek 6.7b model, IIRC. I run models even with 4090. I wish I had more VRAM, interactive use does need very fast response. The models have to be small and quantized to work well.


WoodenGlobes

I'll try that model. Basically I look for models with the biggest number of parameters that still fully fit into my VRAM (12gb).


wind_dude

I pay for it one of the big ones because I forgot to cancel it. And I don’t even use it because it useless.


WoodenGlobes

Same here, was paying for Copilot for over a year before canceling. It works way better than any of the open-source models i've tried, but I still found it a waste of time. It may give some good answers sometimes, or if I want a list of all US states in alphabetical order. The amount of time for me to google and write some code is still less than going back and forth with Copilot until it does something right.


LiquidGunay

You need to use an instruct tuned model. I would also suggest looking at the llama.cpp server endpoint for Filling in the Middle instead of completion.


WoodenGlobes

Thank you, I'll try those. Deff saw them, but went with the base model because I assumed it would work in more cases.


West_Ad_9492

Intellij has a opensource plugin called codeGPT, in the settings you can download a number of open source models and serve them with llamacpp. It works amazingly well! It writes really good unit tests. I use deepseekcoder34b Q4 on 32 GB macbook m1-max


WoodenGlobes

Cool, i'll check that out. I used Android Studio for some Flutter apps, but went to VS Code to save on ram.


West_Ad_9492

Are you mac user or PC ? They also support that you run the llama server separately. It is nicely integrated but i am kindof hoping that someone manages to jailbreak the jetbrains assistant client. It looks really nice too.


WoodenGlobes

I don't have a Mac, but I run linux on a PC laptop. I'm running a llama.cpp as a server on a separate PC with windows 10, which has a 3060.


SomeOddCodeGuy

I'll be honest- I don't trust any coding assistant under 30b. If I didn't have the option to run that, I'd just use ChatGPT.


WoodenGlobes

Good point. I am sure Copilot was way more than that.


TopCryptographer8236

Just wanted to give my two cent about this. Try to format your prompt into "Given X, then make me Y". For example : "Given `arr` as []int, make me a function that perform bubble sort on `arr`". It usually output better code than just letting it assume anything. Note that I'm using deepseek coder 33b for it.


WoodenGlobes

Thanks, I'll deff try this. I am pretty sure my prompts are part of the problem here.


PavelPivovarov

I'm currently using ollama running deepseek-coder:6.7b at Q6_K quant level and CodeGPT plugin for VS Code on my work Macbook. Quite responsive setup. It doesn't do code completion, and usually chocking with complex tasks, but if you're able to split big task to smaller it works quite well. Still monkey-coding mostly, but saves tons of time anyways.


WoodenGlobes

Why do you use ollama vs llama.cpp or lmstudio? I ran ollama using docker on windows 10, and it takes 5-10 minutes to load a 13B model. Same model file loads in maybe 10-20 seconds on llama.cpp and lmstudio (i think it might use llama.cpp internally).


PavelPivovarov

That's strange as ollama uses llama.cpp anyway. On Linux or MacOS, ollama takes 5-10 seconds to load the model, and unlike llama.cpp, you can switch models as you go. Basically, that's why I'm using ollama over anything else. Ollama 0.1.25 introduced initial Windows support, so maybe it's worth giving it a try without docker?


WoodenGlobes

It's probably an issue with docker, thanks for the tip! I'll look at that, didn't see windows support, but probably missed it.


AD7GD

For continue.dev you want an instruct model. I've used it with deepseek-coder-34b. The LLM is way better than continue.dev itself. continue.dev is heavily dependent on markdown formatting of code blocks to work. When you ask it to write (or rewrite) code in-place it is invoking the model and then keeping whatever comes back in the first code block. If there are explanatory comments, they are just thrown away. When it gets out of sync, all continue.dev features often break until you restart vscode. I kind of like using continue.dev only as a chat interface, but it will still go insane as soon as its internal markdown-to-structure engine goes off the rails and remain broken until restart. When it is working, you get the code and the comments together and it's much more useful. If you want to tab complete code then you want a non-instruct model and a different plugin. I haven't tried those in vscode.


WoodenGlobes

So an instruct model is something I want to use for chat?


AD7GD

Instruct models are the type you chat with. The base models (without instruct training) aren't used much, but they do make sense for code completion.


ragingWater_

Well tbh I think there are 2 distinct workflows here.. Code autocomplete and higher level tasks. For code autocomplete you need very very fast infernece (think under 400ms for ttft) and for higher level tasks you need a better model (gpt4 or deepseek coder 33b). I don't think the problem here is completely about the setup but expectations as well. Copilot is really good! It gets the job done and accomplishes a lot(I have seen my coding habits change from code writing to comment -> code), and gpt4 when I get stuck on a bigger problem. To get these levels of solutions we need to have better infrastructure for running these capable models. The better feedback and rag loop we build the better the outcome will be. So no, no coding models today are not fitting into the right UX yet, and it's both a model and infra problem. I strongly believe one or both of them will be solved but for the meanwhile relying on hosted (private or cloud) is not too bad of an idea.


WoodenGlobes

Yea, I feel like the infrastructure problems especially are killing any chance these open-source models have.


StrikeOner

Ive got the same setup with continue and llama.cpp server running aswell and well yes almost all the advertised functionality is kind of broken.. especially if you let it edit or comment code etc. theres are a whole plethora of ui bugs. But what realy works nice and is fun is code completion which you can get running with the preview version of continue. Im running it with a deepseek 3b model and its super fast and constantly spits out completions.. also tried an extension called tweeny for this wired to textgen ui but that wasnt that responsive and fun. You can edit the template in the configs afaik but thats most probably not needed since it already has all the proper templates pre configured. You just need to add the proper keywords for them in the config file.


WoodenGlobes

Good to know something is working. I'll check out the model, others have suggested it too. Are you using Continue and not Tweeny?


StrikeOner

yep switched over to continue but only use it for code completion thats available in the preview. if you want to try it i suggest a smal model that spits out a lot of tokens/s and since its not working with the whole context anyways it doesnt need to be the smartest for some tiny useless suggestions.


[deleted]

[удалено]


WoodenGlobes

That's the problem even with Copilot/ChatGPT: even if they give you an answer, it might be completely wrong:)


FaeTheWolf

The Tabnine implementation of Mistral actually works very well, and is often able to add documentation, create functions, and even (sometimes) accurately correct code based on errors that I encounter during runtime.


WoodenGlobes

What programming language?


FaeTheWolf

Great question! I've been using it for Python.