post-ml-llm-dev.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>ML LLM Dev</title>
    <link rel="stylesheet" href="style.css">
</head>
<body>

    <!-- Sidebar is loaded dynamically -->
    <div id="sidebar"></div>

    <div id="content">

        <h1>ML LLM Dev Links and Notes of resources of interest</h1>

        <h2>Models - open source, open weights, open thoughts, code, documentation</h2>

        <p>llama.cpp<br>
        Inference of Meta's LLaMA model (and others) in pure C/C++<br>
        <a href="https://github.com/ggerganov/llama.cpp">https://github.com/ggerganov/llama.cpp</a></p>

        <p>DeepSeek R1<br>
        Unsloth <a href="https://unsloth.ai/blog/deepseekr1-dynamic">dynamic</a>
        HuggingFace <a href="https://huggingface.co/collections/unsloth/deepseek-r1-all-versions-678e1c48f5d2fce87892ace5">quants, incl distillations</a></p>

        <p>Meta Llama models <a href="https://www.llama.com/">https://www.llama.com/</a><br>
        Meta Llama-3.3-70B-Instruct Hugging Face <a href="https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct">https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct</a></p>

        <p>Ollama<br>
        Get up and running with large language models.<br>
        <a href="https://ollama.com/">https://ollama.com/</a></p>

        <p>llm.c<br>
        LLMs in simple, pure C/CUDA with no need for 245MB of PyTorch or 107MB of cPython. Current focus is on pretraining, in particular reproducing the GPT-2 and GPT-3 miniseries, along with a parallel PyTorch reference implementation in train_gpt2.py.<br>
        <a href="https://github.com/karpathy/llm.c">https://github.com/karpathy/llm.c</a></p>

        <p>LLM<br>
        A CLI utility and Python library for interacting with Large Language Models, both via remote APIs and models that can be installed and run on your own machine.<br>
        <a href="https://llm.datasette.io/en/stable/">https://llm.datasette.io/en/stable/</a></p>

        <p>Hugging Face Models<br>
        <a href="https://huggingface.co/models">https://huggingface.co/models</a></p>

       <p>Mistral AI <a href="https://mistral.ai/">https://mistral.ai/</a>, Hugging Face <a href="https://huggingface.co/mistralai">https://huggingface.co/mistralai</a></p>

        <p>QwQ-32B-Preview blog <a href="https://qwenlm.github.io/blog/qwq-32b-preview/">https://qwenlm.github.io/blog/qwq-32b-preview/</a>, Hugging Face <a href="https://huggingface.co/Qwen/QwQ-32B-Preview">https://huggingface.co/Qwen/QwQ-32B-Preview</a>, github Qwen2.5 <a href="https://github.com/QwenLM/Qwen2.5">https://github.com/QwenLM/Qwen2.5</a></p>

        <p>QVQ-72B-Preview Hugging Face <a href="https://huggingface.co/Qwen/QVQ-72B-Preview">https://huggingface.co/Qwen/QVQ-72B-Preview</a></p>

        <p>DeepSeek-V3 github <a href="https://github.com/deepseek-ai/DeepSeek-V3">https://github.com/deepseek-ai/DeepSeek-V3</a>, Hugging Face <a href="https://huggingface.co/deepseek-ai/DeepSeek-V3">https://huggingface.co/deepseek-ai/DeepSeek-V3</a></p>

        <p>Reddit LocalLLaMA<br>
        <a href="https://www.reddit.com/r/LocalLLaMA/">https://www.reddit.com/r/LocalLLaMA/</a></p>

        <p>llama.cpp guide - Running LLMs locally, on any hardware, from scratch <a href="https://blog.steelph0enix.dev/posts/llama-cpp-guide/">https://blog.steelph0enix.dev/posts/llama-cpp-guide/</a></p>

        <p>ModernBERT<br>
        This is the repository where you can find ModernBERT, our experiments to bring BERT into modernity via both architecture changes and scaling.<br>
        <a href="https://github.com/AnswerDotAI/ModernBERT">https://github.com/AnswerDotAI/ModernBERT</a></p>

        <p>WordLlama <a href="https://github.com/dleemiller/WordLlama">https://github.com/dleemiller/WordLlama</a></p>

        <p>Microsoft AI - AI Platform Blog<a href="https://techcommunity.microsoft.com/category/ai/blog/aiplatformblog">https://techcommunity.microsoft.com/category/ai/blog/aiplatformblog</a>, <a href="https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090">Introducing Phi-4</a></p>

        <p>Chatbot Arena (formerly LMSYS): Free AI Chat to Compare & Test Best AI Chatbots <a href="https://lmarena.ai/">https://lmarena.ai/</a></p>

        <p>Scaling Test Time Compute with Open Models <a href="https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute">https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute</a></p>

        <p>The Complexity Dynamics of Grokking <a href="https://brantondemoss.com/research/grokking/">https://brantondemoss.com/research/grokking/</a></p>

        <h2>Dev, LLM code writing</h2>

        <h3>Update #2</h3>
        <p>
Current work flow is:<br><br>

<pre>
# An Architect model is asked to describe how to solve the coding problem. Thinking/reasoning models often work well in this role.
# An Editor model is given the Architect’s solution and asked to produce specific code editing instructions to apply those changes to existing source files.
# https://aider.chat/2025/01/24/r1-sonnet.html
aider-openrouter-best() {
  local -; set -x; env AIDER_START="$(date)";
  aider --architect --model openrouter/deepseek/deepseek-r1 --editor-model openrouter/anthropic/claude-3.5-sonnet;
}
</pre><br>

Atm waiting on a glitch to resolve -<br><br>

<pre>
architect> litellm.APIError: APIError: OpenrouterException - 
Retrying in 0.2 seconds...
litellm.APIError: APIError: OpenrouterException - 
</pre><br>

...and so I'm realising now I more often then not now I have it write code for me.
        </p>

        <p>
It's not even that much faster atm tbh! By the time I have thought through, explained in detail in INSTRUCTIONS.md — I could have read up the sources, the docs, and done it myself.
        </p>

        <p>
The only explanation I have to offer, that I only now—waiting on the OR api to come back—have, is: it's **much more fun**!! 😍 
        </p>

        <p>
It's much more fun to have someone else write the code, and even if need be talk them into "no no—not that way, change this, change that", than to do everything myself solo and in silence!! 😆
        </p>

        <p>
Ok—this I did not expect. 😛 That the most entertaining—wins. 🙃
        </p>

        <p>
Is <a href="#" onclick="toggleShowImage('vibe-coding-ftw-2025')">vibing</a> the way code wring will scale x10, x100 next??
        </p>
        <img id="vibe-coding-ftw-2025" src="picmem/vibe-coding-ftw-2025.png" style="display: none; width: 100%; height: auto;" onclick="zoomImage(this)">

        <h3>Update #1</h3>
        <p>
1. Start with ChatGPT copy&pasta - works but limited & manual, little time saved.
2. Onto Cursor - nice but not much gained, not even wrong.
3. Over to aider cmd line - some result there, even if cr*p result... but looks like it could be improved?
4. Current VSCode gui + Cline addon + OpenRouter payg credits + Claude model. Well hello!! Finally produced something not obviously wrong.
        </p>

        <p>
Until today the best I got was: in ChatGPT-o4/o1- etc, copy & paste code snippet(s), ask a Question, then incorporate the Answer in the soluton. So this was a replacement for 1) googling and reading web pages 2) search through Stackoverflow Q&A.
        </p>

        <p>
This is the 1st time I got code inserted in 3 files. That required AI to 1) read through 5-6 files 2) compare and contrast, reason by analogy 3) take my requirement Q in considerion 4) edit 3 files, delete some code, insert some other code.
        </p>

        <p>
I have my main codebase, about 200K LoC in an array/matrix language mostly, with some C/C++/bash/awk/sql too.
        </p>

        <p>
I'm agnostic Re: tools. Fallback always available is bash/vim/Makefile/gcc/g++/gdb/ddd/shell/... tools. But if IDE like VSCode/Spyder/CLion/Matlab/DBeaver is available - I'm happy to use. As long as it's not exlcusive, and one can edit/setup outside the IDE too. And esp important version contol - git now, prev hg, cvs, Teams. If that works - then all is good.
        </p>

        <p>
I tried Cursor. That looked hopeful, but did not get me results. I didn't like not being able to use existing API subscriptions in it. Also them using some kind of LLM in-house undocumented bodge. (I maybe wrong/maybe possible - didn't try too hard)
        </p>

        <p>
I then tried aider, a command line tool. That managed mutiple edits, but to not too good results. Waste of time wrt results, but: it was a good learning curve for me. I PAYG subscribed OpenAI -> DeepSeek -> OpenRouter.
        </p>

        <p>
OpenRouter leader board led me to Cline VSCode addon. Latest-greatest setup atm 1) VSCode 2) with Cline Addon 3) OpenRouter API key (payg credits) 4) select Claude 3.5 via openrouter/anthropic/claude-3.5-sonnet.
        </p>

        <p>
The dev task was as follows. Functionality A/B/C needs implementong. Look at existing wrapper X implementing A/B/C, while using Y external library for A/B. Create new wrapper U, to use external library W, in the same way X is using Y, to do the similar A/B. (C is done in X and U respectivelly) E.g. - see how the data is passed X-to-Y, then do it the same way U-to-W. Look at examples code in the W library, figure how to do A/B.
        </p>

        <p>
This to avoid doing the reading abt W and figuring A/B myself. I can do it myself, have done it half a dozen times already, for U/W equivalents, but: bit boring, and wanted to find out if I can make AI do it for me.
        </p>

        <p>
Have yet to finish the full loop, the code does not run yet. But - before it was laughably obviously bad and wrong. Now - the 1st time where the code looks plausable. Need to do a harness to test finally. To be continued.
        </p>

        <p><a href=""></a></p>

        <p><a href=""></a></p>

        <p><a href=""></a></p>

        <p><a href=""></a></p>

        <p><a href=""></a></p>

        <p><a href=""></a></p>

        <p><a href=""></a></p>

        <p><a href=""></a></p>

        <p><a href=""></a></p>

        <p><a href=""></a></p>


        <p><a href=""></a></p>

        <p><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br><br>-- <br>LJ HPD Sun 22 Dec 22:24:19 GMT 2024</p>

    </div>

    <!-- Link to the external script -->
    <script src="scripts.js"></script>

    <!--Load the sidebar html that is table of contents -->
    <script>loadSidebar();</script>

</body>
</html>