Skip to content
Device Family
Device
OS

Local LLM Coding with VSCode and Qwen3-Coder

Use VS Code with locally-running Qwen3-Coder for private code assistance.

Local LLM Coding with VSCode and Qwen3-Coder

Overview

Coding agents are powerful tools that empower developers through collaboration with AI agents backed by Large Language Models (LLMs). They can be embedded into the development environment, such as the terminal or VS Code, allowing seamless integration into a developer’s workflow.

This tutorial demonstrates how to use Cline, VS Code, and LM Studio to run a coding agent entirely on your local machine.

What You’ll Learn

  • How to run VS Code with the Cline coding agent to aid in software engineering tasks.
  • How to configure Cline to communicate with LM Studio for local inference of coding agents.
  • How to use local coding agents to solve real-world software engineering tasks.

Setting the Memory Configuration

For the Ryzen AI Halo, the dedicated GPU memory defaults to 64GB, which is sufficient for most workloads. For larger models or longer contexts, increasing this to 96GB may help. To adjust, open AMD Software: Adrenalin Edition™ and navigate to Performance → Tuning → AMD Variable Graphics Memory. Reboot for the changes to take effect.

AMD Software Adrenalin Edition — AMD Variable Graphics Memory panel

To change the dedicated GPU memory value, open AMD Software: Adrenalin Edition™ and navigate to Performance → Tuning → AMD Variable Graphics Memory. Reboot for the changes to take effect.

AMD Software Adrenalin Edition — AMD Variable Graphics Memory panel

On Linux, to run larger models, increase the shared memory pool available to the GPU. This might involve setting the BIOS dedicated GPU memory to the minimum, so that the shared memory pool can be maximized.

For the AMD Ryzen™ AI Halo, the default is 96GB shared. To modify this, open the AMD Ryzen™ AI Developer Center and go to the Settings tab. Under Graphics Performance Settings, increase the Shared Video Memory slider, then click Apply Changes and reboot for the changes to take effect.

AMD Ryzen AI Developer Center — Graphics Performance Settings with Shared Video Memory slider

Increase the shared memory pool by changing the kernel’s Translation Table Manager (TTM) page setting. AMD recommends setting the minimum dedicated VRAM in the BIOS (0.5 GB) so the maximum amount is available as shared memory.

  1. Install the pipx utility and add the path for pipx-installed wheels to the system search path:
Terminal window
sudo apt install pipx
pipx ensurepath
  1. Install the amd-debug-tools wheel from PyPI:
Terminal window
pipx install amd-debug-tools
  1. Query the current shared memory settings:
Terminal window
amd-ttm
  1. Increase the shared memory allocation (units in GB):
Terminal window
amd-ttm --set <NUM>
  1. Reboot for the changes to take effect.

Check for Software Updates

Before starting, ensure your Ryzen AI Halo has the latest software installed. Open the AMD Ryzen™ AI Developer Center and check for available updates, both to the app itself and additional software.

Go to the Updates tab. If updates are available, install them and reboot before continuing.

AMD Ryzen AI Developer Center — Updates tab on Windows

Go to the Manage tab. If updates are available, install them and reboot before continuing.

AMD Ryzen AI Developer Center — Manage tab on Linux

Installing Software Prerequisites

LM Studio

LM Studio can be installed from the AMD Ryzen™ AI Developer Center. Go to the Updates tab and install LM Studio if it is not already present.

To allow LM Studio to see the pre-installed models, navigate to Settings > General > Models Directory. Then change the path to C:\Users\Public\models

Adding pre-installed models to LM Studio

  1. Download the installer from here: https://lmstudio.ai/download
  2. Install.
  1. Download the appimage from here: https://lmstudio.ai/download?os=linux
  2. run sudo apt install libfuse2
  3. run cd ~/Downloads
  4. run chmod +x LM-Studio-*.AppImage
  5. run ./LM-Studio-*.AppImage

To allow LM Studio to see the pre-installed models, navigate to Settings > General > Models Directory. Then change the path to /var/cache/models.

Adding pre-installed models to LM Studio s


Visual Studio Code

VS Code can be installed from the AMD Ryzen™ AI Developer Center. Go to the Updates tab and install VS Code if it is not already present.

VS Code can be installed from the AMD Ryzen™ AI Developer Center. Go to the Manage tab and install VS Code if it is not already present.

  1. Download the Windows installation executable from: https://update.code.visualstudio.com/1.108.2/win32-x64-user/stable.
  2. Click on the downloaded file VSCodeUserSetup-x64-1.108.2.exe to install VS Code.
  1. Download the Debian installation package from: https://update.code.visualstudio.com/1.108.2/linux-deb-x64/stable.
  2. Click on the downloaded file code_1.108.2-1769004815_amd64.deb to install VS Code.

Launch and Configure LM Studio

We will use LM Studio to serve the LLM powering the coding agent.

  • In the search bar, search for LM Studio and launch the application. You will be greeted by the following page.

LM Studio Initial Screen

Next, we must load the LLM on the system. We are going to use the Qwen3-Coder-30B-A3B model with a large context length.

  • Click on the search bar on the top of the LM Studio window or press CTRL+L. Click the switch Manually choose model load parameters and then click on the Qwen3-Coder-30B-A3B model.
  • Change the context length from 4096 to 32768, and make sure GPU Offload is at the max. Then, click Load Model

Selecting Model

We use a large context length so that the agent can process large codebases and remember changes that have been made.

Configuring Model

Next, we need to enable the LM Studio Server.

  • Click the Developer tab or press CTRL+2 in LM Studio on the left.
  • Check the status toggle and ensure it is set to Running.

Server Status

Launch and Configure VS Code

We will install the Cline Extension in VS Code and connect it to the LM Studio server we just made.

  • In the search bar, search for VS Code and launch the application.
  • Click on the Extensions icon on the left column of VS Code and search for Cline. Then, click the Install button.

Installing Cline Extension

  • A Cline icon should be present on the left. Click on that to open Cline. There will be a window asking How will you use Cline? As we are going to be using a local LLM running via LM Studio, select Bring my own API Key and hit Continue.

Account Creation

Next, we need to configure Cline to communicate with the LM Studio server that we set up.

  • Set the API Provider to LM Studio and the model to Qwen3-Coder-30B-A3B-GGUF.

Model Configuration

Creating your first project

Let’s use our local agent to create a website! Open VSCode to a directory of your choice where Cline will create the files.

  • To do this, go to File -> Open Folder on the top-left of VS Code and choose a folder like Documents.

VS Code Empty Folder

Now we are ready to prompt the local coding agent.

  • Click on the Cline extension on the left column and enter a prompt to kickoff the agent. As an example, let’s use the following prompt:
Create a website showcasing the ability to run local large-language models on an AMD device.

The agent will then start to create files according to the prompt. As a user, you can watch the code be generated in VS Code as shown below. You may have to click Save each time Cline wants to create a file.

Cline Code Generation

After generating the software, the agent is complete and you can run the application. In this case, the agent wrote to three files: index.html, script.js, and styles.css. By simply double clicking on the HTML file we can load and interact with the generated website.

Next Steps

After generating the website, you can continue to work with Cline to improve the website. Two possible improvements are:

  • Documentation: Prompting the agent with Add a README is all that is needed for the agent to generate a README.md file that documents the website.
  • Animation: Prompt the model with Add an animation that visually represents a large language model running on a laptop. to generate an animation to the website.

We encourage the reader to try to generate other applications using this setup. Below are some fun examples we have tried:

  • Retro Arcade Games: Try some other prompts. It can also be fun for the agent to create retro-style games in Python using the PyGame package with the following prompt:
Create a simple pong game using the PyGame python package.
  • Data Analysis: One area where coding agents are particularly useful is that of scripting and data analysis. This is a prompt to showcase the local model’s ability to generate data analysis software for stock price visualization:
Write a Python script that fetches daily price data for AMD (ticker: AMD) from an online API (use the yfinance library so no API key is needed). Loads the last 365 calendar days of data into a Pandas DataFrame. Computes 20-day and 50-day simple moving averages of the closing price. Store the data in a sqlite database and when the script is first run check to see if the sqlite database contains the requested data, if not, fetch it from the API. Plots a single matplotlib line chart with: Close, SMA-20, and SMA-50. Include a title, axis labels, and a legend. Saves the figure to amd_price_sma.png in the current directory and prints the path when done. Allow the user to pass in command line arguments for the total time period of data, the time period for the simple moving average to calculate, as well as to provide different tickers.

Resources

Below are some additional resources to learn more about Coding Agents, Cline, and running workloads on

Need help with this playbook?

Run into an issue or have a question? Open a GitHub issue and our team will take a look.