I made my LLM stop bullshitting. Nothing leaves your machine.

SuspciousCarrot78@lemmy.world · 11 hours ago

FWIW Extra shit I cooked last night. It’s live now, so deserves a PS: of its own

PPS: I inbuilt as spam blocker as well.

allow-list / deny-list domain filters
DDG-lite junk-domain blocklist
ad/tracker URL rejection
relevance gate before any provenance upgrade

Enjoy :) Blurb below

“But what if it just… Googled it?”

We can do that. But better.

You: Who won best picture at the 97th Academy Awards?

Model: Anora won best picture at the 97th Academy Awards.
See: https://www.wdsu.com/article/2025-oscars-biggest-moments/64003102
Confidence: medium | Source: Web

Without >>web, that same 4B model said “The Fabelmans.” Then when I pushed it, “Cannes Film Festival.” With web retrieval, the router searches the internet, scores every result deterministically (phrase match + token overlap + domain trust), and only accepts evidence that passes a hard threshold. Garbage results get rejected, not served. The model never touches the answer - it’s extracted straight from the evidence.

Retrieval cascade: Cheatsheets → Wiki → Web → Model. Each step fires only if the previous one missed. The model is last resort, not first call. Sound familiar?

I asked it who wrote a paper and it invented “Dr. David J. G. Smith” - a person who does not exist. After wiring DOI/Crossref fallback for academic metadata:

You: Who wrote The Anatomy of a Large-Scale Hypertextual Web Search Engine?

Model: Sergey Brin and Lawrence Page.
See: https://research.google/pubs/the-anatomy-of-a-large-scale-hypertextual-web-search-engine/
Confidence: medium | Source: Web

Deterministic extraction from metadata. No model synthesis.

>>web is provider-agnostic - ships with DuckDuckGo (no API key, no account) and supports Tavily, SearxNG, or your own adapter. Add your own trusted domains in one config line (there are a bunch baked in already, like pubmed). Every answer comes with a See: URL so you can verify with one click. Receipts, not pinky promises. PS: I even cooked in allow-list / deny-list domain filters, junk-domain blocklist and ad/tracker URL rejection so your results don’t get fouled with low quality spam shit.

SuspciousCarrot78@lemmy.world · edit-2 1 day ago

You’re describing trust dynamics right and that’s exactly why this project doesn’t ask you to trust the model. It asks you to trust observable outputs: provenance labels, deterministic lanes, fail-loud behaviour.

When it fails, you can see exactly which layer failed and why. Then you can fix it yourself. That’s more than you get right now (and in part why LLMs are considered toxic).

The correction mechanism is explicit rather than hoped for (“it learns” or “it earns my trust back”): you encode the fix via cheatsheets, memory, or lane contracts and it sticks permanently.

The model can’t drift back to the wrong answer. That’s not the model earning trust back - it’s you patching the ground truth it reasons from. Progress is measured in artifacts, not vibes.

Until someone makes better AI, that’s all we’ve got. Generally, we don’t get even this much.

Sadly, AI isn’t “one mind learning”; it can’t. So trust is earned by shrinking failure classes and proving it stuck again and again and again (aka making sure the tool does what it should be doing).

Whether that’s satisfying in the way a person earning trust back is satisfying - look, honestly, probably not. But it’s more auditable.

LLMs aren’t people and I’m ok with meeting them where they are.

SuspciousCarrot78@lemmy.world · 2 days ago

Nope.

Source: Model is not pretending otherwise
It is basically “priors lane.” That’s the point of the label: explicit uncertainty, not fake certainty.
Source footer is harness-generated, not model-authored
In this stack, footer normalization happens post-generation in Python. I’ve specifically hardened this because of earlier bleed cases. So the model does not get to self-award Wiki/Docs/Cheatsheets etc.
Model lane is controlled, not roulette

deterministic-first routing where applicable
fail-loud behavior in grounded lanes
provenance downgrade when grounding didn’t actually occur

So yes: Source: Model means “less trustworthy, verify me.” Always do that. Don’t trust the stochastic parrot.

But also no: it’s not equivalent to a silent hallucination system pretending to be grounded. That’s exactly what the provenance layer is there to prevent.

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

Sure.

Source means where the answer was grounded, not whether an LLM wrote the sentence.

Quick split:

Source: Model
No reliable grounding lane fired. It’s model priors.
Source: Context (Contextual)
A deterministic lane fired and built a structured context for the turn (for example state/math carry-forward, bounded prior-turn facts, or a forced context frame), and the answer is expected to come from that frame.

Key clarification:

Not all user input = Context.
User input becomes Context only when it is captured into a bounded deterministic frame/lane and used as grounding.
If user input is just normal chat and no grounding lane fires, that is still Model.

Why this is more deterministic:

The routing decision is deterministic (same input pattern -> same lane).
The frame/evidence injected is deterministic (same extracted values -> same context block).
Wording can vary, but the answer is constrained to that frame.

Concrete example:

User: A Jar has 12 marbles. I remove 3. How many left?
Router hits deterministic state lane, computes 9, injects structured context.
Assistant answers with Source: Context.

If that lane doesn’t fire (or parse fails), it falls back to normal generation and you get Source: Model.

So Context is not “perfect truth”; it means “grounded via deterministic context pipeline, not free priors.”

I hope that clarifies. I can try a different way if not; my brain is inside the code so much sometimes I forget what’s obvious to me really isn’t obvious.

SuspciousCarrot78@lemmy.world · 2 days ago

^ that. Thank you.

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

It’s for everyone to use :)

I get that it’s maybe an acquired taste though.

Steal what you can, make it better, and then I can steal it back.

And thanks for the star!

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

I genuinely don’t know. A small part of llama-conductor is a triple pass RAG system, using Qdrant, but the interesting bit is what sits on top of it. It’s a thinker/critic/thinker pipeline over RAG retrieval.

Step 1 (Thinker): Draft answer using only the retrieved FACTS_BLOCK
Step 2 (Critic): Check for overstatement, constraint violations
Step 3 (Thinker): Fix issues, output structured answer

I built it that way based what the research shows works best to reduce hallucinations

Let’s Verify Step by Step,

Inverse Knowledge Search over Verifiable Reasoning

To be honest, I have been looking at converting to CAG (Cache Augmented Generation) or GAG (Graph Augmented Generation). The issues are - GAG still has hops, and CAG eats VRAM fast. Technically, for a small, curated domain, CAG potentially outperforms RAG (because you eliminate the retrieval lottery entirely). But on a potato that VRAM ceiling arrives fast.

OTOH, for a domain-specific knowledge base like you’re describing, CAG is worth serious evaluation.

Needs more braining on my end.

SuspciousCarrot78@lemmy.world · 2 days ago

You’re welcome. Hope it makes sense. If not, you can marvel at the (many, many) nestled swears in my -commit messages.

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

Can’t it source other LLM outputs as “verified source” and thus still say whatever sounds good, like any LLM?

No. The footer tells you what the source is. Anything the model generates on its own is confidence: unverified | source: model - explicitly flagged by default. To get to source: docs or source: scratchpad, it needs direct, traceable, human-originated provenance. You control what goes in. The FAQ outlines the sources and strength rankings; it’s not vibes.

Providing “technical” verification, e.g. SHA, gives no insurance about the content itself being from a reputable source.

SHA verifies the document hasn’t been altered since it entered your stack. Source quality is your call. GIGO is always an issue, but if you scope the source correctly it won’t drift. And if it does, you’ll know, because the footer tells you exactly where the answer came from.

The cheatsheet system is the clearest example of how this works in practice: you define terms once in a JSONL file, the model pegs its reasoning to your definition forever. It can’t revert to something you didn’t teach it. That fingerprint is over everything.

… the user STILL has to verify that whatever is provided is coherent and a third party is actually a good source.

Yes, deliberately. That’s a feature.

Like I said, most LLM tools are trying to replace your thinking, this one isn’t. The human stays in the loop. The model’s limitations are visible. You decide what to trust. Maybe that’s enough, maybe it isn’t.

EDIT: giant wall of text. See - https://codeberg.org/BobbyLLM/llama-conductor#some-problems-this-solves

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

“I have introduced myself. You have introduced yourself. This was a very good conversation.”

Confidence: Zero | Source: Model

SuspciousCarrot78@lemmy.world · 2 days ago

Well, you know what they say - there’s no force quite like brute force :)

But to reply in specific:

[1] Decision tree + regex: correct, and intentional. The transparency is a feature not a bug. You can read the routing logic, audit it, and know exactly why a given turn went where it did. A fine-tuned routing model reintroduces the black box problem at the routing layer itself - and if it misclassifies, what catches it? You’ve pushed the problem one layer up, not solved it.

[2] Deterministic-first doesn’t mean deterministic-only. Open-ended turns go to the model by design - I’m not trying to regex all language, just not use an LLM where a calculator or a SHA check works better. The model is still involved. Case in point - see the car wash test.

[3] On edge cases - yep, and that’s what 8,764 benchmark runs were for. Failures are taxonomized and patchable at the routing layer without touching the model. If a rule fails, I can show the exact failure and patch it. Yeah, that’s going to be wack-a-mole for a while but…if a routing model fails, I’d need new training data and still may not know why. Models are inherently black box. Python code (as your robots have shown you) is the opposite.

My way, I know where the fuck up is and I can figure out a global maxima solution myself, cheap and easy.

[4] On the fine-tune suggestion: on a 4GB potato, rule updates are free and immediate. Retraining cycles are…not. Send money, will will buy Strix or cloud GPU access :)

[5] The hybrid direction is already on the roadmap! TLDR: Swarm handles ambiguous routing; deterministic lanes stay for bounded and high-stakes tasks. Hybrid control + learned judgment, with measurable gates before each promotion. That sequencing is deliberate.

Slightly longer version of what that should look like:

User turn

→ Classifier (labels intent)

→ Contradiction detector (user turn + last N turns)

→ Refusal/risk assessor (user turn + classifier label)

→ State tracker (full session summary from memory)

→ Synthesiser (user turn + all worker outputs as FACTS block)

→ Critic (hunts violations in synthesiser output)

→ Output or retry

Each worker does one job with full attention on that task. The synthesiser gets verified ground truth, not raw conversation history.

I’m looking at sub-1B classifiers (SmolLM2-135M/360M, Gemma-3-270M maybe) for fast-gate roles. Phi-4-mini is already a part of the main show (used in as the second pass critic model). Unlike everyone else, I am optimizing for we in the GPU peasant class.

That’s the fine-tune direction, anyway. Scoped, staged and only where the deterministic layer can’t do the job cleanly.

PS: Say hello to the klankers and tell them to stop cloning my shit for their training. :)

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

Hmm. The post has swearing, a personal ASD disclosure, a Feynman quote, statistics, reference to Lawrence of Arabia and ends with “a meat popsicle wrote this,” with a link to a blog as proof and a scientific pre-print with almost 10,000 data points (with raw data and errata). If you have an LLM that can do that, kudos to you.

If there are specific passages that pattern-match to LLM output for you, point them and I’ll look.

But “confident tone” and “LLM tone” aren’t the same thing - I’m just not apologetic about what the project does.

The data is the data.

I’m not going to alter the way I write to approximate Reddit Common.

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

Sure.

It summarise short articles, translate between languages (LLM dependent), provides sentiment analysis, solves multi-step volume/overflow problems, detects positional bias in pairwise rankings, validates output behaviour across 8,764 benchmark runs designed to break things - premise reversals, theory-of-mind separation, evidence label discipline, retraction handling, contradiction adjudication, and hard refusal-floor checks where the only correct answer is “I don’t know” - manages deterministic memory without touching the model, adapts to tone and register, stores and recall facts exactly, folds information you provide naturally into answers (with correct attribution provenance), pits two different model families against each other to catch hallucinations before the answer reaches you, OCRs, provides real-time currency and weather lookup, looks up Wikipedia and word etymology deterministically, reasons across multiple source documents simultaneously to find contradictions, verifies source provenance via SHA checksums, stops the model being a sycophant, condenses clinical note-taking, creates management plans, and tells you when it doesn’t know the answer instead of making something up.

But yes, it summaries short articles.

On a 4GB VRAM potato, no less.

SuspciousCarrot78@lemmy.world · 2 days ago

Well, this was a social media post, aimed at an intelligent, non-scholarly audience. The preprint is a different document with a different structure entirely: bounded claims, explicit limitations, disclosed adjudication gaps, no words like “novel” or “revolutionary” anywhere in it. Not my first rodeo :)

If the preprint has specific passages that read as editorialized, point them and I’ll fix them. But “tone it down for journals” is feedback for a document that isn’t trying to be submitted to journals.

The draft is here

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

Much obliged, but I need to do a little push back here. “Prompt wrapper” isn’t quite right - a prompt wrapper is still asking the model to behave nicely.

This isn’t that. This is more like holding a gun to its head.

Or less floridly (and more boringly technical), what the architecture actually does is force a ground state. The lane contracts define the admissible output space per task type. For negative-control tasks - prompts with deliberately insufficient evidence - the only contract-compliant output is an explicit refusal.

Fabrication gets rejected by the harness. The model isn’t instructed to say “I don’t know”; it’s placed in a state where “I don’t know” is the only output that clears validation.

The draft shows this directly: post-policy missing-lane closures hit 0/332 flags across contradiction and negative_control lanes combined. Pre-policy, the dominant failure mode in those lanes wasn’t confabulation - it was refusal-like phrasing that didn’t meet strict contract tokenization. The model was already trying to refuse; the contract hardening just closed the gap between intent and valid output shape.

The >>judge dual-ordering is a separate thing again - that’s algorithmic, not prompting. Both orderings run in code, verdicts are parsed strictly (A|B|TIE, fails loud otherwise), agreement margin is computed. The model doesn’t know it’s being run twice. Positional bias gets caught structurally, not by asking nicely.

So yes - it solves a lot but not everything. The bounded claims are in the paper too. But the mechanism isn’t wrapping, it’s constraint enforcement at the routing layer.

PS: yes, it’s fully open source. AGPL-3.0 license. You can use it, fork it, modify it etc. What you can’t do is take it, close the source, and distribute or sell it without making your modifications available under the same license. Which means if you run it as a network service (i.e. a SaaS product built on it), you still have to share the source. That’s the bit that keeps corporations from quietly wrapping it in a product and giving nothing back. Theoretically, at least.

SuspciousCarrot78@lemmy.world · 2 days ago

Yeah, I did stop it bullshitting. Quite literally.

Also, “bullshitting” isn’t a rhetorical flourish; it’s a defined term in AI ethics literature. The model produces fluent, confident output without any mechanism to assess truth. That’s domain accepted definition of bullshit. No bullshit. See -

https://link.springer.com/article/10.1007/s10676-024-09775-5

SuspciousCarrot78@lemmy.world · 2 days ago

Me too! I built it to be used, so if people use it, that’s my win.

SuspciousCarrot78@lemmy.world · 2 days ago

Getting shit published - especially as an outsider to the field - involves getting raked over coals. If someone in the field can vouch for me on arXiv (later) that might help because that’s at least a low level signal what I have is interesting and within the field.

Writing journal articles, especially contentious ones, is usually 6-8 weeks of writing and then 6 months of back and forth with reviewers / trying really hard not to hang yourself from the ceiling fan.

SuspciousCarrot78@lemmy.world · 2 days ago

TL;DR:

The post has a section called “So, wait…are you saying you solved LLM hallucinations?” followed by the word “No.” in large letters.

You’d have found it if you’d read past the title. I’ll go back and bold it for you.

But if you have a hook up at NVIDIA that wants to buy me a shiny new car, I’ll put on a pretty dress and bat my eyelashes.

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

That’s exactly what I did. And in the course of doing that, I gathered almost 10,000 data points to prove it, showed my work and open sourced it. (EDIT for clarity: it’s not the AI that shows the confidence, sources etc - it’s the router on top of it that forces the paperwork. I wouldn’t trust an AI as far as I could throw it. But yes, the combined system shows its work).

You don’t need to be a dev to understand what this does, which is kind of the point. I don’t consider myself a dev - I’m was just unusually pissed off at ShitGPT, but instead of complaining about, did something.

Down-vote: dunno. Knee jerk reaction to anything AI? It’s a known thing. Ironically, the thing I built is exactly against AI slop shit.

To say I dislike ChatGPT would be to undersell it.

SuspciousCarrot78@lemmy.world · edit-2 2 days ago

I made my LLM stop bullshitting. Nothing leaves your machine.

SuspciousCarrot78@lemmy.world · edit-2 14 days ago

Goodbye Google - I self-host everything now on 4 tiny PCs in a 3D printed rack (CaptainRedsLab)

SuspciousCarrot78@lemmy.world · 1 month ago

Lemmy vs Reddit

SuspciousCarrot78@lemmy.world · edit-2 1 month ago

Its not much but it's something

SuspciousCarrot78@lemmy.world · edit-2 2 months ago

I'm tired of LLM bullshitting. So I fixed it.