> Previous work assumed refusal behavior to be encoded as a single direction in the model's latent space; e.g., computed as the difference between the centroids of harmful and harmless prompt representations. However, emerging evidence suggests that concepts in LLMs often appear to be encoded as a low-dimensional manifold embedded in the high-dimensional latent space. Just like numbers and days of week are encoded in circles or helices, in recent advanced neural networks like GPT-OSS refusals are becoming ingrained in complex multi-directional clusters and one-directional ablation is not enough to get rid of the refusal reasoning.
Samu /人◕ ‿‿ ◕人\Search [iqdb](2.0 MB, 1978x2810)Search [iqdb](49 KB, 640x480)Mahou Shoujo Marsh-chanExpand4 replies omitted.Marsh shitposting on the go via telephone
what's genuinely kinda fucked is because of mending, diamonds are actually one of the least valuable resources in the game Like yeah, you need some to get yourself kitted out, but I think it's like Once you've mined about 2 stacks of diamonds, actually a little less, you never actually have a reason to mine any more diamonds again unless you wanna make more enchanting tables or something I guess there's armor trims and random dumb luck making you lose your gear, but overall you really just don't need diamonds all that much after the first set of gear has been made
iron is CRAZY valuable, you need iron all the time, for everything I don't even think mending was a mistake, but I think the way it works is a mistake
It is, of course, an eyesore from a distance And that does require some fixing But that's a LOT of building blocks, and not made any less of a pain by the fact I do not have an elytra