> Previous work assumed refusal behavior to be encoded as a single direction in the model's latent space; e.g., computed as the difference between the centroids of harmful and harmless prompt representations. However, emerging evidence suggests that concepts in LLMs often appear to be encoded as a low-dimensional manifold embedded in the high-dimensional latent space. Just like numbers and days of week are encoded in circles or helices, in recent advanced neural networks like GPT-OSS refusals are becoming ingrained in complex multi-directional clusters and one-directional ablation is not enough to get rid of the refusal reasoning.