总目录 大模型安全研究论文整理 2026年版https://blog.csdn.net/WhiffeYF/article/details/159047894本文整理自 DBLP WWW 2026 论文集筛选出与大模型LLM、推理模型、智能体Agent、多模态大模型等安全、隐私、对抗、防御相关的论文。共计整理72篇论文。大模型越狱与对抗攻击序号论文标题简介所属Track1Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs探索大语言模型在文本属性图上的通用对抗攻击能力揭示LLM对图学习的潜在欺骗风险。Track 2: Graph Algorithms and Modeling for the Web2LLMQuA: Practical Backdoor Injection on Large Language Model Quantization面向大语言模型量化过程的实用后门注入方法展示在模型压缩阶段植入恶意行为的可行性。Track 5: Security and Privacy3Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGs揭示图-大语言模型在文本属性图上的多维脆弱性提出可解释的多维对抗攻击框架。Track 5: Security and Privacy4Breaking Cross-modal Alignment in Embodied Intelligence: A Multimodal Adversarial Attack Framework for Vision-Language-Action Models针对具身智能中的视觉-语言-动作模型提出破坏跨模态对齐的多模态对抗攻击框架。Track 5: Security and Privacy5ICL-Evader: Zero-Query Black-Box Evasion Attacks on In-Context Learning and Their Defenses研究针对上下文学习的零查询黑盒逃逸攻击并探讨相应的防御策略。Track 5: Security and Privacy6Inference Cost Attacks for Retrieval-Augmented Large Language Models提出针对检索增强大语言模型的推理成本攻击通过构造恶意输入显著增加模型推理开销。Track 10: Web Mining and Content Analysis7The Asymmetric Vulnerability: Bypassing LLM Defenses via Guardrail-Model Mismatch揭示安全护栏与底层模型之间的不对称脆弱性利用二者失配绕过LLM防御机制。Track 5: Security and Privacy8Exploring and Exploiting Security Vulnerabilities in Self-Hosted LLM Services系统探索并利用自托管大语言模型服务中的安全漏洞评估本地部署LLM的安全风险。Track 5: Security and Privacy9KEPo: Knowledge Evolution Poison on Graph-based Retrieval-Augmented Generation提出面向基于图的检索增强生成的知识演化毒化攻击污染知识图谱以操控模型输出。Track 4: Search and Retrieval-Augmented AI10Has the Two-Decade-Old Prophecy Come True? Artificial Bad Intelligence Triggered by Merely a Single-Bit Flip in Large Language Models研究单比特翻转即可触发大语言模型产生异常行为的现象验证硬件故障导致AI失效的早期预言。Track 5: Security and Privacy推理大模型越狱攻击序号论文标题简介所属Track1When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models首次针对黑盒推理大模型提出成员推断攻击发现推理过程会泄露训练数据的成员关系信息。Track 5: Security and Privacy大模型安全防御序号论文标题简介所属Track1PADD: Prefix-based Attention Divergence Detector for LLM Jailbreaks提出基于前缀的注意力发散检测器用于识别和防御大语言模型的越狱提示。Track 5: Security and Privacy2FraudShield: Knowledge Graph Empowered Defense for LLMs against Fraud Attacks构建基于知识图谱的防御框架增强大语言模型对抗欺诈攻击的能力。Track 5: Security and Privacy3Towards Robust Detection of Chinese Toxic Variants via Dynamic Knowledge Graph-LLM Reasoning结合动态知识图谱与大语言模型推理实现对中文毒性变体内容的鲁棒检测。Track 5: Security and Privacy4Acting Flatterers via LLMs Sycophancy: Combating Clickbait with LLMs Opposing-Stance Reasoning揭示大语言模型的谄媚行为提出利用对立立场推理对抗标题党与错误信息。Track 5: Security and Privacy5BIND: A Bidirectionally Aligned Next-token Denoising Framework for Fast and Lightweight Deobfuscation of Harmful Web Text提出双向对齐的下一代token去噪框架BIND实现有害网络文本的快速轻量级去混淆。Track 3: Responsible Web6Be Responsible in Your Answers! Monitoring Out-of-Domain Behaviors in Domain-Specific LLMs研究对领域专用大语言模型进行域外行为监控的方法确保模型在限定领域内的回答可靠性。Track 3: Responsible Web7D-Models and E-Models: Diversity-Stability Trade-offs in the Sampling Behavior of Large Language Models分析大语言模型采样行为中的多样性与稳定性权衡提出D-Models与E-Models的划分框架。Track 3: Responsible Web8Robust Fake News Detection using Large Language Models under Adversarial Sentiment Attacks研究在对抗性情感攻击下利用大语言模型进行鲁棒假新闻检测的方法。Track 3: Responsible Web9Read as You See: Guiding Unimodal LLMs for Low-Resource Explainable Harmful Meme Detection提出引导单模态大语言模型的方法在低资源场景下实现可解释的有害模因检测。Track 3: Responsible Web10Med-R2: Crafting Trustworthy LLM Physicians via Retrieval and Reasoning of Evidence-Based Medicine通过循证医学的检索与推理构建可信赖的医疗大语言模型医生系统Med-R2。Track 4: Search and Retrieval-Augmented AI11Rethinking the Hidden Risk of Reranking: Achieving Risk-aware Reranking with Information Gain for RAG with LLMs重新审视RAG系统中重排序的隐藏风险提出基于信息增益的风险感知重排序方法。Track 4: Search and Retrieval-Augmented AI12A Fact-Checking Framework with Denoising Evidence Retrieval and LLM-Based Debate Verification构建结合去噪证据检索与大语言模型辩论验证的事实核查框架。Track 4: Search and Retrieval-Augmented AI13Conflict-Aware RAG: Multi-Stage Learning with Conflict Signals for Robust Retrieval-Augmented Generation提出冲突感知的RAG框架通过多阶段学习利用冲突信号增强检索增强生成的鲁棒性。Track 4: Search and Retrieval-Augmented AI14IRAG: Robust Multimodal Retrieval-Augmented Generation via Hazard Separation通过危害分离机制提出鲁棒的多模态检索增强生成框架IRAG。Track 4: Search and Retrieval-Augmented AI15PaperAsk: A Benchmark for Reliability Evaluation of LLMs in Paper Search and Reading提出PaperAsk基准用于系统评估大语言模型在学术论文搜索与阅读任务中的可靠性。Track 4: Search and Retrieval-Augmented AI16SeaRAG: Reducing Hallucination in Retrieval-Augmented Generation via Statement-Entity Adaptive Ranking通过陈述-实体自适应排序有效缓解检索增强生成中的幻觉问题。Track 4: Search and Retrieval-Augmented AI17A Graph Foundation Model for Unified Anomaly Detection提出一种图基础模型统一处理多种场景下的异常检测任务。Track 2: Graph Algorithms and Modeling for the Web18Can Multimodal LLMs Perform Time Series Anomaly Detection?首次系统评估多模态大语言模型在时间序列异常检测任务上的性能表现。Track 8: Systems and Infrastructure for Web, Mobile, and Web of Things19Smart Eye: LLM-Guided Proposer-Verifier Framework for Industrial-Scale Log Anomaly Detection提出基于大语言模型引导的提出者-验证者框架Smart Eye用于工业级日志异常检测。Industry Track20Cascaded Verification Framework: A Progressive Approach for Mitigating Hallucinations in Large Language Models提出级联验证框架通过渐进式验证过程缓解大语言模型的幻觉现象。Short Papers21PAMAS: Self-Adaptive Multi-Agent System with Perspective Aggregation for Misinformation Detection构建自适应多智能体系统PAMAS通过视角聚合提升虚假信息检测能力。Track 10: Web Mining and Content Analysis22Triple-R: Iterative Query Rewriting and Refinement for Retrieval-Augmented Fake News Detection提出Triple-R框架通过迭代查询重写与优化增强检索增强的假新闻检测效果。Track 10: Web Mining and Content Analysis23Prompt-Induced Linguistic Fingerprints for LLM-Generated Fake News Detection发现提示诱导产生的语言指纹特征并将其用于检测大语言模型生成的假新闻。Track 10: Web Mining and Content Analysis24How Human Experts Educate Specialized LLMs: Filling Knowledge Gaps in KG-Augmented Generation through Hallucination Detection探索人类专家如何通过幻觉检测填补知识增强生成中的知识鸿沟以教育专用大语言模型。Track 6: Semantics and Knowledge25Knowledge-Enhanced Multimodal Fake News Detection: Semantic Visual and Priority Fusion提出知识增强的多模态假新闻检测方法融合语义视觉信息与优先级特征。Track 6: Semantics and Knowledge26CogAgent: Self-Evolving Cognitive Agents for Multi-Source Fraud Detection in Heterogeneous Financial Networks构建自演进认知智能体CogAgent在异构金融网络中实现多源欺诈检测。Track 5: Security and Privacy27TGNN: Enhancing Pixel Tracking Detection via LLM-driven Annotation and GAT-powered Structural Representation结合大语言模型驱动的标注与图注意力网络的结构表示提升像素追踪检测能力。Track 5: Security and Privacy28Bridging Expert Reasoning and LLM Detection: A Knowledge-Driven Framework for Malicious Packages构建融合专家推理与大语言模型检测的知识驱动框架用于识别恶意软件包。Track 6: Semantics and Knowledge29Does LLM Focus on the Right Words? Mitigating Context Bias in LLM-based Recommenders分析大语言模型在推荐系统中是否关注正确词汇并提出缓解上下文偏置的方法。Track 9: User Modeling, Personalization and Recommendation30Unbiased Multimodal Reranking for Long-Tail Short-Video Search提出无偏的多模态重排序方法改善长尾短视频搜索的公平性与效果。Industry Track31Digital Skin, Digital Bias: Uncovering Tone-Based Biases in LLMs and Emoji Embeddings揭示大语言模型与表情符号嵌入中的肤色偏见量化数字皮肤带来的数字偏见。Track 3: Responsible Web32Unveiling the Resilience of LLM-Enhanced Search Engines against Black-Hat SEO Manipulation评估大语言模型增强型搜索引擎对抗黑帽SEO操纵的韧性。Track 5: Security and Privacy推理大模型安全防御序号论文标题简介所属Track1Expectation-Guided Self-Verification for Aligning Large Reasoning Models with Domain Knowledge提出期望引导的自验证方法使推理大模型与领域知识实现对齐。Track 6: Semantics and Knowledge2Resisting Manipulative Bots in Meme Coin Copy Trading: A Multi-Agent Approach with Chain-of-Thought Reasoning利用思维链推理的多智能体方法抵御meme币跟单交易中的操纵性机器人。Track 5: Security and Privacy大模型隐私保护序号论文标题简介所属Track1Reading Between the Lines: Towards Reliable Black-box LLM Fingerprinting via Zeroth-order Gradient Estimation提出基于零阶梯度估计的黑盒大语言模型指纹提取方法实现模型身份的可靠识别。Track 5: Security and Privacy2Reconstructing Training Data from Adapter-based Federated Large Language Models研究从基于适配器的联邦大语言模型中重建训练数据的隐私风险。Track 5: Security and Privacy3When Reasoning Leaks Membership: Membership Inference Attack on Black-box Large Reasoning Models首次针对黑盒推理大模型提出成员推断攻击发现推理过程会泄露训练数据的成员关系信息。Track 5: Security and Privacy4Decoding Web Memorization: A Semantic Membership Inference Attack on LLMs提出基于语义记忆解码的成员推断攻击揭示大语言模型对网页内容的记忆泄露。Track 6: Semantics and Knowledge5Towards Practical LLM Unlearning: Efficient, Modular, and Retain-Free提出一种实用的大语言模型遗忘学习方法具备高效、模块化且不影响保留知识的特性。Track 5: Security and Privacy6DSSmoothing: Toward Certified Dataset Ownership Verification for Pre-trained Language Models via Dual-Space Smoothing提出双空间平滑方法DSSmoothing为预训练语言模型提供可认证的数据集所有权验证。Track 5: Security and Privacy7AWMA-MoE: Attention-Guided Watermark Adapter with MoE for Latent Diffusion Models设计基于注意力引导的混合专家水印适配器用于潜在扩散模型的版权保护。Short Papers8Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection研究大语言模型引导的语义一致性注入攻击可破坏语义感知水印机制。Short Papers9The Algorithmic Self-Portrait: Deconstructing Memory in ChatGPT解构ChatGPT中的记忆机制从算法自画像的角度分析大语言模型的记忆构成。Track 5: Security and Privacy智能体/大模型多模态等的安全序号论文标题简介所属Track1Combating Knowledge Corruption in Agent Systems: A Byzantine-Tolerant Secure Collaborative RAG Framework提出容忍拜占庭故障的安全协作RAG框架以抵御智能体系统中的知识腐化攻击。Track 5: Security and Privacy2ARuleCon: Agentic Security Rule Conversion提出ARuleCon框架实现安全规则向智能体可执行规则的自动转换。Track 5: Security and Privacy3SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection设计基于信用机制的动态威胁检测网络SentinelNet保障多智能体协作安全。Track 5: Security and Privacy4Beyond Detection: Autonomous Anomaly Remediation for MCP Against Tool Poisoning Attacks提出面向MCP协议的自主异常修复机制超越检测层面以防御工具投毒攻击。Track 5: Security and Privacy5MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM提出多粒度提示学习方法MGFFD-VLM利用视觉-语言模型进行人脸伪造检测。Track 5: Security and Privacy6Breaking Cross-modal Alignment in Embodied Intelligence: A Multimodal Adversarial Attack Framework for Vision-Language-Action Models针对具身智能中的视觉-语言-动作模型提出破坏跨模态对齐的多模态对抗攻击框架。Track 5: Security and Privacy7Navigating Truth in Multimodal Fact-checking via Retrieval- and Reasoning-Enhanced Large Language Models利用检索与推理增强的大语言模型引导多模态事实核查中的真相发现。Track 3: Responsible Web8IRAG: Robust Multimodal Retrieval-Augmented Generation via Hazard Separation通过危害分离机制提出鲁棒的多模态检索增强生成框架IRAG。Track 4: Search and Retrieval-Augmented AI9Emergent Coordinated Behaviors in Networked LLM Agents: Modeling the Strategic Dynamics of Information Operations建模网络化大语言模型智能体中的涌现协调行为分析信息作战中的战略动态。Track 7: Social Networks and Social Media10CogAgent: Self-Evolving Cognitive Agents for Multi-Source Fraud Detection in Heterogeneous Financial Networks构建自演进认知智能体CogAgent在异构金融网络中实现多源欺诈检测。Track 5: Security and Privacy11PAMAS: Self-Adaptive Multi-Agent System with Perspective Aggregation for Misinformation Detection构建自适应多智能体系统PAMAS通过视角聚合提升虚假信息检测能力。Track 10: Web Mining and Content Analysis12Mitigating Cognitive Vulnerabilities in Code Generation via Multi-Agent Adversarial Debate利用多智能体对抗辩论机制缓解代码生成过程中的认知脆弱性。Track 10: Web Mining and Content Analysis13What Is Your AI Agent Buying? Evaluation, Biases, Model Dependence, Emerging Implications of Agentic E-Commerce系统评估AI智能体在电子商务中的购买行为揭示偏置、模型依赖性与新兴影响。Short Papers14Can Multimodal LLMs Perform Time Series Anomaly Detection?首次系统评估多模态大语言模型在时间序列异常检测任务上的性能表现。Track 8: Systems and Infrastructure for Web, Mobile, and Web of Things15AWMA-MoE: Attention-Guided Watermark Adapter with MoE for Latent Diffusion Models设计基于注意力引导的混合专家水印适配器用于潜在扩散模型的版权保护。Short Papers16Unveiling the Vulnerability of Graph-LLMs: An Interpretable Multi-Dimensional Adversarial Attack on TAGs揭示图-大语言模型在文本属性图上的多维脆弱性提出可解释的多维对抗攻击框架。Track 5: Security and Privacy17Can LLMs Fool Graph Learning? Exploring Universal Adversarial Attacks on Text-Attributed Graphs探索大语言模型在文本属性图上的通用对抗攻击能力揭示LLM对图学习的潜在欺骗风险。Track 2: Graph Algorithms and Modeling for the Web18Towards LLM-centric Affective Visual Customization via Efficient and Precise Emotion Manipulating提出以LLM为核心的情感视觉定制方法实现高效且精确的情绪操控与内容生成。Track 3: Responsible Web