Global Trend Radar
arXiv cs.AI INT ai 2026-04-28 13:00

コアドメインを超えたAIセキュリティ:専門的なLLMアプリケーションにおける敵対的脆弱性のケーススタディとしての履歴書スクリーニング

原題: AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications

元記事を開く →

分析結果

カテゴリ
AI
重要度
85
トレンドスコア
34
要約
大規模言語モデル(LLM)はテキストの理解と生成に優れており、コードレビューやコンテンツモデレーションなどの自動化タスクに最適です。しかし、
キーワード
長期重要性
数年で重要
ビジネス可能性
高い(自動化された履歴書スクリーニングのセキュリティ向上により新たな市場が開ける)
日本波及可能性
高(日本でもAIを活用した採用活動が進んでおり、セキュリティ対策が求められるため)
arXiv:2512.20164v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation. However, our research identifies a vulnerability: LLMs can be manipulated by "adversarial instructions" hidden in input data, such as resumes or code, causing them to deviate from their intended task. Notably, while defenses may exist for mature domains such as code review, they are often absent in other common applications such as resume screening and peer review. This paper introduces a benchmark to assess this vulnerability in resume screening, revealing attack success rates exceeding 80% for certain attack types. We evaluate two defense mechanisms: prompt-based defenses achieve 10.1% attack reduction with 12.5% false rejection increase, while our proposed FIDS (Foreign Instruction Detection through Separation) using LoRA adaptation achieves 15.4% attack reduction with 10.4% false rejection increase. The combined approach provides 26.3% attack reduction, demonstrating that training-time defenses outperform inference-time mitigations in both security and utility preservation. arXiv:2512.20164v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation. However, our research identifies a vulnerability: LLMs can be manipulated by "adversarial instructions" hidden in input data, such as resumes or code, causing them to deviate from their intended task. Notably, while defenses may exist for mature domains such as code review, they are often absent in other common applications such as resume screening and peer review. This paper introduces a benchmark to assess this vulnerability in resume screening, revealing attack success rates exceeding 80% for certain attack types. We evaluate two defense mechanisms: prompt-based defenses achieve 10.1% attack reduction with 12.5% false rejection increase, while our proposed FIDS (Foreign Instruction Detection through Separation) using LoRA adaptation achieves 15.4% attack reduction with 10.4% false rejection increase. The combined approach provides 26.3% attack reduction, demonstrating that training-time defenses outperform inference-time mitigations in both security and utility preservation.

類似記事(ベクトル近傍)