딥러닝/자연어 처리

[Contrastive Data and Learning for Natural Language Processing] - 1.3 Analysis of Contrastive Learning

Part 1: Foundations of Contrastive Learning
- Contrastive Learning Objectives
- Contrastive Data Sampling and Augmentation Strategies
- Analysis of Contrastive Learning

아래와 같이 3가지 측면에서 Contrastive Learning을 살펴볼 예정이다.

● Geometric Interpretation

● Connection to Mutual Information

● Robustness and Security

1. Geometric Interpretation

Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere. (Wand and Isola, 2020) 논문에서 Contrastive Representation을 hypersphere 형태로 바라본다.

본 논문에서는 Contrastive Learning이 negative sample이 무한하다는 조건 하에 alignment와 uniformity라는 두 가지 속성을 만족한다고 설명한다.

alignment: 유사한 sample은 유사한 feature을 가진다.
uniformity: feature의 분포는 정보를 최대한 보존한다.

위 설명은 Self-Supervised Contrastive Learning에 대한 설명이었다.Supervised Contrastive Learning의 경우에는 label이 존재하므로 아래 그림처럼 label 값을 point로 Simplex 형태의 모습으로 수렴하게 된다.

2. Connection to Mutual Information

Mutual Information

: 두 변수가 서로 얼마나 의존적인지를 측정

위 수식처럼 $x$는 anchor point, $x^+$는 positive point라고 한다면,

두 pair에 대한 joint distribution과 marginal distribution을 KL Divergence한 값을 구한다.

만약 이 pair가 서로 독립적이라면 $p(x, x^+)$는 $p(x)p(x^+)$와 같을 것이다. → Mutual Information 값은 0이 된다.

Mutual Information 값을 Maximize한다면 이 두 값을 dependent하게 하는 것이고
Mutual Information 값을 Minimize한다면 이 두 값을 independent하는 방향으로 학습하는 것이다.

InfoNCE

: Softmax loss를 사용해 하나의 Positive sample을 N-1개의 negative sample들로부터 구별해낸다고 볼 수 있음

이 InfoNCE를 Mutual Information을 Maximizing하는 과정으로 해석할 수 있다.

Info-Sentence-BERT

[2009.12061] An Unsupervised Sentence Embedding Method by Mutual Information Maximization (arxiv.org)

: Info-Sentence-BERT는 문장의 global representation과 local representation의 Mutual Information을 Maximizing 하는 아이디어를 사용하였다.

3. Robustness and Security of Contrastive Learning Models

데이터를 일일히 라벨링하는 것보다 nosiy하고 가공되지 않은 데이터를 사용하는 것이 시간과 비용이 적게 들겠지만,

CLIP과 같이 noise가 많고 가공되지 않은 학습 데이터셋을 가지고 Contrastive Learning을 한 모델을 신뢰할 수 있을까?

Poisoning and Backdooring Contrastive Learning | OpenReview 에서 데이터셋에 대해서 Target Posioning과 Backdoor Attack을 했을 경우에는 취약함을 확인하였다.

Target Posioning
: Posioning 되지 않은 모델과 정확도는 거의 동일하게 유지하면서 Malicious data를 주입하여 입력 데이터를 잘못 분류하도록 하는 공격 유형
Backdoor Attack
: 작은 patch를 이미지에 덧씌워서 잘못 분류하도록 하는 공격 유형

Reference

Contrastive Data and Learning for Natural Language Processing (contrastive-nlp-tutorial.github.io)https://dilithjay.com/blog/nt-xent-loss-explained/

728x90

'딥러닝 > 자연어 처리' 카테고리의 다른 글

[Vector Similarity Search] 3 Vector-based Methods for Similarity Search - TF-IDF / BM25 / SBERT (0)	2023.05.28
[Vector Similarity Search] 3 Traditional Methods for Similarity Search - Jaccard / w-shingling / Levenshtein (0)	2023.05.13
[Contrastive Data and Learning for Natural Language Processing] - 1.2 Contrastive Data Sampling and Augmentation Strategies (0)	2023.03.01
[Contrastive Data and Learning for Natural Language Processing] - 1.1 Contrastive Learning Objectives (2)	2023.02.24
LSTM sequence-to-sequence with attention (0)	2022.07.04

Contents

새소식