## Adversarial Attacks on Graph Neural Networks via Meta Learning

### Daniel Zügner and Stephan Günnemann, ICLR 2019

The paper is set in a transductive learning setting where a graph is given along with label to some of the nodes and the task is to predict labels of the remaining nodes. The objective of the adversarial attack is to reduce the overall performance of the model by modifying the training data. So the objective can be formulated as a max-min problem where the attacker wants modify the graph to maximize the loss while the training agent will learn parameters to minimize the loss.

This can be reformulated as following optimization function:

Ideally $\mathcal{L}_{\mathrm{atk}} = - \mathcal{L}_{\mathrm{train}}$ but authors also use another option $\mathcal{L}_{\mathrm{atk}} = - \mathcal{L}_{\mathrm{self}}$ where they learn a model to predict the labels of the unlabeled dataset and then try to maximize the prediction error for those nodes. Since the above mentioned objective is a bilevel optimization problem it is a difficult to solve and hence authors propose a meta-learning approach.

### Meta-Learning

Recall that in MAML we saw a similar objective where we wanted to optimize parameter $\theta$ such that the loss on individual tasks $\mathcal{T}_i$ is also minimum when adapted from parameter $\theta$:

Below is a quick review of MAML equations. Map the colors with the image visualize to the equations.

simulated adaptation / learning / training $(\nabla_{\theta}\mathcal{L}_{T_{i}})$ update performed by the meta-learner is as follows:

meta objective is $\theta_{i}^{\prime}$ should be close to $\theta_i^{*}$

$\therefore$ meta update $\left(\nabla \mathcal{L}_{\text {meta }}\right)$ is:

first order approximation assumes $\nabla_{\theta} \theta_{i}^{\prime}=1$

In MAML both the meta objective and the adaptation/learning phase optimized over the same parameters. In this paper, the adaptation/learning phase optimizes model parameters but the meta-objective optimizes the graph $G$. So the equations look quite similar with meta objective optimizing $G$ instead of $\theta$.

simulated adaptation / learning / training update by the attacker is

meta update:

first order approximation $\theta \approx \tilde{\theta}$ where $\left(\nabla_{G} \tilde{\theta}=0\right)$

Another major difference between MAML and this paper is the assumption in the first order approximation of the meta-gradients. In MAML, the first order approximations assumes $\nabla_{\theta}\theta_i^{\prime} = 1$ i.e. it assumes the parameters $\theta$ and $\theta^{\prime}$ are essentially the same. Here, in the first order approximation, authors assume that the $\nabla_{\tilde{\theta}}G= 0$ i.e. the graph $G$ is constant (i.e. independent of $\tilde{\theta}$).

In the exact meta-attack version, the graph $G$ used in the simulated learning/adaptation/training phase is constantly optimized by the attacker w.r.t parameters and hence the $\nabla_{{\theta}}G \neq 0$. But for the approximate version, the graph $G$ is optimized only after $t$ steps of simulated adaptation/training/learning iterations. So, the for most of the $\tilde{\theta}$ updates the graph $G$ is gonna be constant. Hence, the parameter $\nabla_{\tilde{\theta}}G = 0$.

Since, the graph modifications are limited to edge manipulations, the optimization objective replaces graph $G$ with adjacency matrix $A$. And since the gradient direction is the direction of maximizing the function and we want to maximize the $L_\text{train/self}$ , we use +ve sign for gradient update.

#### Graph admissibility: $\Phi(G)$

Since the inherent objective of an attack is be unnoticeable there are certain constraints on modifications that the attacker can do on the graph $\hat{G} \in \Phi(G)$

1. There is a budget on the number of perturbations allowed. So, $|| A - \hat{A} ||_0 \leq \Delta$, where $\Delta$ is a budget and $A$ and $\hat{A}$ are adjacency matrix of original and modified graph $G$
2. Nodes which was initially connected should remain connected, so no singleton node created as a result of perturbations
3. Degree distribution of the graph remains the same.

#### Extension to graph attributes

To extend the meta learning formulation to modify the graph attributes we can treat the node feature matrix of the graph $X$ as hyper-parameter and reformulate the attack objective as below.

So, the meta gradient equation will be as follows:

### Empirical evaluations

Experiments show that the proposed method is indeed able to reduce the classification accuracy of the model from around 82% to 60% by making change in 15% of the edges (Fig 1). Interesting insight is in table 3 is that if parameters $W% trained with a clean graph A$ are used on the modified graph $\hat{A}$, it is still able to achieve 83% accuracy on the perturbed graph. But, if the parameters $\hat{W}$ are trained on the perturbed graph $\hat{A}$ the accuracy on the clean as well as perturbed graph is reduced significantly.

The analysis of perturbed graphs reveals that the majority of the perturbations in the graph are edge insertions (table 5). Yet, the mean shortest path of the adversarial graph is higher than the original graph. This might mean that the edges which are removed in the perturbations were some of the key connections.

### Critique

The paper brings forth a novel application of meta-learning in the bilevel optimization problems and demonstrates a successfully use case of adversarial attacks. They show both the exact and approximate formulations and their results. The approach was successful in reducing the classification accuracy.

The attacker here is making an assumption about the learning algorithm that will be used for classification, which might not be true in general. In meta-learning since the parameters are shared between the meta-learning and the actual classification model, the assumption on the learning algorithm is valid. However, here the attacker is modifying the training data before the classifier is learned and a different entity is gonna learn classifier. I feel the assumption of the attacker is not justified.

It would be interesting to see the classification accuracy of different GNN models on the $A$ and $\hat{A}$, other than the ones which were used while attacking.