Organizer -
Dr. Bo Chen (Computer Science)
Coordinators -
Dr. Xinyu Lei (Computer Science)
Dr. Kaichen Yang (Electrical and Computer Engineering)
Dr. Ronghua Xu (Applied Computing)
Next Colloquium:
01/24/25
12pm-1pm
Library 242
"Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution"
Presenter: Xinyun Liu
Abstract: Ownership verification is currently the most critical and widely adopted post-hoc method to safeguard model copyright. In general, model owners exploit it to identify whether a given suspicious third-party model is stolen from them by examining whether it has particular properties ‘inherited’ from their released models. Currently, backdoor-based model watermarks are the primary and cutting-edge methods to implant such properties in the released models. However, backdoor-based methods have two fatal drawbacks, including harmfulness and ambiguity. The former indicates that they introduce maliciously controllable misclassification behaviors (i.e., backdoor) to the watermarked released models. The latter denotes that malicious users can easily pass the verification by finding other misclassified samples, leading to ownership ambiguity. In this paper, we argue that both limitations stem from the ‘zero-bit’ nature of existing watermarking schemes, where they exploit the status (i.e., misclassified) of predictions for verification. Motivated by this understanding, we design a new watermarking paradigm, i.e., Explanation as a Watermark (EaaW), that implants verification behaviors into the explanation of feature attribution instead of model predictions. Specifically, EaaW embeds a ‘multi-bit’ watermark into the feature attribution explanation of specific trigger samples without changing the original prediction. We correspondingly design the watermark embedding and extraction algorithms inspired by explainable artificial intelligence. In particular, our approach can be used for different tasks (e.g., image classification and text generation). Extensive experiments verify the effectiveness and harmlessness of our EaaW and its resistance to potential attacks.
Past Colloquiums
Presenter - Caleb Rother
In the history of access control, nearly every system designed has relied on the operating system (OS) to enforce the access control protocols. However, if the OS (and specifically root access) is compromised, there are few if any solutions that can get users back into their system efficiently. In this work, we have proposed a novel approach that allows secure and efficient rollback of file access control after an adversary compromises the OS and corrupts the access control metadata. Our key observation is that the underlying flash memory typically performs out-of-place updates. Taking advantage of this unique feature, we can extract the "stale data" specific for OS access control, by performing low-level disk forensics over the raw flash memory. This allows efficiently rolling back the OS access control to a state pre-dating the compromise. To justify the feasibility of the proposed approach, we have implemented it in a computing device using file system EXT2/EXT3 and open-sourced flash memory firmware OpenNFM. We also evaluated the potential impact of our design on the original system. Experimental results indicate that the performance of the affected drive is not significantly impacted.
Presenter - Haoyang Chen
Deep Learning (DL) models have become crucial in digital transformation, thus raising concerns about their intellectual property rights. Different watermarking techniques have been developed to protect Deep Neural Networks (DNNs) from IP infringement, creating a competitive field for DNN water-marking and removal methods. The predominant watermarking schemes use white-box techniques, which involve modifying weights by adding a unique signature to specific DNN layers. On the other hand, existing attacks on whitebox watermarking usually require knowledge of the specific deployed watermarking scheme or access to the underlying data for further training and finetuning. We propose DeepEclipse, a novel and unified framework designed to remove white-box watermarks. We present obfuscation techniques that significantly differ from the existing whitebox watermarking removal schemes. DeepEclipse can evade watermark detection without prior knowledge of the underlying watermarking scheme, additional data, or training and fine-tuning. Our evaluation reveals that DeepEclipse excels in breaking multiple white-box watermarking schemes, reducing watermark detection to random guessing while maintaining a similar model accuracy as the original one. Our framework showcases a promising solution to address the ongoing DNN watermark protection and removal challenges.
Presenter - Xinyun Liu
Adversarial patch attacks present a signifcant threat to real-world object detectors due to their practical feasibility. Existing defense methods, which rely on attack data or prior knowledge, struggle to effectively address a wide range of adversarial patches. In this paper, we show two inherent characteristics of adversarial patches, semantic independence and spatial heterogeneity, independent of their appearance, shape, size, quantity, and location. Semantic independence indicates that adversarial patches operate autonomously within their semantic context, while spatial heterogeneity manifests as distinct image quality of the patch area that differs from original clean image due to the independent generation process. Based on these observations, we propose PAD, a novel adversarial patch localization and removal method that does not require prior knowledge or additional training. PAD offers patch-agnostic defense against various adversarial patches, compatible with any pre-trained object detectors. Our comprehensive digital and physical experiments involving diverse patch types, such as localized noise, printable, and naturalistic patches, exhibit notable improvements over state-of-the-art works. Our code is available at https://github.com/Lihua-Jing/PAD.
Presenter - Shiwei Ding
Nowadays, the deployment of deep learning-based applications is an essential task owing to the increasing demands on intelligent services. In this paper, we investigate latency attacks on deep learning applications. Unlike common adversarial attacks for misclassification, the goal of latency attacks is to increase the inference time, which may stop applications from responding to the requests within a reasonable time. This kind of attack is ubiquitous for various applications, and we use object detection to demonstrate how such kind of attacks work. We also design a framework named Overload to generate latency attacks at scale. Our method is based on a newly formulated optimization problem and a novel technique, called spatial attention. This attack serves to escalate the required computing costs during the inference time, consequently leading to an extended inference time for object detection. It presents a significant threat, especially to systems with limited computing resources. We conducted experiments using YOLOv5 models on Nvidia NX. Compared to existing methods, our method is simpler and more effective. The experimental results show that with latency attacks, the inference time of a single image can be increased ten times longer in reference to the normal setting. Moreover, our findings pose a potential new threat to all object detection tasks requiring non-maximum suppression (NMS), as our attack is NMS-agnostic.
Presenter - Dani Obidov
Because "out-of-the-box" large language models are capable of generating a great deal of objectionable content, recent work has focused on aligning these models in an attempt to prevent undesirable generation. While there has been some success at circumventing these measures - so-called "jailbreaks" against LLMs - these attacks have required significant human ingenuity and are brittle in practice. Attempts at automatic adversarial prompt generation have also achieved limited success. In this paper, we propose a simple and effective attack method that causes aligned language models to generate objectionable behaviors. Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer). However, instead of relying on manual engineering, our approach automatically produces these adversarial suffixes by a combination of greedy and gradient-based search techniques, and also improves over past automatic prompt generation methods. Surprisingly, we find that the adversarial prompts generated by our approach are highly transferable, including to black-box, publicly released, production LLMs. Specifically, we train an adversarial attack suffix on multiple prompts (i.e., queries asking for many different types of objectionable content), as well as multiple models (in our case, Vicuna-7B and 13B). When doing so, the resulting attack suffix induces objectionable content in the public interfaces to ChatGPT, Bard, and Claude, as well as open source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others. Interestingly, the success rate of this attack transfer is much higher against the GPT-based models, potentially owing to the fact that Vicuna itself is trained on outputs from ChatGPT. In total, this work significantly advances the state-of the-art in adversarial attacks against aligned language models, raising important questions about how such systems can be prevented from producing objectionable information. Code is available at github.com/llm-attacks/llm-attacks.
Presenter - Harsh Singh
The increasing popularity and deployment of Internet of Vehicles (IoV) along with Connected Autonomous Vehicles (CAVs) have benefits ranging from enhanced road safety to efficient traffic management. However, with the increasing connectivity and data exchange among vehicles, the security of IoV systems becomes paramount. In this paper, I have proposed a novel approach for ensuring secure access control within IoV environments using the Proof of Elapsed Time (PoET) consensus mechanism of Blockchain. PoET, a lightweight consensus algorithm based on trusted execution environments (TEE), offers a robust solution for securely managing access permissions in dynamic IoV scenarios. A detailed analysis of the proposed system architecture, highlighting its resilience against various security threats including unauthorized access, data tampering, and Sybil attacks, has also been discussed. Additionally, the implementation aspects and evaluation of the performance of the solution through simulations and real-world experiments have been showcased. The findings demonstrate that the integration of PoET into IoV environments significantly enhances security without compromising efficiency, making it a promising solution for securing the next generation of connected and autonomous vehicles (CAVs).
Presenter - Caleb Rother
In the history of access control, nearly every system designed has relied on the operating system for enforcement of its protocols. If the operating system (and specifically root access) is compromised, there are few if any solutions that can get users back into their system efficiently. In this work, we have proposed a method by which file permissions can be efficiently rolled back after a catastrophic failure of permission enforcement. Our key idea is to leverage the out-of-place-update feature of flash memory in order to collaborate with the flash translation layer to efficiently return those permissions to a state pre-dating the failure.
Presenter - Shiwei Ding
The backdoor attack, where the adversary uses inputs stamped with triggers (e.g., a patch) to activate pre-planted malicious behaviors, is a severe threat to Deep Neural Network (DNN) models. Trigger inversion is an effective way of identifying backdoor models and understanding embedded adversarial behaviors. Achallenge of trigger inversion is that there are many ways of constructing the trigger. Existing methods cannot generalize to various types of triggers by making certain assumptions or attack-specific constraints. The fundamental reason is that existing work does not consider the trigger’s design space in their formulation of the inversion problem. This work formally defines and analyzes the triggers injected in different spaces and the inversion problem. Then, it proposes a unified framework to invert backdoor triggers based on the formalization of triggers and the identified inner behaviors of backdoor models from our analysis. Our prototype UNICORN is general and effective in inverting backdoor triggers in DNNs.
Presenter - Haoyang Chen
The fast-growing surveillance systems will make image captioning, i.e., automatically generating text descriptions of images, an essential technique to process the huge volumes of videos efficiently, and correct captioning is essential to ensure the text authenticity. While prior work has demonstrated the feasibility of fooling computer vision models with adversarial patches, it is unclear whether the vulnerability can lead to incorrect captioning, which involves natural language processing after image feature extraction. In this paper, we design CAPatch, a physical adversarial patch that can result in mistakes in the final captions, i.e., either create a completely different sentence or a sentence with keywords missing, against multi-modal image captioning systems. To make CAPatch effective and practical in the physical world, we propose a detection assurance and attention enhancement method to increase the impact of CAPatch and a robustness improvmment method to address the patch distortions caused by image printing and capturing. Evaluations on three commonly used image captioning systems (Show-and-Tell, Self-critical Sequence Training: Att2in, and Bottom-up Top-down) demonstrate the effectiveness of CAPatch in both the digital and physical worlds, whereby volunteers wear printed patches in various scenarios, clothes, lighting conditions. With a size of 5% of the image, physically printed CAPatch can achieve continuous attacks with an attack success rate higher than 73.1% over a video recorder.
Presenter- Doni Obidov
Abstract:
In the rapidly evolving domain of language model (LM) development, ensuring the integrity and security of training datasets is crucial. This study introduces a sophisticated form of data poisoning, categorized as a backdoor attack, which subtly undermines LMs. Diverging from traditional methodologies that rely on textual alterations within the training corpus, our approach is grounded in the strategic manipulation of labels for a select subset of the dataset. This discreet yet potent form of attack demonstrates its efficacy through the implementation of both single-word and multi-word trigger mechanisms. The novelty of our method lies in its unobtrusiveness, effectively executing backdoor attacks without the need for conspicuous text modifications. Our findings reveal significant vulnerabilities in LM training processes, underscoring the need for enhanced security measures in dataset preparation and model training. This paper not only elucidates the feasibility of label-based backdoor attacks but also serves as a crucial reminder of the often-overlooked subtleties in dataset security that can have profound implications on model integrity.
Presenter- Haoyang Chen
Abstract:
Adversarial purification refers to a class of defense methods that remove adversarial perturbations using a generative model. These methods do not make assumptions on the form of attack and the classification model, and thus can defend pre-existing classifiers against unseen threats. However, their performance currently falls behind adversarial training methods. In this work, we propose DiffPure that uses diffusion models for adversarial purification: Given an adversarial example, we first diffuse it with a small amount of noise following a forward diffusion process, and then recover the clean image through a reverse generative process. To evaluate our method against strong adaptive attacks in an efficient and scalable way, we propose to use the adjoint method to compute full gradients of the reverse generative process. Extensive experiments on three image datasets including CIFAR10, ImageNet and CelebA-HQ with three classifier architectures including ResNet, WideResNet and ViT demonstrate that our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods, often by a large margin.
Presenter- Niusen Chen
Abstract:
With increasing development of connected and autonomous vehicles, the risk of cyber threats on them is also increasing. Compared to traditional computer systems, a CAV attack is more critical, as it does not only threaten confidential data or system access, but may endanger the lives of drivers and passengers. To control a vehicle, the attacker may inject malicious control messages into the vehicle’s controller area network. To make this attack persistent, the most reliable method is to inject malicious code into an electronic control unit’s firmware. This allows the attacker to inject CAN messages and exhibit significant control over the vehicle, posing a safety threat to anyone in proximity. In this work, we have designed a defensive framework which allows restoring compromised ECU firmware in real-time. Our framework combines existing intrusion detection methods with a 'firmware recovery mechanism using trusted hardware components equipped in ECUs. Especially,the firmware restoration utilizes the existing FTL in the flash storage device. This process is highly efficient by minimizing the necessary restored information. Further, the recovery is managed via a trusted application running in TrustZone secure world. Both the FTL and TrustZone are secure when the ECU firmware is compromised. Steganography is used to hide communications during recovery. We have implemented and evaluated our prototype implementation in a testbed simulating the real-world in-vehicle scenario.
Presenter - Dr. Ronghua Xu
Abstract:
The fast integration of the fifth-generation (5G) communication, Artificial Intelligence (AI), and the Internet of Things (IoT) technologies is envisioned to enable Next Generation Networks (NGNs) that provides diverse intelligent services for Smart Cities. However, the ubiquitous proliferation of highly connected end devices and user-defined applications bring serious services provisioning, security, privacy, and management challenges on the centralized framework adopted by conventional networking systems. My research aims for a large-dimensional, autonomous and intelligent network infrastructure that integrates Machine Learning (ML), Blockchain, and Network Slicing (NS) atop of the sixth-generation (6G) communication networks to provide decentralized, secure, scalable, resilient, and efficient network services and a dynamic resource management for complex and heterogeneous IoT ecosystems, like Metaverse, smart transportation, and Unmanned Aerial Vehicle (UAV) systems, etc. Therefore, this presentation will introduce "a Secure-by-Design Federated Microchain Fabric for Internet-of-Things (IoT) System", which laid down a solid foundation for constructing the secure and decentralized networking infrastructure under multi-domain IoT scenarios. From the system architecture aspect, I will specially focus on microDFL, which is a novel hierarchical IoT network fabric for decentralized federated learning (DFL) atop of the federation of lightweight Microchains. Under the framework of federated Microchain, I will explain two lightweight microchains for IoT systems, called Econledger and Fairledger, which adopt efficient consensus protocols to improve performance at the network of edge. Following that, a novel epoch randomness-enabled consensus committee configuration scheme call ECOM has been designed to enhance scalability and security of the small scale microchain, and a smart contract enabled inter-ledger protocol has been implemented to improve interoperation during cross-chain operations. After that, I will explain a novel concept of a dynamic edge resource federation framework by joint combination of a federated microchain fabric with network slicing technology, which shed the light on future opportunities that guarantee scalability, dynamicity, and security for multi-domain IoT ecosystems atop of NGNs. Moreover, I will also talk on applying of the key ideas of microchain into different IoT scenarios, like IoT network security, data marketplaces, and urban air mobility systems. Finally, I will conclude my talk by presenting vision towards NGNs that provide ubiquitous and pervasively network access for users.
Presenter - Dr. Xinyun Liu
Abstract:
A generative AI model can generate extremely realistic-looking content, posing growing challenged to the authenticity of information. To address the challenges, watermark has been leveraged to detect AI-generated content before it is released. Content is detected as AI-generated if a similar watermark can be decoded from it. In this work, we perform a systematic study on the robustness of such watermark-based AI-generative content detection. We focus on AI-generated images. Our work shows that an attacker can post-process a watermarked image via adding a small, human-imperceptible perturbation to it, such that the post-processed image evades detection while maintaining its visual quality. We show the effectiveness of our attack both theoretically and empirically. Moreover, to evade detection, our adversarial post-processing method adds much smaller perturbations to AI-generated images and thus better maintain their visual quality than existing popular post-processing methods such as JPEG compression Gaussian blur, and brightness/contrast. Our work shows the insufficiency of existing watermark-based detection of AI-generated content, highlighting the urgent needs of new methods.