Research Poster Session
The College of Computing and the Institute of Computing and Cybersystems (ICC) hosted the second Computing[MTU] Showcase research poster session on Monday, October 10, 2022. Students from across campus in a variety of disciplines displayed their posters and discussed their research.
Undergraduate winners
- First Place: Dominika Bobik — “An Educational Modeling Software Tool That Teaches Computational Thinking Skills”
- Second Place: Niccolo Jeanetta-Wark — “Performance Measurement of Trajectory Tracking Controllers for Wheeled Mobile Robots”
- Third Place: Kristoffer Larsen — “A machine learning-based method for cardiac resynchronization therapy decision support”
Graduate winners
- First Place: Shashank Pathrudkar — “Interpretable machine learning model for the deformation of multiwalled carbon nanotubes”
- Second Place: Nicholas Hamilton — “Enhancing Visualization and Explainability of Computer Vision Models with Local Interpretable Model-Agnostic Explanations (LIME)"
- Third Place (Tie): Zonghan Lyu — “Automated Image Segmentation for Computational Analysis of Patients with Abdominal Aortic Aneurysms”
- Third Place (Tie): Tauseef Mamun — “When to be Aware of your Self-Driving Vehicle: Use of Social Media Posts to Understand Problems and Misconceptions about Tesla’s Full Self-Driving Mode”
Honorable Mentions
- Dharmendra Pant — “DFT-aided Machine Learning-based discovery of Magnetism in Fe-based Bimetallic Chalcogenides”
- Chen Zhao — “Lung segmentation and automatic detection of COVID-19 using radiomic features from chest CT images”
- Abel A. Reyes-Angulo — “GAF-NAU: Gramian Angular Field encoded Neighborhood Attention U-Net for Pixel-Wise Hyperspectral Image Classification”
- Suresh Pokharel — “Improving Protein Succinylation Sites Prediction Using Embeddings from Protein Language Model”
The authors of the research posters and each poster's title and abstract are listed below.
"BertM6A: A Bert-based Encoding Approach to predict m6A modification site Prediction in DNA sequences"
M6A is one of the important DNA modifications and is responsible for DNA replication, repair, transcription, and regulation of gene expression. Most of the existing computational tools for M6A prediction depend on handcrafted/manual features. In this work, we aim to entirely eliminate the dependency on these features using the state-of-art DNA Language Model (LM) called DNABert. To our knowledge, this is the first approach to use the DNA language model to predict M6A and one of the first approaches in DNA modification site prediction in general. We believe our DNA language-based deep learning framework can outperform current state-of-art predictors for predicting m6A modification site.
"An Educational Modeling Software Tool That Teaches Computational Thinking Skills"
Due to a myriad of factors, including lack of qualified teachers and competing demands on curriculum, many students do not have the opportunity to develop vital computational thinking skills through traditional programming-intensive computer science courses. We believe there is another way for students to engage in computational thinking in the absence of traditional programming instruction. Our central hypothesis is that students engaged in computational modeling and simulation in a science course are learning and applying computational thinking skills. To test this hypothesis, our project is developing a series of computational modeling activities for high school STEM classes.
"Modeling Bioaccumulation of Polychlorinated Biphenyl Contaminants in Aquatic Ecosystems"
The Keweenaw area has been influenced from copper mining activities that occurred close to 100 years ago. Mining brought many people to the area and improved the local economy, but it also caused pollution with industrial chemical such as polychlorinated biphenyls compounds (PCBs). The properties of PCB compounds cause them to be persistent organic pollutants. These pollutants are characterized as undergoing long-range transport, resistant to metabolic transformations, able to bioaccumulate in ecosystems, and having negative impacts on human and environmental health. Therefore, PCBs are not only a local concern, but a global one. This project focuses on Torch Lake, Houghton as it is designated as an Area of Concern. The concentration of PCBs in the fish are above the allowable limit, which causes the implementation of fish advisories. This has a major impact on the Keweenaw Bay Indian Community which formerly used the lake for subsistence fishing. This project aims to determine if the fish PCB concentrations have decreased as a result of remediation done over the past five years. Mechanistic models are helpful for examining system responses to perturbations when actual testing is not feasible. A mass balance model was developed using MATLAB to estimate steady state concentrations in each trophic level of the food chain and determine how changes in the system (e.g., average fish size) can influence the fish PCB concentrations.
"Exploring Team Innovation in Academic Makerspaces"
Innovation requires people to think differently. Thinking differently is a critical skill that students need to develop in the 21st century. Makerspaces and design thinking have become part of university innovation education strategies across the world and at Michigan Tech to help students develop these skills. But how do we support innovation in makerspaces? Using cognitive task analysis, we interviewed experts from Europe and the United States and conducted a thematic analysis of the data focused on innovation. Themes from these interviews suggest focus areas for future experiments that will support innovation in makerspace.
"Improved Automated Quality Control of MSK Radiographs using Deep Learning Multi-Task Learning"
Radiographic quality control is an important part of the radiology workflow. However, most Musculoskeletal (MSK) radiographs are not evaluated by a radiologist until after the patient has left the department. We sought to develop a convolutional neural network (CNN) model for automated quality control to detect and classify projection, laterality of the wrist based on R/L marker, and the presence of hardware and/or cast. The results from the model need to match the metadata from the image requisition for the radiograph to pass the quality check. If not, the model raises an alert to the technician to be corrected. In this paper, we evaluate the performance of a deep learning multi-task capable model and discuss its limitations and challenges. Finally, we propose planned future work in these areas.
"A Secure Plausibly Deniable System for Mobile Devices against Multi-snapshot Adversaries"
Mobile computing devices have been used broadly to store, manage and process critical data. To protect confidentiality of stored data, major mobile operating systems provide full disk encryption, which relies on traditional encryption and requires keeping the decryption keys secret. This however, may not be true as an active attacker may coerce victims for decryption keys. Plausibly deniable encryption (PDE) can defend against such a coercive attacker by disguising the secret keys with decoy keys. Leveraging concept of PDE, various PDE systems have been built for mobile devices. However, a practical PDE system is still missing which can be compatible with mainstream mobile devices and, meanwhile, remains secure when facing a strong multi-snapshot adversary.
"Data Recovery from Ransomware Attacks via File System Forensics and Flash Translation Layer Data Extraction"
Ransomware is increasingly prevalent in recent years. To defend against ransomware in computing devices using flash memory as external storage, existing designs extract the entire raw flash memory data to restore the external storage to a good state. However, they cannot allow a fine-grained recovery in terms of user files as raw flash memory data do not have the semantics of “files”. In this work, we design FFRecovery, a new ransomware defense strategy that can support fine-grained data recovery after the attacks. Our key idea is, to recover a file corrupted by the ransomware, we can 1) restore its file system metadata via file system forensics, and 2) extract its file data via raw data extraction from the flash translation layer, and 3) assemble the corresponding file system metadata and the file data. A simple prototype of FFRecovery has been developed and some preliminary results are provided.
"Enhancing Visualization and Explainability of Computer Vision Models with Local Interpretable Model-Agnostic Explanations (LIME)"
It is important that humans understand why machine learning models behave the way they do, especially in the field of computer vision. Having methods for visualizing which regions of an image are responsible for classifying or detecting objects can be a very useful resource. One popular algorithm for doing so is Local Interpretable Model-agnostic Explanations (LIME). We introduce Sub-model Stabilized and Sub-grid Superimposed LIME (SubLIME), a technique for enhancing the stability of LIME-based visualizations as well as increasing the resolution of those explanations using a superimposition technique. Demonstrations are shown on the MNIST handwritten digit data set as well as a real-world data set for object detection in overhead infrared imagery.
"When to be Aware of your Self-Driving Vehicle: Use of Social Media Posts to Understand Problems and Misconceptions about Tesla’s Full Self-Driving Mode"
With the recent deployment of the latest generation of Tesla’s Full Self-Driving (FSD) mode, consumers are using semi-autonomous vehicles in both highway and residential driving for the first time. As a result, drivers are facing complex and unanticipated situations with an unproven technology, which is a central challenge for cooperative cognition. One way to support cooperative cognition in such situations is to inform and educate the user about potential limitations. Because these limitations are not always easily discovered, users have turned to the internet and social media to document their experiences, seek answers to questions they have, provide advice on features to others, and assist other drivers with less FSD experience. In this paper, we explore a novel approach to supporting cooperative cognition: Using social media posts can help characterize the limitations of the automation in order to get information about the limitations of the system and explanations and workarounds for how to deal with these limitations. Ultimately, our goal is to determine the kinds of problems being reported via social media that might be useful in helping users anticipate and develop a better mental model of an AI system that they rely on. To do so, we examine a corpus of social media posts about FSD problems to identify (1) the typical problems reported, (2) the kinds of explanations or answers provided by users, and (3) the feasibility of using such user-generated information to provide training and assistance for new drivers. The results reveal a number of limitations of the FSD system (e.g., lane-keeping and phantom braking) that may be anticipated by drivers, enabling them to predict and avoid the problems, thus allowing better mental models of the system and supporting cooperative cognition of the human-AI system in more situations.
"Performance Measurement of Trajectory Tracking Controllers for Wheeled Mobile Robots"
Autonomously driven robotics are utilized in many advanced ways, ranging from self-driving cars to customer service devices. Wheeled mobile robots (WMR) are a representative platform that utilizes autonomous navigation. The performance of a wheeled mobile robot (WMR) can be determined by evaluating various techniques involved in autonomous navigation, and motion control is a part of it. In this context, the reliability of the motion control refers to the ability to follow the desired path without large deviations. Unfortunately, while many trajectory tracking controllers are available in the literature, there is no specific guideline for measuring their performance. This research is an initial approach to fill the gap in this area. Two standard methods for motion control of WMRs are open loop and closed loop controllers that function by feeding coordinate updates continuously or calculating positional error and feedbacking correction values, respectively. In this research, we propose an index parameter that evaluates the performance of the robots. Different path traces of a WMR were measured for each controller within a simulation environment to verify the proposed index represents the controllers' performance. In the future, the research aims to propose multiple performance indices based on the objective of the operations and include more various controllers in the experiments in both simulation and real-world environments to verify the effectiveness of the proposed performance indices.
"Computational Insights on Neuropathic Pain"
The development of novel drugs to treat a variety of neurological disorders is oriented on voltage-gated sodium channels (Nav). Nav1.7 being one of the nine sodium channel isoform mainly involved in neuropathy pain. With recently solved Nav1.7 structure, a promising drug candidate for pain-related diseases targeting Nav1.7 will be available. Conotoxins are mainly small disulfide-rich peptides from the venom of cone snails with diverse composition, and biological functions. Conotoxins could be a selective ligand for certain subtypes of Nav. This work aims to computationally analyse potential binding sites in Nav1.7 for four distinct conotoxins that might be the beginning point for potential pain killers. Four different peptides of cone snail venom from C. consors µ-Conotoxin (2YEN), C. geographus µO-Conotoxin (2N8H), C.textile α- conotoxin (6OTA) and C.ermineus δ-conotoxin (1G1Z). The structure of the Nav1.7 (6J8H) was retrieved from PDB. ZDOCK was used to dock Nav1.7 with the four cone snail venom peptides, and the analysis on interacting residues was performed using PISA. The docking results demonstrate that 3 distinct conopetides (C. geographus, C.ermineus, C.textile) bind primarily to the pore forming regions of domain II (D II) and domain III (D III), whereas C. consors interacts with pore forming regions of domain I (D I) in Nav1.7. Figure 1 shows the binding pockets of all conopeptides with Nav1.7. In D III, hydrophobic interactions (F1343, F1405, W1408) are prominent in δ, α and µO- conotoxins and hydrophilic interactions (W908, D912, H915, R922) have also been identified in C.ermineus for D II. In contrast, C. consors µ-Conotoxin exhibits distinct hydrophobic interactions (F317, F344,V331) in D I, whereas no other interactions were seen in D II and D III. Thus C. consors µ-Conotoxin has unique binding site which makes it a candidate for unique Nav1.7 blocker. Hence, the utilization of toxins will considerably increase our understanding of the biophysical and pharmacological characteristics of channels, particularly in distinguishing specific channel activity.
"NECOLA: A UNet-based Universal Cosmological Emulator"
We train convolutional neural networks to correct the output of fast and approximate N-body simulations at the field level. Our model, Neural Enhanced COLA (NECOLA), takes as input a snapshot generated by the computationally efficient COLA code and corrects the positions of the cold dark matter particles to match the results of full N-body Quijote simulations. We quantify the accuracy of the network using several summary statistics, and find that NECOLA can reproduce the results of the full N-body simulations with subpercent accuracy down to k ≃ 1 hMpc−1. Furthermore, the model that was trained on simulations with a fixed value of the cosmological parameters is also able to correct the output of COLA simulations with different values of Ωm, Ωb, h, ns, σ8, w, and Mν with very high accuracy: the power spectrum and the cross-correlation coefficients are within ≃1% down to k = 1 hMpc−1. Our results indicate that the correction to the power spectrum from fast/approximate simulations or field-level perturbation theory is rather universal. Our model represents a first step toward the development of a fast field-level emulator to sample not only primordial mode amplitudes and phases, but also the parameter space defined by the values of the cosmological parameters.
"CIGAN: Cosmology-Injected GAN-based Cosmic Web Generator"
Dark matter evolves through gravity and forms the complex network of cosmic filaments, sheets, voids, and halos, known as the cosmic web. In order to compare the large-scale observations of the cosmic web with theory, numerous simulations employing billions of cosmic tracers need to be run which is a very computationally intensive task. Therefore the upcoming cosmological surveys will face a big computational bottleneck that could limit the potential of their scientific return. In order to address this problem, we train generative adversarial networks to generate statistically independent and significant, and physically realistic realizations of the cosmic web of our Universe. We show that the samples generated by our GAN-based model, CIGAN, are qualitatively and quantitatively very similar to the real samples. An important advantage of this approach is the considerable gains in computational times. Each new sample takes around a fraction of a second to generate a cosmic web compared to the numerous hours taken by the traditional N-body techniques. This will, in turn, play a crucial role in providing fast, precise and reliable simulations of the cosmic web in the era of large-scale structure surveys of the Universe.
"Cybersecurity in Smart Cars Network"
The project involves cybersecurity in Smart Car Network model. With the increasing demand of automated cars, it is equally very important to work on security features of these cars. A lot of cases has been reported recently where these cars were easily hacked and hijacked by the attacker to commit the crime of loots, rape, murder, etc. Also these cars have a control feature for various IOT appliances at home which can also be easily hacked and controlled by breaking into the system. With this project, we aim to solve and reduce such cyber threats.
"A machine learning-based method for cardiac resynchronization therapy decision support"
Heart failure (HF) remains an increasingly prevalent disease condition resulting in the inability of the heart to sufficiently pump blood to meet the body's demands. Cardiac resynchronization therapy (CRT) is a standard and costly treatment to combat HF. One problem is the large proportion of patients who will not experience benefit from the procedure. It is hypothesized that machine learning is an effective method to mitigate this issue by providing better decision support than current guidelines for admission. Additionally, ML can validate the selection of relevant patient features which constitute response to CRT from clinical and imaging data, such as single photon emission computed tomography (SPECT) myocardial perfusion imaging (MPI).
"Counterfactual Thinking as a Strategy for Questioning a Frame: Experimental Results"
Sensemaking (Weick et al., 2005; Klein et al., 2007) is integral within human-centered
computing. Developers continually readjust how they frame a problem and its potential
solutions throughout the design process (Hoffman et al., 2004; Costanza-Chock, 2020)
and users evaluate evidence and adjust their perspectives when encountering potential
cybersecurity threats. This research focuses on how people question their perspectives,
assuming that questioning a perspective is a precursor to changing it. In this experiment,
we examined the effect of counterfactual thinking strategies (through focusing on
mutability) about social situations on the likelihood of considering alternative outcomes,
a proxy for perspective questioning.
"Development of a reference-free text segmentation metric"
The performance of a text segmentation algorithm is usually computed by comparing the segmentation boundaries with those of a human made reference. In this work, we propose a reference-free metric for segmentation tasks that separate chunks at the sentence or multi-sentence level. The proposed method begins by using a pre-trained transformer to generate embeddings for each chunk of text. These embeddings are then clustered by segment and a modified cluster validity metric is computed between all adjacent text segments. We validate our proposed metric against commonly used metrics by comparing their performance on popular segmentation data sets and show that it is capable of directionally indicating the quality of a segmentation.
"Automated Image Segmentation for Computational Analysis of Patients with Abdominal Aortic Aneurysms"
"Image-based” computational fluid dynamics (CFD) first utilizes “patient-specific”
medical
imaging data to create anatomically accurate geometries and then solve Naiver-Stokes
Equation to analyze blood flow for individual patients. This process yields critically
important hemodynamic characteristics relevant to the initiation, growth, and rupture
of abdominal aortic aneurysms. Traditionally, “patient-specific” CFD models are produced
through manual segmentation of medical images, such as CT and MRI. However, the manual
segmentation process typically requires a tremendous amount of time to generate a
good model (approximately 2 hours). Hence, we looked into the possibility to reduce
the model generation time by using deep learning image-segmentation algorithms for
the automated delineation of aortic aneurysms in this study. More specifically, ARU-Net,
a trained convolutional neural network-based algorithm, was selected to segment CT
images from 10 different patients. Different hemodynamic parameters were calculated
in the CFD simulation using Ansys software (Fluent, Ansys Inc., PA). We found that
ARU-Net produced considerable time-saving in terms of reducing model creation time
(2 hours vs. 15 minutes). Minor manual editing was applied for running the CFD simulation.
Quantitatively, statistical analysis, including Pearson’s correlation coefficients,
linear regression, and Bland-Altman analysis, were used to evaluate the quality of
automatic segmentation compared to traditional manual segmentation. Our quantitative
result shows good agreement of volume, surface area, and height between the two (Averaged
PCC greater than 0.9). The wall shear stress value between the two has an unignorable
discrepancy (PCC around 0.7). In conclusion, the ARU-Net is viable for automatic segmentation
of the abdominal aortic aneurysm and can partially accelerate the CFD model creation
process. However, the ARU-Net still requires more developments to further reduce the
model creation time.
"Backward Compatible Physics Informed Neural Networks"
A physics-informed neural network (PINN) incorporates the physics of a system by satisfying its boundary value problem through a neural network's loss function. Recent studies have shown that the PINN approach can be used to approximate the map between the solution of a partial differential equation (PDE) and its spatio-temporal coordinates. However, we have observed that the PINN method is significantly inaccurate for strongly non-linear and higher-order time-varying partial differential equations such as Allen Cahn and Cahn Hilliard equations. Therefore, to overcome this problem, a novel PINN scheme is proposed that solves the PDE sequentially over successive time segments using a single neural network. The key idea in the new proposed scheme is that the same neural network is re-trained for solving the PDE over successive time segments while satisfying the already obtained solution for all previous time segments. Thus it is named as backward compatible PINN (bc-PINN). We illustrate the advantages of bc-PINN, by solving the Cahn Hilliard and Allen Cahn equations. Furthermore, we have introduced two new techniques to improve the proposed bc-PINN scheme. In the first technique, we have taken advantage of the initial condition of a time--segment to guide the neural network map closer to the true map over that segment. In the second technique, we have implemented a transfer learning approach to preserve the solution features learned while training the previous segment. We have demonstrated that these two techniques improve the accuracy and efficiency of the bc-PINN scheme significantly. The convergence has also been improved by using a phase space representation for higher-order PDEs.
"Learning to Segment Intracranial Aneurysms via An Attention Residual U-Net with Differential Preprocessing and Geometric Postprocessing"
Intracranial aneurysms (IA) are lethal, with high morbidity and mortality rates. Reliable,
rapid, and accurate segmentation of IAs and their adjacent vasculature from medical
imaging data is important to improve the clinical management of patients with IAs.
However, due to the blurred boundaries and complex structure of IAs and overlapping
with brain tissue or other cerebral arteries, image segmentation of IAs remains challenging.
This study aimed to develop an attention residual U-Net (ARU-Net) architecture with
differential preprocessing and geometric postprocessing for automatic segmentation
of IAs and their adjacent arteries in conjunction with 3D rotational angiography (3DRA)
images.
"DFT-aided Machine Learning-based discovery of Magnetism in Fe-based Bimetallic Chalcogenides."
With the technological advancement in recent years and the widespread use of magnetism in every sector of the current technology, a search for a low-cost magnetic material has been more important than ever. The discovery of magnetism in alternate materials such as metal chalcogenides with abundant atomic constituents would be a milestone in such a scenario. However, considering the multitude of possible chalcogenide configurations, predictive computational modeling or experimental synthesis is an open challenge. Here, we recourse to stacked generalization machine learning model to predict magnetism in hexagonal Fe-based bimetallic chalcogenides, FexAyB; A represents Ni, Co, Cr, or Mn, and B represents S, Se, or Te and x and y represent the concentration of respective atoms. The stacked generalization model is trained on the dataset obtained using first-principles density functional theory (DFT). The model achieves MSE, MAE, and R2 values of 1.655, 0.546, and 0.922 respectively on an independent test set, indicating that our model predicts the compositional dependent magnetism in bimetallic chalcogenides with a high degree of accuracy. A generalized algorithm is also developed to test the universality of our proposed model for any concentration of Ni, Co, Cr, or Mn up to 62.5% in bimetallic chalcogenides.
"Machine learning based prediction of the electronic structure of quasi-one-dimensional materials under strain"
We present a machine learning based model that can predict the electronic structure of quasi-one-dimensional materials while they are subjected to deformation modes such as torsion and extension/compression. The technique described here applies to important classes of materials systems such as nanotubes, nanoribbons, nanowires, miscellaneous chiral structures, and nanoassemblies, for all of which, tuning the interplay of mechanical deformations and electronic fields, i.e., strain engineering, is an active area of investigation in the literature. Our model incorporates global structural symmetries and atomic relaxation effects, benefits from the use of helical coordinates to specify the electronic fields, and makes use of a specialized data generation process that solves the symmetry-adapted equations of Kohn-Sham density functional theory in these coordinates. Using armchair single-wall carbon nanotubes as a prototypical example, we demonstrate the use of the model to predict the fields associated with the ground-state electron density and the nuclear pseudocharges, when three parameters (namely, the radius of the nanotube, its axial stretch, and the twist per unit length) are specified as inputs. Other electronic properties of interest, including the ground-state electronic free energy, can be evaluated from these predicted fields with low-overhead postprocessing, typically to chemical accuracy. Additionally, we show how the nuclear coordinates can be reliably determined from the predicted pseudocharge field using a clustering-based technique. Remarkably, only about 120 data points are found to be enough to predict the three-dimensional electronic fields accurately, which we ascribe to the constraints imposed by symmetry in the problem setup, the use of low-discrepancy sequences for sampling, and efficient representation of the intrinsic low-dimensional features of the electronic fields. We comment on the interpretability of our machine learning model and anticipate that our framework will find utility in the automated discovery of low-dimensional materials, as well as the multiscale modeling of such systems.
"Interpretable machine learning model for the deformation of multiwalled carbon nanotubes"
We present an interpretable machine learning model to predict accurately the complex rippling deformations of multiwalled carbon nanotubes made of millions of atoms. Atomistic-physics-based models are accurate but computationally prohibitive for such large systems. To overcome this bottleneck, we have developed a machine learning model that comprises a novel dimensionality reduction technique and a deep neural network-based learning in the reduced dimension. The proposed nonlinear dimensionality reduction technique extends the functional principal component analysis to satisfy the constraint of deformation. Its novelty lies in designing a function space that satisfies the constraint exactly, which is crucial for efficient dimensionality reduction. Owing to the dimensionality reduction and several other strategies adopted in the present paper, learning through deep neural networks is remarkably accurate. The proposed model accurately matches an atomistic-physics-based model whereas being orders of magnitude faster. It extracts universally dominant patterns of deformation in an unsupervised manner. These patterns are comprehensible and explain how the model predicts yielding interpretability. The proposed model can form a basis for an exploration of machine learning toward the mechanics of one- and two-dimensional materials.
"Scheduling Multiple Tethered Underwater Robots for Entanglement Free Navigation"
This work provides an operational strategy for the underwater multi-agent system of tethered robots which are utilized in many real-world applications, such as surveillance, inspection and maintenance, exploration and monitoring. Specifically, the authors focus on developing an algorithm that prevents, firstly the collision of the robots and secondly the entanglement of the robot cables by determining the appropriate time of departure from every node on the route of respective robots. The proposed technique repetitively simulates the movement of the robots along their respective routes. The algorithm is capable of accurately detecting and preventing the cable entanglements as well as collision of the robots while moving along their convoluted paths irrespective of the degree of complexity. Though estimating exact time and location of potential collision or entanglement of cables requires high computational loads, the authors aim to produce the results in a very short time period. The algorithm was iteratively tested in a simulation with varying problem sizes to verify its effectiveness. The computational results show that the algorithm can produce reliable solutions to apply in real-time operations within a reasonable time.
"Improving Protein Succinylation Sites Prediction Using Embeddings from Protein Language Model"
Protein succinylation is an important post-translational modification (PTM)
responsible for many vital metabolic activities in cells, including cellular respiration,
regulation, and repair. Here, we present a novel approach that combines features from
supervised word embedding with embedding from a protein language model called ProtT5-XL-UniRef50
(hereafter termed, ProtT5) in a deep learning framework to predict protein succinylation
sites. To our knowledge, this is one of the first attempts to employ embedding from
a pre-trained protein language model to predict protein succinylation sites. The proposed
model, dubbed LMSuccSite, achieves state-of-the-art results compared to existing methods,
with performance scores of 0.36, 0.79, 0.79 for MCC, sensitivity, and specificity,
respectively. LMSuccSite is likely to serve as a valuable resource for exploration
of succinylation and its role in cellular physiology and disease.
"LMSNOPred: An Improved Deep Learning Framework for Protein S-nitrosylation sites Prediction using Embeddings from Transformer based Protein Language Model"
Background: Protein S-nitrosylation (SNO) is a key mechanism of transferring Nitric
Oxide-mediated signals in both animals and plants and has emerged as an important
mechanism to regulate protein functions and cell signaling of all main classes of
protein. It is involved in a multitude of biological processes including immune response,
protein stability, transcription regulation, post translational regulation, DNA damage
repair, redox regulation, and is an emerging paradigm of redox signaling for protection
against oxidative stress. Development of robust computational tools to predict protein
SNO sites would be useful
in further elucidating the pathological and physiological mechanisms of SNO.
Results: Using an ensemble approach by integrating supervised word embedding and embeddings from protein language model, we developed a tool called LMSNOPred (protein Language Model-based SNO site Predictor). On an independent test set of experimentally identified SNO sites, LMSNOPred achieved values of 0.339, 0.735 and 0.772 for MCC, Sensitivity and Specificity respectively. In comparison to other SNO site prediction approaches, LMSNOPred represents a significant improvement in prediction of s-nitrosylation sites.
Conclusion: LMSNOPred site is, to the best of our knowledge, the first approach to use embeddings from protein Language Model (pLM) to predict protein SNO sites. Together, these results suggest that our method represents a robust computational approach for prediction of protein s-nitrosylation sites.
"Gabor Filter-Embedded U-Net with Transformer-Based Encoding for Biomedical Image Segmentation"
Medical image segmentation involves a process of categorization of target regions that are typically varied in terms of shape, orientation and scales. This requires highly accurate algorithms as marginal segmentation errors in medical images may lead to inaccurate diagnosis in subsequent procedures. The U-Net framework has become one of the dominant deep neural network architectures for medical image segmentation. Due to complex and irregular shape of objects involved in medical images, robust feature representations that correspond to various spatial transformations are key to achieve successful results. Although U-Net based deep architectures can perform feature extraction and localization, the design of specialized architectures or layer modifications is often an intricate task. In this paper, we propose an effective solution to this problem by introducing Gabor filter banks into the U-Net encoder, which has not yet been well explored in existing U-Net-based segmentation frameworks. In addition, global self-attention mechanisms and Transformer layers are also incorporated into the U-Net framework to capture global contexts. Through extensive testing on two benchmark datasets, we show that the Gabor filter-embedded U-Net with Transformer encoders can enhance the robustness of deep-learned features, and thus achieve a more competitive performance.
"GAF-NAU: Gramian Angular Field encoded Neighborhood Attention U-Net for Pixel-Wise Hyperspectral Image Classification"
"Improving Characterization of Abdominal Aortic Aneurysms By Modeling Thrombosis"
Abstract Background: The prevalence of abdominal aortic aneurysms (AAA) is high (9-10%) among seniors. Our goal is to characterize AAAs (slowly-growing versus rapidly-growing). Thus, better management strategies (immediate intervention versus the frequency of imaging surveillance) can be devised in a patient-specific fashion.
Methods: 3D geometrical AAA models with and without thrombosis (vessel lumen only) were generated for 64 human subjects using available contrast-enhanced CTA data. AAA growth rates were categorized as slow (< 5 mm/year) or rapid (≥ 5 mm/year) based on serial imaging. In-house Python scripts were used to calculate geometrical parameters (>40) with and without thrombosis. The patient’s relevant health information was retrieved through medical records under IRB approval. Support vector machine (SVM), a well-established machine learning method, was run with 10- fold cross-validation (100 iterations) to assess predictive strength.
Results: Among 64 AAAs studied, the ratio between rapidly-growing and slowly-growing was nearly 1:2.The combination of blood pressure control medication, co-existing coronary artery disease, aorta artery size proximal to AAA, and five geometrical parameters quantifying the extent of thrombosis provided the best accuracy for AAA’s growth status: the area under receiving operating curve (AUROC) and total accuracy are 0.82 and 0.75, respectively, with 60% and 83% of rapidly- and slowly-growing AAAs correctly identified, respectively. Without considering the thrombosis, the AUROC and our accuracy in predicting rapidly-growing AAAs decreased to 0.78 and 54%, respectively.
"Pay 'Attention' to Adverse Weather: Weather-aware Attention-based Object Detection"
Despite the recent advances of deep neural networks, object detection for adverse weather remains challenging due to the poor perception of some sensors in adverse weather. Instead of relying on one single sensor, multimodal fusion has been one promising approach to provide redundant detection information based on multiple sensors. However, most existing multimodal fusion approaches are ineffective in adjusting the focus of different sensors under varying detection environments in dynamic adverse weather conditions. Moreover, it is critical to simultaneously observe local and global information under complex weather conditions, which has been neglected in most early or late-stage multimodal fusion works. In view of these, this paper proposes a Global-Local Attention (GLA) framework to adaptively fuse the multi-modality sensing streams, i.e., camera, gated camera, and lidar data, at two fusion stages. Specifically, GLA integrates an early-stage fusion via a local attention network and a late-stage fusion via a global attention network to deal with both local and global information, which automatically allocates higher weights to the modality with better detection features at the late-stage fusion to cope with the specific weather condition adaptively. Experimental results demonstrate the superior performance of the proposed GLA compared with state-of-the-art fusion approaches under various adverse weather conditions, such as light fog, dense fog, and snow.
"Expanding Code Critiquers"
Early courses in computer programming offer many challenges for both students and instructors. For novice coding students, much of their time is spent getting a handle on the basic syntax of a specific language while also learning the problem-solving skills necessary to complete coding assignments. While developing these fundamental skills, meaningful and immediate feedback is crucial. Yet, messages and warnings from a compiler or interpreter are often inadequate at explaining an error to a novice programmer. These error messages are geared toward experts and can prove counterproductive to novice coders. Instructors can find it difficult to maintain a level of feedback conducive to student learning because of the number of students being taught in a given course. Further, instructors can not always be on-demand to meet their student’s feedback needs because of conflicting working schedules.
"Evaluating the Impact of Ablation on Atrial Hemodynamics"
This study aims to evaluate the impact of catheter ablation for atrial fibrillation on left atrial (LA) flow dynamics and geometrical changes. This exploratory study included 10 patients who underwent catheter ablation for AF for computational flow simulations. Complete cardiac cycle datasets were simulated before and after ablation using computational fluid dynamics. The study main endpoints were the changes in LA volume, LA velocity, LA wall shear stress (WSS), circulation (Γ), vorticity, pulmonary vein (PV) ostia area, and LA vortices before and after ablation. There was an average decrease in LA volume (11.58±15.17%), and PV ostia area (16.6±21.41%) after ablation. A non-uniform trend of velocity and WSS changes were observed after ablation. Compared with pre-ablation, 4 patients exhibited lower velocities, WSS distributions and a decreased Γ (-21.4±10.6%) and 6 developed higher velocities and WSS distributions after ablation. These geometrical changes dictated different flow mixing in the LA and distinct vortex patterns, characterized with different spinning velocities, vorticities, and rotational directions from pre- and post-ablation. Regions with q-criterion>0 were found to be dominant in the LA indicating prevalent rotational vortex structures. Catheter ablation for AF induced different geometrical changes on the LA and the PVs influencing therefore flow mixing and vortex patterns in the LA, in addition to overall velocity and WSS distribution. Further exploration of the impact of catheter ablation on intracardiac flow dynamics is warranted to discern general and uniform patterns that may correlate with clinical outcomes.
"Fair and Adaptive Over-sampling for Evolving Data Streams"
The growing involvement of machine learning in the decision-making systems has a more profound impact on our community. As unfair decisions caused by machine learning algorithms are increasingly reported, algorithmic fairness in machine learning is getting increasing attention. Therefore, considering algorithmic fairness in building streaming machine learning is an indispensable step. Online fairness learning is a branch of streaming machine learning that combines algorithmic fairness and concept drift challenges. In this paper, we first review the limitations of class balancing techniques and then investigate how to achieve fair, balanced data streams for the binary classification problem in the presence of concept drift. We argue that adaptive fairness in the case of evolving data streams is closer to the real-world requirement and better explains fair class balancing techniques.
"Lung Nodule Classification from Radiology Report and CT Images Using BERT and 3D
Convolutional Neural Network"
Recently, deep learning based approaches has drawn much attention to detect lung nodules
from CT images. However, the lack of large-scale medical image datasets and class
imbalance issues become key obstacles. In this study, we propose a two-branch based
deep learning framework to detect lung nodules by leveraging both text reports and
images. One branch takes input text data which utilizes a BERT-based natural language
processing algorithm, while another branch is designed to extract useful features
using a 3D deep convolutional network. The outcome of this study shows promises to
assist radiology in workflow and quality control processes of lung nodule classification.
"COVID-19 Prediction from Clinical Symptoms and X-ray Images Using Machine Learning Models"
A critical step in the fight against COVID-19 is the effective diagnosing of patients, and the popular approach is to utilize chest x-ray images and symptoms for decision making. In this study, we explore popular machine learning models for automatically detecting COVID-19 from X-ray images as well as clinical symptoms. Experimental results show promises of Convolutional Neural Network based methods and clinical symptoms such as sore throat and headache showed some degree of significance. The outcome of this study will be a good reference for the medical research community to accelerate the development of practical AI solutions for COVID-19 detection and treatment.
"Lung segmentation and automatic detection of COVID-19 using radiomic features from chest CT images"
This study aims to develop an automatic method to segment pulmonary parenchyma in chest CT images and analyze texture features from the segmented pulmonary parenchyma regions to assist radiologists in COVID-19 diagnosis. A new segmentation method, which integrates a 3D V-Net with a shape deformation module implemented using a spatial transform network (STN), was proposed to segment pulmonary parenchyma in chest CT images. The radiomic features were further analyzed by sophisticated statistical models with high interpretability to discover significant independent features and detect COVID-19 infection. Experimental results demonstrated that compared with the manual annotation, the proposed segmentation method achieved a Dice similarity coefficient of 0.9796, a sensitivity of 0.9840, a specificity of 0.9954, and a mean surface distance error of 0.0318 mm. Furthermore, our COVID-19 classification model achieved an area under curve (AUC) of 0.9470, a sensitivity of 0.9670, and a specificity of 0.9270."
"Explanation and Use of Uncertainty Quantified by Bayesian Neural Network Classifiers for Breast Histopathology Images"