Climate change is a polarized topic on social media in the U.S. Actors who advocate climate change as scientific fact, or tout it as a conspiracy, both post videos on YouTube. Both kinds of videos can receive millions of views and thousands of comments. Given the polarized nature of the topic, we might expect a high degree of vitriolic speech in the comments around the videos. Previous Twitter studies would also suggest significant differences in the networks made from the interactions of such comments and replies. This study focuses on these comments and replies in an effort to understand the nature of discourse surrounding climate change believer and skeptic videos. Our hope is to extend the existing literature studying scientific communication around climate change, which to our knowledge hasn’t specifically compared discussions around both climate change believers or skeptics on YouTube. Results show most users only comment on other users that align with their own perspectives about climate change, and express positive sentiment toward them. Our study also finds that the more negative the users’ comment, the more connections they have with other users. These findings indicate further investigations of climate change social activities on YouTube.
2023
Can ChatGPT Understand Causal Language in Science Claims?
Kim, Yuheun, Guo, Lu, Yu, Bei, and Li, Yingya
In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis Jul 2023
This study evaluated ChatGPT’s ability to understand causal language in science papers and news by testing its accuracy in a task of labeling the strength of a claim as causal, conditional causal, correlational, or no relationship. The results show that ChatGPT is still behind the existing fine-tuned BERT models by a large margin. ChatGPT also had difficulty understanding conditional causal claims mitigated by hedges. However, its weakness may be utilized to improve the clarity of human annotation guideline. Chain-of-Thoughts were faithful and helpful for improving prompt performance, but finding the optimal prompt is difficult with inconsistent results and the lack of effective method to establish cause-effect between prompts and outcomes, suggesting caution when generalizing prompt engineering results across tasks or models.
2022
SQ2SV: Sequential Queries to Sequential Videos retrieval
Paek, Injin, Choi, Nayoung, Ha, Seongjin, Kim, Yuheun, and Song, Min
In 2022 IEEE International Conference on Big Data (Big Data) Jul 2022
Current video retrieval models are one-to-one matching models, which limits them from learning from the sequential context of the videos. While most public datasets for this task are text-video pairs that are contextually independent, datasets such as YouCook2, Video Storytelling, and COIN consist of chronological text-video pair segments. This paper introduces a retrieval task Sequential Queries to Sequential Videos retrieval (SQ2SV) that retrieves multiple sets of sequential videos from sequential queries to utilize such contextual interdependence. To the best of our knowledge, this paper is the first a ttempt to introduce multiple sets of sequential videos retrieval. We not only introduce a new task but also build a task-specific model and its evaluation metric. Our model, UniSeq (UniVL-based sequential videos retrieval), is a sequential as well as a cross representation model. Our new metric ‘Video R@k’ evaluates the performance of a retrieval model in a unit of video, not in a unit of the video segment. Our best model outperforms the UniVL baseline in the original R@1 of YouCook2 by 0.40% and Video Storytelling by 1.09%. Furthermore, comparing the Video R@1 score, our model outperforms the baseline by 0.27% for YouCook2 and 0.94% for Video Storytelling.
2021
BioPREP: Deep learning-based predicate classification with SemMedDB
Hong, Gibong*, Kim, Yuheun*, Choi, YeonJung*, and Song, Min
When it comes to inferring relations between entities in biomedical texts, Relation Extraction (RE) has become key to biomedical information extraction. Although previous studies focused on using rule-based and machine learning-based approaches, these methods lacked efficiency in terms of the demanding amount of feature processing while resulting in relatively low accuracy. Some existing biomedical relation extraction tools are based on neural networks. Nonetheless, they rarely analyze possible causes of the difference in accuracy among predicates. Also, there have not been enough biomedical datasets that were structured for predicate classification. With these regards, we set our research goals as follows: constructing a large-scale training dataset, namely Biomedical Predicate Relation-extraction with Entity-filtering by PKDE4J (BioPREP), based on SemMedDB then using PKDE4J as an entity-filtering tool, evaluating the performances of each neural network-based algorithms on the structured dataset. We then analyzed our model’s performance in-depth by grouping predicates into semantic clusters. Based on comprehensive experimental outcomes, the experiments showed that the BioBERT-based model outperformed other models for predicate classification. The suggested model achieved an f1-score of 0.846 when BioBERT was loaded as the pre-trained model and 0.840 when SciBERT weights were loaded. Moreover, the semantic cluster analysis showed that sentences containing key phrases were classified better, such as comparison verb + ‘than’.