About Me - Summary
A trained computer scientist by trait, I have passion for all thing technology. Starting from a ‘grade calculator’ program using QBasic programming language in school library decades ago, I have learned a thing or two, in different roles and experiences.
Blog
I sometime scribble grammatically incorrect and semantically incoherent blobs of my thoughts as blog posts. These posts are what I am thinking at that point in time, It might not even represent what I think today – let alone my employer.
Selected Project Portfolio
While I cannot post work and project I did for other employers, here are a few examples of project I have done over the last few years.
These are projects I did many years ago, as a student.
Central Analysis Techniques
Sentiment analysis of textual data takes a central role in our research focus. The strategy for this was to track positive and negative sentiment through time and between groups. We relied on a combination of techniques pulled from the semester, specifically clustering for comparison between factors that potentially influence sentiment in the data. As another method, we looked at frequent item sets to determine dominant word associations that would potentially inform sentiment. This was also considered through time and between groups.
A. Sentiment Analysis using polarity as a dimension:
Natural Language Toolkit and TextBlob provide sentiment analysis calculations built in to the libraries. For sentiment, we chose to focus on polarity to assess positive and negative sentiment within twitter data and news content. This was calculated for the entire corpus of articles and used as a dimension for clustering and regression to determine associations between sentiment and other factors.
B. Frequent itemset and association rule Analysis
Frequent items and association rule analysis was used to determine frequently co-occurring words present in the news story headlines and how they change through time. As we hoped to determine overall sentiment and tone of media content of the given topics as they progressed through time, we determined frequently co-occurring words within the documents. The Association rules are determined by frequent itemsets above a given threshold. For us, frequent itemsets are determined using the Apriori algorithm.
Most legal processes and arguments rely on prece- dents and previous interpretations of laws. Because of this, having access to the recent case documents is very important. Unfortunately, these case documents are often very long, and parsing through them is not easy for an average person. Case summaries are written to aid people, mainly professionals in legal services, in quickly parsing through many legal documents by highlighting the essential information. Creating a case summary is a labor-intensive task performed by an often expensive, trained human. A Natural Language Processing (NLP) approach to summarizing a legal text can help lawyers and clerks efficiently comb through a large number of documents, ideally lowering the cost and increasing the quality of legal help, and increasing the access to the legal system for people of lower-income brackets. In this paper, we explore the different techniques for summarizing court opinions. We present the best-performing models for both extractive and abstractive summaries. Our LSTM-based classifier for extractive and a domain-adapted transformer-based model PEGASUS_CourtOp for abstractive summary are the best performers in the legal text summarization.