A Citation Count Prediction Model For Stem Publishing Domains

Better Essays

A Citation Count Prediction Model for STEM Publishing Domains

Goals

I attempt to tackle the task of citation count prediction using existing and new features. Looking at multiple domains, I identify differences both in the ability to predict citation counts as well as the nature of features that contribute to the prediction. For instance, the phenomenon of famous authors attracting more citations is more apparent in Biology and Medicine compared with other domains. Additionally, while the popularity of a paper’s references is predictive of the paper’s success in most domains, this is clearly not the case in Engineering and Physics. The following is a model that can be used to predict citations 5 years in the future (using data from 2005 …show more content…

Table 1. Domain-specific Statistics
Domain Affiliations Papers – 2005 Papers – 2015 Authors per paper – 2005 Authors per paper - 2015
CS 4,851 59,116 110,506 2.43 2.75
Biology 2,082 59,395 93,792 3.58 4.04
Chemistry 811 26,496 50,381 3.56 3.99
Medicine 5,524 125,113 214,854 3.52 3.67
Engineering 2,589 43,440 77,664 3.20 3.53
Mathematics 581 11,057 17,317 1.75 1.90
Physics 688 25,393 42,955 4.41 5.05

Methods & Techniques

Feature Engineering - I consider four groups of features: Authors, Institutions, Affiliations, References Network. The first three (group 1)—Authors, Institutions and Affiliations—describe the reputation of the paper’s venue, of its authors and of its author’s institutions. I start by calculating the following features for each venue, author and institution in the dataset: the sum of citation counts of papers published by the entity, mean citations over papers published by the entity, and max citations, e.g. the citation count of the most cited work by the entity. I also calculate the h-index and g-index of these entities. The h-index is defined as the largest h such that at least h papers by the entity received at least h citations. The g-index is defined as the largest g such that the top g papers by the entity received together at least g2 citations. Both h-index and g-index numbers are easily calculable using the capabilities in the Scopus database. For each paper I aggregate the features of the entities (authors, institutions and

Get Access

A Citation Count Prediction Model For Stem Publishing Domains

Nt1310 Unit 3 Research Papers

Nt1310 Unit 3 Research Papers

Where The Jobs Are: STEM Fields By Linda Rosen

Where The Jobs Are: STEM Fields By Linda Rosen

Baseball Hall Of Fame

Baseball Hall Of Fame

Cuebed: Article Analysis

Cuebed: Article Analysis

Final Project Proposal : Joshua Abraham Kopin

Final Project Proposal : Joshua Abraham Kopin

Exclusivism: Scholarly Article Analysis

Exclusivism: Scholarly Article Analysis

Should Organizations Fear Websites Where Consumers Post Negative Or The Messages About Products Or Services?

Should Organizations Fear Websites Where Consumers Post Negative Or The Messages About Products Or Services?

Richer And Poorer By Jill Lepore Summary

Richer And Poorer By Jill Lepore Summary

Myth Of STEM Problem

Myth Of STEM Problem

Community Scholarly Analysis

Community Scholarly Analysis

Summary: Uchicago Appeals To My Sense Of Learning

Summary: Uchicago Appeals To My Sense Of Learning

Charles Krauthammer Column Report

Charles Krauthammer Column Report

Statement Of Purpose For Biostatistics

Statement Of Purpose For Biostatistics

Letter About Giving Up A Case Study

Letter About Giving Up A Case Study

Do College Rankings Matter? Essay

Do College Rankings Matter? Essay

Related Topics