Lattes is an unique and largest platform for academic curriculumns. There you can find information about the academic work of all Brazilian scholars. It includes institution of PhD, current employer, field of work, all publications metadata and more. It is an unique and reliable source of information for bibliometric studies.
I’ve been working with Lattes data for some time. Here I present a short list of papers that have used this data.
The Brazilian scientific output published in journals: A study based on a large CV database
Análise do Perfil dos Acadêmicos e de suas Publicações Científicas em Administração (in Portuguese)
Predatory publications in the Brazilian academic system: an empirical analysis (Working paper)
Package GetLattesData
is a wrap up of the functions that I’ve been using for acessing the dataset. It’s main innovation is the possibility of downloading data directly from Lattes, without any manual work or captcha solving.
Let’s consider a simple example of downloading information for a group of scholars. I selected a couple of coleagues at my university. Their Lattes id can be easilly found in Lattes website. After searching for a name, notice the internet address of the resulting CV, such as http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4713546D3. Lattes ID is the final 10 digit code of this address. In our case, it is 'K4713546D3'
.
Since we all work in the business department of UFRGS, the impact of our publications is localy set by the Qualis ranking of Management, Accounting and Tourism ('ADMINISTRAÇÃO PÚBLICA E DE EMPRESAS, CIÊNCIAS CONTÁBEIS E TURISMO'
). Qualis is the local journal ranking in Brazil. You can read more about Qualis in Wikipedia and here
Now, based on the two sets of information, vector of ids and field of Qualis, we can use GetLattesData
to download all up to date information about the researchers:
library(GetLattesData)
# ids from EA-UFRGS
my.ids <- c('K4713546D3', 'K4440252H7',
'K4783858A0', 'K4723925J2')
# qualis for the field of management
field.qualis = 'ADMINISTRAÇÃO PÚBLICA E DE EMPRESAS, CIÊNCIAS CONTÁBEIS E TURISMO'
l.out <- gld_get_lattes_data(id.vec = my.ids, field.qualis = field.qualis)
##
## Downloading file /tmp/RtmpeJ4HJu/K4713546D3_2017-10-15.zip
## Downloading file /tmp/RtmpeJ4HJu/K4440252H7_2017-10-15.zip
## Downloading file /tmp/RtmpeJ4HJu/K4783858A0_2017-10-15.zip
## Downloading file /tmp/RtmpeJ4HJu/K4723925J2_2017-10-15.zip
## Reading K4713546D3_2017-10-15.zip - Marcelo Scherer Perlin found 18 papers
## Reading K4440252H7_2017-10-15.zip - Marcelo Brutti Righi found 47 papers
## Reading K4783858A0_2017-10-15.zip - João Luiz Becker found 58 papers
## Reading K4723925J2_2017-10-15.zip - Denis Borenstein found 65 papers
The output my.l
is a list with three items:
names(l.out)
## [1] "tpesq" "tpublic" "tsupervisions"
The first is a dataframe with information about researchers:
tpesq <- l.out$tpesq
str(tpesq)
## 'data.frame': 4 obs. of 9 variables:
## $ name : chr "Marcelo Scherer Perlin" "Marcelo Brutti Righi" "João Luiz Becker" "Denis Borenstein"
## $ last.update : Date, format: "2017-09-24" "2017-10-09" ...
## $ phd.institution: chr "University of Reading" "Universidade Federal de Santa Maria" "University Of California At Los Angeles" "University of Strathclyde"
## $ phd.start.year : num 2007 2013 1982 1991
## $ phd.end.year : num 2010 2015 1986 1995
## $ country.origin : chr "Brasil" "Brasil" "Brasil" "Brasil"
## $ major.field : chr "CIENCIAS_SOCIAIS_APLICADAS" "CIENCIAS_SOCIAIS_APLICADAS" "CIENCIAS_SOCIAIS_APLICADAS" "ENGENHARIAS"
## $ minor.field : chr "Administração" "Administração" "Administração" "Engenharia de Produção"
## $ id.file : chr "K4713546D3_2017-10-15.zip" "K4440252H7_2017-10-15.zip" "K4783858A0_2017-10-15.zip" "K4723925J2_2017-10-15.zip"
The second dataframe contains information about all publications, including Qualis and SJR:
tpublic <- l.out$tpublic
str(tpublic)
## 'data.frame': 188 obs. of 13 variables:
## $ name : chr "Marcelo Scherer Perlin" "Marcelo Scherer Perlin" "Marcelo Scherer Perlin" "Marcelo Scherer Perlin" ...
## $ article.title: chr "Análise do Perfil dos Acadêmicos e de suas Publicações Científicas em Administração" "The Brazilian scientific output published in journals: A study based on a large CV database" "THE FORECASTING POWER OF INTERNET SEARCH QUERIES IN THE BRAZILIAN FINANCIAL MARKET" "A multistage stochastic programming asset-liability management model: an application to the Brazilian pension fund industry" ...
## $ year : num 2017 2017 2017 2017 2016 ...
## $ language : chr "Português" "Inglês" "Inglês" "Inglês" ...
## $ journal.title: chr "RAC. Revista de Administração Contemporânea (Impresso)" "Journal of Informetrics" "RAM. REVISTA DE ADMINISTRAÇÃO MACKENZIE (ONLINE)" "OPTIMIZATION AND ENGINEERING" ...
## $ ISSN : chr "1415-6555" "1751-1577" "1678-6971" "1389-4420" ...
## $ start.page : num 62 18 184 349 1 353 454 443 188 162 ...
## $ end.page : num 83 31 210 368 20 374 467 478 213 NA ...
## $ order.aut : num 2 1 3 3 1 2 1 1 2 1 ...
## $ n.authors : num 3 5 3 5 1 2 4 2 2 2 ...
## $ qualis : chr "A2" NA "B1" "A2" ...
## $ SJR : num NA 2.029 NA 0.481 NA ...
## $ H.SJR : int NA 50 NA 29 NA NA 45 NA NA NA ...
The third element of the list provides information about all academic supervisions of each researcher:
tsupervisions <- l.out$tsupervisions
str(tsupervisions)
## 'data.frame': 258 obs. of 7 variables:
## $ id : chr "K4713546D3_2017-10-15.zip" "K4713546D3_2017-10-15.zip" "K4713546D3_2017-10-15.zip" "K4713546D3_2017-10-15.zip" ...
## $ name : chr "Marcelo Scherer Perlin" "Marcelo Scherer Perlin" "Marcelo Scherer Perlin" "Marcelo Scherer Perlin" ...
## $ situation : chr "CONCLUIDA" "CONCLUIDA" "CONCLUIDA" "CONCLUIDA" ...
## $ type.course : chr "ACADEMICO" "ACADEMICO" "ACADEMICO" "ACADEMICO" ...
## $ course : chr "Dissertação de mestrado" "Dissertação de mestrado" "Dissertação de mestrado" "Dissertação de mestrado" ...
## $ std.name : chr "Gladys Helena Albarracín Gómez" "Martin Pontuschka" "Henrique Pinto Ramos" "Kadja Mendes" ...
## $ year.supervision: num 2015 2015 2016 2016 2016 ...
GetLattesData
Based on GetLattesData
and other packages, it is easy to create academic reports for a large number of researchers. See next, where we plot the number of publications for each researcher, conditioning on Qualis ranking.
library(ggplot2)
p <- ggplot(tpublic, aes(x = qualis)) +
geom_bar(position = 'identity') + facet_wrap(~name) +
labs(x = paste0('Qualis: ', field.qualis))
print(p)
We can also use dplyr
to do some simple assessment of academic productivity:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
my.tab <- tpublic %>%
group_by(name) %>%
summarise(n.papers = n(),
max.SJR = max(SJR, na.rm = T),
mean.SJR = mean(SJR, na.rm = T),
n.A1.qualis = sum(qualis == 'A1', na.rm = T),
n.A2.qualis = sum(qualis == 'A2', na.rm = T),
median.authorship = median(as.numeric(order.aut), na.rm = T ))
knitr::kable(my.tab)
name | n.papers | max.SJR | mean.SJR | n.A1.qualis | n.A2.qualis | median.authorship |
---|---|---|---|---|---|---|
Denis Borenstein | 65 | 3.674 | 1.3193333 | 23 | 15 | 2 |
João Luiz Becker | 58 | 3.885 | 0.8090000 | 5 | 13 | 2 |
Marcelo Brutti Righi | 47 | 1.767 | 0.4363103 | 7 | 16 | 1 |
Marcelo Scherer Perlin | 18 | 2.029 | 0.7755000 | 2 | 3 | 1 |