dinsdag 26 januari 2016

Using Gephi to determine trending subjects in libraries

Using Gephi to create a network of subjects, which were exposed to users who used the OPAC of the Peace Palace Library during the month of December, 2015, I am able to observe clusters of related subjects. In the upper level of this network I see one huge cluster with the subject “Human rights” more or less in the center of this cluster. Zooming in on this cluster enables me to weed out the subjects which are not strongly related to “Human rights”. After several zooming in sessions, one smaller cluster with “Human Rights” in the center remains.  Gephi no longer distinguishes subsections anymore. So the resulting table below contains a set of more strongly related subjects, centered around the main subject “Human rights” in December 2015.



Human rights

Freedom of expression
Civil society
Human dignity
World Bank Group
Regional instruments
Universal Declaration of Human Rights (New York, 10 December 1948)
Committee on the Elimination of Racial Discrimination
Human rights commissions
International Convention on the Elimination of All Forms of Racial Discrimination (New York, 7 March 1966)
Office of the United Nations High Commissioner for Human Rights

International instruments
Obligations of the state
Civil and political rights
International law and domestic law
Human Rights Committee
Legal remedies
International Covenant on Civil and Political Rights (New York, 16 December 1966)
Freedom of information
Hate speech
United Nations Human Rights Council
Treaty bodies
Freedom to provide education

How about the use of this set of interrelated subjects in other months of 2015? Is it possible to determine a trend during  the whole year 2015? In order to investigate this, I collected a lot of data which have to do with subjects. Every month of the year 2015 I collected all subjects exposed to our users of our OPAC and then let Gephi calculate several values related to each subject in these monthly networks. All these different values I stored in a database, but in the following I am just interested in the so-called betweenness centrality value. This value stands for, and I quote : “In brief,betweenness centrality is an indicator value for a key position. The higher the value the more important the role of the keyword. This value is calculated by counting the shortest paths between two keywords in our network. The keyword which appears the most times as being in between two different keywords, has the highest betweenness centrality value; these keywords are brokers or intermediaries.” [http://pushaqa.blogspot.nl/2014/11/just-below-surface.html]

So it is possible to measure the popularity of certain subject areas by using the 'weight' of these subject areas in the bigger picture of monthly subject networks. This means I used the values calculated in relation to the complete sets of subjects, not just te values of the subset. I then calculate the trendline of these weigths.  The slope of the trendline indicates an increase (positive slope) or a decrease (negative slope) in popularity. In the table of subjects above, I also mentioned the slope in accordance to each subject. A division in popularity is shown; there are subjects with a decreasing popularity like “Human rights” itself, or “Freedom of expression”. Increasing popularity can be observed in “International instruments” and “Obligations of the state”. If I take the complete set of subjects in consideration there is an average of -0.027, so a very slight decrease of interest in ‘Human rights and related subjects”.

There is a increase in interest if I do the same exercise with "Law of the sea": on average 0.142.

It is my belief that knowledge about the development of interest in a particular subject, can help libraries to create better services for its users.

