Information extraction and visualization from Twitter considering spatial structure
Social media is expected to be a good source of data for analyzing human behavior and statuses of locations. It is possible to provide location-based information simply by geospatially filtering archived data.
However, this naive approach causes problems for practical applications. For example, with Twitter, in general, the location from which a tweet is posted is attached to a geotagged tweet. Specifically, the location coordinates attached to the geotagged tweet, “Heavy rain in Miura Peninsula” by NHK (Japan’s public broadcaster) are not those of the Miura Peninsula, but of Shibuya in Tokyo, where NHK is located. Therefore, the tweet is be found by a spatial search around the Miura Peninsula or even Kanagawa Prefecture, where the Miura Peninsula is located.
Hideyuki Fujita at the University of Electro-Communications, Tokyo has proposed a new framework that considers the relationships between data meaning and their spatial structures.
In this research, Fujita particularly focused on the distinction between locations of interest (LoI) and locations of activity (LoA). In example above, Miura Peninsula is LoI and Shibuya is LoA. Fujita proposed a method for automatically classifying tweet locations into LoI and LoA.
The evaluation experiment that used 600,000 tweets showed good results about the precision and recall of the classification. The method was also successfully applied to extract frequently mentioned locations while classifying them into those which were globally mentioned and those locally mentioned.
The results imply that this method could be applied to analyzing the relationships between location names and the signified locations.