The Text Mining Handbook Advanced Approaches in Analyzing Unstructured Data

Reading for this week: The Text Mining Handbook. Advanced Approaches in Analyzing Unstructured Data (Feldman & Sanger, 2006)

 

TMhandbook

Description

Text mining tries to solve the crisis of information overload by combining techniques from data mining, machine learning, natural language processing, information retrieval, and knowledge management. In addition to providing an in-depth examination of core text mining and link detection algorithms and operations, this book examines advanced pre-processing techniques, knowledge representation considerations, and visualization approaches. Finally, it explores current real-world, mission-critical applications of text mining and link detection in such varied fields as M&A business intelligence, genomics research and counter-terrorism activities.

Sentiment Analysis. Mining Opinions, Sentiments, and Emotions

Recommended book of the week: Sentiment Analysis: Mining Opinions, Sentiments and Emotions (B. Lui, 2015)

 

SA BLiu

Description

Sentiment analysis is the computational study of people’s opinions, sentiments, emotions, and attitudes. This fascinating problem is increasingly important in business and society. It offers numerous research challenges but promises insight useful to anyone interested in opinion analysis and social media analysis. This book gives a comprehensive introduction to the topic from a primarily natural-language-processing point of view to help readers understand the underlying structure of the problem and the language constructs that are commonly used to express opinions and sentiments. It covers all core areas of sentiment analysis, includes many emerging themes, such as debate analysis, intention mining, and fake-opinion detection, and presents computational methods to analyze and summarize opinions. It will be a valuable resource for researchers and practitioners in natural language processing, computer science, management sciences, and the social sciences.

 

XIII Congreso Internacional en Innovación Tecnológica Informática

CIITI 2015

¿Qúe es el CIITI?

En el CIITI se genera un espacio de reflexión abierta, participativa e inclusiva, sobre el impacto de la tecnología informática en los distintos campos de la ciencia, presentando las innovaciones y nuevos conocimientos a la sociedad, convirtiéndose en un espacio para la difusión, promoción y reflexión sobre la importancia de la innovación tecnológica informática como factor de competitividad.
Objetivos

Este Congreso fomenta el círculo virtuoso de la articulación entre el Gobierno, las Empresas, las Universidades y los Centros de Investigación y Desarrollo nacionales e Internacionales. Es desde esta visión interdisciplinaria, que se abordan los procesos que provocan cambios sociales a partir de las nuevas tecnologías y se renueva, año a año, la importancia de pensar el desarrollo estratégico de las nuevas tecnologías como pilar fundamental para el crecimiento equitativo y sustentable del país.

Capítulo Buenos Aires
30 de Septiembre de 2015
Palais Rouge
Jerónimo Salguero 1443/49 – Ciudad Autónoma de Buenos Aires

Ver Agenda CIITI 2015, Buenos Aires

 

Social Network Mining, Analysis and Research Trends: Techniques and Applications

Book recommendation of the week: “Social Network Mining, Analysis and Research Trends: Techniques and Applications.” by I-Hsien Ting (First Edition)

SNM

Social network analysis dates back to the early 20th century, with initial studies focusing on small group behavior from a sociological perspective. The emergence of the Internet and subsequent increase in the use of online social networking applications has caused a shift in the approach to this field. Faced with complex, large datasets, researchers need new methods and tools for collecting, processing, and mining social network data.

Social Network Mining, Analysis and Research Trends: Techniques and Applications covers current research trends in the area of social networks analysis and mining. Containing research from experts in the social network analysis and mining communities, as well as practitioners from social science, business, and computer science, this book proposes new measures, methods, and techniques in social networks analysis and also presents applications and case studies in this changing field.

 

 

Network Analysis with igraph

We can use an open source tool to  analyze network data with R. Let’s see the code and how we can visualize  this data.

sna-terms-2

 

 

# Tutorial and data based on the network visualization workshop published over https://rpubs.com/kateto/netviz
#packages:
library(igraph)
library(RCurl)
# get the data
nodes <- read.csv(https://raw.githubusercontent.com/danielmarcelino/Tables/master/Media-NODES.csv, header=T, as.is=T)
ties <- read.csv(https://raw.githubusercontent.com/danielmarcelino/Tables/master/Media-EDGES.csv, header=T, as.is=T)
# Explore data:
head(nodes);head(ties);nrow(nodes);length(unique(nodes$id));nrow(ties);nrow(unique(ties[,c(from, to)]))
# Because the data is detailed and not as matrix, we need to simplify it by collapsing multiple edges of the same type between the same two nodes. This can be achieved by summing their “weights”, using aggregate() by “from”, “to”, and “type”:
ties <- aggregate(ties[,3], ties[,-3], sum)
ties <- ties[order(ties$from, ties$to),]
colnames(ties)[4] <- weight
rownames(ties) <- NULL
# After the hard work is done, we can convert the data to an igraph object:
net <- graph.data.frame(ties, nodes, directed=T)
# One final touch is to removing loops from the graph, so the edges won’t appear that bushy:
net <- simplify(net, remove.multiple = F, remove.loops = T)
# A clean plot: reduced arrow size and remove the labels:
plot(net, edge.arrow.size=.4,vertex.label=NA)
# There are plenty of parameters that can be set, but the most importants are the node & edge options
# 1) Plot with curved edges (edge.curved=.1) and reduce arrow size:
plot(net, edge.arrow.size=.4, vertex.label=NA, edge.curved=.1)
# 2) nodes’ colors. Here we set color to orange and the border color to hex #555555
plot(net, edge.arrow.size=.4, edge.curved=0,
vertex.color=orange, vertex.frame.color=#555555)
# 3) Replace the vertex label with the node names stored in “media”
plot(net, edge.arrow.size=.4, edge.curved=0,
vertex.color=orange, vertex.frame.color=#555555,
vertex.label=V(net)$media, vertex.label.color=black,
vertex.label.cex=.7)
# Another way to set attributes is to add them to the igraph object.
# 1) Generate colors base on media type:
colrs <- c(gray50, tomato, gold)
V(net)$color <- colrs[V(net)$media.type]
# 2) Compute node degree (#ties) and use it to set node size:
deg <- degree(net, mode=all)
V(net)$size <- deg*3
V(net)$size <- V(net)$audience.size*0.6
# 3) The labels are currently node IDs, setting them to NA will render no labels:
V(net)$label.color <- black
V(net)$label <- NA
# 4) We can set edge width based on weight:
E(net)$width <- E(net)$weight/6
#5) changing arrow size and edge color:
E(net)$arrow.size <- .2
E(net)$edge.color <- gray80
# We can also override the attributes explicitly inline:
plot(net, edge.color=orange, vertex.color=gray50)
# Don’t you think a legend explaining the meaning of the colors is a good idea?:
plot(net)
legend(x=-1.1, y=-1.1, c(Newspaper,Television, Online News), pch=21,
col=#777777, pt.bg=colrs, pt.cex=2.5, bty=n, ncol=1)
# For now, a final touch would is to highlight areas in the network:
plot(net, mark.groups=list(c(1,4,5,8), c(15:17)),
mark.col=c(#C5E5E7,#ECD89A), mark.border=NA)

Diplomatura en “Análisis de datos para negocios, finanzas e investigación de mercados”

Diplo

Duración:
81 Horas

Días y horarios:
Del 23 de Mayo al 19 de Diciembre de 2015.
Sábados de 10.00 a 13.00 hs.

Lugar de realización:
Sede Centro – Av. San Juan 951.

 

Contenidos:
A lo largo de la Diplomatura se expondrá los algoritmos más utilizados en el Análisis de datos, enfocándose en sus aplicaciones prácticas pero dando un importante panorama de sus aspectos teóricos.
Se exploraran diversas bases de datos orientadas a problemas relacionados con diferentes negocios y crearán potentes modelos predictivos y descriptivos orientados específicamente a resolver problemas empresariales.

Entre las problemáticas más destacadas a analizar se encuentran:
•Segmentación Avanzada de clientes.
•Predicción de la demanda.
•Modelos de predicción para series temporales y financieras.
•Modelos de Scoring.
•Análisis de riesgo.
•Detección y prevención de fraudes.

Entre los algoritmos más importantes a estudiar se encuentran:
•Redes neuronales.
•Árboles de decisión.
•Inferencia Bayesiana.
•Algoritmo de clustering K-Means.
•Teoría de la información.

 

Modalidad de Trabajo:
•Explorar base de datos y planillas en Excel con software específico de Data Mining.
•Generar modelos predictivos y descriptivos aplicados a resolver problemas empresariales.
•Testear dichos modelos para determinar su nivel de exactitud.
•Resolver problemas reales que se presentan a menudo en el sector de negocios con los conocimientos adquiridos.

Profesor a Cargo:
Dra. Cristina Camós
Juan Pablo Braña
Trad. Prof. Alejandra Litterio
Ing. Alexis Sarghel

 

Para más información ver: Diplomatura en Análisis de Datos

Data Aggregation for Twitter Sentiment Analysis

Data aggregation offers a variety of tools and definitions that prove to be beneficial in the formulation of collective sentiment. The field of computational social choice related with computational properties for collective choice in Artificial Intelligence and multi-agent systems is extremely relevant.

Recently, sentiment analysis is gaining widespread attention in industry and the media. Several web tools, commercial products and applications are being developed in this field. Some of this analysis is based on specific events like death of a popular public figure while other analysis deals with different socio-economic trends and its relationship with tweets like political opinion, stock market fluctuations and others as well.

The results obtained through the analysis of collective mood aggregators are convincing and shows that accurate sentiment analysis can be retrieved from online posts. Performing sentiment analysis online reduces the cost, effort and time needed to conduct public surveys and questionnaires.  This data is of utmost importance to social scientists and psychologists.

Key words: data aggregation, sentiment analysis, artificial inteligence, machine learning, microblogging, Twitter.

Read more about Data aggregation for Twitter Sentiment Analysis

 

Argentine Symposium on Artificial Intelligence (ASAI 2015)

ASAI – Simposio Argentino de Inteligencia Artificial

Rosario, Argentina
31st August – 1st September, 2015

IMPORTANT DATES

April 13th, 2015 : Deadline for submissions
June 15th, 2015 : Notification of Acceptance/Rejection
June 29th, 2015 : Deadline for camera ready submissions
31st August – 1st September 2015: ASAI

TOPICS OF INTEREST

Topics of interest include, but are not limited to:
  • Artificial Intelligence in Data Analysis
  • Clustering
  • Computer Vision
  • Data Mining
  • Decision-Support Systems
  • Evolutionary Algorithms
  • Formal and Empirical Aspects of Artificial Intelligence
  • Fuzzy Logic
  • Human-Computer Interaction
  • Intelligent Agents and Multi-Agent Systems
  • Knowledge Acquisition, Representation, Management and Reasoning
  • Machine Learning
  • Natural Language Processing and Computational Linguistics
  • Neural Networks
  • Pattern Recognition
  • Personalization and Recommender Systems
  • Planning and Scheduling
  • Robotics
  • Innovative Applications of Artificial Intelligence: Big-data, Bioinformatics and Computational Biology, Education, Social Networks, Virtual Reality, etc.

Sentiment Analysis Simposium 15-16 April NY- Agenda

The agenda for the 2015 Sentiment Analysis Symposium, July 15-16 in New York

New this year: A half-day segment (within the Workshop track) on Sentiment Analysis for Financial Markets, adding to the symposium’s coverage of consumer, public, and social sentiment for market research, customer experience, healthcare, government, and other application areas.

Who’ll be presenting?

* Industry analysts Dave Schubmehl from IDC and Anjali Lai from Forrester.
Agency folks Francesco D’Orazio from Face, Brook Mille from MotiveQuest, and Karla Wachter, Waggener Edstrom.
* Technical innovators including Bethany Bengtson from Bottlenose, Moritz Sudhof from Kanjoya, Karo Moilanen from TheySay, and CrowdFlower CEO Lukas Biewald.
* Speakers on sentiment analysis in healthcare from the Advisory Board Company, Westat, and DSE Analytics.
* A set of forward-looking talks on wearables, speech analytics, emotional response via facial recognition, and virtual assistants.

For the longer-form workshops, running in parallel with the presentations, we’ll have:

* Prof. Bing Liu, presenting a half-day Sentiment Analysis tutorial.
* Dr. Robert Dale on Natural Language Generation.
* Sue Feldman, Synthexis, on Cognitive Computing.
* Research leader Steve Rappaport on Metrics and Measurement.
* … and more …