By Sanjiv Ranjan Das, Santa Clara University, Leavey School of Business, USA, srdas@scu.edu
This monograph surveys the technology and empirics of text analytics in finance. I present various tools of information extraction and basic text analytics. I survey a range of techniques of classification and predictive analytics, and metrics used to assess the performance of text analytics algorithms. I then review the literature on text mining and predictive analytics in finance, and its connection to networks, covering a wide range of text sources such as blogs, news, web posts, corporate filings, etc. I end with textual content presenting forecasts and predictions about future directions.
Text and Context: Language Analytics in Finance describes the current landscape of text analytics in finance. After a brief introduction, Section 2 examines how text is extracted from various web sites and services. Section 3 deals with the basics of text analytics such as dictionaries, lexicons, mood scoring, and summarization of text. This is followed by the analytics of text classification in Section 4. The performance of text analytic algorithms is assessed using a range of metrics in Section 5. A survey of the empirical literature on text mining in finance and the commercialization of textual analytics is discussed in Section 6. Finally, the author takes a look at the future of text analytics in Section 7.