background preloader

Predictive analytics

Predictive analytics
Predictive analytics encompasses a variety of statistical techniques from modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events.[1][2] In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions.[3] Predictive analytics is used in actuarial science,[4] marketing,[5] financial services,[6] insurance, telecommunications,[7] retail,[8] travel,[9] healthcare,[10] pharmaceuticals[11] and other fields. One of the most well known applications is credit scoring,[1] which is used throughout financial services. Definition[edit] Types[edit] Predictive models[edit] Descriptive models[edit] Decision models[edit] Applications[edit] Collection analytics[edit] Related:  gummibearehausenBid data industry

Discounted cumulative gain Overview[edit] Two assumptions are made in using DCG and its related measures. Highly relevant documents are more useful when appearing earlier in a search engine result list (have higher ranks)Highly relevant documents are more useful than marginally relevant documents, which are in turn more useful than irrelevant documents. DCG originates from an earlier, more primitive, measure called Cumulative Gain. Cumulative Gain[edit] Cumulative Gain (CG) is the predecessor of DCG and does not include the position of a result in the consideration of the usefulness of a result set. is defined as: Where is the graded relevance of the result at position The value computed with the CG function is unaffected by changes in the ordering of search results. above a higher ranked, less relevant, document does not change the computed value for CG. Discounted Cumulative Gain[edit] is defined as:[2] An alternative formulation of DCG[4] places stronger emphasis on retrieving relevant documents: . Normalized DCG[edit]

Index (search engine) Popular engines focus on the full-text indexing of online, natural language documents.[1] Media types such as video and audio[2] and graphics[3] are also searchable. Meta search engines reuse the indices of other services and do not store a local index, whereas cache-based search engines permanently store the index along with the corpus. Unlike full-text indices, partial-text services restrict the depth indexed to reduce index size. Larger services typically perform indexing at a predetermined time interval due to the required time and processing costs, while agent-based search engines index in real time. Indexing[edit] The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query. Index design factors[edit] Major factors in designing a search engine's architecture include: Merge factors Storage techniques How to store the index data, that is, whether information should be data compressed or filtered. Index size Lookup speed Maintenance

Neural Network Applications An Artificial Neural Network is a network of many very simple processors ("units"), each possibly having a (small amount of) local memory. The units are connected by unidirectional communication channels ("connections"), which carry numeric (as opposed to symbolic) data. The units operate only on their local data and on the inputs they receive via the connections. The design motivation is what distinguishes neural networks from other mathematical techniques: A neural network is a processing device, either an algorithm, or actual hardware, whose design was motivated by the design and functioning of human brains and components thereof. There are many different types of Neural Networks, each of which has different strengths particular to their applications. 2.0 Applications There are abundant materials, tutorials, references and disparate list of demos on the net. The applications featured here are: PS: For those who are only interested in source codes for Neural Networks

Data analysis Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains. Data mining is a particular data analysis technique that focuses on modeling and knowledge discovery for predictive rather than purely descriptive purposes. Data integration is a precursor to data analysis, and data analysis is closely linked to data visualization and data dissemination. The process of data analysis[edit] Data analysis is a process, within which several phases can be distinguished:[1] Processing of data refers to concentrating, recasting and dealing with data in such a way that they become as amenable to analysis as possible Data cleaning[edit] Initial data analysis[edit] Quality of data[edit] Analysis[edit] See also[edit]

Principal component analysis PCA of a multivariate Gaussian distribution centered at (1,3) with a standard deviation of 3 in roughly the (0.878, 0.478) direction and of 1 in the orthogonal direction. The vectors shown are the eigenvectors of the covariance matrix scaled by the square root of the corresponding eigenvalue, and shifted so their tails are at the mean. Principal component analysis (PCA) is a statistical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. PCA is closely related to factor analysis. PCA is also related to canonical correlation analysis (CCA). Details[edit] Mathematically, the transformation is defined by a set of p-dimensional vectors of weights or loadings that map each row vector of X to a new vector of principal component scores , given by Covariances[edit]

Enterprises using social media intelligence to obtain business insights and act on them: Technology forecast: PwC Natural language processing and social media intelligence Mining insights from social media data requires more than sorting and counting words. By Alan Morrison and Steve Hamby Features Introduction Most enterprises are more than eager to further develop their capabilities in social media intelligence (SMI)—the ability to mine the public social media cloud to glean business insights and act on them. “Ideally, social media can function as a really big focus group,” says Jeff Auker, a director in PwC’s Customer Impact practice. Auker cites the example of a media company’s use of SocialRep,2 a tool that uses a mix of natural language processing (NLP) techniques to scan social media. This article explores the primary characteristics of NLP, which is the key to SMI, and how NLP is applied to social media analytics. Back to top Natural language processing: Its components and social media applications NLP technologies for SMI are just emerging. Types of NLP The primary NLP techniques include these:

Optimization (mathematics) In mathematics, computer science, or management science, mathematical optimization (alternatively, optimization or mathematical programming) is the selection of a best element (with regard to some criteria) from some set of available alternatives.[1] Optimization problems[edit] An optimization problem can be represented in the following way: Sought: an element x0 in A such that f(x0) ≤ f(x) for all x in A ("minimization") or such that f(x0) ≥ f(x) for all x in A ("maximization"). Such a formulation is called an optimization problem or a mathematical programming problem (a term not directly related to computer programming, but still in use for example in linear programming – see History below). Many real-world and theoretical problems may be modeled in this general framework. By convention, the standard form of an optimization problem is stated in terms of minimization. the expression Notation[edit] Optimization problems are often expressed with special notation. . , occurring at Similarly,

DTREG -- Predictive Modeling Software Data mining Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.[1] Data mining is an interdisciplinary subfield of computer science with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use.[1][2][3][4] Data mining is the analysis step of the "knowledge discovery in databases" process, or KDD.[5] Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.[1] Etymology[edit] In the 1960s, statisticians and economists used terms like data fishing or data dredging to refer to what they considered the bad practice of analyzing data without an a-priori hypothesis. Process[edit]

Some Statistics - Car Pollution Data The typical American male devotes more than 1,600 hours a year to his car. He sits in it while it goes and while it stands idling. He parks it and searches for it. He earns the money to put down on it and to meet the monthly installments. Contents The Car and Global Warming Motor vehicles are the single biggest source of atmospheric pollution,contributing an estimated 14% of the world's carbon dioxide emissions from fossil fuel burning, a proportion than is steadily rising. The average American car releases 300 pounds of carbon dioxide into the atmosphere from a full, 15 gallon tank of gasoline. (1)The average European car produces over 4 tonnes of carbon dioxide every year. (1)Methane (another global warming gas, 21 times more powerful than carbon dioxide) is also emitted by cars. [Top] The Car and Pollution Exhaust fumes cause acid air, pollution, cancer, lead-poisoning and a variety of bronchial and respiratory illnesses. Oil Most cars run on gasoline or diesel. Oil Spills Road Building

natural language processing The big-data analysis process reduces to three elements: Collection, Synthesis, and Insight. We gather relevant data, harmonize and link it, and use analysis findings situationally. In the online/social/sensor era, “relevant” may reflect enormous data volume. “Harmonize” responds to variety, and situational applications must often accommodate high-velocity data. This article is about the roles of metadata and connection in the big-data story. Human communications are complex: “The Tower of Babel” by Pieter Bruegel the Elder Human Data: Fact, Feelings, and Intent My particular interest is “human data,” communicated in intentionally expressive sources such as text, video, and social likes and shares, and in implicit expressions of sentiment. Human data, from devices, online and social platforms, and enterprise transactional and operational systems, captures what Fernando Lucini characterizes as “the electronic essence of people.” Natural Language Processing Opportunity

Related:  Research