background preloader

Data mining

Data mining
Data mining is the process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.[1] Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information (with intelligent methods) from a data set and transform the information into a comprehensible structure for further use.[1][2][3][4] Data mining is the analysis step of the "knowledge discovery in databases" process or KDD.[5] Aside from the raw analysis step, it also involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.[1] Etymology[edit] In the 1960s, statisticians and economists used terms like data fishing or data dredging to refer to what they considered the bad practice of analyzing data without an a-priori hypothesis.

Related:  Aristotle, Organon

Irrelevant conclusion Irrelevant conclusion should not be confused with formal fallacy, an argument whose conclusion does not follow from its premises. Overview[edit] Ignoratio elenchi is one of the fallacies identified by Aristotle in his Organon. Data warehouse Data Warehouse Overview In computing, a data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a database used for reporting and data analysis. Integrating data from one or more disparate sources creates a central repository of data, a data warehouse (DW). Data warehouses store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons. The data stored in the warehouse is uploaded from the operational systems (such as marketing, sales, etc., shown in the figure to the right).

Hacker (computer security) Bruce Sterling traces part of the roots of the computer underground to the Yippies, a 1960s counterculture movement which published the Technological Assistance Program (TAP) newsletter.[citation needed] TAP was a phone phreaking newsletter that taught techniques for unauthorized exploration of the phone network. Many people from the phreaking community are also active in the hacking community even today, and vice versa. Red herring A red herring is something that misleads or distracts from a relevant or important question.[1] It may be either a logical fallacy or a literary device that leads readers or audiences toward a false conclusion. A red herring may be used intentionally, as in mystery fiction or as part of rhetorical strategies (e.g., in politics), or may be used in argumentation inadvertently. The term was popularized in 1807 by English polemicist William Cobbett, who told a story of having used a kipper (a strong-smelling smoked fish) to divert hounds from chasing a hare.

Knowledge extraction Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL (data warehouse), the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge (reusing identifiers or ontologies) or the generation of a schema based on the source data. Overview[edit]

Research and development Cycle of research and development The research and development (R&D, also called research and technical development or research and technological development, RTD in Europe) is a specific group of activities within a business. The activities that are classified as R&D differ from company to company, but there are two primary models. In one model, the primary function of an R&D group is to develop new products; in the other model, the primary function of an R&D group is to discover and create new knowledge about scientific and technological topics for the purpose of uncovering and enabling development of valuable new products, processes, and services. Under both models, R&D differs from the vast majority of a company's activities which are intended to yield nearly immediate profit or immediate improvements in operations and involve little uncertainty as to the return on investment (ROI).

Data Mining: What is Data Mining? Overview Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified.

Post hoc ergo propter hoc Post hoc ergo propter hoc (Latin: "after this, therefore because of this") is a logical fallacy (of the questionable cause variety) that states "Since event Y followed event X, event Y must have been caused by event X." It is often shortened to simply post hoc. It is subtly different from the fallacy cum hoc ergo propter hoc (correlation does not imply causation), in which two things or events occur simultaneously or the chronological ordering is insignificant or unknown. Post hoc is a particularly tempting error because temporal sequence appears to be integral to causality. The fallacy lies in coming to a conclusion based solely on the order of events, rather than taking into account other factors that might rule out the connection.

Knowledge retrieval Knowledge Retrieval seeks to return information in a structured form, consistent with human cognitive processes as opposed to simple lists of data items. It draws on a range of fields including epistemology (theory of knowledge), cognitive psychology, cognitive neuroscience, logic and inference, machine learning and knowledge discovery, linguistics, and information technology. Overview[edit] United Brotherhood of Carpenters and Joiners of America The United Brotherhood of Carpenters and Joiners of America often simply, the United Brotherhood of Carpenters (UBC) [1] was formed by Peter J. McGuire and Gustav Luebkert. In 1881. It has become one of the largest trade unions in the United States, and through chapters, and locals, there is international cooperation that poises the brotherhood for a global role. For example the North American Chapter has over 520,000 members throughout the continent. [2]

Are data mining and data warehousing related? - HowStuffWorks Both data mining and data warehousing are business intelligence tools that are used to turn information (or data) into actionable knowledge. The important distinctions between the two tools are the methods and processes each uses to achieve this goal. Data mining is a process of statistical analysis. Analysts use technical tools to query and sort through terabytes of data looking for patterns. Straw man A straw man is a common form of argument and is an informal fallacy based on giving the impression of refuting an opponent's argument, while actually refuting an argument that was not presented by that opponent.[1] One who engages in this fallacy is said to be "attacking a straw man." The typical straw man argument creates the illusion of having completely refuted or defeated an opponent's proposition through the covert replacement of it with a different proposition (i.e., "stand up a straw man") and the subsequent refutation of that false argument ("knock down a straw man") instead of the opponent's proposition.[2][3] This technique has been used throughout history in polemical debate, particularly in arguments about highly charged emotional issues where a fiery "battle" and the defeat of an "enemy" may be more valued than critical thinking or an understanding of both sides of the issue. Origin[edit]

Information retrieval Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text (or other content-based) indexing. Automated information retrieval systems are used to reduce what has been called "information overload". Many universities and public libraries use IR systems to provide access to books, journals and other documents. Web search engines are the most visible IR applications. Overview[edit] Penetration test A penetration test , occasionally pentest , is a method of evaluating the computer security of a computer system or network by simulating an attack from external threats and internal threats. [ 1 ] The process involves an active analysis of the system for any potential vulnerabilities that could result from poor or improper system configuration, both known and unknown hardware or software flaws, or operational weaknesses in process or technical countermeasures. [ citation needed ] This analysis is carried out from the position of a potential attacker and can involve active exploitation of security vulnerabilities. [ citation needed ] Security issues uncovered through the penetration test are presented to the system's owner. [ citation needed ] Effective penetration tests will couple this information with an accurate assessment of the potential impacts to the organization and outline a range of technical and procedural countermeasures to reduce risks. [ citation needed ] Tools [ edit ]

data mining: The process of exploring and analyzing large amounts of data to find patterns. Found in: Hurwitz, J., Nugent, A., Halper, F. & Kaufman, M. (2013) Big Data For Dummies. Hoboken, New Jersey, United States of America: For Dummies. ISBN: 9781118504222. by raviii Jan 1

Wiki, but a great starting point for Data Mining -Josh by fritzjl Mar 28

Data mining, a branch of computer science,[1] is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. by agnesdelmotte Mar 24