background preloader

Stylometry

Stylometry
Stylometry is often used to attribute authorship to anonymous or disputed documents. It has legal as well as academic and literary applications, ranging from the question of the authorship of Shakespeare's works to forensic linguistics. History[edit] Stylometry grew out of earlier techniques of analyzing texts for evidence of authenticity, authorial identity, and other questions. An early example is Lorenzo Valla's 1439 proof that the Donation of Constantine was a forgery, an argument based partly on a comparison of the Latin with that used in authentic 4th Century documents. The basics of stylometry were set out by Polish philosopher Wincenty Lutosławski in Principes de stylométrie (1890). Methods[edit] Modern stylometry draws heavily on the aid of computers for statistical analysis, artificial intelligence and access to the growing corpus of texts available via the Internet. Writer invariant[edit] In one such method, the text is analyzed to find the 50 most common words. Rare Pairs[edit]

http://en.wikipedia.org/wiki/Stylometry

Related:  Text Analytics

Graphing the history of philosophy « Drunks&Lampposts A close up of ancient and medieval philosophy ending at Descartes and Leibniz If you are interested in this data set you might like my latest post where I use it to make book recommendations. This one came about because I was searching for a data set on horror films (don’t ask) and ended up with one describing the links between philosophers. To cut a long story very short I’ve extracted the information in the influenced by section for every philosopher on Wikipedia and used it to construct a network which I’ve then visualised using gephi It’s an easy process to repeat.

Natural language processing Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human–computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation. 23 maps and charts on language by Dylan Matthews on April 15, 2015 "The limits of my language," the philosopher Ludwig Wittgenstein once posited, "mean the limits of my world." Explaining everything within the limits of the world is probably too ambitious a goal for a list like this. But here are 23 maps and charts that can hopefully illuminate small aspects of how we manage to communicate with one another. The basics Indo-European language rootsMinna Sundberg, a Finnish-Swedish comic artist, created this beautiful tree to illustrate both the relationships between European and central Asian languages generally, as well as a smaller but still striking point: Finnish has less in common with, say, Swedish than Persian or Hindi do.

Analysis Jean Lievens: Wikinomics Model for Value of Open Data Categories: Analysis,Architecture,Balance,Citizen-Centered,Data,Design,Graphics,ICT-IT,Knowledge,Policies-Harmonization,Processing,Strategy-Holistic Coherence Jean Lievens A visual model showing the value of open data Automatic summarization Methods[edit] Methods of automatic summarization include extraction-based, abstraction-based, maximum entropy-based, and aided summarization. Extraction-based summarization[edit] Two particular types of summarization often addressed in the literature are keyphrase extraction, where the goal is to select individual words or phrases to "tag" a document, and document summarization, where the goal is to select whole sentences to create a short paragraph summary. Abstraction-based summarization[edit] Extraction techniques merely copy the information deemed most important by the system to the summary (for example, key clauses, sentences or paragraphs), while abstraction involves paraphrasing sections of the source document.

Mozilla Shortcuts Chrome users: see the Chrome shortcuts for your address bar for faster searches. You will create a bookmark, then add the "keyword". Once it is created, you can simply type "enfr dog", for example, to quickly translate "dog" from English to French. 1. Linguistics and the Book of Mormon According to most adherents of the Latter Day Saint movement, the Book of Mormon is a 19th-century translation of a record of ancient inhabitants of the American continent, which was written in a script which the book refers to as "reformed Egyptian."[1][2][3][4][5] This claim, as well as virtually all claims to historical authenticity of the Book of Mormon, are generally rejected by non–Latter Day Saint historians and scientists.[6][7][8][9][10] Linguistically based assertions are frequently cited and discussed in the context of the subject of the Book of Mormon, both in favor of and against the book's claimed origins. Both critics and promoters of the Book of Mormon have used linguistic methods to analyze the text. Promoters have published claims of stylistic forms that Joseph Smith and his contemporaries are unlikely to have known about, as well as similarities to Egyptian and Hebrew.

Named-entity recognition Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify elements in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. Most research on NER systems has been structured as taking an unannotated block of text, such as this one: Jim bought 300 shares of Acme Corp. in 2006. And producing an annotated block of text that highlights the names of entities: [Jim]Person bought 300 shares of [Acme Corp.]Organization in [2006]Time. In this example, a person name consisting of one token, a two-token company name and a temporal expression have been detected and classified.

chrome Shortcuts Firefox users: see the Firefox shortcuts for your address bar for faster searches. You will create a bookmark, then add the "keyword". Once it is created, you can simply type "enfr dog", for example, to quickly translate "dog" from English to French.

The Signature Stylometric System The aim of this website is to highlight the many strong links between Philosophy and Computing, for the benefit of students of both disciplines: For students of Philosophy who are seeking ways into formal Computing, learning by discovery about programming, how computers work, language processing, artificial intelligence, and even conducting computerised thought experiments on philosophically interesting problems such as the evolution of co-operative behaviour. For students of Computing who are keen to see how their technical abilities can be applied to intellectually exciting and philosophically challenging problems. The links along the top of these web pages lead to the main sections of the website (click here for the next page in the "Home" section).

AI-complete In the field of artificial intelligence, the most difficult problems are informally known as AI-complete or AI-hard, implying that the difficulty of these computational problems is equivalent to that of solving the central artificial intelligence problem—making computers as intelligent as people, or strong AI.[1] To call a problem AI-complete reflects an attitude that it would not be solved by a simple specific algorithm. AI-complete problems are hypothesised to include computer vision, natural language understanding, and dealing with unexpected circumstances while solving any real world problem.[2] With current technology, AI-complete problems cannot be solved by computer alone, but also require human computation. This property can be useful, for instance to test for the presence of humans as with CAPTCHAs, and for computer security to circumvent brute-force attacks.[3][4] History[edit] AI-complete problems[edit]

Message Understanding Conference The Message Understanding Conferences (MUC) were initiated and financed by DARPA (Defense Advanced Research Projects Agency) to encourage the development of new and better methods of information extraction. The character of this competition—many concurrent research teams competing against one another—required the development of standards for evaluation, e.g. the adoption of metrics like precision and recall. Topics and Exercises[edit] Only for the first conference (MUC-1) could the participant choose the output format for the extracted information. From the second conference the output format, by which the participants' systems would be evaluated, was prescribed. For each topic fields were given, which had to be filled with information from the text.

Difficulty of learning languages Second-language acquisition, second-language learning, or L2 acquisition, is the process by which people learn a second language.[1] Second-language acquisition (often abbreviated to SLA) also refers to the scientific discipline devoted to studying that process. Second language refers to any language learned in addition to a person's first language; although the concept is named second-language acquisition, it can also incorporate the learning of third, fourth, or subsequent languages. Second-language acquisition refers to what learners do; it does not refer to practices in language teaching. Outline[edit] The academic discipline of second-language acquisition is a subdiscipline of applied linguistics.

Related: