background preloader

DataCamp: The Easiest Way To Learn R & Data Science

Related:  big dataEstadística

Tools The Social Media Research Foundation sustains the development of social media network analysis software. So far, it has supported the creation and dissemination of the NodeXL tool: NodeXL The Network Overview Discovery and Exploration Add-in for Excel (2007 / 2010 / 2013 / 2016) is an extension to the familiar Excel spreadsheet that helps collect, visualize and interpret social media networks. The Social Media Research Foundation is dedicated to making tools that help people understand social media and social networks. We produce NodeXL Basic which is available freely and openly to all. NodeXL Pro offers advanced features for importing social media data, calculating social network metrics, sentiment analysis, and publishing reports. NodeXL Pro is licensed to users on an annual basis: Registration keys will be required to run NodeXL Pro starting in October 2015! Contact info@smrfoundation.org for details! Your support keeps the NodeXL project active and strong, please upgrade to NodeXL Pro.

Apache Spark Apache Spark bugünlerde ismini daha sık duymaya başladığımız, büyük veri işleme amaçlı bir diğer proje. Hadoop’tan 100 kat daha hızlı olmak gibi bir iddia ile birlikte, gelişmiş “Directed Acyclic Graph” motoruna sahip, Scala dili ile yazılmış ve bellek-içi (in-memory) veri işleme özellikleriyle bu iddiayı boşa çıkartmıyor gibi görünüyor. Özellikle Yapay Öğrenme algoritmalarının dağıtık implementasyonu konusunda Hadoop’tan daha performanslı olduğunu söyleyebiliriz. Öyle ki, Apache Mahout projesi bundan böyle Hadoop ile değil Spark üzerinde çalışacak şekilde geliştirilmeye etme kararı aldı. Ancak şunu söylemeliyiz ki Spark Hadoop’un yerine geçecek bir teknoloji olmaktan ziyade, Hadoop ailesinin bir üyesi olup Hadoop’un zayıf kaldığı bazı konulardaki eksiklikleri giderecek gibi görünüyor. Logistic regression algoritmasının Hadoop ve Spark üzerinde çalıştırılması sonucu elde edilen performans örneklenmiş. Uygulama geliştirme açısından Spark Scala’nın avantajlarını sonuna kadar kullanıyor.

Google's Python Class  |  Python Education  |  Google Developers Welcome to Google's Python Class -- this is a free class for people with a little bit of programming experience who want to learn Python. The class includes written materials, lecture videos, and lots of code exercises to practice Python coding. These materials are used within Google to introduce Python to people who have just a little programming experience. The first exercises work on basic Python concepts like strings and lists, building up to the later exercises which are full programs dealing with text files, processes, and http connections. To get started, the Python sections are linked at the left -- Python Set Up to get Python installed on your machine, Python Introduction for an introduction to the language, and then Python Strings starts the coding material, leading to the first exercise. This material was created by Nick Parlante working in the engEDU group at Google. Tip: Check out the Python Google Code University Forum to ask and answer questions.

Data science : apprendre la discipline en 8 étapes avec DataCamp Le métier de data scientist a été surnommé par Harvard Business Review comme « le plus sexy du 21e siècle » en 2012 et « le meilleur emploi de l’année » en 2016 par Glassdoor. Data Camp a dévoilé une infographie qui résume la façon d’apprendre la data science en 8 étapes. Un métier encore méconnu La position vis-à-vis de la data science a considérablement évolué au cours de ces quatre dernières années. En 2012, la majorité des articles visaient à expliquer le rôle du data scientist et son activité exacte. À l’époque, une recherche Google de « comment devenir un data scientist » montrait que ce concept pouvait avoir un grand nombre de significations. Ils sont très importants, car il existe très peu de data scientists répondant aux attentes des entreprises à ce jour, bien que la définition de ce métier ne soit pas encore fixe. Avec plus de demande que d’offre, l’attention que l’on porte aux équipes de data scientists est à la hausse. De nombreuses compétences requises

¿Qué es el muestreo de aceptación? - Minitab El muestreo de aceptación es un componente principal de control de calidad y es útil cuando el costo de la prueba es alto comparado con el costo de pasar un elemento defectuoso o cuando la prueba destruye la muestra. Es un compromiso entre realizar el 100% de la inspección y no inspeccionar. El muestreo de aceptación se puede realizar en atributos o mediciones del producto. Puede utilizar el muestreo de aceptación para desarrollar planes de inspección que le permitan aceptar o rechazar un lote en particular de material entrante con base en los datos de una muestra representativa. Ejemplo de plan de muestreo de aceptación por atributos Por ejemplo, usted recibe un envío de 10,000 microchips. En este caso, supongamos que el nivel de calidad aceptable (AQL) es de 1.5% y el nivel de calidad rechazable (RQL) es de 5.0% y que alfa = 0.05 y beta = 0.1. Ejemplo de plan de muestreo de aceptación por variables

Pegasus Data Project | Plate-forme d'expérimentation en humanités numériques, réseaux sociaux, Twitter, influence sur le web et visualisation de données Data Science Cheat Sheets – Python / R / MySQL & SQL / Spark / Hadoop & Hive / Machine Learning / Django – AITS – Data Mining Club Gear up to speed and have Data Science & Data Mining concepts and commands handy with these cheatsheets covering R, Python, Django, MySQL, SQL, Hadoop, Apache Spark and Machine learning algorithms. There are thousands of packages and hundreds of functions out there in the Data science world! An aspiring data enthusiast need not know all. Mastering Data science involves understanding of statistics, Mathematics, Programming knowledge especially in R, Python & SQL and then deploying a combination of all these to derive insights using the business understanding & a human instinct—that drives decisions. Here are the cheatsheets by category: Cheat sheets for Python: Python is a popular choice for beginners, yet still powerful enough to back some of the world’s most popular products and applications. Cheat sheets for R: The R’s ecosystem has been expanding so much that a lot of referencing is needed. Cheat sheets for MySQL & SQL: Cheat sheets for Spark: Cheat sheets for Hadoop & Hive: Like this:

28 cайтов, на которых можно порешать задачи по программированию Не секрет, что лучший способ повысить свои навыки в программировании — это практиковаться и только практиковаться. Мы подготовили для вас огромную подборку сайтов с задачами по программированию на самые разные темы. Codeforces — несомненно самая популярная и известная платформа во всем мире для проведения соревнований на алгоритмику. Кроме крупных контестов сайт зачастую проводит свои «раунды» — участникам даются 5 задач на два часа. TopCoder — ненамного отстающая по популярности от Codeforces американская платформа. Timus Online Judge — русскоязычная (хотя английский язык также поддерживается) платформа, на которой более тысячи задач удачно отсортированы по темам и по сложности. SPOJ — крупный англоязычный сайт с более чем 20000 задачами на абсолютно разные темы: динамическое программирование, графы, структуры данных и т.д. informatics.mccme.ru — платформа с множеством теоретических материалов и задач по соответствующим темам. CodeCombat будет больше полезен для новичков.

5 Big Data Use Cases To Watch - InformationWeek Here's how companies are turning big data into decision-making power on customers, security, and more. 10 Hadoop Hardware Leaders (Click image for larger view and slideshow.) We hear a lot about big data's ability to deliver usable insights -- but what does this mean exactly? It's often unclear how enterprises are using big-data technologies beyond proof-of-concept projects. Certainly the market for Hadoop and NoSQL software and services is growing rapidly. [Digital business demands are bringing marketing and IT departments even closer. According to Quentin Gallivan, CEO of big-data analytics provider Pentaho, the market is at a "tipping point" as big-data platforms move beyond the experimentation phase and begin doing real work. Here they are: 1. "That's all unstructured clickstream data," said Gallivan. A third big-source, social media sentiment, also is tossed into the mix, providing the desired 360 degree view of the customer. 2. Next Page 1 of 2 More Insights

Hay que decirlo más: correlación no implica causalidad Correlación no implica causalidad, hay que decirlo más (si queréis, con la entonación que Ernesto Sevilla le daba a cierto insulto muy español en cierto vídeo que fue un fenómeno de internet hace un tiempo…). Y hay que decirlo más porque en general no llegamos a comprender qué significa esta frase. Bueno, o eso o que aun comprendiéndola intentamos confundir a quien no la entiende haciéndole creer que una cosa sí que implica a la otra. Un estudio afirma que cuanto más A más B. En principio, todos esos titulares indican básicamente que lo que dice A es lo que provoca que ocurra B, o, lo que es lo mismo, que B es consecuencia de A. El estudio de la correlación entre dos variables es uno de los temas que se trata en Estadística. – A partir de ciertos datos obtenidos de cada una de esas variables uno estima si hay alguna relación entre ellas. Este coeficiente suele tomar valores entre -1 y 1, y se interpreta de la siguiente forma: Hasta aquí bien, ¿no? Fuente: Wikimedia Commons.

Related:  CodeImportés depuis FirefoxR-softwarePythonLifehacks