Some datasets for teaching data science In this post I describe the dslabs package, which contains some datasets that I use in my data science courses. A much discussed topic in stats education is that computing should play a more prominent role in the curriculum. I strongly agree, but I think the main improvement will come from bringing applications to the forefront and mimicking, as best as possible, the challenges applied statisticians face in real life. I therefore try to avoid using widely used toy examples, such as the mtcars dataset, when I teach data science. However, my experience has been that finding examples that are both realistic, interesting, and appropriate for beginners is not easy.
Improvement Service - Spatial Hub 33 datasets found Air Quality Management Areas Description Local Authorities have a duty to designate any relevant areas where the air quality objectives are not (or are unlikely to be) being met as Air Quality Management Areas (AQMAs). AQMAs must be designated officially by means of an 'order'. The extent of the AQMA may be limited to the area of exceedance or encompass a larger area. AI Playbook - Datasets Kaggle Kaggle Kaggle includes nearly 600 'Featured' datasets that are well documented and prepped for ML analysis. Any user or organization can publish data on Kaggle Datasets, and it includes classics like Iris as well as unique datasets published by our users Reuters Corpora (RCV1, RCV2, TRC2) NIST/Reuters
Digitizing the delivery of government services An agile development approach was critical in a European agency’s launch of a new online system for registering businesses. Here’s how the agency moved from paper to pixels. Government agencies around the world are under internal and external pressure to become more efficient by incorporating digital technologies and processes into their day-to-day operations. For a lot of public-sector organizations, however, the digital transformation has been bumpy. In many cases, agencies are trying to streamline and automate workflows and processes using antiquated systems-development approaches.
CKAN 2.8.2 documentation: DataStore extension The CKAN DataStore extension provides an ad hoc database for storage of structured data from CKAN resources. Data can be pulled out of resource files and stored in the DataStore. When a resource is added to the DataStore, you get: Automatic data previews on the resource’s page, using the Data Explorer extensionThe DataStore API: search, filter and update the data, without having to download and upload the entire data file The DataStore is integrated into the CKAN API and authorization system. The DataStore is generally used alongside the DataPusher, which will automatically upload data to the DataStore from suitable files, whether uploaded to CKAN’s FileStore or externally linked.
Tackling the Challenges of Big Data This is an Archived Course EdX keeps courses open for enrollment after they end to allow learners to explore content and continue learning. All features and materials may not be all available. Fueling the Gold Rush: The Greatest Public Datasets for AI It has never been easier to build AI or machine learning-based systems than it is today. The ubiquity of cutting edge open-source tools such as TensorFlow, Torch, and Spark, coupled with the availability of massive amounts of computation power through AWS, Google Cloud, or other cloud providers, means that you can train cutting-edge models from your laptop over an afternoon coffee. Though not at the forefront of the AI hype train, the unsung hero of the AI revolution is data — lots and lots of labeled and annotated data, curated with the elbow grease of great research groups and companies who recognize that the democratization of data is a necessary step towards accelerating AI.
NHS-R Community datasets package This post briefly introduces an R package created for the NHS-R Community to help us learn and teach R. Firstly, it is now available on CRAN, the major package repository for R, and can be installed like any other package, or directly from GitHub as follows: install.packages("NHSRdatasets") #or remotes::install_github(" Why?
50+ free Datasets for Data Science Projects [Updated as on Jan 31, 2020] 50+ free-datasets for your DataScience project portfolio There is no doubt that having a project portfolio is one of the best ways to master Data Science whether you aspire to be a data analyst, machine learning expert or data visualization ninja! What Amazon Gets Right A recent article in The New York Times ignited a controversial discussion about Amazon, the world's largest online retailer, and its CEO, Jeff Bezos. The article describes Amazon's workplace culture as "bruising," characterized by maladies such as harsh management, towering expectations, gender inequality, excessive hours and annual "purposeful Darwinism" employee firings, to name a few. With some suggesting that the massive organization leaves workers in tears at their desks, Amazon and its leadership practices have come under fire. Standing strong in the face of these claims, Bezos, along with other high-ranking company leaders, has vehemently denied the article's allegations -- saying the report in no way describes the "caring Amazonians" he leads or the strong work culture he strives daily to protect. Data-Driven Leadership Philosophy Gallup's position on evaluating best practices and helping organizations implement solutions aligns with Boyle: Knowledge is power.