background preloader

101014

Facebook Twitter

Мифы и легенды про Big Data / Блог компании ВымпелКом (Билайн) Один из наших кластеров для пилотных задач (Data node: 18 servers /2 CPUs, 12 Cores, 64GB RAM/, 12 Disks, 3 TB, SATA — HP DL380g) — Что такое Big Data вообще? Все знают, что это обработка огромных массивов данных. Но, например, работа с Oracle-базой на 20 Гигабайт или 4 Петабайта — это ещё не Big Data, это просто highload-БД. — Так в чём ключевое отличие Big Data от «обычных» highload-систем? В возможности строить гибкие запросы. Реляционная база данных, в силу своей архитектуры, предназначена для коротких быстрых запросов, идущих однотипным потоком.

Если вы вдруг решите выйти за пределы таких запросов и собрать новый сложный, то базу придётся переписывать – или же она умрёт под нагрузкой. — Откуда берётся эта новая нагрузка? — Есть пример такой задачи? — И как это решается? — Так давайте просто промасштабируем их — и проблема решится? — Так что получается в итоге? — Но ведь это чудовищно медленно, разве не так? Короткие запросы с малым количеством join’ов. . — Какова структура платформы? Top 30 DSC blogs, based on new scoring technology. 20 short tutorials all data scientists should read (and practice) How to Become a Data Scientist. These days you can get a degree in data science so you can show your diploma that certifies your credentials. But these are relatively new so, with all due respect, if you only recently got your degree you are still a beginner.

Those of us who use this title today most likely came from combination backgrounds of business, hard science, computer science, operations research, and statistics. What you call yourself is one thing but what your employer or client is looking for can be quite a different kettle of fish. A lot has been written about data scientists being as elusive as unicorns.

Not being a unicorn I’d say this sets the bar pretty high. Additionally, as I’ve perused the job listings it is equally true that the title is used so loosely and with such little understanding that an ad for data scientist may actually describe an entry level analyst and some ads for analysts are looking for polymath data scientists. Four Types of Data Scientists Data Creatives. Data Developer. 38 Seminal Articles Every Data Scientist Should Read. Data Science Cheat Sheet. I will update this article regularly. An old version can be found here and has many interesting links. All the material presented here is not in the old version. This article is divided into 11 sections. 1. A laptop is the ideal device. Even if you work heavily on the cloud (AWS, or in my case, access to a few remote servers mostly to store data, receive data from clients and backups), your laptop is you core device to connect to all external services (via the Internet). 2.

Once you installed Cygwin, you can type commands or execute programs in the Cygwin console. Figure 1: Cygwin (Linux) console on Windows laptop You can open multuple Cygwin windows on your screen(s). To connect to an external server for file transfers, I use the Windows FileZilla freeware rather than the command-line ftp offered by Cygwin. You can run commands in the background using the & operator. . $ notepad VR3.txt & A few more things about files Other extensions include File management 3. Examples Miscellaneous 4. Exercise. Mining Massive Datasets. Deep Web.

Microsoft BizTalk Server. Переводчик Google. Big Data Technology Suite of Cloud Services. Big Data doesn’t need to be so hard We provide value faster and with less complexity with a cloud services approach Infochimps™ Cloud is a suite of cloud services that makes it faster and far less complex to develop and deploy Big Data applications. Our cloud services handle all of the complex Big Data technologies and processes, giving you a simple, developer-friendly interface. Infochimps Cloud lets you focus on creating the applications that will drive value for your business instead of spending your time managing a Big Data “infrastructure stack.”

Cloud::Streams — Streaming data and real-time analyticsCloud::Queries — NoSQL database and ad hoc, query-based analyticsCloud::Hadoop — Elastic Hadoop clusters and batch analytics Infochimps Cloud eliminates all the implementation headaches caused by Big Data enabling your Big Data applications to be completed quickly and fully achieve their objectives. Flexible, cost-effective cloud deployment.

Новая папка

Новая папка2. Kevin Kelly.