Big data science

What is Big Data?

Does anybody have recommendation on books.

thank you

Comments

  • prampram Member Posts: 171
    Big data is exactly what it sounds like, tons of information. This usually involves some kind of analytics or number crunching (and possible storage)

    I'd read a book on Hadoop and mapreduce in general.
  • IvanjamIvanjam Member Posts: 978 ■■■■□□□□□□
    Check out the MS in data Analytics offered by CUNY: Online Master's Degree in Data Analytics (M.S.) | CUNY School of Professional Studies

    This is their summer bridge program for persons with a BS who are interested in getting into the MS program:
    A Bridge to Admission: Free Online Workshops for Data Analytics Applicants | CUNY School of Professional Studies
    Fall 2014: Start MA in Mathematics [X]
    Fall 2016: Start PhD in Mathematics [X]
  • NutsacjacNutsacjac Member Posts: 76 ■■■□□□□□□□
    CBT Nuggets has a very good series on Hadoop. If you have a subscription I'd suggest giving it a view.
  • prampram Member Posts: 171
    Kind of a tangent but another emerging data tool is elasticsearch and kibana. You can turn, for instance, your map reduced data into json and create custom analytic views with kibana. I've set up custom bindings for sql data and piped it into elasticsearch with a jdbc river. Kibana is a FANTASTIC tool and I think it has a ton of uses. Elasticsearch is pretty amazing as well, since you can program displays right into it and serve content directly from the daemon.


    Open Source Distributed Real Time Search & Analytics | Elasticsearch
    Kibana | Overview | Elasticsearch

    It's essentially an open source version of Splunk's upcoming Hunk

    http://www.splunk.com/view/hunk/SP-CAAAH2E
  • KronesKrones Member Posts: 164
    What are you interested in specifically? The store houses? Creating Data Cubes in SQL Server? The ETL process from MySQL?Data analytics field is pretty broad. Are you more interesting in forecast analysis based on data, AB Testing, etc.

    This is based on where I work which run a web app that receives about 12 million uniques monthly and tracking is at the heart of the site.

    Database Administrator - in charge of architecture and table management, ETL process, query optimization, etc (SQL and NO SQL Technologies)
    Business Intelligence/Analytics - Number crunching, lots of SEO,SEM, etc. More or less guiding the vision for business operation and also requesting new systems and web campaigns - also researches new technology. Some deal directly with vendors and merchants but are not always strictly sales either.
    Developer/Reporting Team - AB Testing, writing code to support tracking in the app, cube creation in SQL Server, lots of query writing and uuid and internal tracking.
    Business Analysts - Excel crunching - all day every day. The dirty work so to speak and supports BI/Analytics echelon.

    You might want to look into pentaho as well.
    WGU - Security
    Current: Start date Sept 1. Remaining:
    CUV1, BOV1, CJV1, CVV1, KET1, KFT1, DFV1, TPV1, BNC1, RIT1, DHV1, CSV1, COV1, CQV1, CNV1, SBT1, RGT1 Completed:
    AXV1, CPV1, CTV1 Transferred: AGC1, BBC1, LAE1, QBT1, LUT1, GAC1/HHT1, QLT1, IWC1, IWT1, INC1, INT1, BVC1, CLC1, WFV1, DJV1
  • N2ITN2IT Inactive Imported Users Posts: 7,483 ■■■■■■■■■■
    I work within a business intelligence team and I am not so sure there have been any formal standards developed yet. I know for a example big data can be trillions of rows of transactional data and the only way to visually depict the data is by use of heat maps and other charting/graphing techniques. IBM for instance crushed so much data in one of their databases they couldn't even show it on a chart or anything that made sense. They developed a heat map that represented the trending. @OP Sorry I don't know of any really good books. I had one from college but I forget the name of it. I'll repost tonight when I get home and look up the author, title and edition of the book.
  • NightShade03NightShade03 Member Posts: 1,383 ■■■■■■■□□□
    Big Data = Marketing term that is insanely overused...

    In reality "big data" has three defining properties; velocity, variety, and volume. These three properties help to describe how data sets can be described. For example you could have a 10GB database which to the average person might seem big (it's not). However, big data is actually closer to what N2IT was describing (trillions of rows).

    The underlying principles that are driving big data today are, in my opinion, split in two camps:

    In camp one you have the data scientists. Data Science incorporates varying elements and builds on techniques and theories from many fields, including signal processing, mathematics, probability models, machine learning, computer programming, statistics, data engineering, pattern recognition and learning, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products. URL="http://en.wikipedia.org/wiki/Data_science"]Cite[/URL People in this field are usually really really smart and have a strong background in math/statistics/etc. Additionally, this camp is helping to solve some crazy hard problems by crunching data and making advances in machine learning (i.e. mapping the human genome).

    In the other camp you have everyone else (BI people, data warehousing, analytics, etc). While this camp holds many smart people too....they are usually more focused on a problem relating to providing better insight. This might be around data analytics and how that applies to a business, security analytics and how that can help secure an organization better and in real time, or even around predictive modeling to enhance a forecast model for a sales organization. The biggest difference is that this camp deals with "the business" or "the enterprise" vs. camp one which focuses more on research.

    There are tons of good reads online about big data itself, but in all honesty to understand the data you will have to understand how data sets work and the power of manipulating data. Look at things like ETL for example. ETL (extract, transform load) is a common way to manipulate data when moving between platforms like raw logs and Hadoop or Hadoop and DBs. Understanding some of the key concepts and algorithms used in data science will help make your understanding of big data more real.

    I found this book really good as a read and a reference:

    Machine Learning: The Art and Science of Algorithms that Make Sense of Data: Peter Flach: 9781107422223: Amazon.com: Books

    WARNING - This book is hardcore in the math department. Unless you are going to spend a lot of time researching these concepts or have a solid background in discrete math/statistics/geometry/spacial planes you will have a hard time getting through this book. All that being said it is one of the best books I've read so far on many different topics in the field as a whole.

    If you want to play around with some sample data sets and see how things like Hadoop work for processing big data...check out Hortonworks:

    Hortonworks. We Do Hadoop.

    They have a free sandbox ready to go (with tutorials) for data analysis and looking at the various components of Hadoop, which is commonly referenced in conversations regarding big data.

    If I can help clarify, provide more insight, or be helpful in any other way just let me know :D
  • horusthesunhorusthesun Member Posts: 289
    Thanks everyone. I understand now. I was just curious and I want real opinions and the hard truth of it all. I watch many you tube videos and it all felt like infomercials. Thanks.
  • marknathonmarknathon Member Posts: 2 ■■□□□□□□□□
    First, I should provide a great definition of data science I found on the internet: A clear big data definition can be difficult to pin down because big data can cover a multitude of use cases. But in general, the term refers to sets of data that are so large in volume and so complex that traditional data processing software products are not capable of capturing, managing, and processing the data within a reasonable amount of time.

    Now, Let me recommend you some books: 
    1) "Big Data: A Revolution..." by Kenneth Cukier
    2)  "Too Big to Ignore" by Phil Simon
    .
    Noww, you can also make projects with data science. Here are some Ideas for you: Projects for beginner data science.
Sign In or Register to comment.