The Resource Beginning data science in R : data analysis, visualization, and modelling for the data scientist, Thomas Mailund

Beginning data science in R : data analysis, visualization, and modelling for the data scientist, Thomas Mailund

Label
Beginning data science in R : data analysis, visualization, and modelling for the data scientist
Title
Beginning data science in R
Title remainder
data analysis, visualization, and modelling for the data scientist
Statement of responsibility
Thomas Mailund
Creator
Author
Subject
Language
eng
Summary
Discover best practices for data analysis and software development in R and start on the path to becoming a fully-fledged data scientist. This book teaches you techniques for both data manipulation and visualization and shows you the best way for developing new software packages for R. Data Science in R details how data science is a combination of statistics, computational science, and machine learning. You'll see how to efficiently structure and mine data to extract useful patterns and build mathematical models. This requires computational methods and programming, and R is an ideal programming language for this. This book is based on a number of lecture notes for classes the author has taught on data science and statistical programming using the R programming language. Modern data analysis requires computational skills and usually a minimum of programming. You will: Perform data science and analytics using statistics and the R programming language Visualize and explore data, including working with large data sets found in big data Build an R package Test and check your code Practice version control Profile and optimize your code
Cataloging source
N$T
Dewey number
001.42
Index
index present
LC call number
Q180.55.Q36
Literary form
non fiction
Nature of contents
dictionaries
Label
Beginning data science in R : data analysis, visualization, and modelling for the data scientist, Thomas Mailund
Publication
Copyright
Note
Includes index
Antecedent source
unknown
http://library.link/vocab/branchCode
  • net
Carrier category
online resource
Carrier category code
cr
Carrier MARC source
rdacarrier
Color
multicolored
Content category
text
Content type code
txt
Content type MARC source
rdacontent
Contents
  • At a Glance; Contents; About the Author; About the Technical Reviewer; Acknowledgments; Introduction; Chapter 1: Introduction to R Programming; Basic Interaction with R; Using R as a Calculator; Simple Expressions; Assignments; Actually, All of the Above Are Vectors of Values ... ; Indexing Vectors; Vectorized Expressions; Comments; Functions; Getting Documentation for Functions; Writing Your Own Functions; Vectorized Expressions and Functions; A Quick Look at Control Structures; Factors; Data Frames; Dealing with Missing Values; Using R Packages
  • Controlling the Output (Templates/Stylesheets)Running R Code in Markdown Documents; Using Chunks when Analyzing Data (Without Compiling Documents); Caching Results; Displaying Data; Exercises; Create an R Markdown Document; Produce Different Output; Add Caching; Chapter 3: Data Manipulation; Data Already in R; Quickly Reviewing Data; Reading Data; Examples of Reading and Formatting Datasets; Breast Cancer Dataset; Boston Housing Dataset; The readr Package; Manipulating Data with dplyr; Some Useful dplyr Functions; select(): Pick Selected Columns and Get Rid of the Rest
  • Data Pipelines (or Pointless Programming)Writing Pipelines of Function Calls; Writing Functions that Work with Pipelines; The magical "." argument; Defining Functions Using .; Anonymous Functions; Other Pipeline Operations; Coding and Naming Conventions; Exercises; Mean of Positive Values; Root Mean Square Error; Chapter 2: Reproducible Analysis; Literate Programming and Integration of Workflow and Documentation; Creating an R Markdown/knitr Document in RStudio; The YAML Language; The Markdown Language; Formatting Text; Cross-Referencing; Bibliographies
  • FacetsScaling; Themes and Other Graphics Transformations; Figures with Multiple Plots; Exercises; Chapter 5: Working with Large Datasets; Subsample Your Data Before You Analyze the Full Dataset; Running Out of Memory During Analysis; Too Large to Plot; Too Slow to Analyze; Too Large to Load; Exercises; Subsampling; Hex and 2D Density Plots; Chapter 6: Supervised Learning; Machine Learning; Supervised Learning; Regression versus Classification; Inference versus Prediction; Specifying Models; Linear Regression; Logistic Regression (Classification, Really); Model Matrices and Formula
  • Mutate():Add Computed Values to Your Data FrameTransmute(): Add Computed Values to Your Data Frame and Get Rid of All Other Columns; arrange(): Reorder Your Data Frame by Sorting Columns; filter(): Pick Selected Rows and Get Rid of the Rest; group_by(): Split Your Data Into Subtables Based on Column Values; summarise/summarize(): Calculate Summary Statistics; Breast Cancer Data Manipulation; Tidying Data with tidyr; Exercises; Importing Data; Using dplyr; Using tidyr; Chapter 4: Visualizing Data; Basic Graphics; The Grammar of Graphics and the ggplot2 Package; Using qplot(); Using Geometries
Control code
ocn975486855
Dimensions
unknown
Extent
1 online resource
File format
unknown
Form of item
online
Isbn
9781484226711
Media category
computer
Media MARC source
rdamedia
Media type code
c
Other control number
10.1007/978-1-4842-2671-1
http://library.link/vocab/ext/overdrive/overdriveId
cl0500000849
Quality assurance targets
unknown
http://library.link/vocab/recordID
.b37442557
Sound
unknown sound
Specific material designation
remote
System control number
  • (OCoLC)975486855
  • safari1484226712

Library Locations

    • Deakin University Library - Geelong Waurn Ponds CampusBorrow it
      75 Pigdons Road, Waurn Ponds, Victoria, 3216, AU
      -38.195656 144.304955
Processing Feedback ...