Learn to extract business or statistical information from large data sets with R
This course is ideal for data analysts wanting to use R to extract organisationally useful data from large data sets.
It is wide-ranging, covering all aspects of R from the basics, through to sophisticated graphics, advanced programming techniques and data mining algorithms. It also has a strong business focus, illustrating how analytical findings can be used for organisational planning purposes. It also provides an excellent all-round introduction for anybody needing to use R for general statistical purposes.
No prior knowledge required but certain basic statistical and programming skills would be an advantage (see Eligibility tab below).
The course covers all aspects of the R Language, focusing on its ability to extract organisationally significant information from databases and other large data sets.
Students will not only acquire a great deal of technical knowledge but will also gain insights into some sophisticated statistical and analytical concepts and the way analysis supports planning and strategic management processes.
By the end of this course, you will be able to:
Vectors, factors, matrices, lists and especially data frames. Manipulation of these using aggregative functions, indexing and other more sophisticated functions including the apply() family. How to use these techniques to best advantage with large organisational datasets.
We learn R’s basic plotting techniques (plot(), hist() etc.), but soon move on to more sophisticated techniques (ggplot2(), Tableau, Power BI). How to use these to further analyse organisational data and to present your analytic findings to co-workers.
With the emphasis very much on practical applications, not mathematical theory, we learn about descriptives, distribution, regression and correlation (including multiple regression), t-tests, ANOVA and categorical data analysis (including chi-squared). There is a strong emphasis of the applicability of statistical techniques to organisational problems, refining our models and rigourously testing them for reliability.
We learn the basics of procedural programming – variables, control structures and writing simple functions – before moving on to building more sophisticated functions geared to manipulating large datasets.
Data loading, cleaning and transformation:
Loading data from Excel, SQL, XML and the web, using SQL notation to query R data, cleaning and transforming your data (missing values, recoding and converting variables, creating new variables), merging and sampling data.
While no prior knowledge is required, you will find it useful to have a little knowledge of statistics (descriptives, regression and distribution), some basic SQL (up to using GROUP BY and ORDER BY), and the fundamentals of procedural programming (manipulating variables, ifs and whiles, writing simple functions) that can have been gained using any other programming language.
You must be IT literate.
You must be proficient in written and spoken English.
Teaching is in the form of lectures interspersed with exercises to test and expand your knowledge.
There is also a continuous data analysis project, where you will use R techniques to gain insights into a particular client group and how they might be approached to best advantage by an organisation.
R in Action. Manning (2015). Robert I Kabacoff
Other useful texts will be suggested during the course.
Mark Robbins was for many years a Project Manager working for the government, the BBC and the NHS, where he led large teams that designed and implemented many strategic national networking and messaging systems.
Mark now works as a freelance academic researcher and author, journalist and IT consultant and teaches a wide range of computer science subjects at London Metropolitan University.