By Angus Ball

When working in R there are quite a few ways data can be presented stored and used. I’ve been writing these tutorials haphazardly and as I write more I’m getting better at coding in R. This unfortunately means that two tutorials that come back to back in the pipeline may have been written months apart with very (hopefully more efficiently) ways to do things.

This tutorial is here to provide a quick introduction to how I store and read data within R. Aswell as provide an introduction to the type of data you need and what it looks like.

There are two good ways to store data in R. If you are working with data frames (i.e. tables), they can often be stored as csv files (comma separated values). These are for all intensive purposes excel files that are simpler. You can make a csv file by taking an excel file and exporting/saving it as a csv file.

You can easily read and write a csv file like this

library(tidyverse)
Object_name <- read_csv("file path.csv")

# e.g.:
Seq_1 <- read_csv("C:\\Users\\angus\\OneDrive - UNBC\\Angus Ball\\Lab work\\ULTRA\\They call me... data\\Angus-16S\\16S-sv_seqs.csv")

You can write to a csv file like this

write_csv(Object, 
          "filepath.csv")
#e.g.
write_csv(Seq_1,
          "C:\\Users\\angus\\OneDrive - UNBC\\Angus Ball\\Lab work\\ULTRA\\They call me... data\\Angus-16S\\16S-sv_seqs.csv") #its important to have a fully fledged file name including .csv for your file

Hey fun fact when you’re writing these filepaths notice I use two slashes. You’ll likely need to do this too! (it’s for complicated coding escape character reasons)

But writing to a csv isn’t necessary the best way to store a dataframe in R. infact to save R objects you should use RDS files!

Using RDS files will save any type of R objects. Not just dataframes (tables), but complicated objects, like phyloseq objects, too!

to read an RDS file its the same as the csvs

object <- readRDS("filepath.rds")

#e.g. :
taxa <- readRDS("C:\\Users\\angus\\OneDrive - UNBC\\Angus Ball\\Lab work\\Bioinformatics\\Lisas data\\taxa.rds") #notice the .rds

Saving an RDS file looks the same too

saveRDS(object,
        "filepath.rds")

#e.g.
saveRDS(physeq_Key, file = "C:\\Users\\angus\\OneDrive - UNBC\\Angus Ball\\Lab work\\Bioinformatics\\Lisas data\\R objects\\physeq_Key_raw_fungaltrats.rds") #notice the .rds

excellent! so what kind of files should you expect?

When you create a phyloseq object you need a couple things.

You need an OTU table, a taxa table, and a key

This is an OTU table

#MetaG_1_total <- read_csv("C:\\Users\\angus\\OneDrive - UNBC\\Angus Ball\\Lab work\\ULTRA\\They call me... data\\Angus-16S\\16S-dada2_nochim_tax.csv")

head(MetaG_1_total)

These tables have samples names on the top (e.g. CHB1P1 and so on), and they have ASV’s on the side (e.g. SV_0, SV_1). The numbers in the middle are the counts of reads. I.e. CHB1P1 has 167 reads of SV_1.

You also need a taxa table. Depending on how your data was given to you, where this is and how it looks like will change. Ideally your taxa table looks like this

#taxa <- readRDS("C:\\Users\\angus\\OneDrive - UNBC\\Angus Ball\\Lab work\\Bioinformatics\\Lisas data\\taxa.rds")
head(taxa)

Where SV_# is on the right followed by the taxa level determinations in each following row

You’re data number have SV_# replaced with a DNA sequence and this is fine as long as its the same within the OTU table.

You have have the taxa table conglomerated into a single Taxa vector (as is seen within the phyloseq tutorial). This tends to exist at the very end of the OTU table. I just removed all the samples for easy reading (erm click the right arrow to see what this looks like)

head(taxavector)

In the phyloseq tutorial I show how to turn tax.vector into a good taxa table like that shown above.

Finally you’ll need a key for your samples

Key <- read_csv("C:\\Users\\angus\\OneDrive - UNBC\\Angus Ball\\Lab work\\ULTRA\\They call me... data\\Angus-16S\\ultra to categories key.csv")
Rows: 52 Columns: 5── Column specification ────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (5): Name, Amd, Conc, Fertalized, Inoculum
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Key)

The important parts are that the names and order of the names match exactly to your sample names in the OTU table. Amd, Conc, Fertalized are where you put your specific sample metadata.

LS0tDQp0aXRsZTogIkRhdGEgaW4gUiINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQpCeSBBbmd1cyBCYWxsDQoNCg0KV2hlbiB3b3JraW5nIGluIFIgdGhlcmUgYXJlIHF1aXRlIGEgZmV3IHdheXMgZGF0YSBjYW4gYmUgcHJlc2VudGVkIHN0b3JlZCBhbmQgdXNlZC4gSSd2ZSBiZWVuIHdyaXRpbmcgdGhlc2UgdHV0b3JpYWxzIGhhcGhhemFyZGx5IGFuZCBhcyBJIHdyaXRlIG1vcmUgSSdtIGdldHRpbmcgYmV0dGVyIGF0IGNvZGluZyBpbiBSLiBUaGlzIHVuZm9ydHVuYXRlbHkgbWVhbnMgdGhhdCB0d28gdHV0b3JpYWxzIHRoYXQgY29tZSBiYWNrIHRvIGJhY2sgaW4gdGhlIHBpcGVsaW5lIG1heSBoYXZlIGJlZW4gd3JpdHRlbiBtb250aHMgYXBhcnQgd2l0aCB2ZXJ5IChob3BlZnVsbHkgbW9yZSBlZmZpY2llbnRseSkgd2F5cyB0byBkbyB0aGluZ3MuDQoNClRoaXMgdHV0b3JpYWwgaXMgaGVyZSB0byBwcm92aWRlIGEgcXVpY2sgaW50cm9kdWN0aW9uIHRvIGhvdyBJIHN0b3JlIGFuZCByZWFkIGRhdGEgd2l0aGluIFIuIEFzd2VsbCBhcyBwcm92aWRlIGFuIGludHJvZHVjdGlvbiB0byB0aGUgdHlwZSBvZiBkYXRhIHlvdSBuZWVkIGFuZCB3aGF0IGl0IGxvb2tzIGxpa2UuDQoNCg0KDQoNClRoZXJlIGFyZSB0d28gZ29vZCB3YXlzIHRvIHN0b3JlIGRhdGEgaW4gUi4gSWYgeW91IGFyZSB3b3JraW5nIHdpdGggZGF0YSBmcmFtZXMgKGkuZS4gdGFibGVzKSwgdGhleSBjYW4gb2Z0ZW4gYmUgc3RvcmVkIGFzIGNzdiBmaWxlcyAoY29tbWEgc2VwYXJhdGVkIHZhbHVlcykuIFRoZXNlIGFyZSBmb3IgYWxsIGludGVuc2l2ZSBwdXJwb3NlcyBleGNlbCBmaWxlcyB0aGF0IGFyZSBzaW1wbGVyLiBZb3UgY2FuIG1ha2UgYSBjc3YgZmlsZSBieSB0YWtpbmcgYW4gZXhjZWwgZmlsZSBhbmQgZXhwb3J0aW5nL3NhdmluZyBpdCBhcyBhIGNzdiBmaWxlLg0KIA0KWW91IGNhbiBlYXNpbHkgcmVhZCBhbmQgd3JpdGUgYSBjc3YgZmlsZSBsaWtlIHRoaXMgDQogDQpgYGB7cn0NCmxpYnJhcnkodGlkeXZlcnNlKQ0KT2JqZWN0X25hbWUgPC0gcmVhZF9jc3YoImZpbGUgcGF0aC5jc3YiKQ0KDQojIGUuZy46DQpTZXFfMSA8LSByZWFkX2NzdigiQzpcXFVzZXJzXFxhbmd1c1xcT25lRHJpdmUgLSBVTkJDXFxBbmd1cyBCYWxsXFxMYWIgd29ya1xcVUxUUkFcXFRoZXkgY2FsbCBtZS4uLiBkYXRhXFxBbmd1cy0xNlNcXDE2Uy1zdl9zZXFzLmNzdiIpDQpgYGANCg0KDQpZb3UgY2FuIHdyaXRlIHRvIGEgY3N2IGZpbGUgbGlrZSB0aGlzDQpgYGB7cn0NCndyaXRlX2NzdihPYmplY3QsIA0KICAgICAgICAgICJmaWxlcGF0aC5jc3YiKQ0KI2UuZy4NCndyaXRlX2NzdihTZXFfMSwNCiAgICAgICAgICAiQzpcXFVzZXJzXFxhbmd1c1xcT25lRHJpdmUgLSBVTkJDXFxBbmd1cyBCYWxsXFxMYWIgd29ya1xcVUxUUkFcXFRoZXkgY2FsbCBtZS4uLiBkYXRhXFxBbmd1cy0xNlNcXDE2Uy1zdl9zZXFzLmNzdiIpICNpdHMgaW1wb3J0YW50IHRvIGhhdmUgYSBmdWxseSBmbGVkZ2VkIGZpbGUgbmFtZSBpbmNsdWRpbmcgLmNzdiBmb3IgeW91ciBmaWxlDQoNCmBgYA0KDQpIZXkgZnVuIGZhY3Qgd2hlbiB5b3UncmUgd3JpdGluZyB0aGVzZSBmaWxlcGF0aHMgbm90aWNlIEkgdXNlIHR3byBzbGFzaGVzLiBZb3UnbGwgbGlrZWx5IG5lZWQgdG8gZG8gdGhpcyB0b28hIChpdCdzIGZvciBjb21wbGljYXRlZCBjb2RpbmcgZXNjYXBlIGNoYXJhY3RlciByZWFzb25zKQ0KDQoNCg0KDQpCdXQgd3JpdGluZyB0byBhIGNzdiBpc24ndCBuZWNlc3NhcnkgdGhlIGJlc3Qgd2F5IHRvIHN0b3JlIGEgZGF0YWZyYW1lIGluIFIuIGluZmFjdCB0byBzYXZlIFIgb2JqZWN0cyB5b3Ugc2hvdWxkIHVzZSBSRFMgZmlsZXMhDQoNClVzaW5nIFJEUyBmaWxlcyB3aWxsIHNhdmUgYW55IHR5cGUgb2YgUiBvYmplY3RzLiBOb3QganVzdCBkYXRhZnJhbWVzICh0YWJsZXMpLCBidXQgY29tcGxpY2F0ZWQgb2JqZWN0cywgbGlrZSBwaHlsb3NlcSBvYmplY3RzLCB0b28hDQoNCnRvIHJlYWQgYW4gUkRTIGZpbGUgaXRzIHRoZSBzYW1lIGFzIHRoZSBjc3ZzDQpgYGB7cn0NCm9iamVjdCA8LSByZWFkUkRTKCJmaWxlcGF0aC5yZHMiKQ0KDQojZS5nLiA6DQp0YXhhIDwtIHJlYWRSRFMoIkM6XFxVc2Vyc1xcYW5ndXNcXE9uZURyaXZlIC0gVU5CQ1xcQW5ndXMgQmFsbFxcTGFiIHdvcmtcXEJpb2luZm9ybWF0aWNzXFxMaXNhcyBkYXRhXFx0YXhhLnJkcyIpICNub3RpY2UgdGhlIC5yZHMNCmBgYA0KDQpTYXZpbmcgYW4gUkRTIGZpbGUgbG9va3MgdGhlIHNhbWUgdG9vDQoNCmBgYHtyfQ0Kc2F2ZVJEUyhvYmplY3QsDQogICAgICAgICJmaWxlcGF0aC5yZHMiKQ0KDQojZS5nLg0Kc2F2ZVJEUyhwaHlzZXFfS2V5LCBmaWxlID0gIkM6XFxVc2Vyc1xcYW5ndXNcXE9uZURyaXZlIC0gVU5CQ1xcQW5ndXMgQmFsbFxcTGFiIHdvcmtcXEJpb2luZm9ybWF0aWNzXFxMaXNhcyBkYXRhXFxSIG9iamVjdHNcXHBoeXNlcV9LZXlfcmF3X2Z1bmdhbHRyYXRzLnJkcyIpICNub3RpY2UgdGhlIC5yZHMNCmBgYA0KDQoNCmV4Y2VsbGVudCEgc28gd2hhdCBraW5kIG9mIGZpbGVzIHNob3VsZCB5b3UgZXhwZWN0Pw0KDQpXaGVuIHlvdSBjcmVhdGUgYSBwaHlsb3NlcSBvYmplY3QgeW91IG5lZWQgYSBjb3VwbGUgdGhpbmdzLiANCg0KDQpZb3UgbmVlZCBhbiBPVFUgdGFibGUsIGEgdGF4YSB0YWJsZSwgYW5kIGEga2V5DQoNClRoaXMgaXMgYW4gT1RVIHRhYmxlDQpgYGB7cn0NCiNNZXRhR18xX3RvdGFsIDwtIHJlYWRfY3N2KCJDOlxcVXNlcnNcXGFuZ3VzXFxPbmVEcml2ZSAtIFVOQkNcXEFuZ3VzIEJhbGxcXExhYiB3b3JrXFxVTFRSQVxcVGhleSBjYWxsIG1lLi4uIGRhdGFcXEFuZ3VzLTE2U1xcMTZTLWRhZGEyX25vY2hpbV90YXguY3N2IikNCg0KaGVhZChNZXRhR18xX3RvdGFsKQ0KYGBgDQpUaGVzZSB0YWJsZXMgaGF2ZSBzYW1wbGVzIG5hbWVzIG9uIHRoZSB0b3AgKGUuZy4gQ0hCMVAxIGFuZCBzbyBvbiksIGFuZCB0aGV5IGhhdmUgQVNWJ3Mgb24gdGhlIHNpZGUgKGUuZy4gU1ZfMCwgU1ZfMSkuIFRoZSBudW1iZXJzIGluIHRoZSBtaWRkbGUgYXJlIHRoZSBjb3VudHMgb2YgcmVhZHMuIEkuZS4gQ0hCMVAxIGhhcyAxNjcgcmVhZHMgb2YgU1ZfMS4NCg0KDQpZb3UgYWxzbyBuZWVkIGEgdGF4YSB0YWJsZS4gRGVwZW5kaW5nIG9uIGhvdyB5b3VyIGRhdGEgd2FzIGdpdmVuIHRvIHlvdSwgd2hlcmUgdGhpcyBpcyBhbmQgaG93IGl0IGxvb2tzIGxpa2Ugd2lsbCBjaGFuZ2UuDQpJZGVhbGx5IHlvdXIgdGF4YSB0YWJsZSBsb29rcyBsaWtlIHRoaXMNCg0KYGBge3J9DQojdGF4YSA8LSByZWFkUkRTKCJDOlxcVXNlcnNcXGFuZ3VzXFxPbmVEcml2ZSAtIFVOQkNcXEFuZ3VzIEJhbGxcXExhYiB3b3JrXFxCaW9pbmZvcm1hdGljc1xcTGlzYXMgZGF0YVxcdGF4YS5yZHMiKQ0KaGVhZCh0YXhhKQ0KYGBgDQpXaGVyZSBTVl8jIGlzIG9uIHRoZSByaWdodCBmb2xsb3dlZCBieSB0aGUgdGF4YSBsZXZlbCBkZXRlcm1pbmF0aW9ucyBpbiBlYWNoIGZvbGxvd2luZyByb3cNCg0KDQpZb3UncmUgZGF0YSBudW1iZXIgaGF2ZSBTVl8jIHJlcGxhY2VkIHdpdGggYSBETkEgc2VxdWVuY2UgYW5kIHRoaXMgaXMgZmluZSBhcyBsb25nIGFzIGl0cyB0aGUgc2FtZSB3aXRoaW4gdGhlIE9UVSB0YWJsZS4NCg0KDQpZb3UgaGF2ZSBoYXZlIHRoZSB0YXhhIHRhYmxlIGNvbmdsb21lcmF0ZWQgaW50byBhIHNpbmdsZSBUYXhhIHZlY3RvciAoYXMgaXMgc2VlbiB3aXRoaW4gdGhlIHBoeWxvc2VxIHR1dG9yaWFsKS4gVGhpcyB0ZW5kcyB0byBleGlzdCBhdCB0aGUgdmVyeSBlbmQgb2YgdGhlIE9UVSB0YWJsZS4gSSBqdXN0IHJlbW92ZWQgYWxsIHRoZSBzYW1wbGVzIGZvciBlYXN5IHJlYWRpbmcgKGVybSBjbGljayB0aGUgcmlnaHQgYXJyb3cgdG8gc2VlIHdoYXQgdGhpcyBsb29rcyBsaWtlKQ0KDQpgYGB7cn0NCmhlYWQodGF4YXZlY3RvcikNCmBgYA0KDQpJbiB0aGUgcGh5bG9zZXEgdHV0b3JpYWwgSSBzaG93IGhvdyB0byB0dXJuIHRheC52ZWN0b3IgaW50byBhIGdvb2QgdGF4YSB0YWJsZSBsaWtlIHRoYXQgc2hvd24gYWJvdmUuDQoNCg0KRmluYWxseSB5b3UnbGwgbmVlZCBhIGtleSBmb3IgeW91ciBzYW1wbGVzDQpgYGB7cn0NCktleSA8LSByZWFkX2NzdigiQzpcXFVzZXJzXFxhbmd1c1xcT25lRHJpdmUgLSBVTkJDXFxBbmd1cyBCYWxsXFxMYWIgd29ya1xcVUxUUkFcXFRoZXkgY2FsbCBtZS4uLiBkYXRhXFxBbmd1cy0xNlNcXHVsdHJhIHRvIGNhdGVnb3JpZXMga2V5LmNzdiIpDQpoZWFkKEtleSkNCmBgYA0KVGhlIGltcG9ydGFudCBwYXJ0cyBhcmUgdGhhdCB0aGUgbmFtZXMgYW5kIG9yZGVyIG9mIHRoZSBuYW1lcyBtYXRjaCBleGFjdGx5IHRvIHlvdXIgc2FtcGxlIG5hbWVzIGluIHRoZSBPVFUgdGFibGUuIEFtZCwgQ29uYywgRmVydGFsaXplZCBhcmUgd2hlcmUgeW91IHB1dCB5b3VyIHNwZWNpZmljIHNhbXBsZSBtZXRhZGF0YS4NCg0KDQoNCg==