
Distant reading is a great digital humanities tool for understanding broad subjects in a short period of time. I chose the dataset “First Person Narratives of the American South”. You can find the photos from the data I recieved below.


This corpus has 150 documents with 8,168,336 total words and 73,029 unique word forms. Created about 31 minutes ago.
Document Length:
- Longest: fpn-velazquez-velazquez (228225); fpn-early-early (166873); fpn-morganjames-morgan (166297); fpn-wyeth-wyeth (162300); fpn-harland-harland (152707)
- Shortest: fpn-ward-ward (2464); fpn-hortonlife-horton (3464); fpn-negnurse-negnurse (3572); fpn-curry-curry (3808); fpn-shepherd-shepherd (3933)
Vocabulary Density:
- Highest: fpn-curry-curry (0.382); fpn-hortonlife-horton (0.366); fpn-shepherd-shepherd (0.362); fpn-ward-ward (0.345); fpn-bagby-bagby (0.295)
- Lowest: fpn-early-early (0.043); fpn-velazquez-velazquez (0.047); fpn-ball-ball (0.064); fpn-pringle-pringle (0.066); fpn-olive-olive (0.068)
Average Words Per Sentence:
- Highest: fpn-malone-malone (44.0); fpn-robson-robson (37.3); fpn-ball-ball (36.3); fpn-early-early (35.5); fpn-leigh-leigh (34.6)
- Lowest: fpn-betts-betts (9.5); fpn-wrightmarcus-wright (10.1); southlit-chesnut-maryches (14.2); fpn-leon-leon (14.6); fpn-burge-lunt (15.7)
Readability Index:
- Highest: fpn-curry-curry (13.245); fpn-jonescharles-jones (11.085); fpn-gordon-gordon (10.796); fpn-shepherd-shepherd (10.664); fpn-taylor-taylor (10.619)
- Lowest: fpn-betts-betts (5.029); fpn-edmondson-edmondson (5.705); fpn-mcleary-mcleary (5.746); fpn-malone-malone (5.891); fpn-jones-jones (5.891)
Most frequent words in the corpus: time (17217); said (15093); day (14747); men (14303); general (13997)
Distinctive words (compared to the rest of the corpus):
- fpn-andrews-andrews: mett (70), metta (69), dépot (33), capt (158), garnett (69).
- fpn-arp-arp: arp (131), dident (59), em (241), wouldent (40), cobe (22).
- fpn-ashby-ashby: ashby (52), federal (243), duryée (14), shields (40), kenly (16).
- fpn-aughey-aughey: aughey (67), tupelo (76), unionists (70), unionist (59), rienzi (31).
- fpn-avary-avary: milicent (79), dan (246), nell (82), hosmer (35), locke (48).
- fpn-avirett-avirett: yuh (36), suh (29), planter’s (40), marse (49), plantation (215).
- fpn-bagby-bagby: canell (5), ahn (5), canal (25), lynchburg (15), mobjack (4).
- fpn-balch-balch: manse (73), moseby (11), ringwood (9), java (11), mocha (8).
- fpn-ball-ball: fishery (35), overseer (141), seine (25), whilst (59), master (432).
- fpn-battle-lee: pallas (65), nealie (50), bettie (53), peel (44), jesse (69).
- fpn-beard-beard: bluefield (38), winston (72), ida (66), robah (25), beard (86).
- fpn-betts-betts: bro (155), betts (54), preaches (64), nov (81), chaplains (45).
- fpn-biggs-biggs: williamston (14), biggs (16), oclock (13), ly (9), senate (41).
- fpn-blackford-blackford: murf (279), jupe (197), oclock (221), blackford (199), enoch (287).
- fpn-boggs-boggs: boggs (60), o’bannon (26), bragg (84), pensacola (40), kirby (50).
- fpn-bokum-bokum: union (75), treason (12), learnt (5), treasonable (5), tennessee (24).
- fpn-boyd1-boyd1: martinsburg (40), boyd (33), belle (56), sentries (16), honour (17).
- fpn-boyd2-boyd2: hardinge (64), greyhound (22), swasey (11), mulford (9), boyd (22).
- fpn-branch-branch: polk (109), pollok (21), wyley (15), polk’s (16), mckinnie (9).
- fpn-brownd-dbrown: mas’r (44), cudjo (23), slavery (196), abolitionists (42), cleon (12).
Using these datasets, I can come to a couple conclusions about how life was like in the American south. For starters, the word “war” is a pretty clear one; it refers to the American Civil War and the effect it would be having on the citizens. Another word that came up quite a bit was “general”, relating to the civil war once again. Another thing I noticed was that 2 of the top 5 words were “time’ and “day”, making me think that times were tough and that they were heavily keeping track of time, whether it be how long they had been fighting for, how many days since an event, etc. Lastly, one of the top 5 words was “men”, representing how the 1800s southern states we’re heavily men-oriented, along with many other parts of the world at this time. It also would have been men fighting on the front-lines of the war, fighting until their bloody end.
I definitely could have determined this with close reading instead of distant. In fact, I would have a much more advanced interpretation with close reading, not just making guesses off a few words. While distant reading is great in a time sense, it doesn’t seem like an ideal tool to fully understand a concept or a time period. It just doesn’t provide enough information without having external context (Like, for example, the American Civil War that I mentioned earlier.) If I had processed a dataset of something I had no prior knowledge of, I’m not sure if I would be able to come to as clear of a conclusion.