Git

Scientific Computing

Author

Prof. Calvin

Why Git?

What is “git”?

  • A major command line tool for collaboration and publication.
  • The state-of-the-art of storing and sharing data files.
  • The basis of most modern open source tools.

Different “git”

  • git is a command line utility, like python or nvim
  • “Git” is the name of some software the supports this utility.
  • We also introduce “GitHub”

What is “GitHub”?

  • Was: an independent company maintaining servers for use by git
  • Now: a subsidiary of Microsoft for the same purposes.

Differentiating

git “GitHub”
Command line tool Website
Free and open source Free with paid options
Works on your own computer Works on the internet
Primarily used with GitHub Primarily used with git

Conceptualizing

  • So far in school you may have:
    • Turned in physical assignments in person.
    • Turned in electronic assignments via Google suite.
    • Turned in electornic assignments via email
  • All of these are very rough on source code many data files (like the .wav file).

Problem Solving

  • This exacerbated on:
    • Group projects
    • Remote projects
    • Projects using more than one file (paper + presentation, often)
  • Also, we face other coding challenges
    • What if you rm the project by accident?
    • That is permanent deletion (no “trash” or “recycling bin”)

Stepping Back

  • git was made to solve this exact problem!
  • The world leading scientific computing operating system, Linux, was being developed remotely via volunteers.
  • They used millions of lines of code in complex file structures.
  • They had to have a way to keep track of everything!

Enter git

  • In 2005 the Linux operating system engineers developed git:
    • A way to save versions of complex file systems and files
    • A way to distribute changes to these complex systems.
    • A way to document these changes.
  • It quickly became popular in the scientific community as well.

We will

  • Use git to:
    • Manage versions of our code
    • Publish these versions
  • Use “GitHub” to:
    • Store the work of git online

GitHub

Going Backwards

  • We’ll actually start with GitHub, where you’ll need to create an account.
  • We can have more fun with git latter if there’s a “GitHub” we can use with git

Your Account

  • To get started with GitHub, you’ll need to create a free personal account and verify your email address.
    • I strongly recommend using your .edu email account to get educational access to otherwise premium features.
    • You may, of course, use a personal email. It is your account!

Accounts

  • Every person who uses GitHub signs in to a user account.
    • Your user account is your identity on GitHub and has a username and profile.
    • For example, see @octocat’s profile
    • Or look at mine

Signing Up

  1. Navigate to GitHub.com.
  2. Click Sign up for GitHub
  3. Follow the prompts to create your personal account.

Verify

  • During sign up, you’ll be asked to verify your email address.
  • Without a verified email address, you won’t be able to do much, so make sure you verify your email!

Remember!

  • Make sure your remember the email you use, your account name, and your password.
  • As a computing professor, I get a lot of questions from students about what their name was.
    • I don’t know! It was your name!
  • You may want to send an email to yourself with a password hint, or take a physical note.

Repositories

Atomicity

  • The atomic unit of git is the “repository” which is basically a file system.
    • We learned about file systems by navigating around them with the shell
  • Like GitHub users, repositories also have names.

Uniqueness

  • Give some thought to naming repositories!
    • Names must be unique per user, but different users may have the same repository name.
    • For example, both “pandas-dev” and “cd-public” have a “pandas” repository
      • That is “pandas/pandas-dev” and “cd-public/pandas-dev”

Create New

  1. Navigate to any GitHub page
  2. In the top right right past the search bar, click +
  3. Select New repository

Take Ownership

  1. Leave template blank.
  2. Select yourself as the owner.

Name it

  1. Name your repository - for now hello-world is great!

Set Visibility

  • When you create a repository, you can choose to make the repository public or private.
    • Public repositories are accessible to everyone on the internet.
    • Private repositories are only accessible to you and people you explicitly share access.
  • I recommend “public”!
    • Easier to get help and to show off!

Finalize

  • By the way - do not initialize with a README.
    • Complicates our lives latter!
  • You’re now ready to click:
    Create repository
  • We’ll see a new page and return to the command line for more git!

Placing a Pin

  • Keep this page open for a little bit latter!
  • We have to set up the repository on our device before we can do anything else!
    • This like collaborating with ourselves - our GitHub self on the internet

git

Remote

  • We have now created a remote repository on GitHub.
    • Remote as in “not on our computer we are using right now”
    • As in “on a web server somewhere”
  • No we will make a local repository on our own computer.
  • Then connect them!

mkdir

  • Make and enter directory on your computer!
    • I’d name it the same as the repository, like hello-world
    mkdir hello-world
    cd hello-world

Add files

  • Let’s make a file to add.
  • Like a “Hello, world!” file.
  • Perhaps
nvim hello.py
  • Then
hello.py
print("Hello, world!")
  • Then :x back to command line!

Initialize

  • We currently have a local directory.
  • To promote it to git managed repository, we do:
git init
  • You can verify it worked by checking:
git status 

Status

  • I consult status often, yours will likely look like this:
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        hello.py

nothing added to commit but untracked files present (use "git add" to track)

Add

  • The first thing we note is that there are “Untracked files”
  • While we made a hello.py and have it in our hello-world directory, it isn’t yet “tracked” by git!
  • By default, git only keeps track of what we tell it to!
  • So, we tell it to track our code!
git add hello.py

Status

  • I check status again
On branch master

No commits yet

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)
        new file:   hello.py

Commit

  • With git now aware of hello.py, we need to commit our changes for git to save them.
    • Similar to saving files to the file system.
    • There are, of course, ways to automate this.
  • This probably won’t work at first (next slide!) but try:
git commit -a -m "first commit"

“-a -m”

  • Commits require a commit message (like a version name or number) so specify with -m
  • Usually I provide -a to commit “all” files.
  • I usually make some effort to make my life easier with specific commit messages, perhaps listing:
    • What I’m trying to do
    • Why?

Config

  • If you haven’t used git on your system before, you’ll have to tell git who you are.
    • In git there are no anonymous changes - you have to sign every change you make.
  • You’ll be prompted to provide something like this:
git config --global user.email "you@example.com"
git config --global user.name "Your Name"

My name

  • I often include which computer I’m using in my name and also don’t use a real email address.
    • My GitHub account is already attached to an email address, so I use a throwaway for commits.
git config --global user.email "prof_calvin@scicom.edu"
git config --global user.name "Calvin for Class"

Looping Back

  • Once you have provided your identity, you can successfuly complete a commit.
git commit -a -m "first commit"
  • This will:
    • Mark current code as a version, named by your commit message.
  • This won’t:
    • Do anything to GitHub.

Branching Out

  • As a rule, repositories have branches
    • This was helpful early on when different versions of software had different levels of release readiness or were working on different features.
  • The default branch is usually main which we can use for now.

Branch Command

  • The branch command can “list, create, or destroy branches”
  • We mostly need to have our branch on our device have the same name as the branch on GitHub.
  • Rename our brnach with branch -M to main - the expected name.
git branch -M main 

ssh

Secure Shell

  • We know how to deal with a very important topic:
  • Security
  • When moving executable code between websites and our system, we have to be careful that it is:
    • Not tampered with
    • Made by who said made it

Security Concession

  • We will use SSH, the “Secure Shell Protocol”
  • It is supported by git/GitHub and all major operating systems.

Keygen

  • SSH is based around having “keys”
    • Under the hood, these are special numbers with special properties related to primes, basically.
  • We generate a special unique key we can use as a password or signature.
ssh-keygen

Prompts

  • This example uses (1) the default location and (2) no passphrase.
    • This is less secure but easier to manage.
    • Work in a passphrase as soon as you can!
Generating public/private ed25519 key pair.
Enter file in which to save the key (/home/calvin/.ssh/id_ed25519): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/calvin/.ssh/id_ed25519
  • You now have a key! Let’s find a lock.

GitHub

  • The purpose of this exercise is to connect to GitHub!
  • We will mostly use git for that, but we can check if we have a connection easily:
ssh -T git@github.com

Example

  • This is what I see:
The authenticity of host 'github.com (140.82.116.3)' can't be established.
ED25519 key fingerprint is SHA256:+DiY3wvvV6TuJJhbpZisF/zLDA0zPMSvHdkr4UvCOqU.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
  • Before saying “yes” or “no” compare versus the “public SSH key fingerprints”
  • By default, the modern keygen uses “Ed25519” so compare those keys, and click “yes” if they match!

Connecting

  • After confirming the key correctness, you will likely see this.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'github.com' (ED25519) to the list of known hosts.
git@github.com: Permission denied (publickey).

One Way

  • Now we have instructed our computer to trust GitHub.
  • To save our code to GitHub, we must also get GitHub to trust our computer.
  • We generated our key, now we must match it to our lock on GitHub!

GitHub pt 1

  • We follow this guide
  • Return to GitHub and in the top right click your profile picture/avatar.
    • For me, a picture of myself.
  • Scroll down to “Settings” and click.
    • Last in 2nd grouping, for me.

GitHub pt 2

  • Within settings left-side menu click “SSH and GPG keys”
    • Middle of 2nd group (Access) for me.
  • Within the center block click New SSH key in the top right of the block.

Prompt

  • I get a page “Add new SSH Key”.
  • I provide a title (like “For SciCom class” or “Laptop”)
  • Now we’ll get our key from our system to use.
  • Recall we previously generated a key to the default location, like /home/calvin/.ssh/id_ed25519
    • If you don’t remember, just type ssh-keygen and it’ll show you.

Copy/Paste

  • To get the key for GitHub, we can can
    • cat that file in the command line
    • Copy from the command line
    • Paste into the GitHub “Key” field
    • Click Add SSH key

SSH Test

  • After adding the key to the account, I confirm via SSH like so:
ssh -T git@github.com
  • I see the following:
Hi cd-public! You've successfully authenticated, but GitHub does not provide shell access.

Push

Save your work

  • My preferred way to save my work, especially when working across multiple team members or devices, is by “pushing” to GitHub.
  • Usually this is only git push (after commit) but we still have a bit of setup to do.

Origins

  • We want to use git to attach our local repository to the GitHub repository.
  • This called “adding a remote origin”
    • Remote as in online
    • Origin as in place to get files from.
  • Our GitHub repository has some SSH name we will need.

Finding Names

  • In your web browser, navigate to your repository page on GitHub.
  • Locate the “Quick setup — if you’ve done this kind of thing before” heading.
    • Select the “SSH” option.
    • Copy the address.

git@github.com:cd-public/hello-world

Or, existing repositories.

  • Find the <> Code button in the top right.
  • Click it:
    • Select the top “local” tab (for or local repository)
    • Select the SSH option (for our SSH key)
    • Copy the name, like

git@github.com:cd-public/hello-world

Add Remote

  • With that name (which is just use GitHub name and repository name), return to the command line and add the remote origin.
git remote add origin git@github.com:cd-public/hello-world.git

Push

  • With all that work done, you can simply:
git push -u origin main
  • After a moment and some diagnostic text, you should be able to see your code on GitHub, possibly after refreshing the page!

Diagnostics

  • For example, you might see the following:
Enumerating objects: 27, done.
Counting objects: 100% (27/27), done.
Delta compression using up to 22 threads
Compressing objects: 100% (15/15), done.
Writing objects: 100% (16/16), 98.54 KiB | 2.46 MiB/s, done.
Total 16 (delta 11), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (11/11), completed with 10 local objects.
To github.com:cd-public/hello-world.git
   adaa3a7..ba3c794  main -> main

Everyday Use

Easier After

  • A lot of what we did, we only have to do once:
    • git config
    • ssh-keygen
    • set origin
  • Many are special cases:
    • Only have to make new repositories for… new repositories (likely projects)
    • Only have to git add for new files.

Quick Example

  • Let’s add bye.py
  • We’ll create a new file at the command line.
nvim bye.py
  • Add some code…
bye.py
print("Bye!")

Add, Commit, Push

  • To get the code onto Github, add to the repo:
git add bye.py
  • Commit changes to a version:
git commit -a -m "You say hello, I say goodbye"
  • Push to GitHub
git push
  • And that’s that!

Altogether

  • For a single copy/paste
git add bye.py
git commit -a -m "You say hello, I say goodbye"
git push
  • If it works, you’ll see the change on GitHub!

Again

  • Perhaps we wish to be more polite and update saying “Bye!” to saying “Goodbye!”.
bye.py
print("Goodbye!")

No Add

  • Don’t need an add this time!
git commit -a -m "Goodbye version"
git push

Using GitHub.com

  • It is also possible to edit files on GitHub.
    • This is the same as a collaborator pushing a change!
  • In the repository webpage, click on a file, like hello.py
  • This should take you to a page with a central text window showing the text in the file hello.py
  • Click, for lack of a better word, the pencil icon on the top right of the text window.
    • ✏️

Editing files

  • Click the “pencil” ✏️ top right of the text.
  • Make some edit, like:
hello.py
print("Hello, remote-to-local!")
  • Click Commit changes… in the top right.
  • In the pop-up, write commit message a la git commit -m then click Commit changes

Pulling

  • To get remote changes reflected locally, simply use
git pull
  • You can then run immediately:
python hello.py
  • As a rule, it is good to pull often, and especially before a push, if you have a collaborator or use multiple devices.

Exercise

Exercise

  • Create a new repository for this class, say “Scientific Computing”
  • I named mine “scicom” (shorter to type!)
  • Create file tax.py.
  • For each of the Python, Neovim, Shell, and NumPy versions of the tax problem:
    • Save the solution as tax.py
    • Commit with a descriptive message.

Solution