Git
Scientific Computing
Why Git?
What is “git”?
- A major command line tool for collaboration and publication.
- The state-of-the-art of storing and sharing data files.
- The basis of most modern open source tools.
Different “git”
git
is a command line utility, likepython
ornvim
- “Git” is the name of some software the supports this utility.
- We also introduce “GitHub”
What is “GitHub”?
- Was: an independent company maintaining servers for use by
git
- Now: a subsidiary of Microsoft for the same purposes.
Differentiating
git |
“GitHub” |
---|---|
Command line tool | Website |
Free and open source | Free with paid options |
Works on your own computer | Works on the internet |
Primarily used with GitHub | Primarily used with git |
Conceptualizing
- So far in school you may have:
- Turned in physical assignments in person.
- Turned in electronic assignments via Google suite.
- Turned in electornic assignments via email
- All of these are very rough on source code many data files (like the
.wav
file).
Problem Solving
- This exacerbated on:
- Group projects
- Remote projects
- Projects using more than one file (paper + presentation, often)
- Also, we face other coding challenges
- What if you
rm
the project by accident? - That is permanent deletion (no “trash” or “recycling bin”)
- What if you
Stepping Back
git
was made to solve this exact problem!- The world leading scientific computing operating system, Linux, was being developed remotely via volunteers.
- They used millions of lines of code in complex file structures.
- They had to have a way to keep track of everything!
Enter git
- In 2005 the Linux operating system engineers developed
git
:- A way to save versions of complex file systems and files
- A way to distribute changes to these complex systems.
- A way to document these changes.
- It quickly became popular in the scientific community as well.
We will
- Use
git
to:- Manage versions of our code
- Publish these versions
- Use “GitHub” to:
- Store the work of
git
online
- Store the work of
GitHub
Going Backwards
- We’ll actually start with GitHub, where you’ll need to create an account.
- We can have more fun with
git
latter if there’s a “GitHub” we can use withgit
Your Account
- To get started with GitHub, you’ll need to create a free personal account and verify your email address.
- I strongly recommend using your
.edu
email account to get educational access to otherwise premium features. - You may, of course, use a personal email. It is your account!
- I strongly recommend using your
Accounts
- Every person who uses GitHub signs in to a user account.
- Your user account is your identity on GitHub and has a username and profile.
- For example, see @octocat’s profile
- Or look at mine
Signing Up
- Navigate to GitHub.com.
- Click Sign up for GitHub
- Follow the prompts to create your personal account.
Verify
- During sign up, you’ll be asked to verify your email address.
- Without a verified email address, you won’t be able to do much, so make sure you verify your email!
Remember!
- Make sure your remember the email you use, your account name, and your password.
- As a computing professor, I get a lot of questions from students about what their name was.
- I don’t know! It was your name!
- You may want to send an email to yourself with a password hint, or take a physical note.
Repositories
Atomicity
- The atomic unit of
git
is the “repository” which is basically a file system.- We learned about file systems by navigating around them with the shell
- Like GitHub users, repositories also have names.
Uniqueness
- Give some thought to naming repositories!
- Names must be unique per user, but different users may have the same repository name.
- For example, both “pandas-dev” and “cd-public” have a “pandas” repository
- That is “pandas/pandas-dev” and “cd-public/pandas-dev”
Create New
- To understand better, let’s create a repository.
- Navigate to any GitHub page
- In the top right right past the search bar, click +
- Select New repository
Take Ownership
- Leave template blank.
- Select yourself as the owner.
Name it
- Name your repository - for now
hello-world
is great!
Set Visibility
- When you create a repository, you can choose to make the repository public or private.
- Public repositories are accessible to everyone on the internet.
- Private repositories are only accessible to you and people you explicitly share access.
- I recommend “public”!
- Easier to get help and to show off!
Finalize
- By the way - do not initialize with a README.
- Complicates our lives latter!
- You’re now ready to click:
Create repository - We’ll see a new page and return to the command line for more
git
!
Placing a Pin
- Keep this page open for a little bit latter!
- We have to set up the repository on our device before we can do anything else!
- This like collaborating with ourselves - our GitHub self on the internet
git
Remote
- We have now created a remote repository on GitHub.
- Remote as in “not on our computer we are using right now”
- As in “on a web server somewhere”
- No we will make a local repository on our own computer.
- Then connect them!
mkdir
- Make and enter directory on your computer!
- I’d name it the same as the repository, like
hello-world
mkdir hello-world cd hello-world
- I’d name it the same as the repository, like
Add files
- Let’s make a file to add.
- Like a “Hello, world!” file.
- Perhaps
nvim hello.py
- Then
- Then
:x
back to command line!
Initialize
- We currently have a local directory.
- To promote it to
git
managed repository, we do:
git init
- You can verify it worked by checking:
git status
Status
- I consult status often, yours will likely look like this:
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
hello.py
nothing added to commit but untracked files present (use "git add" to track)
Add
- The first thing we note is that there are “Untracked files”
- While we made a
hello.py
and have it in ourhello-world
directory, it isn’t yet “tracked” bygit
! - By default,
git
only keeps track of what we tell it to! - So, we tell it to track our code!
git add hello.py
Status
- I check status again
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: hello.py
Commit
- With
git
now aware ofhello.py
, we need to commit our changes forgit
to save them.- Similar to saving files to the file system.
- There are, of course, ways to automate this.
- This probably won’t work at first (next slide!) but try:
git commit -a -m "first commit"
“-a -m”
- Commits require a commit message (like a version name or number) so specify with
-m
- Usually I provide
-a
to commit “all” files. - I usually make some effort to make my life easier with specific commit messages, perhaps listing:
- What I’m trying to do
- Why?
Config
- If you haven’t used
git
on your system before, you’ll have to tellgit
who you are.- In
git
there are no anonymous changes - you have to sign every change you make.
- In
- You’ll be prompted to provide something like this:
git config --global user.email "you@example.com"
git config --global user.name "Your Name"
My name
- I often include which computer I’m using in my name and also don’t use a real email address.
- My GitHub account is already attached to an email address, so I use a throwaway for commits.
git config --global user.email "prof_calvin@scicom.edu"
git config --global user.name "Calvin for Class"
Looping Back
- Once you have provided your identity, you can successfuly complete a commit.
git commit -a -m "first commit"
- This will:
- Mark current code as a version, named by your commit message.
- This won’t:
- Do anything to GitHub.
Branching Out
- As a rule, repositories have branches
- This was helpful early on when different versions of software had different levels of release readiness or were working on different features.
- The default branch is usually
main
which we can use for now.
Branch Command
- The branch command can “list, create, or destroy branches”
- We mostly need to have our branch on our device have the same name as the branch on GitHub.
- Rename our brnach with
branch -M
tomain
- the expected name.
git branch -M main
ssh
Secure Shell
- We know how to deal with a very important topic:
- Security
- When moving executable code between websites and our system, we have to be careful that it is:
- Not tampered with
- Made by who said made it
Security Concession
- We will use SSH, the “Secure Shell Protocol”
- It is supported by
git
/GitHub and all major operating systems.
Keygen
- SSH is based around having “keys”
- Under the hood, these are special numbers with special properties related to primes, basically.
- We generate a special unique key we can use as a password or signature.
ssh-keygen
Prompts
- This example uses (1) the default location and (2) no passphrase.
- This is less secure but easier to manage.
- Work in a passphrase as soon as you can!
Generating public/private ed25519 key pair.
Enter file in which to save the key (/home/calvin/.ssh/id_ed25519):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/calvin/.ssh/id_ed25519
- You now have a key! Let’s find a lock.
GitHub
- The purpose of this exercise is to connect to GitHub!
- We will mostly use
git
for that, but we can check if we have a connection easily:
ssh -T git@github.com
Example
- This is what I see:
The authenticity of host 'github.com (140.82.116.3)' can't be established.
ED25519 key fingerprint is SHA256:+DiY3wvvV6TuJJhbpZisF/zLDA0zPMSvHdkr4UvCOqU.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])?
- Before saying “yes” or “no” compare versus the “public SSH key fingerprints”
- Check them here
- By default, the modern keygen uses “Ed25519” so compare those keys, and click “yes” if they match!
Connecting
- After confirming the key correctness, you will likely see this.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'github.com' (ED25519) to the list of known hosts.
git@github.com: Permission denied (publickey).
One Way
- Now we have instructed our computer to trust GitHub.
- To save our code to GitHub, we must also get GitHub to trust our computer.
- We generated our key, now we must match it to our lock on GitHub!
GitHub pt 1
- We follow this guide
- Return to GitHub and in the top right click your profile picture/avatar.
- For me, a picture of myself.
- Scroll down to “Settings” and click.
- Last in 2nd grouping, for me.
GitHub pt 2
- Within settings left-side menu click “SSH and GPG keys”
- Middle of 2nd group (Access) for me.
- Within the center block click New SSH key in the top right of the block.
Prompt
- I get a page “Add new SSH Key”.
- I provide a title (like “For SciCom class” or “Laptop”)
- Now we’ll get our key from our system to use.
- Recall we previously generated a key to the default location, like
/home/calvin/.ssh/id_ed25519
- If you don’t remember, just type
ssh-keygen
and it’ll show you.
- If you don’t remember, just type
Copy/Paste
- To get the key for GitHub, we can can
cat
that file in the command line- Copy from the command line
- Paste into the GitHub “Key” field
- Click Add SSH key
SSH Test
- After adding the key to the account, I confirm via SSH like so:
ssh -T git@github.com
- I see the following:
Hi cd-public! You've successfully authenticated, but GitHub does not provide shell access.
Push
Save your work
- My preferred way to save my work, especially when working across multiple team members or devices, is by “pushing” to GitHub.
- Usually this is only
git push
(aftercommit
) but we still have a bit of setup to do.
Origins
- We want to use
git
to attach our local repository to the GitHub repository. - This called “adding a remote origin”
- Remote as in online
- Origin as in place to get files from.
- Our GitHub repository has some SSH name we will need.
Finding Names
- In your web browser, navigate to your repository page on GitHub.
- This is where we said “place a pin”
- Locate the “Quick setup — if you’ve done this kind of thing before” heading.
- Select the “SSH” option.
- Copy the address.
git@github.com:cd-public/hello-world
Or, existing repositories.
- Find the <> Code button in the top right.
- Click it:
- Select the top “local” tab (for or local repository)
- Select the SSH option (for our SSH key)
- Copy the name, like
git@github.com:cd-public/hello-world
Add Remote
- With that name (which is just use GitHub name and repository name), return to the command line and add the remote origin.
git remote add origin git@github.com:cd-public/hello-world.git
Push
- With all that work done, you can simply:
git push -u origin main
- After a moment and some diagnostic text, you should be able to see your code on GitHub, possibly after refreshing the page!
Diagnostics
- For example, you might see the following:
Enumerating objects: 27, done.
Counting objects: 100% (27/27), done.
Delta compression using up to 22 threads
Compressing objects: 100% (15/15), done.
Writing objects: 100% (16/16), 98.54 KiB | 2.46 MiB/s, done.
Total 16 (delta 11), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (11/11), completed with 10 local objects.
To github.com:cd-public/hello-world.git
adaa3a7..ba3c794 main -> main
Everyday Use
Easier After
- A lot of what we did, we only have to do once:
git config
ssh-keygen
set origin
- Many are special cases:
- Only have to make new repositories for… new repositories (likely projects)
- Only have to
git add
for new files.
Quick Example
- Let’s add
bye.py
- We’ll create a new file at the command line.
nvim bye.py
- Add some code…
Add, Commit, Push
- To get the code onto Github, add to the repo:
git add bye.py
- Commit changes to a version:
git commit -a -m "You say hello, I say goodbye"
- Push to GitHub
git push
- And that’s that!
Altogether
- For a single copy/paste
git add bye.py
git commit -a -m "You say hello, I say goodbye"
git push
- If it works, you’ll see the change on GitHub!
Again
- Perhaps we wish to be more polite and update saying “Bye!” to saying “Goodbye!”.
No Add
- Don’t need an add this time!
git commit -a -m "Goodbye version"
git push
Using GitHub.com
- It is also possible to edit files on GitHub.
- This is the same as a collaborator pushing a change!
- In the repository webpage, click on a file, like
hello.py
- This should take you to a page with a central text window showing the text in the file
hello.py
- Click, for lack of a better word, the pencil icon on the top right of the text window.
- ✏️
Editing files
- Click the “pencil” ✏️ top right of the text.
- Make some edit, like:
- Click Commit changes… in the top right.
- In the pop-up, write commit message a la
git commit -m
then click Commit changes
Pulling
- To get remote changes reflected locally, simply use
git pull
- You can then run immediately:
python hello.py
- As a rule, it is good to pull often, and especially before a
push
, if you have a collaborator or use multiple devices.
Exercise
Exercise
- Create a new repository for this class, say “Scientific Computing”
- I named mine “scicom” (shorter to type!)
- Create file
tax.py
. - For each of the Python, Neovim, Shell, and NumPy versions of the tax problem:
- Save the solution as
tax.py
- Commit with a descriptive message.
- Save the solution as
Solution
- Here is an example repository with a
tax.py
- Here is the page showing the version history.