Detailed Table of Contents
Guidance for the item(s) below:
Given this is a first course in SE, tradition demands that we start by defining the subject. However, let's not spend a lot of time going through lengthy/formal definitions of SE. Instead, let's look at an extract from the very first chapter of a very famous SE book, with the aim of providing some inspiration, but also an appreciation of the challenges ahead.
Can explain pros and cons of software engineering
The following description of the Joys of the Programming Craft was taken (and emphasis added) from Chapter 1 of the famous book The Mythical Man-Month, by Frederick P. Brooks.
Why is programming fun? What delights may its practitioner expect as his reward?
First is the sheer joy of making things. As the child delights in his mud pie, so the adult enjoys building things, especially things of his own design. I think this delight must be an image of God's delight in making things, a delight shown in the distinctness and newness of each leaf and each snowflake.
Second is the pleasure of making things that are useful to other people. Deep within, you want others to use your work and to find it helpful. In this respect the programming system is not essentially different from the child's first clay pencil holder "for Daddy's office."
Third is the fascination of fashioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning. The programmed computer has all the fascination of the pinball machine or the jukebox mechanism, carried to the ultimate.
Fourth is the joy of always learning, which springs from the nonrepeating nature of the task. In one way or another the problem is ever new, and its solver learns something: sometimes practical, sometimes theoretical, and sometimes both.
Finally, there is the delight of working in such a tractable medium. The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by the exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures....
Yet the program construct, unlike the poet's words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.
Programming then is fun because it gratifies creative longings built deep within us and delights sensibilities you have in common with all men.
Not all is delight, however, and knowing the inherent woes makes it easier to bear them when they appear.
First, one must perform perfectly. The computer resembles the magic of legend in this respect, too. If one character, one pause, of the incantation is not strictly in proper form, the magic doesn't work. Human beings are not accustomed to being perfect, and few areas of human activity demand it. Adjusting to the requirement for perfection is, I think, the most difficult part of learning to program.
Next, other people set one's objectives, provide one's resources, and furnish one's information. One rarely controls the circumstances of his work, or even its goal. In management terms, one's authority is not sufficient for his responsibility. It seems that in all fields, however, the jobs where things get done never have formal authority commensurate with responsibility. In practice, actual (as opposed to formal) authority is acquired from the very momentum of accomplishment.
The dependence upon others has a particular case that is especially painful for the system programmer. He depends upon other people's programs. These are often maldesigned, poorly implemented, incompletely delivered (no source code or test cases), and poorly documented. So he must spend hours studying and fixing things that in an ideal world would be complete, available, and usable.
The next woe is that designing grand concepts is fun; finding nitty little bugs is just work. With any creative activity come dreary hours of tedious, painstaking labor, and programming is no exception.
Next, one finds that debugging has a linear convergence, or worse, where one somehow expects a quadratic sort of approach to the end. So testing drags on and on, the last difficult bugs taking more time to find than the first.
The last woe, and sometimes the last straw, is that the product over which one has labored so long appears to be obsolete upon (or before) completion. Already colleagues and competitors are in hot pursuit of new and better ideas. Already the displacement of one's thought-child is not only conceived, but scheduled.
This always seems worse than it really is. The new and better product is generally not available when one completes his own; it is only talked about. It, too, will require months of development. The real tiger is never a match for the paper one, unless actual use is wanted. Then the virtues of reality have a satisfaction all their own.
Of course the technological base on which one builds is always advancing. As soon as one freezes a design, it becomes obsolete in terms of its concepts. But implementation of real products demands phasing and quantizing. The obsolescence of an implementation must be measured against other existing implementations, not against unrealized concepts. The challenge and the mission are to find real solutions to real problems on actual schedules with available resources.
This then is programming, both a tar pit in which many efforts have floundered and a creative activity with joys and woes all its own. For many, the joys far outweigh the woes....
Guidance for the item(s) below:
Now, let's switch our focus to the project management aspect of SE.
Broadly speaking, there are two approaches to doing a software project. Those two approaches are also highly relevant to the way this course is run, and how it is different from most SE courses elsewhere.
Let's learn about those two approaches early so that we can better understand how this course works.
Can explain SDLC process models
Software development goes through different stages such as requirements, analysis, design, implementation and testing. These stages are collectively known as the software development lifecycle (SDLC). There are several approaches, known as software development lifecycle models (also called software process models), that describe different ways to go through the SDLC. Each process model prescribes a 'roadmap' for the software developers to manage the development effort. The roadmap describes the aims of the development stages, the outcome of each stage, and the workflow i.e. the relationship between stages.
Can explain sequential process models
The sequential model, also called the waterfall model, views software development as a linear process, in which the project is seen as progressing through the development stages. The name waterfall stems from how the model is drawn to look like a waterfall (see below).
When one stage of the process is completed, it produces some artifacts to be used in the next stage. For example, the requirements stage produces a comprehensive list of requirements, to be used in the design phase.
A strict sequential model project moves only in the forward direction i.e., each stage is completed before starting the next. For example, once the requirements stage is over, there is no provision for revising the requirements later.
This model can work well for a project that produces software to solve a well-understood problem, in which case the requirements can remain stable and the effort can be estimated accurately. Furthermore, as each stage has a well-defined outcome, it is easy to track the progress of the project because one can gauge the project progress by monitoring which stage the project is in.
However, real-world projects often tackle problems that are not well-understood at the beginning, making them unsuitable for this model. For example, target users of a software product may not be able to state their requirements accurately at the start of the project, if they have not used a similar product before.
Can explain iterative process models
The iterative model advocates producing the software by going through several iterations. Each of the iterations could potentially go through all the stages of the SDLC, from requirements gathering to deployment.
Each iteration produces a new version of the product, building upon the version produced in the previous iteration. Feedback from each iteration is factored into the subsequent iterations. For example, if an implementation task took longer than expected, the effort estimate for a similar tasks in future iterations can be adjusted accordingly. Similarly, if a feature introduced in the current iteration was not well-received by target users, it can be removed or tweaked in the next iteration.
The iterative model can be done in breadth-first or depth-first approach.
Taking a Minesweeper game as an example,
A project can be done as a mixture of breadth-first and depth-first iterations i.e., an iteration can contain some breadth-first work as well as some depth-first work, or, some iterations can be breadth-first while others are depth-first.
Follow up notes for the item(s) above:
Scanning a TLDR version of a topic: As mentioned in 'Using this Website' page, the more important layer of information is given in bold text. For example, you can quickly scan the essential points of a topic by reading the bold text only (this could be useful when you want to quickly recap a previous topic, or to get an idea of what a topic covers without reading all the details).
Guidance for the item(s) below:
This week, you are starting your individual project (iP). As you are adding code to the iP in rapid succession, you'll need a way to keep track of all the changes you do. The tool we are going to use for that is called Git, and we need to learn Git basics pretty quickly.
Let's jump in and learn how to get started using Git in your own computer.
Guidance for the item(s) below:
First, let's learn a bit about tracking the change history of a project in general, at a higher level.
Can explain revision control
Given below is a general introduction to revision control, adapted from bryan-mercurial-guide:
Revision control is the process of managing multiple versions of a piece of information. In its simplest form, this is something that many people do by hand: every time you modify a file, save it under a new name that contains a number, each one higher than the number of the preceding version.
Manually managing multiple versions of even a single file is an error-prone task, though, so software tools to help automate this process have long been available. The earliest automated revision control tools were intended to help a single user to manage revisions of a single file. Over the past few decades, the scope of revision control tools has expanded greatly; they now manage multiple files, and help multiple people to work together. The best modern revision control tools have no problem coping with thousands of people working together on projects that consist of hundreds of thousands of files.
There are a number of reasons why you or your team might want to use an automated revision control tool for a project.
Most of these reasons are equally valid, at least in theory, whether you're working on a project by yourself, or with a hundred other people.
Revision: A revision (some seem to use it interchangeably with version while others seem to distinguish the two -- here, let us treat them as the same, for simplicity) is a state of a piece of information at a specific time that is a result of some changes to it e.g., if you modify the code and save the file, you have a new revision (or a new version) of that file.
RCS: Revision control software are the software tools that automate the process of Revision Control i.e. managing revisions of software artifacts.
Revision control software are also known as Version Control Software (VCS), and by a few other names.
Git is the most widely used RCS today. Other RCS tools include Mercurial, Subversion (SVN), Perforce, CVS (Concurrent Versions System), Bazaar, TFS (Team Foundation Server), and Clearcase.
Github is a web-based project hosting platform for projects using Git for revision control. Other similar services include GitLab, BitBucket, and SourceForge.
Can explain repositories
The repository is the database that stores the revision history. Suppose you want to apply revision control on files in a directory called ProjectFoo
. In that case, you need to set up a repo (short for repository) in the ProjectFoo
directory, which is referred to as the working directory of the repo. For example, Git uses a hidden folder named .git
inside the working directory, to store the database of the working directory's revision history.
Repository (repo for short): The database of the history of a directory being tracked by an RCS software (e.g. Git).
Working directory: the root directory revision-controlled by Git (e.g., the directory in which the repo was initialized).
You can have multiple repos in your computer, each repo revision-controlling files of a different working directory, for examples, files of different projects.
Guidance for the item(s) below:
Now that we know what RCS is in general, we can try to practice it ourselves using a specific tool i.e., Git.
The following section gives a specific scenario that includes the steps of initializing a Git repository.
If you are new to Git, you are highly recommended to follow those steps in your own computer to get some hands-on practice as you learn Git usage.
Can create a local Git repo
Let's take your first few steps in your Git journey.
0. Take a peek at the full picture(?). Optionally, if you are the sort who prefers to have some sense of the full picture before you get into the nitty-gritty details, watch the video in the panel below:
Git Overview
1. First, install Sourcetree (installation instructions), which is Git + a GUI for Git. If you prefer to use Git via the command line (i.e., without a GUI), you can install Git instead.
2. Next, create a directory for the repo (e.g., a directory named things
).
3. Then, initialize a repository in that directory.
Windows: Click File
→ Clone/New…
→ Click on + Create
button on the top menu bar.
Enter the location of the directory and click Create
.
Mac: New...
→ Create Local Repository
(or Create New Repository
) → Click ...
button to select the folder location for the repository → click the Create
button.
Go to the things
folder and observe how a hidden folder .git
has been created.
Windows: To see the hidden folders, you might have to configure Windows Explorer to show hidden files first.
Open a Git Bash Terminal.
If you installed Sourcetree, you can click the Terminal
button to open a GitBash terminal (on a Linux/Mac environment, even a regular terminal should do).
Navigate to the things
directory.
Use the command git init
which should initialize the repo.
$ cd /c/repos/things
$ git init
Initialized empty Git repository in c:/repos/things/.git/
You can use the list all command ls -a
to view all files, which should show the .git
directory that was created by the previous command.
$ ls -a
. .. .git
You can also use the git status
command to check the status of the newly-created repo. It should respond with something like the following:
$ git status
# On branch master
#
# No commits yet
#
nothing to commit (create/copy files and use "git add" to track)
As you see above, this textbook explains how to use Git via Sourcetree (a GUI client) as well as via the Git CLI. If you are new to Git, we recommend you learn both the GUI method and the CLI method -- The GUI method will help you visualize the result better while the CLI method is more universal (i.e., you will not be tied to any GUI) and more flexible/powerful.
It is fine to learn the CLI way only (using Sourcetree is optional), especially if you normally prefer to work with CLI over GUI.
Guidance for the item(s) below:
For the next few sections, the drill is the same: first learn the high-level explanation of a revision control concept, and then follow the given scenarios yourself to learn how to apply that concept using Git.
Can explain saving history
In a repo, you can specify which files to track and which files to ignore. For example, we can configure Git to ignore temporary files created during the build/test process, as it does not make sense to track their version history.
Committing saves a snapshot of the current state of the tracked files in the revision control history. Such a snapshot is also called a commit (i.e. the noun).
Commit (noun): a change (aka a revision) saved in the Git revision history.
(verb): the act of creating a commit i.e., saving a change in the working directory into the Git revision history.
When ready to commit, you first add the specific changes you want to commit to a staging area. This intermediate step allows you to commit only some changes while saving other changes for a later commit.
Stage (verb): Instructing Git to prepare a file for committing.
Can commit using Git
After initializing a repository, Git can help you with revision controlling files inside the working directory. However, it is not automatic. You need to tell Git which of your changes (aka revisions) should be committed to its memory for later use. Saving changes into Git's memory in that way is called committing and a change saved to the revision history is called a commit.
Here are the steps you can follow to learn how to create Git commits:
1. Do some changes to the content inside the working directory e.g., create a file named fruits.txt
in the things
directory and add some dummy text to it.
2. Observe how the file is detected by Git.
The file is shown as ‘unstaged’.
You can use the git status
command to check the status of the working directory.
$ git status
# On branch master
#
# No commits yet
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# a.txt
nothing added to commit but untracked files present (use "git add" to track)
3. Stage the changes to commit: Although Git has detected the file in the working directory, it will not do anything with the file unless you tell it to. Suppose you want to commit the current changes to the file. First, you should stage the file, which is how you tell Git which changes you want to include in the next commit.
When you stage a change, the change is moved to the staging area, which is a file Git uses to store information about what will go into your next commit. The staging area is also called the 'index' by the Git practitioners.
Select the fruits.txt
and click on the Stage Selected
button.
fruits.txt
should appear in the Staged files
panel now.
If Sourcetree shows a \ No newline at the end of the file
message below the staged lines (i.e., below the cherries
line in the above screenshot), that is because you did not hit enter after entering the last line of the file (hence, Git is not sure if that line is complete). To rectify, move the cursor to end of the last line in that file and hit enter (like you are adding a blank line below it). This new change will now appear as an 'unstaged' change. Stage it as well.
You can use the stage
or the add
command (they are synonyms, add
is the more popular choice) to stage files.
$ git add fruits.txt
$ git status
# On branch master
#
# No commits yet
#
# Changes to be committed:
# (use "git rm --cached <file>..." to unstage)
#
# new file: fruits.txt
#
4. Commit the staged version of fruits.txt
.
Click the Commit
button, enter a commit message e.g. add fruits.txt
into the text box, and click Commit
.
Use the commit
command to commit. The -m
switch is used to specify the commit message.
$ git commit -m "Add fruits.txt"
You can use the log
command to see the commit history.
$ git log
commit 8fd30a6910efb28bb258cd01be93e481caeab846
Author: … < … @... >
Date: Wed Jul 5 16:06:28 2017 +0800
Add fruits.txt
Note the existence of something called the master
branch. Git uses a mechanism called 'branches' to facilitate evolving file content in parallel (we'll learn git branching in a later topic). Furthermore, Git auto-creates a branch named master
(or main
) on which the commits go on by default.
Expand the BRANCHES
menu and click on the master
to view the history graph, which contains only one node at the moment, representing the commit you just added. Also note a label master
attached to the commit.
This label points to the latest commit on the master
branch.
Run the git status
command and note how the output contains the phrase on branch master
.
5. Do a few more commits.
Make some changes to fruits.txt
(e.g. add some text and delete some text). Stage the changes, and commit the changes using the same steps you followed before. You should end up with something like this.
Next, add two more files colors.txt
and shapes.txt
to the same working directory. Add a third commit to record the current state of the working directory.
You can decide what to stage and what to leave unstaged. When staging changes to commit, you can leave some files unstaged, if you wish to not include them in the next commit. In fact, Git even allows some changes in a file to be staged, while other changes in the same file to be unstaged. This flexibility is particularly useful when you want to put all related changes into a commit while leaving out unrelated changes.
6. See the revision graph: Note how commits form a path-like structure aka the revision tree/graph. In the revision graph, each commit is shown as linked to its 'parent' commit (i.e., the commit before it).
To see the revision graph, click on the History
item (listed under the WORKSPACE
section) on the menu on the right edge of Sourcetree.
The gitk
command opens a rudimentary graphical view of the revision graph.
How to undo/delete a commit?
To undo the last commit, right-click on the commit just before it, and choose Reset current branch to this commit
.
In the next dialog, choose the mode Mixed - keep working copy but reset index
option. This will make the offending commit disappear but will keep the changes that you included in that commit intact.
If you use the Soft - ...
mode instead, the last commit will be undone as before, but the changes included in that commit will stay in the staging area.
To delete the last commit entirely (i.e., undo the commit and also discard the changes included in that commit), do as above but choose the Hard - ...
mode instead.
To undo/delete last n commits, right-click on the commit just before the last n commits, and do as above.
To undo the last commit, but keep the changes in the staging area, use the following command.
$ git reset --soft HEAD~1
To undo the last commit, and remove the changes from the staging area (but not discard the changes), used --mixed
instead of --soft
.
$ git reset --mixed HEAD~1
To delete the last commit entirely (i.e., undo the commit and also discard the changes included in that commit), do as above but use the --hard
flag instead (i.e., do a hard reset).
$ git reset --hard HEAD~1
To undo/delete last n commits: HEAD~1
is used to tell Git you are targeting the commit one position before the latest commit -- in this case the target commit is the one we want to reset to, not the one we want to undo (as the command used is reset
). To undo/delete two last commits, you can use HEAD~2
, and so on.
Can set Git to ignore files
Often, there are files inside the Git working folder that you don't want to revision-control e.g., temporary log files. Follow the steps below to learn how to configure Git to ignore such files.
1. Add a file into your repo's working folder that you supposedly don't want to revision-control e.g., a file named temp.txt
. Observe how Git has detected the new file.
2. Configure Git to ignore that file:
The file should be currently listed under Unstaged files
. Right-click it and choose Ignore…
. Choose Ignore exact filename(s)
and click OK
.
Observe that a file named .gitignore
has been created in the working directory root and has the following line in it.
temp.txt
Create a file named .gitignore
in the working directory root and add the following line in it.
temp.txt
The .gitignore
file
The .gitignore
file tells Git which files to ignore when tracking revision history. That file itself can be either revision controlled or ignored.
To version control it (the more common choice – which allows you to track how the .gitignore
file changes over time), simply commit it as you would commit any other file.
To ignore it, follow the same steps you followed above when you set Git to ignore the temp.txt
file.
It supports file patterns e.g., adding temp/*.tmp
to the .gitignore
file prevents Git from tracking any .tmp
files in the temp
directory.
More information about the .gitignore
file: git-scm.com/docs/gitignore
Files recommended to be omitted from version control
*.class
, *.jar
, *.exe
(reasons: 1. no need to version control these files as they can be generated again from the source code 2. Revision control systems are optimized for tracking text-based files, not binary files.Can explain basic concepts of how RCS history is used
RCS tools store the history of the working directory as a series of commits. This means you should commit after each change that you want the RCS to 'remember'.
Each commit in a repo is a recorded point in the history of the project that is uniquely identified by an auto-generated hash e.g. a16043703f28e5b3dab95915f5c5e5bf4fdc5fc1
.
You can tag a specific commit with a more easily identifiable name e.g. v1.0.2
.
To see what changed between two points of the history, you can ask the RCS tool to diff the two commits in concern.
To restore the state of the working directory at a point in the past, you can checkout the commit in concern. i.e., you can traverse the history of the working directory simply by checking out the commits you are interested in.
Can tag commits using Git
Each Git commit is uniquely identified by a hash e.g., d670460b4b4aece5915caf5c68d12f560a9fe3e4
. As you can imagine, using such an identifier is not very convenient for our day-to-day use. As a solution, Git allows adding a more human-readable tag to a commit e.g., v1.0-beta
.
Here's how you can tag a commit in a local repo:
Right-click on the commit (in the graphical revision graph) you want to tag and choose Tag…
.
Specify the tag name e.g. v1.0
and click Add Tag
.
The added tag will appear in the revision graph view.
To add a tag to the current commit as v1.0
:
$ git tag v1.0
To view tags:
$ git tag
To learn how to add a tag to a past commit, go to the ‘Git Basics – Tagging’ page of the git-scm book and refer the ‘Tagging Later’ section.
After adding a tag to a commit, you can use the tag to refer to that commit, as an alternative to using the hash.
Annotated vs Lightweight Tags: The Git tags explained above are known as lightweight tags. There is another type of Git tags called annotated tags. See git-scm.com/book for more info.
Tags are different from commit messages, in purpose and in form. A commit message is a description of the commit that is part of the commit itself. A tags is a short name for a commit, which exists as a separate entity that points to a commit.
Can compare git revisions
Git can show you what changed in each commit.
To see which files changed in a commit, click on the commit. To see what changed in a specific file in that commit, click on the file name.
$ git show < part-of-commit-hash >
Example:
$ git show 5bc0e306
commit 5bc0e30635a754908dbdd3d2d833756cc4b52ef3
Author: … < … >
Date: Sat Jul 8 16:50:27 2017 +0800
fruits.txt: replace banana with berries
diff --git a/fruits.txt b/fruits.txt
index 15b57f7..17f4528 100644
--- a/fruits.txt
+++ b/fruits.txt
@@ -1,3 +1,3 @@
apples
-bananas
+berries
cherries
Git can also show you the difference between two points in the history of the repo.
Select the two points you want to compare using Ctrl+Click
. The differences between the two selected versions will show up in the bottom half of Sourcetree, as shown in the screenshot below.
The same method can be used to compare the current state of the working directory (which might have uncommitted changes) to a point in the history.
The diff
command can be used to view the differences between two points of the history.
git diff
: shows the changes (uncommitted) since the last commit.git diff 0023cdd..fcd6199
: shows the changes between the points indicated by commit hashes.git diff v1.0..HEAD
: shows changes that happened from the commit tagged as v1.0
to the most recent commit.Can load a specific version of a Git repo
Git can load a specific version of the history to the working directory. Note that if you have uncommitted changes in the working directory, you need to stash them first to prevent them from being overwritten.
Double-click the commit you want to load to the working directory, or right-click on that commit and choose Checkout...
.
Click OK
to the warning about ‘detached HEAD’ (similar to below).
The specified version is now loaded to the working folder, as indicated by the HEAD
label. HEAD
is a reference to the currently checked out commit.
If you checkout a commit that comes before the commit in which you added the .gitignore
file, Git will now show ignored files as ‘unstaged modifications’ because at Git hasn’t been told to ignore those files.
To go back to the latest commit, double-click it.
Use the checkout <commit-identifier>
command to change the working directory to the state it was in at a specific past commit.
git checkout v1.0
: loads the state as at commit tagged v1.0
git checkout 0023cdd
: loads the state as at commit with the hash 0023cdd
git checkout HEAD~2
: loads the state that is 2 commits behind the most recent commitFor now, you can ignore the warning about ‘detached HEAD’.
If you checkout a commit that comes before the commit in which you added the .gitignore
file, Git will now show ignored files as ‘unstaged modifications’ because at Git hasn’t been told to ignore those files.
Guidance for the item(s) below:
Having learned how to use Git in your own computer, let's also learn a bit about working with remote repositories too. While it seems like a bit 'too much' to take in one week, but we want you to start using Git in your iP (individual project) from the very beginning.
Can explain remote repositories
Remote repositories are repos that are hosted on remote computers and allow remote access. They are especially useful for sharing the revision history of a codebase among team members of a multi-person project. They can also serve as a remote backup of your codebase.
It is possible to set up your own remote repo on a server, but the easier option is to use a remote repo hosting service such as GitHub or BitBucket.
You can clone a repo to create a copy of that repo in another location on your computer. The copy will even have the revision history of the original repo i.e., identical to the original repo. For example, you can clone a remote repo onto your computer to create a local copy of the remote repo.
When you clone from a repo, the original repo is commonly referred to as the upstream repo. A repo can have multiple upstream repos. For example, let's say a repo repo1
was cloned as repo2
which was then cloned as repo3
. In this case, repo1
and repo2
are upstream repos of repo3
.
You can pull (or fetch) from one repo to another, to receive new commits in the second repo, but only if the repos have a shared history. Let's say some new commits were added to the after you cloned it, and you would like to copy over those new commits to your own clone i.e., sync your clone with the upstream repo. In that case, you can pull from the upstream repo to your clone.
You can push new commits in one repo to another repo which will copy the new commits onto the destination repo. Note that pushing to a repo requires you to have write-access to it. Furthermore, you can push between repos only if those repos have a shared history among them (i.e., one was created by copying the other at some point in the past).
Cloning, pushing, and pulling/fetching can be done between two local repos too, although it is more common for them to involve a remote repo.
A repo can work with any number of other repositories as long as they have a shared history e.g., repo1
can pull from (or push to) repo2
and repo3
if they have a shared history between them.
A fork is a remote copy of a remote repo. As you know, cloning creates a local copy of a repo. In contrast, forking creates a remote copy of a repo hosted on remote service such as GitHub. This is particularly useful if you want to play around with a GitHub repo, but you don't have write permissions to it; you can simply fork the repo and do whatever you want with the fork as you are the owner of the fork.
A pull request (PR for short) is a mechanism for contributing code to a remote repo i.e., "I'm requesting you to pull my proposed changes to your repo". It's feature provided by RCS platforms such as GitHub. For this to work, the two repos must have a shared history. The most common case is sending PRs from a fork to its repo.
Here is a scenario that includes all the concepts introduced above (click inside the slide to advance the animation):
Can clone a remote repo
Given below is an example scenario you can try yourself to learn Git cloning.
Suppose you want to clone the sample repo samplerepo-things to your computer.
Note that the URL of the GitHub project is different from the URL you need to clone a repo in that GitHub project. e.g.
GitHub project URL: https://github.com/se-edu/samplerepo-things
Git repo URL: https://github.com/se-edu/samplerepo-things.git
(note the .git
at the end)
File
→ Clone / New…
and provide the URL of the repo and the destination directory.
Can pull changes from a repo
Here's a scenario you can try in order to learn how to pull commits from another repo to yours.
1. Clone a repo (e.g., the repo used in [Git & GitHub → Clone]) to be used for this activity.
2. Delete the last few commits to simulate cloning the repo a few commits ago.
Right-click the target commit (i.e. the commit that is 2 commits behind the tip) and choose Reset current branch to this commit
.
Choose the Hard - …
option and click OK
.
This is what you will see.
Note the following (cross-refer the screenshot above):
a
: The local repo is now at this commit, marked by the master
label.b
: The origin/master
label shows what is the latest commit in the master
branch in the remote repo. origin
is the default name given to the upstream repo you cloned from. You can ignore the origin/HEAD
label for now.Use the reset
command to delete commits at the tip of the revision history.
$ git reset --hard HEAD~2
More info on the git reset
command can be found here.
Now, your local repo state is exactly how it would be if you had cloned the repo 2 commits ago, as if somebody has added two more commits to the remote repo since you cloned it.
3. Pull from the remote repo: To get those missing commits to your local repo (i.e. to sync your local repo with upstream repo) you can do a pull.
Click the Pull
button in the main menu, choose origin
and master
in the next dialog, and click OK
.
Now you should see something like this where master
and origin/master
are both pointing the same commit.
$ git pull origin
You can also do a fetch
instead of a pull
in which case the new commits will be downloaded to your repo but the working directory will remain at the current commit. To move the current state to the latest commit that was downloaded, you need to do a merge
. A pull
is a shortcut that does both those steps in one go.
When you clone a repo, Git automatically adds a remote repo named origin
to your repo configuration. As you know, you can pull commits from that repo. Furthermore, a Git repo can work with remote repos other than the one it was cloned from.
To communicate with another remote repo, you can first add it as a remote of your repo. Here is an example scenario you can follow to learn how to pull from another repo:
Open the local repo in Sourcetree. Suggested: Use your local clone of the samplerepo-things
repo.
Choose Repository
→ Repository Settings
menu option.
Add a new remote to the repo with the following values.
Remote name
: the name you want to assign to the remote repo e.g., upstream1
URL/path
: the URL of your repo (ending in .git
) that. Suggested: https://github.com/se-edu/samplerepo-things-2.git
(samplerepo-things-2
is another repo that has a shared history with samplerepo-things
)Username
: your GitHub usernameNow, you can fetch or pull (pulling will fetch the branch and merge the new code to the current branch) from the added repo as you did before but choose the remote name of the repo you want to pull from (instead of origin
):
Click the Fetch
button or the Pull
button first.
If the Remote branch to pull
dropdown is empty, click the Refresh
button on its right.
If the pull from the samplerepo-things-2
was successful, you should have received one more commit into your local repo.
Navigate to the folder containing the local repo.
Set the new remote repo as a remote of the local repo.
command: git remote add {remote_name} {remote_repo_url}
e.g., git remote add upstream1 https://github.com/johndoe/foobar.git
Now you can fetch or pull (pulling will fetch the branch and merge the new code to the current branch) from the new remote.
e.g., git fetch upstream1 master
followed by git merge upstream1/master
, or,
git pull upstream1 master
Can fork a repo
Given below is a scenario you can try in order to learn how to fork a repo:.
0. Create a GitHub account if you don't have one yet.
1. Go to the GitHub repo you want to fork e.g., samplerepo-things
2. Click on the button on the top-right corner. In the next step,
[ ] Copy the master branch only
option, so that you get copies of other branches (if any) in the repo.As you might have guessed from the above, forking is not a Git feature, but a feature provided by remote Git hosting services such as Github.
GitHub does not allow you to fork the same repo more than once to the same destination. If you want to re-fork, you need to delete the previous fork.
Can push to a remote repo
Given below is a scenario you can try in order to learn how to push commits to a remote repo hosted on GitHub:
1. Fork an existing GitHub repo (e.g., samplerepo-things) to your GitHub account.
2. Clone the fork (not the original) to your computer.
3. Commit some changes in your local repo.
4. Push the new commits to your fork on GitHub
Click the Push
button on the main menu, ensure the settings are as follows in the next dialog, and click the Push
button on the dialog.
Use the command git push origin master
. Enter your Github username and password when prompted.
5. Add a few more commits, and tag some of them.
6. Push the new commits and the tags.
Push similar to before, but ensure the [ ] Push all tags
option in the push dialog is ticked as well.
A normal push does not include tags. After pushing the commits (as before), push tags to the repo as well:
To push a specific tag:
$ git push origin v1.0b
To push all tags:
$ git push origin --tags
You can push to repos other than the one you cloned from, as long as the target repo and your repo have a shared history.
Push your repo to the new remote the usual way, but select the name of target remote instead of origin
and remember to select the Track
checkbox.
Push to the new remote the usual way e.g., git push upstream1 master
(assuming you gave the name upstream1
to the remote).
You can even push an entire local repository to GitHub, to form an entirely new remote repository. For example, you created a local repo and worked with it for a while but now you want to upload it onto GitHub (as a backup or to share it with others). The steps are given below.
1. Create an empty remote repo on GitHub.
Login to your GitHub account and choose to create a new Repo.
In the next screen, provide a name for your repo but keep the Initialize this repo ...
tick box unchecked.
Note the URL of the repo. It will be of the form https://github.com/{your_user_name}/{repo_name}.git
.
e.g., https://github.com/johndoe/foobar.git
(note the .git
at the end)
2. Add the GitHub repo URL as a remote of the local repo. You can give it the name origin
(or any other name).
3. Push the repo to the remote.
Push each branch to the new remote the usual way but use the -u
flag to inform Git that you wish to the branch.
e.g., git push -u origin master
Guidance for the item(s) below:
As you are likely to be using an IDE for the iP, let's learn at least enough about IDEs to get you started using one.
🤔 In case you are puzzled by the sudden change of topic, it's because we take an iterative approach to covering topics, as explained in the panel below:
Can explain IDEs
Professional software engineers often write code using Integrated Development Environments (IDEs). IDEs support most development-related work within the same tool (hence, the term integrated).
An IDE generally consists of:
Examples of popular IDEs:
Some web-based IDEs have appeared in recent times too e.g., Amazon's Cloud9 IDE.
Some experienced developers, in particular those with a UNIX background, prefer lightweight yet powerful text editors with scripting capabilities (e.g. Emacs) over heavier IDEs.
Guidance for the item(s) below:
As you start adding features to your project iteratively, you'll need a way to detect if the new code breaks the existing code. Next, let's learn a rather simple way to do that using a certain type of testing (we'll be learning more sophisticated methods in later weeks).
This also means we are now switching focus from the implementation aspect to the testing aspect of SE.
Can explain testing
Testing: Operating a system or component under specified conditions, observing or recording the results, and making an evaluation of some aspect of the system or component. –- source: IEEE
When testing, you execute a set of test cases. A test case specifies how to perform a test. At a minimum, it specifies the input to the software under test (SUT) and the expected behavior.
Example: A minimal test case for testing a browser:
longfile.html
located in the test data
folder.longfile.html
.Test cases can be determined based on the specification, reviewing similar existing systems, or comparing to the past behavior of the SUT.
For each test case you should do the following:
A test case failure is a mismatch between the expected behavior and the actual behavior. A failure indicates a potential defect (or a bug) -- we say 'potential' because the error could be in the test case itself.
Example: In the browser example above, a test case failure is implied if the scrollbar remains disabled after loading longfile.html
. The defect/bug causing that failure could be an uninitialized variable.
Can explain regression testing
When you modify a system, the modification may result in some unintended and undesirable effects on the system. Such an effect is called a regression.
Regression testing is the re-testing of the software to detect regressions. The typical way to detect regressions is retesting all related components, even if they had been tested before.
Regression testing is more effective when it is done frequently, after each small change. However, doing so can be prohibitively expensive if testing is done manually. Hence, regression testing is more practical when it is automated.
Can explain test automation
An automated test case can be run programmatically and the result of the test case (pass or fail) is determined programmatically. Compared to manual testing, automated testing reduces the effort required to run tests repeatedly and increases precision of testing (because manual testing is susceptible to human errors).
Can semi-automate testing of CLIs
A simple way to semi-automate testing of a CLI (Command Line Interface) app is by using input/output re-direction. Here are the high-level steps:
Let's assume you are testing a CLI app called AddressBook
. Here are the detailed steps:
Store the test input in the text file input.txt
.
Example input.txt
Store the output you expect from the SUT in another text file expected.txt
.
Example expected.txt
Run the program as given below, which will redirect the text in input.txt
as the input to AddressBook
and similarly, will redirect the output of AddressBook
to a text file output.txt
. Note that this does not require any changes in AddressBook
code.
java AddressBook < input.txt > output.txt
The way to run a CLI program differs based on the language.
e.g., In Python, assuming the code is in AddressBook.py
file, use the command
python AddressBook.py < input.txt > output.txt
If you are using Windows, use a normal MS-DOS terminal (i.e., cmd.exe
) to run the app, not a PowerShell window.
Next, you compare output.txt
with the expected.txt
. This can be done using a utility such as Windows' FC
(i.e. File Compare) command, Unix's diff
command, or a GUI tool such as WinMerge.
FC output.txt expected.txt
Note that the above technique is only suitable when testing CLI apps, and only if the exact output can be predetermined. If the output varies from one run to the other (e.g. it contains a time stamp), this technique will not work. In those cases, you need more sophisticated ways of automating tests.
Follow up notes for the item(s) above:
Congrats! You've made it to the end of this week's topics. It feels like a lot right now but now that we got an early start, this stuff will be second nature to you by the time you are done with the semester. 😃